internationalization - How can I programmatically find the list of codecs known to Python? -


i know can following:

>>> import encodings, pprint >>> pprint.pprint(sorted(encodings.aliases.aliases.values())) ['ascii',  'base64_codec',  'big5',  'big5hkscs',  'bz2_codec',  'cp037',  'cp1026',  'cp1140',  'cp1250',  'cp1251',  'cp1252',  'cp1253',  'cp1254',  'cp1255',  'cp1256',  'cp1257',  'cp1258',  'cp424',  'cp437',  'cp500',  'cp775',  'cp850',  'cp852',  'cp855',  'cp857',  'cp860',  'cp861',  'cp862',  'cp863',  'cp864',  'cp865',  'cp866',  'cp869',  'cp932',  'cp949',  'cp950',  'euc_jis_2004',  'euc_jisx0213',  'euc_jp',  'euc_kr',  'gb18030',  'gb2312',  'gbk',  'hex_codec',  'hp_roman8',  'hz',  'iso2022_jp',  'iso2022_jp_1',  'iso2022_jp_2',  'iso2022_jp_2004',  'iso2022_jp_3',  'iso2022_jp_ext',  'iso2022_kr',  'iso8859_10',  'iso8859_11',  'iso8859_13',  'iso8859_14',  'iso8859_15',  'iso8859_16',  'iso8859_2',  'iso8859_3',  'iso8859_4',  'iso8859_5',  'iso8859_6',  'iso8859_7',  'iso8859_8',  'iso8859_9',  'johab',  'koi8_r',  'latin_1',  'mac_cyrillic',  'mac_greek',  'mac_iceland',  'mac_latin2',  'mac_roman',  'mac_turkish',  'mbcs',  'ptcp154',  'quopri_codec',  'rot_13',  'shift_jis',  'shift_jis_2004',  'shift_jisx0213',  'tactis',  'tis_620',  'utf_16',  'utf_16_be',  'utf_16_le',  'utf_32',  'utf_32_be',  'utf_32_le',  'utf_7',  'utf_8',  'uu_codec',  'zlib_codec'] 

i know sure not complete list, since includes only encodings alias exists (e.g "cp737" missing), , @ least pseudo-encodings missing (e.g "string_escape").

as title of question says: how can programmatically list of codecs/encodings known python?

if not programmatically: there complete list available online?

i don't think complete list stored anywhere in python standard library. instead, encodings loaded on demand through calls encoding.search_function(encoding). if study code there, looks encoding string first normalized , encodings package searched submodules name matches encoding.

the following uses pkgutil list submodules of encoding, , adds them listed in encoding.aliases.aliases.

unfortunately, encoding.aliases.aliases contains 1 encoding, tactis not generated above, tried generate complete list union-ing 2 sets.

import encodings import os import pkgutil  modnames=set([modname importer, modname, ispkg in pkgutil.walk_packages(     path=[os.path.dirname(encodings.__file__)], prefix='')]) aliases=set(encodings.aliases.aliases.values())  print(modnames-aliases) # set(['charmap', 'unicode_escape', 'cp1006', 'unicode_internal', 'punycode', 'string_escape', 'aliases', 'palmos', 'mac_centeuro', 'mac_farsi', 'mac_romanian', 'cp856', 'raw_unicode_escape', 'mac_croatian', 'utf_8_sig', 'mac_arabic', 'undefined', 'cp737', 'idna', 'koi8_u', 'cp875', 'cp874', 'iso8859_1'])  print(aliases-modnames) # set(['tactis'])  codec_names=modnames.union(aliases) print(codec_names) # set(['bz2_codec', 'cp1140', 'euc_jp', 'cp932', 'punycode', 'euc_jisx0213', 'aliases', 'hex_codec', 'cp500', 'uu_codec', 'big5hkscs', 'mac_romanian', 'mbcs', 'euc_jis_2004', 'iso2022_jp_3', 'iso2022_jp_2', 'iso2022_jp_1', 'gbk', 'iso2022_jp_2004', 'unicode_internal', 'utf_16_be', 'quopri_codec', 'cp424', 'iso2022_jp', 'mac_iceland', 'raw_unicode_escape', 'hp_roman8', 'iso2022_kr', 'cp875', 'iso8859_6', 'cp1254', 'utf_32_be', 'gb2312', 'cp850', 'shift_jis', 'cp852', 'cp855', 'iso8859_3', 'cp857', 'cp856', 'cp775', 'unicode_escape', 'cp1026', 'mac_latin2', 'utf_32', 'mac_cyrillic', 'base64_codec', 'ptcp154', 'palmos', 'mac_centeuro', 'euc_kr', 'hz', 'utf_8', 'utf_32_le', 'mac_greek', 'utf_7', 'mac_turkish', 'utf_8_sig', 'mac_arabic', 'tactis', 'cp949', 'zlib_codec', 'big5', 'iso8859_9', 'iso8859_8', 'iso8859_5', 'iso8859_4', 'iso8859_7', 'cp874', 'iso8859_1', 'utf_16_le', 'iso8859_2', 'charmap', 'gb18030', 'cp1006', 'shift_jis_2004', 'mac_roman', 'ascii', 'string_escape', 'iso8859_15', 'iso8859_14', 'tis_620', 'iso8859_16', 'iso8859_11', 'iso8859_10', 'iso8859_13', 'cp950', 'utf_16', 'cp869', 'mac_farsi', 'rot_13', 'cp860', 'cp861', 'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'shift_jisx0213', 'johab', 'mac_croatian', 'cp1255', 'latin_1', 'cp1257', 'cp1256', 'cp1251', 'cp1250', 'cp1253', 'cp1252', 'cp437', 'cp1258', 'undefined', 'cp737', 'koi8_r', 'cp037', 'koi8_u', 'iso2022_jp_ext', 'idna']) 

Comments

Popular posts from this blog

unicode - Are email addresses allowed to contain non-alphanumeric characters? -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

c++ - Convert big endian to little endian when reading from a binary file -