Character Sets And Code Pages At The Push Of A Button |
Code Pages, Character Encodings from Software Vendors and Standards Bodies
Here you can find character set and code page information from software vendors (Microsoft, HP, IBM, Sun, etc.) and international standards organizations (e.g. ISO, ECMA, INCITS, etc.). Push any "button" and you will be taken either to the chart of a code page provided by the vendor, or the vendor's web page of links to code page charts. This gives you fast access to popular code pages, as well as access to more complete lists of code page charts.
Organization
The links are (mostly) organized by vendor or standard organization. Some code pages are listed redundantly, usually because the code page is being described by different vendors. Sometimes the difference is important. For example, one vendor's view of a code page may be different from another's. Certainly character conversion or mapping tables may be very different. Sometimes a code page has been updated and one vendor is still referring to an earlier version of the code page.
Character Encodings, Transformation Formats, Double-Byte, Multi-byte, UTF...
Note that a "code page" is also known by various other names: codepage, encoding, charset, character set, coded character set, (CCS), graphic character set, character map et al. Some of these have more specific names DBCS (double-byte character set), MBCS (multi-byte character set). Some encodings are the result of transformations, and are known as transformation formats, examples include Unicode UTF-8, UTF-16, UTF-32.
Unicode UTF-16 Surrogate Code Points, or Supplementary Characters
If you are interested in UTF-16 surrogate code points, or supplementary characters, see
Setting up Microsoft Windows NT, 2000 or Windows XP to Support Unicode Supplementary Characters and
Conversion Table: Unicode Surrogates to Scalar Value/UTF-32.
Other Unicode pages on this site that may be of interest include: Cheat Sheet: Unicode-Enabling Microsoft C/C++ Source Code, Hiragana Characters, Hebrew Characters, Benefits of the Unicode Standard, and the Compelling Unicode Demo.
Unicode Standards Organizations Assorted web pages The Go To Guys Czyborra's Site Great Sites China's GB18030 Hong Kong Supplementary Character Set (HKSCS) Library of Congress MAchine Readable Catalog (MARC) | Microsoft's ISO code pages Microsoft Windows code pages Microsoft double-byte character sets Microsoft DOS code pages | IBM ICU Character Conversion Data IBM's ISO code pages IBM Windows code pages IBM Asian code pages IBM DOS code pages |
Assorted Web PagesI18n Guy's Hiragana Unicode Chart Dik Winter's Character Set History Piotr Trzcionkowski's Polish code page site (in Polish) Cyrillic.com Character Sets I18nGuru's Character Sets page VT320, VT102, VT52, Heath-19 DEC Terminals VT100, VT220, VT320 Kostis' Character Sets Kostis' Apple Macintosh Roman Japanese Encoding Differences Koichi Yasuoka's Character Tables | Unicode ChartsUnicode Charts Unicode character name index UTF-32 (TR-19) Character Encoding Model (TR-17) Basic Latin Latin-1 Supplement Latin Extended-A Combining Diacritical Marks Greek Cyrillic Hebrew I18n Guy's Hebrew Unicode Chart Arabic Currency Symbols Hangul Jamo Hiragana I18n Guy's Hiragana Unicode Chart Katakana | Standards OrganizationsISO INCITS ECMA Standards ISO 6429 = ECMA-48 (pdf) (Control codes) ISO/IEC International register of coded character sets to be used with escape sequences Links to many code page charts! IANA Character Set Registry RFC Index RFC 1555 Hebrew Character Encoding for Internet Messages RFC 1556 Handling of Bi-directional Texts in MIME RFC 1556 defines ISO-8859-6-e, ISO-8859-6-i, ISO-8859-8-e,ISO-8859-8-i Armenian Character Sets ArmSCII Thai TIS 620-2533 (in Thai 620-2533) Annotated reference to the Thai implementations |
The Go To GuysMichael Everson's site Ken Lunde's CJK.inf Ken Lunde's Character set server Mark Davis's site | Czyborra's Sitewww.czyborra.com/charsets is offline. Fortunately, Kevin Atkinson has mirrored it at aspell.net/charsets. These buttons now link to his mirror. Thanks Kevin. Roman Czyborra's site Czyborra's Vendor Codepages Czyborra's Vietnamese page Czyborra's ASCII/ISO 646 page Czyborra's ISO 8859 Alphabet Soup So vat's Unicode? Chicken soup? | Great SitesFrank da Cruz's Character Sets Frank da Cruz's Character Set Tables Korpela's Tutorial on character code issues Korpela's Character and encoding site |
GB18030 Web PagesICU's Markus Scherer on GB18030 Sun on GB18030-2000 Microsoft GB18030 Support Package (in GB2312) (Adobe) Dirk Meyer's Summary of GB18030 | Hong Kong Supplementary Character Set (HKSCS)Hong Kong Supplementary Character Set (HKSCS) Hong Kong ITF on ISO 10646 | MARC BibliographicMARC 21 MARC-8 MARC UCS (Unicode) MARC Code Tables |
Here are many transcoding tables expressed in XML files using the Character Mapping Markup Language (CharMapML, UTR 22). The encoding conversion data is used in the Internationalization Components for Unicode (ICU) open source library. IBM ICUCharacter Conversion DataIBM Character DataIBM Code pages (Appendix F) IBM Character lists (Appendix I) IBM Sort Sequences (Appendix C) | IBM ISO Code Pages CP 00819 (ISO 8859-1) Latin Alphabet No. 1 CP 00813 (ISO 8859-7) Greece CP 00916 (ISO 8859-8) Hebrew CP 00920 (ISO 8859-9) Turkey | IBM Windows Code pages CP 01250 (Windows) Latin 2 CP 01252 (Windows) Latin 1 CP 01253 (Windows) Greek CP 01254 (Windows) Turkish CP 01255 (Windows) Hebrew CP 01256 (Windows) Arabic CP 01257 (Windows) Baltic Rim |
In the following web pages, leadbytes are indicated by light gray background shading. Each of these leadbytes links to a new page showing the 256 character block associated with that leadbyte. Unused leadbytes are identified by a darker gray background. Microsoft Double-Byte Character Sets I18n Guy's Hiragana Unicode Chart Japanese Shift-JIS (CP 932) Conversion Problems CP932 & Unicode Simplified Chinese GBK (CP 936) Korean (CP 949) Traditional Chinese Big5 (CP 950) Hong Kong Character Set (HKSCS) | Microsoft Windows Code Pages Microsoft's Windows code pages Microsoft's Windows code pages by country Windows CP 1250 (Central Europe) Windows CP 1251 (Cyrillic) Windows CP 1252 (Latin I) Windows CP 1253 (Greek) Windows CP 1254 (Turkish) Windows CP 1255 (Hebrew) Windows CP 1256 (Arabic) Windows CP 1257 (Baltic) Windows CP 1258 (Viet Nam) Windows CP 874 (Thai) | Microsoft's ISO Code Page Charts Globalization site: GlobalDev ISO Code Pages at Microsoft's site ISO/IEC 8859-1 (Latin 1) ISO/IEC 8859-2 (Latin 2) ISO/IEC 8859-3 (Latin 3) ISO/IEC 8859-4 (Baltic) ISO/IEC 8859-5 (Cyrillic) ISO/IEC 8859-6 (Arabic) ISO/IEC 8859-7 (Greek) ISO/IEC 8859-8 (Hebrew) ISO/IEC 8859-9 (Turkish) ISO/IEC 8859-15 (Latin 9) |
IBM DOS Code pagesCP 00437 (IBM PC) USA CP 00850 (IBM PC) Multilingual CP 00851 (IBM PC) Greece CP 00852 Latin-2 PC CP 00855 (IBM PC) Cyrillic CP 00856 (IBM PC) Hebrew CP 00857 (IBM PC) Turkey CP 00860 (IBM PC) Portugal CP 00861 (IBM PC) Iceland CP 00862 (IBM PC) Israel CP 00863 (IBM PC) Canadian French CP 00864 (IBM PC) Arabic CP 00865 (IBM PC) Nordic CP 00866 (IBM PC) Cyrillic #2 CP 00869 (IBM PC) Greece CP 00870 Latin-2 Multilingual CP 00874 (IBM PC) Thai Extended | Microsoft OEM (DOS) Code Pages Microsoft's OEM code pages DOS CP 437 (US) DOS CP 720 (Arabic) DOS CP 737 (Greek) DOS CP 775 (Baltic) DOS CP 850 (Western Europe) DOS CP 852 (Central Europe) DOS CP 855 (Cyrillic) DOS CP 857 (Turkish) DOS CP 862 (Hebrew) DOS CP 866 (Cyrillic II) | IBM Asian Code pagesI18n Guy's Hiragana Unicode Chart CP 00290 (EBCDIC) Japanese (Katakana) Non-extended CP 00290 (EBCDIC) Japanese (Katakana) Extended CP 00833 (EBCDIC) Korea Extended CP 00836 (EBCDIC) Simplified Chinese Extended CP 00891 (IBM PC) Korea CP 00895 Japan 7-Bit CP 00897 (IBM PC) Japan PC #1 CP 00903 (IBM PC) People's Republic of China (PRC) CP 00904 (IBM PC) Republic of China (ROC) CP 00905 (EBCDIC) Turkey Extended CP CP 01027 (EBCDIC) Japanese (Latin) Extended CP 01040 (IBM PC) Korean Extended CP 01041 (IBM PC) Japanese Extended CP 01042 (IBM PC) Simplified Chinese Extended CP 01043 (IBM PC) Traditional Chinese CP 01088 (IBM PC) Korean CP 01114 Traditional Chinese (Big5) CP 01115 Simplified Chinese (GB) |