天天看點

史上最全的代碼頁(CodePage)和字元集(Character Sets)Character Sets And Code Pages At The Push Of A Button

Character Sets And Code Pages At The Push Of A Button

Code Pages, Character Encodings from Software Vendors and Standards Bodies

Here you can find character set and code page information from software vendors (Microsoft, HP, IBM, Sun, etc.) and international standards organizations (e.g. ISO, ECMA, INCITS, etc.). Push any "button" and you will be taken either to the chart of a code page provided by the vendor, or the vendor's web page of links to code page charts. This gives you fast access to popular code pages, as well as access to more complete lists of code page charts.

Organization

The links are (mostly) organized by vendor or standard organization. Some code pages are listed redundantly, usually because the code page is being described by different vendors. Sometimes the difference is important. For example, one vendor's view of a code page may be different from another's. Certainly character conversion or mapping tables may be very different. Sometimes a code page has been updated and one vendor is still referring to an earlier version of the code page.

Character Encodings, Transformation Formats, Double-Byte, Multi-byte, UTF...

Note that a "code page" is also known by various other names: codepage, encoding, charset, character set, coded character set, (CCS), graphic character set, character map et al. Some of these have more specific names DBCS (double-byte character set), MBCS (multi-byte character set). Some encodings are the result of transformations, and are known as transformation formats, examples include Unicode UTF-8, UTF-16, UTF-32.

Unicode UTF-16 Surrogate Code Points, or Supplementary Characters

If you are interested in UTF-16 surrogate code points, or supplementary characters, see

Setting up Microsoft Windows NT, 2000 or Windows XP to Support Unicode Supplementary Characters and

Conversion Table: Unicode Surrogates to Scalar Value/UTF-32.

Other Unicode pages on this site that may be of interest include: Cheat Sheet: Unicode-Enabling Microsoft C/C++ Source Code, Hiragana Characters, Hebrew Characters, Benefits of the Unicode Standard, and the Compelling Unicode Demo.

TABLE OF CONTENTS

Unicode

Standards Organizations

Assorted web pages

The Go To Guys

Czyborra's Site

Great Sites

China's GB18030

Hong Kong Supplementary Character Set (HKSCS)

Library of Congress MAchine Readable Catalog (MARC)

Microsoft's ISO code pages

Microsoft Windows code pages

Microsoft double-byte character sets

Microsoft DOS code pages

IBM ICU

Character Conversion Data

IBM's ISO code pages

IBM Windows code pages

IBM Asian code pages

IBM DOS code pages

Push A Button To Get Code Page Information

Assorted Web Pages

I18n Guy's Hiragana Unicode Chart

Dik Winter's Character Set History

Piotr Trzcionkowski's Polish code page site (in Polish)

Cyrillic.com Character Sets

I18nGuru's Character Sets page

VT320, VT102, VT52, Heath-19

DEC Terminals VT100, VT220, VT320

Kostis' Character Sets

Kostis' Apple Macintosh Roman

Japanese Encoding Differences

Koichi Yasuoka's Character Tables

Unicode Charts

Unicode Charts

Unicode character name index

UTF-32 (TR-19)

Character Encoding Model (TR-17)

Basic Latin

Latin-1 Supplement

Latin Extended-A

Combining Diacritical Marks

Greek

Cyrillic

Hebrew

I18n Guy's Hebrew Unicode Chart

Arabic

Currency Symbols

Hangul Jamo

Hiragana

I18n Guy's Hiragana Unicode Chart

Katakana

Standards Organizations

ISO

INCITS

ECMA Standards

ISO 6429 = ECMA-48 (pdf)

(Control codes)

ISO/IEC International register of coded character sets to be used with escape sequences

Links to many code page charts!

IANA Character Set Registry

RFC Index

RFC 1555 Hebrew Character Encoding for Internet Messages

RFC 1556 Handling of Bi-directional Texts in MIME

RFC 1556 defines ISO-8859-6-e, ISO-8859-6-i, ISO-8859-8-e,ISO-8859-8-i

Armenian Character Sets ArmSCII

Thai TIS 620-2533

(in Thai 620-2533)

Annotated reference to the Thai implementations

The Go To Guys

Michael Everson's site

Ken Lunde's CJK.inf

Ken Lunde's Character set server

Mark Davis's site

Czyborra's Site

www.czyborra.com/charsets is offline. Fortunately, Kevin Atkinson has mirrored it at aspell.net/charsets. These buttons now link to his mirror. Thanks Kevin.

Roman Czyborra's site

Czyborra's Vendor Codepages

Czyborra's Vietnamese page

Czyborra's ASCII/ISO 646 page

Czyborra's ISO 8859 Alphabet Soup

So vat's Unicode? Chicken soup?

Great Sites

Frank da Cruz's Character Sets

Frank da Cruz's Character Set Tables

Korpela's Tutorial on character code issues

Korpela's Character and encoding site

GB18030 Web Pages

ICU's Markus Scherer on GB18030

Sun on GB18030-2000

Microsoft GB18030 Support Package (in GB2312)

(Adobe) Dirk Meyer's Summary of GB18030

Hong Kong Supplementary Character Set (HKSCS)

Hong Kong Supplementary Character Set (HKSCS)

Hong Kong ITF on ISO 10646

MARC Bibliographic

MARC 21

MARC-8

MARC UCS (Unicode)

MARC Code Tables

Here are many transcoding tables expressed in XML files using the Character Mapping Markup Language (CharMapML, UTR 22). The encoding conversion data is used in the Internationalization Components for Unicode (ICU) open source library.

IBM ICU

Character Conversion Data

IBM Character Data

IBM Code pages (Appendix F)

IBM Character lists (Appendix I)

IBM Sort Sequences (Appendix C)

IBM

ISO Code Pages

CP 00819 (ISO 8859-1) Latin Alphabet No. 1

CP 00813 (ISO 8859-7) Greece

CP 00916 (ISO 8859-8) Hebrew

CP 00920 (ISO 8859-9) Turkey

IBM

Windows Code pages

CP 01250 (Windows) Latin 2

CP 01252 (Windows) Latin 1

CP 01253 (Windows) Greek

CP 01254 (Windows) Turkish

CP 01255 (Windows) Hebrew

CP 01256 (Windows) Arabic

CP 01257 (Windows) Baltic Rim

In the following web pages, leadbytes are indicated by light gray background shading. Each of these leadbytes links to a new page showing the 256 character block associated with that leadbyte. Unused leadbytes are identified by a darker gray background.

Microsoft

Double-Byte Character Sets

I18n Guy's Hiragana Unicode Chart

Japanese Shift-JIS (CP 932)

Conversion Problems CP932 & Unicode

Simplified Chinese GBK (CP 936)

Korean (CP 949)

Traditional Chinese Big5 (CP 950)

Hong Kong Character Set (HKSCS)

Microsoft Windows

Code Pages

Microsoft's Windows code pages

Microsoft's Windows code pages

by country

Windows CP 1250 (Central Europe)

Windows CP 1251 (Cyrillic)

Windows CP 1252 (Latin I)

Windows CP 1253 (Greek)

Windows CP 1254 (Turkish)

Windows CP 1255 (Hebrew)

Windows CP 1256 (Arabic)

Windows CP 1257 (Baltic)

Windows CP 1258 (Viet Nam)

Windows CP 874 (Thai)

Microsoft's

ISO Code Page Charts

Globalization site: GlobalDev

ISO Code Pages at Microsoft's site

ISO/IEC 8859-1 (Latin 1)

ISO/IEC 8859-2 (Latin 2)

ISO/IEC 8859-3 (Latin 3)

ISO/IEC 8859-4 (Baltic)

ISO/IEC 8859-5 (Cyrillic)

ISO/IEC 8859-6 (Arabic)

ISO/IEC 8859-7 (Greek)

ISO/IEC 8859-8 (Hebrew)

ISO/IEC 8859-9 (Turkish)

ISO/IEC 8859-15 (Latin 9)

IBM DOS Code pages

CP 00437 (IBM PC) USA

CP 00850 (IBM PC) Multilingual

CP 00851 (IBM PC) Greece

CP 00852 Latin-2 PC

CP 00855 (IBM PC) Cyrillic

CP 00856 (IBM PC) Hebrew

CP 00857 (IBM PC) Turkey

CP 00860 (IBM PC) Portugal

CP 00861 (IBM PC) Iceland

CP 00862 (IBM PC) Israel

CP 00863 (IBM PC) Canadian French

CP 00864 (IBM PC) Arabic

CP 00865 (IBM PC) Nordic

CP 00866 (IBM PC) Cyrillic #2

CP 00869 (IBM PC) Greece

CP 00870 Latin-2 Multilingual

CP 00874 (IBM PC) Thai Extended

Microsoft OEM

(DOS) Code Pages

Microsoft's OEM code pages

DOS CP 437 (US)

DOS CP 720 (Arabic)

DOS CP 737 (Greek)

DOS CP 775 (Baltic)

DOS CP 850 (Western Europe)

DOS CP 852 (Central Europe)

DOS CP 855 (Cyrillic)

DOS CP 857 (Turkish)

DOS CP 862 (Hebrew)

DOS CP 866 (Cyrillic II)

IBM Asian Code pages

I18n Guy's Hiragana Unicode Chart

CP 00290 (EBCDIC) Japanese (Katakana) Non-extended

CP 00290 (EBCDIC) Japanese (Katakana) Extended

CP 00833 (EBCDIC) Korea Extended

CP 00836 (EBCDIC) Simplified Chinese Extended

CP 00891 (IBM PC) Korea

CP 00895 Japan 7-Bit

CP 00897 (IBM PC) Japan PC #1

CP 00903 (IBM PC) People's Republic of China (PRC)

CP 00904 (IBM PC) Republic of China (ROC)

CP 00905 (EBCDIC) Turkey Extended CP

CP 01027 (EBCDIC) Japanese (Latin) Extended

CP 01040 (IBM PC) Korean Extended

CP 01041 (IBM PC) Japanese Extended

CP 01042 (IBM PC) Simplified Chinese Extended

CP 01043 (IBM PC) Traditional Chinese

CP 01088 (IBM PC) Korean

CP 01114 Traditional Chinese (Big5)

CP 01115 Simplified Chinese (GB)