-
- ååç¢éª
- å ¸åé®é¢
- å°Unicodeå符串转åæUTF-8ISO-8859-1å符串
- ä¹±ç ç产ç
- getBytesçç¼ç æ¹å¼
- æ»ç»
ååç¢éª
å¨å·¥ä½ä¸éå°ä¸å°Javaç¼ç é®é¢ï¼æ¯æ¬¡è§£å³ä¹±ç é®é¢ï¼é½è±äºè¾é¿çæ¶é´ï¼é常影åå·¥ä½æçï¼
å¨å·¥ä½ä¸ä¼ç»å¸¸éå°åç§åæ ·çé®é¢ï¼æ¯æ¬¡ä¸ä¸ªé®é¢æ»æ¯æ£è ¾äºå¾ä¹ ï¼ç¶å就解å³äºï¼ä½å ¶å®æ²¡æå¼æç½é®é¢çæ¬è´¨ãå½ä¸æ¬¡éå°ç±»ä¼¼çé®é¢ï¼è¿æ¯ç»§ç»æè ¾ï¼é常浪费æ¶é´ã
æ³è¦é«æ解å³ç¼ç é®é¢ï¼å½ç¶è¦ç©¶å ¶æ ¹æºã以åéå°ç±»ä¼¼é®é¢ï¼å¯ä»¥é¡ºè¤æ¸çï¼è½»æ¾è§£å³ï¼
å ¸åé®é¢
- Javaéç¨åªç§å符é
- Javaä¸ä¸ä¸ªå符éè¦å 个åèåå¨
- String.getBytes()æ¹æ³åªç§ç¼ç æ¹å¼ï¼IOS-8895-1ãè¿æ¯UTF-8ï¼
- ä¹±ç 产ççåå
å¦æè½å¤åç以ä¸é®é¢ï¼é£ä¹è¿ç¯æç« å¯ä»¥Passäºï¼
å°Unicodeå符串转åæUTF-8ãISO-8859-1å符串
Javaéç¨Unicodeå符éï¼ä¸ä¸ªå符ç¨ä¸¤ä¸ªåèåå¨ã
以ä¸æ¯ç¨åºç段ï¼å°Unicodeå符串转åæUTF-8ãISO-8859-1å符串
String cnName = "I am å°ä½³";
//Unicodeå符é
System.out.println(printHexString(cnName.toCharArray()));
//ISO-8859-1ç¼ç
System.out.println(printHexString(cnName.getBytes("ISO-8859-1")));
//UTF-8ç¼ç
System.out.println(printHexString(cnName.getBytes("UTF-8")));
/**
* åèæ°ç»è½¬æ¢æ16è¿å¶
* @param b
* @return
*/
private String printHexString( byte[] b) {
String a = "";
for (int i = ; i < b.length; i++) {
String hex = Integer.toHexString(b[i] & );
if (hex.length() == ) {
hex = '0' + hex;
}
a = a+hex;
}
return a;
}
/**
* å符æ°ç»è½¬åæ16è¿å¶
* @param b
* @return
*/
private String printHexString(char[] b) {
String a = "";
for (int i = ; i < b.length; i++) {
String hex = Integer.toHexString(b[i] & );
if (hex.length() == ) {
hex = '0' + hex;
}
a = a+hex;
}
return a;
}
å符串âI am å°ä½³âï¼ç±Unicode转æ¢æUTF-8åISO-8859-1ï¼å¾å°ä¸åçåèï¼16è¿å¶è¡¨ç¤ºï¼ï¼ä¸»è¦åºå«å¦ä¸ï¼
- 对äºè±æå符ï¼æ 论éç¨ä½ç§ç¼ç æ¹å¼ï¼å¾å°çç¼ç ç»æç¸åãæ以ï¼è±æå符ä¸åå¨ä¹±ç é®é¢
- 对äºä¸æå符ï¼ä¸åçç¼ç æ¹å¼å¾å°ä¸åçç»æï¼Unicodeç¨2个åèåå¨ä¸æï¼UTF-8ç¨3个åèåå¨ä¸æ
- ISO-8859-1ä¸æ¯æä¸æï¼ä»»ä½ä¸æï¼éç¨è¯¥ç¼ç ï¼é½ä¼è½¬æ¢æ3F(3Fæ å°çå符为â?â)ï¼ æ以å½ä¸æç¨ISO-8859-1ç¼ç æ¶ï¼ä¼çæä¸äºå带é®å·çä¹±ç â????????????????â
ç¼ç æ¹å¼ | I | a | m | å° | ä½³ | ||
---|---|---|---|---|---|---|---|
Unicode | 49 | 20 | 61 | 6D | 20 | 5C0F | 4F73 |
ISO-8859-1 | 49 | 20 | 61 | 6D | 20 | 3F | 3F |
UTF-8 | 49 | 20 | 61 | 6D | 20 | E5B08F | E4BDB3 |
ä¹±ç ç产ç
éç¨ä¸åçç¼ç æ¹å¼è¿è¡ ç¼ç å 解ç ï¼ æ¯çæä¹±ç çæ ¹æºãä¸é¢åæ两ç§å¸¸è§çä¹±ç
String cnName = "I am å°ä½³";
//éç¨ISO-8859-1ç¼ç ãUTF-8解ç
String iso88591Str = new String (cnName.getBytes("ISO-8859-1"), "UTF-8");
//éç¨UTF-8ç¼ç ãISO-8859-1解ç
String utf8Str = new String (cnName.getBytes("UTF-8"), "ISO-8859-1");
System.out.println(iso88591Str);//æå°ç»æï¼ä¹±ç ï¼I am ??
System.out.println(utf8Str);//æå°ç»æï¼ä¹±ç ï¼I am å°Âä½³
æºå符串 | ç¼ç | 解ç | 转æ¢ç»æ |
---|---|---|---|
I am å°ä½³ | UTF-8 | ISO-8859-1 | I am å°Âä½³ |
I am å°ä½³ | ISO-8859-1 | UTF-8 | I am ?? |
产çä¹±ç âI am å°Âä½³âçåå æ¯ï¼ISO-8859-1åªè½å¯¹å个åèç¼ç
产çä¹±ç âI am ??âçåå æ¯ï¼ISO-8859-1ä¼æææçä¸æå符转åæ3Fï¼3Fæ å°çå符为â?â
getBytes()çç¼ç æ¹å¼
String.getBytes()éç¨æä½ç³»ç»é»è®¤çç¼ç æ¹å¼ï¼è¯·çgetBytes()æºç
è·åé»è®¤ç¼ç å¨è¿ä¸ªæ¹æ³ï¼ defaultCharset()ï¼è¯¥æ¹æ³ä»å ³é®
åâfile.encodingâè·åé»è®¤ç¼ç æ¹å¼
é¢å¤è¯ï¼defaultCharsetæ¹æ³å¼å¾ç 究ï¼è¯¥æ¹æ³æ¯çº¿ç¨å®å ¨çï¼å¹¶ä¸ç¨äºç¼åï¼
å¦æä½ ä½¿ç¨getBytes()ï¼åçäºä¸æä¹±ç é®é¢ï¼ åéè¦è®¾ç½®file.encoding为UTF-8, å³å¯è§£å³ä¹±ç é®é¢
public byte[] getBytes() {
return StringCoding.encode(value, , value.length);
}
static byte[] encode(char[] ca, int off, int len) {
//è·åé»è®¤å¾ç¼ç æ¹å¼
String csn = Charset.defaultCharset().name();
try {
// use charset name encode() variant which provides caching.
return encode(csn, ca, off, len);
} catch (UnsupportedEncodingException x) {
warnUnsupportedCharset(csn);
}
try {
return encode("ISO-8859-1", ca, off, len);
} catch (UnsupportedEncodingException x) {
// If this code is hit during VM initialization, MessageUtils is
// the only way we will be able to get any kind of error message.
MessageUtils.err("ISO-8859-1 charset not available: "
+ x.toString());
// If we can not find ISO-8859-1 (a required encoding) then things
// are seriously wrong with the installation.
System.exit();
return null;
}
}
public static Charset defaultCharset() {
if (defaultCharset == null) {
synchronized (Charset.class) {
//ä»æä½ç³»ç»è·åé»è®¤å¾ç¼ç æ¹å¼
String csn = AccessController.doPrivileged
new GetPropertyAction("file.encoding"));
Charset cs = lookup(csn);
if (cs != null)
defaultCharset = cs;
else
defaultCharset = forName("UTF-8");
}
}
return defaultCharset;
}
æ»ç»
æè¿éåªæ¯æç å¼çï¼è®²è¿°äºUnicode转æ¢UTF-8ãISO-8859-1çè¿ç¨ï¼ä»¥åä¹±ç é®é¢äº§ççæ ¹æºã
å¦æ对ç¼ç é®é¢æå ´è¶£çè¯ï¼å¯ä»¥è¿ä¸æ¥æ¢è®¨GBKãGB2313ãUTF-8çç¼ç 转æ¢æ¯å¦ä¼äº§çä¹±ç
å¦å¤ï¼ç¼ç é®é¢å¾èéªæ维深度ï¼æä¸å°é¢è¯å®ä¼ä¸é¨åºç¼ç çé¢ç®ãæ以ï¼ç¼ç é®é¢å¼å¾ç 究ï¼