天天看點

避免UTF-8的csv檔案打開中文出現亂碼

最近又遇到了需要提供csv下載下傳功能的需求,不同的時需要用java來實作,心想簡單,就把以前php的版本重寫了一遍,然後生成一份csv,用excel2007打開一看,裡面的中文都是亂碼,一下就懵了,以前好好的功能怎麼突然不行了??以前也一直用2007的啊!于是開始了漫長的google之旅。

看來看去,說的都是輸出utf-8格式的csv需要在檔案頭先輸出BOM(BOM不懂的可以google了),即0xEF 0xBB 0xBF三個位元組,這樣更摸不着頭腦了,明明是對的,偏偏不成功,直到發現一個文章:http://stackoverflow.com/a/9337150/1794493 ,裡面提到2007需要裝sp3才能識别BOM,shit!原來是這回事!裡面同時又提到,用utf-16le編碼輸出貌似更通用,經測試确實如此,但是utf-16le的BOM是0xFF 0xFE,文章裡面說錯了!下面是一個簡單的測試結果:

excel版本 附加包 編碼  測試結果
2007 sp3 utf-8 yes
2007 utf-8 no
2007 sp3 utf-16le yes
2007 utf-16le yes
2011 utf-8 no
2011 utf-16le yes

因為條件有限,隻測試了這幾個版本,可見utf-16le是更通用的編碼格式。下面附上java代碼,main方法中采用utf-16le編碼,最後調用了utf8編碼的方法,最後會輸出兩種編碼格式的csv檔案:

import java.io.*;

/**
 * Created by zhaozhi on 15-5-29.
 */
public class TestCSV {

    public static String join(String[] strArr, String delim) {
        StringBuilder sb = new StringBuilder();
        for(String s : strArr) {
            sb.append(s);
            sb.append(delim);
        }
        String ret;
        if (strArr.length > 1) {
            ret = sb.substring(0, sb.length()-1);
        }
        else {
            ret = sb.toString();
        }
        return ret;
    }
    public static  void main (String[] args) throws  Exception {
        String[] heads = {"日期", "産品", "訂單數"};
        String[][] rows = {
                {"20150228", "安卓", "23"},
                {"20150301", "web", "34"}
        };
        byte[] bom = {(byte)0xFF, (byte)0xFE};
        String fname = "d:\\utf-16le.csv";
        BufferedOutputStream bo = new BufferedOutputStream(new FileOutputStream(fname));
        bo.write(bom);
        bo.write(join(heads, "\t").getBytes("utf-16le"));
        bo.write("\n".getBytes("utf-16le"));
        for (String[] row : rows) {
            bo.write(join(row, "\t").getBytes("utf-16le"));
            bo.write("\n".getBytes("utf-16le"));
        }
        bo.close();

        UTF8();
    }

    public static void UTF8() throws IOException {
        String line = "中文,标題,23";
        OutputStream os = new FileOutputStream("d:/utf-8.csv");
        os.write(239);   // 0xEF
        os.write(187);   // 0xBB
        os.write(191);   // 0xBF

        PrintWriter w = new PrintWriter(new OutputStreamWriter(os, "UTF-8"));

        w.print(line);
        w.flush();
        w.close();
    }
}