天天看點

【小筆記】大資料量excel解析工具性能對比

1. Excel存儲格式及解析流程

1.1 存儲格式

Excel本質上是以xml存儲的,這些xml内容符合office open xml規範。把字尾名改為壓縮檔案的字尾名(zip,tar等等)可以看到其基本結構:

【小筆記】大資料量excel解析工具性能對比

其主要資料内容儲存于sharedString.xml以及worksheets裡的xml:

【小筆記】大資料量excel解析工具性能對比

實際上對于Excel的解析就是對于XML的解析,但是各個xml之間存在着關系,解析時更為複雜。

1.2 解析流程

【小筆記】大資料量excel解析工具性能對比

2. 寫入性能對比

2.1 測試代碼:

見文末

2.2 結果

模式 10萬資料 100萬資料
POI(XSSF) 10833ms GC overhead limit exceeded
POI(SXSSF) 1378ms 9274ms
EasyExcel 1339ms 9077ms

結論:

10萬級别POI的SXSSF和EasyExcel的速度基本一緻。

100萬級别POI的XSSF模式直接無法生成,而SXSSF模式和EasyExcel的速度基本一緻。

2.3 分析

POI傳統模式XSSF會全部寫入記憶體,記憶體占用很高,然後一次性刷盤;而SXSSF模式基于滑動視窗,部分刷盤,是以避免了大量GC時間及記憶體占用。

SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.

https://poi.apache.org/components/spreadsheet/how-to.html#sxssf

是以,通過這個問題也能看出,把資料完全放在記憶體操作在某些場景下并不是最好的,性能反而不如多次IO操作,因為其中可能會産生FULL CG而占用大量時間。

3. 讀取性能對比

3.1 測試代碼

3.2 結果

資料量 POI耗時 EasyExcel耗時
10萬 4223ms 2813ms

10萬資料量,EasyExcel比POI快了1.4秒左右,資料量更大時差異更明顯。

3.3 分析

4. 多線程解析表格

線程數 耗時
1 1807ms
2 1425ms
4 1311ms
8 2335ms
36356ms
18064ms
9082ms
4532ms
16 2662ms
32 1287ms
64 688ms
128 380ms
256 335ms
512 1253ms

5. 測試代碼

import cn.hutool.core.date.StopWatch;
import cn.hutool.core.util.RandomUtil;
import com.alibaba.excel.EasyExcel;
import com.alibaba.excel.ExcelReader;
import com.alibaba.excel.ExcelWriter;
import com.alibaba.excel.context.AnalysisContext;
import com.alibaba.excel.event.AnalysisEventListener;
import com.alibaba.excel.read.metadata.ReadSheet;
import com.alibaba.excel.write.metadata.WriteSheet;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.junit.Test;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;

public class ExcelUtilTest {
    private static final String[] USERNAME = {"趙", "錢", "孫", "李", "周", "吳", "鄭", "王", "馮", "陳", "褚", "衛", "蔣", "沈", "韓", "楊", "朱", "秦", "尤", "許",
            "何", "呂", "施", "張", "孔", "曹", "嚴", "華", "金", "魏", "陶", "姜", "戚", "謝", "鄒", "喻", "柏", "水", "窦", "章", "雲", "蘇", "潘", "葛", "奚", "範", "彭", "郎",
            "魯", "韋", "昌", "馬", "苗", "鳳", "花", "方", "俞", "任", "袁", "柳", "酆", "鮑", "史", "唐", "費", "廉", "岑", "薛", "雷", "賀", "倪", "湯", "滕", "殷",
            "羅", "畢", "郝", "邬", "安", "常", "樂", "于", "時", "傅", "皮", "卞", "齊", "康", "伍", "餘", "元", "蔔", "顧", "孟", "平", "黃", "和",
            "穆", "蕭", "尹", "姚", "邵", "湛", "汪", "祁", "毛", "禹", "狄", "米", "貝", "明", "臧", "計", "伏", "成", "戴", "談", "宋", "茅", "龐", "熊", "紀", "舒",
            "屈", "項", "祝", "董", "梁", "杜", "阮", "藍", "闵", "席", "季"};
    private static final String GIRL = "秀娟英華慧巧美娜靜淑惠珠翠雅芝玉萍紅娥玲芬芳燕彩春菊蘭鳳潔梅琳素雲蓮真環雪榮愛妹霞香月莺媛豔瑞凡佳嘉瓊勤珍貞莉桂娣葉璧璐娅琦晶妍茜秋珊莎錦黛青倩婷姣婉娴瑾穎露瑤怡婵雁蓓纨儀荷丹蓉眉君琴蕊薇菁夢岚苑婕馨瑗琰韻融園藝詠卿聰瀾純毓悅昭冰爽琬茗羽希甯欣飄育滢馥筠柔竹霭凝曉歡霄楓芸菲寒伊亞宜可姬舒影荔枝思麗 ";
    private static final String BOY = "偉剛勇毅俊峰強軍平保東文輝力明永健世廣志義興良海山仁波甯貴福生龍元全國勝學祥才發武新利清飛彬富順信子傑濤昌成康星光天達安岩中茂進林有堅和彪博誠先敬震振壯會思群豪心邦承樂紹功松善厚慶磊民友裕河哲江超浩亮政謙亨奇固之輪翰朗伯宏言若鳴朋斌梁棟維啟克倫翔旭鵬澤晨辰士以建家緻樹炎德行時泰盛雄琛鈞冠策騰楠榕風航弘";

    /**
     * 寫測試
     */
    @Test
    public void test() throws IOException {
        int number = 100000;

        StopWatch sw = new StopWatch();

        sw.start();
        poiTest(number, "XSSF");
        sw.stop();
        System.out.println("POI(XSSF)寫入" + number + "條資料耗時" + sw.getLastTaskTimeMillis() + "ms");

        sw.start();
        poiTest(number, "SXSSF");
        sw.stop();
        System.out.println("POI(SXSSF)寫入" + number + "條資料耗時" + sw.getLastTaskTimeMillis() + "ms");

        sw.start();
        easyExcelTest(number);
        sw.stop();
        System.out.println("EasyExcel寫入" + number + "條資料耗時" + sw.getLastTaskTimeMillis() + "ms");

    }

    private void poiTest(int number, String type) throws IOException {
        String path = "D:\\tmp\\test.xlsx";
        try (Workbook wb = "SXSSF".equals(type) ? new SXSSFWorkbook() : new XSSFWorkbook()) {
            Sheet sheet = wb.createSheet();
            Row row = sheet.createRow(0);
            row.createCell(0).setCellValue("ID");
            row.createCell(1).setCellValue("姓名");
            row.createCell(2).setCellValue("年齡");
            row.createCell(3).setCellValue("性别");
            row.createCell(4).setCellValue("是否會員");
            for (int i = 1; i < number; i++) {
                row = sheet.createRow(i);
                List<String> randomData = this.getRandomData();
                row.createCell(0).setCellValue(i);
                for (int col = 0; col < randomData.size(); col++) {
                    row.createCell(col + 1).setCellValue(randomData.get(col));
                }
            }
            wb.write(new FileOutputStream(path));
        }
    }

    private void easyExcelTest(int number) throws IOException {
        List<List> dataList = new LinkedList<>();
        List<String> header = new LinkedList<>();
        header.add("ID");
        header.add("姓名");
        header.add("年齡");
        header.add("性别");
        header.add("是否會員");
        dataList.add(header);
        for (int i = 1; i < number; i++) {
            List<String> randomData = this.getRandomData();
            List<String> data = new LinkedList<>();
            data.add(String.valueOf(i));
            data.addAll(randomData);
            dataList.add(data);
        }
        String path = "D:\\tmp\\test.xlsx";
        File file = new File(path);
        ExcelWriter excelWriter = EasyExcel.write(file).build();
        WriteSheet writeSheet = EasyExcel.writerSheet(0).build();
        excelWriter.write(dataList, writeSheet);
        excelWriter.finish();
    }

    private List<String> getRandomData() {
        int usernameRandom = RandomUtil.randomInt(0, USERNAME.length - 1);
        String name = USERNAME[usernameRandom];
        if (usernameRandom % 2 == 0) {
            name += GIRL.substring(usernameRandom % GIRL.length(), (usernameRandom + 2) % GIRL.length());
        } else {
            name += BOY.substring(usernameRandom % BOY.length(), (usernameRandom + 1) % BOY.length());
        }
        String age = String.valueOf(RandomUtil.randomInt(10, 50));
        String sex = usernameRandom % 2 == 0 ? "女" : "男";
        String isVip = usernameRandom % 2 == 0 ? "是" : "否";

        return Arrays.asList(name, age, sex, isVip);
    }


    /**
     * 讀測試
     */
    @Test
    public void readTest() throws IOException {
        StopWatch sw = new StopWatch();
        sw.start();
        poiReadTest();
        sw.stop();
        System.out.println("POI讀取資料耗時" + sw.getLastTaskTimeMillis() + "ms");

        sw.start();
        easyExcelReadTest();
        sw.stop();
        System.out.println("EasyExcel讀取資料耗時" + sw.getLastTaskTimeMillis() + "ms");
    }

    private void poiReadTest() throws IOException {
        String path = "D:\\tmp\\test.xlsx";
        File file = new File(path);
        List<List<String>> result = new LinkedList<>();
        Workbook wb = WorkbookFactory.create(file);
        Sheet sheet = wb.getSheetAt(0);
        Iterator<Row> rowIterator = sheet.rowIterator();
        Row row;
        while (rowIterator.hasNext()) {
            row = rowIterator.next();

            Iterator<Cell> cellIterator = row.cellIterator();
            List<String> data = new LinkedList<>();
            while (cellIterator.hasNext()) {
                data.add(cellIterator.next().getStringCellValue());
            }
            result.add(data);
        }
        wb.close();
        System.out.println("擷取到" + result.size() + "條資料");
    }

    private void easyExcelReadTest() throws IOException {
        String path = "D:\\tmp\\test.xlsx";
        File file = new File(path);
        ExcelReader excelReader = EasyExcel.read(file, new ExcelListener()).build();
        ReadSheet readSheet = new ReadSheet(0);
        excelReader.read(readSheet);
        excelReader.finish();
    }

    private class ExcelListener extends AnalysisEventListener<LinkedHashMap> {

        private List<LinkedHashMap> result = new ArrayList<>();

        @Override
        public void invoke(LinkedHashMap linkedHashMap, AnalysisContext analysisContext) {
            result.add(linkedHashMap);
        }

        @Override
        public void doAfterAllAnalysed(AnalysisContext analysisContext) {
            System.out.println("擷取到" + result.size() + "條資料");
        }
    }
}

           

網絡上志同道合,我們一起學習網絡安全,一起進步。