天天看点

java解析parquet,java - Java将Parquet文件读取到JSON输出 - SO中文参考 - www.soinside.com

读取镶木地板文件正在工作,但获得缩进格式而不是所需的JSON输出格式。有任何想法吗?我在想我可能需要更改GroupRecordConverter,但却无法找到太多文档。如果可以指出我,也会有所帮助。非常感谢你的帮助。

long num = numLines;

try {

ParquetMetadata readFooter = ParquetFileReader.readFooter(conf, path, ParquetMetadataConverter.NO_FILTER);

MessageType schema = readFooter.getFileMetaData().getSchema();

ParquetFileReader r = new ParquetFileReader(conf,path,readFooter);

PageReadStore pages = null;

try{

while(null != (pages = r.readNextRowGroup())) {

final long rows = pages.getRowCount();

System.out.println("Number of rows: " + rows);

final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);

final RecordReader recordReader = columnIO.getRecordReader(pages, new GroupRecordConverter(schema));

String sTemp = "";

for(int i=0; i0; i++) {

System.out.println(recordReader.read().toString())

}

}

}

}

当前缩进输出:

data1: value1

data2: value2

models

map

key: data3

value

array: value3

map

key: data4

value

array: value4

data5: value5

...

期望的JSON输出:

"data1": "value1",

"data2": "value2",

"models": {

"data3": [

"value3"

],

"data4": [

"value4"

]

},

"data5": "value5"

...