天天看點

[python][spark]wholeTextFiles 讀入多個檔案的例子

$pwd 

/home/training/mydir

$cat file1.json

{

"firstName":"Fred",

"lastName":"Flintstone",

"userid":"123"

}

$cat file2.json

"firstName":"Barney",

"lastName":"Rubble",

[training@localhost ~]$ hdfs dfs -put /home/training/mydir

[training@localhost ~]$ 

[training@localhost ~]$ hdfs dfs -ls

Found 4 items

drwxrwxrwx - training supergroup 0 2017-09-23 19:26 .sparkStaging

-rw-rw-rw- 1 training supergroup 48 2017-09-25 05:31 cats.txt

drwxrwxrwx - training supergroup 0 2017-09-25 15:39 mydir ***

-rw-rw-rw- 1 training supergroup 34 2017-09-23 06:16 test.txt

[training@localhost ~]$

myrdd1 = sc.wholeTextFiles("mydir")

myrdd1.count()

Out[32]: 2

In [35]: myrdd1.take(2)

Out[35]: 

[(u'hdfs://localhost:8020/user/training/mydir/file1.json',

u'{\n "firstName":"Fred",\n "lastName":"Flintstone",\n "userid":"123"\n}\n'),

(u'hdfs://localhost:8020/user/training/mydir/file2.json',

u'{\n "firstName":"Barney",\n "lastName":"Rubble",\n "userid":"456"\n}\n')]

本文轉自健哥的資料花園部落格園部落格,原文連結:http://www.cnblogs.com/gaojian/p/7594782.html,如需轉載請自行聯系原作者

繼續閱讀