天天看點

運作時_程式排程_1 | 學習筆記

開發者學堂課程【大資料實時計算架構Spark快速入門:運作時_程式排程_1】學習筆記,與課程緊密聯系,讓使用者快速學習知識。

課程位址:

https://developer.aliyun.com/learning/course/100/detail/1650

運作時_程式排程_1

Internally,each RDD is characterized by five main properties:

-A list of partitions

- Afunction for computing each split

- A list of dependencies on other RDDs

- optionally,a Partitioner for key-value RDDs (e.g. to say that the RDD is hash.

- optionally,a list of preferred locations to compute each split on (e.g. block

an HDES file)

運作時_程式排程_1 | 學習筆記

- optionally,a Partitioner for key-value RDDs (e.g. to say that the RDD is hash.

- optionally,a list of preferred locations to compute each split on (e.g. block

an HDES file)

Spark運作時

運作時_程式排程_1 | 學習筆記

流程示意

分布式檔案系統(File system ) --加載資料集

transformations 延遲執行--針對 RDD 的操作

Action 觸發執行

代碼示例

lines = se.textFile("hdfs://...”)

加載進來成為RDD

errors = lines.filter(_.startsWith(“ERROR”))

Transformation轉換

errors.persist()

緩存RDD

Mysql_errors=errors.filter(_.contain( "MySQL”)).count

Action執行

http_errors = errors.filter(_.contain( "Http")).count. Action執行

繼續閱讀