(适用于hadoop 2.7及以上版本)
resourcemanager rest api’s:
<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html</a>
webhdfs rest api:
<a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html">https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html</a>
mapreduce history server rest api’s:
<a href="https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html">https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html</a>
spark monitoring and instrumentation
<a href="http://spark.apache.org/docs/latest/monitoring.html">http://spark.apache.org/docs/latest/monitoring.html</a>
url
<a href="http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=getcontentsummary</a>
返回结果:
关于返回结果的说明:
注意length与spaceconsumed的关系,跟hdfs副本数有关。
如果要统计各个组工作目录的使用情况,使用如下请求:
<a href="http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=getcontentsummary</a>
<a href="http://emr-header-1:8088/ws/v1/cluster">http://emr-header-1:8088/ws/v1/cluster</a>
返回结果
<a href="http://emr-header-1:8088/ws/v1/cluster/scheduler">http://emr-header-1:8088/ws/v1/cluster/scheduler</a>
具体参数说明参考:
<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api</a>
<a href="http://emr-header-1:8088/ws/v1/cluster/apps">http://emr-header-1:8088/ws/v1/cluster/apps</a>
如果要统计固定时间段的,可以加上"?finishedtimebegin={时间戳}&finishedtimeend={时间戳}"参数,例如
<a href="http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&finishedtimeend=1496742134000">http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&finishedtimeend=1496742134000</a>
job扫描的数据量,需要通过history server的restful api查询,mapreduce的和spark的又有一些差异。
<a href="http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters">http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters</a>
其中org.apache.hadoop.mapreduce.lib.input.fileinputformatcounter里面的bytes_read为job扫描的数据量
<a href="http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors">http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors</a>
每个executor的totalinputbytes总和为整个job的数据扫描量。