使用hadoop restful api实现对集群信息的统计

2021-11-08 08:09:06

（适用于hadoop 2.7及以上版本）

resourcemanager rest api’s：

<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html</a>

webhdfs rest api：

<a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html">https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/webhdfs.html</a>

mapreduce history server rest api’s：

<a href="https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html">https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/historyserverrest.html</a>

spark monitoring and instrumentation

<a href="http://spark.apache.org/docs/latest/monitoring.html">http://spark.apache.org/docs/latest/monitoring.html</a>

url

<a href="http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=getcontentsummary</a>

返回结果：

关于返回结果的说明：

注意length与spaceconsumed的关系，跟hdfs副本数有关。

如果要统计各个组工作目录的使用情况，使用如下请求：

<a href="http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=getcontentsummary">http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=getcontentsummary</a>

<a href="http://emr-header-1:8088/ws/v1/cluster">http://emr-header-1:8088/ws/v1/cluster</a>

返回结果

<a href="http://emr-header-1:8088/ws/v1/cluster/scheduler">http://emr-header-1:8088/ws/v1/cluster/scheduler</a>

具体参数说明参考：

<a href="https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api">https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/resourcemanagerrest.html#cluster_application_queue_api</a>

<a href="http://emr-header-1:8088/ws/v1/cluster/apps">http://emr-header-1:8088/ws/v1/cluster/apps</a>

如果要统计固定时间段的，可以加上"?finishedtimebegin={时间戳}&finishedtimeend={时间戳}"参数，例如

<a href="http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&finishedtimeend=1496742134000">http://emr-header-1:8088/ws/v1/cluster/apps?finishedtimebegin=1496742124000&finishedtimeend=1496742134000</a>

job扫描的数据量，需要通过history server的restful api查询，mapreduce的和spark的又有一些差异。

<a href="http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters">http://emr-header-1:19888/ws/v1/history/mapreduce/jobs/job_1495123166259_0962/counters</a>

其中org.apache.hadoop.mapreduce.lib.input.fileinputformatcounter里面的bytes_read为job扫描的数据量

<a href="http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors">http://emr-header-1:18080/api/v1/applications/application_1495123166259_1050/executors</a>

每个executor的totalinputbytes总和为整个job的数据扫描量。

使用hadoop restful api实现对集群信息的统计

继续阅读

Apache与PHP环境下配置本地虚拟主机

MapReduce的几个企业级经典面试案例MapReduce的几个企业级经典面试案例

Linux 7 中配置Apache服务，及禁止ip访问，删除apache广告页面。

Apache配置文件中的deny和allow的使用

Apache 配置默认编码

服务器配置——Apache

Apache静态文件访问配置（书封服务器）

apache httpd 配置

Ubuntu16.04安装Apache+MySQL+PHP1. 安装Apache2. 安装MySQL3. 安装PHP4. 安装phpMyAdmin

ubuntu14.04下安装hbse1.0.1.1

Apache配置SSLApache配置SSL

Windows下配置Apache的SSL服务

User Defined Hadoop DataType

Apache2.4.x 配置文件详解Apache配置需要了解如下：开始讲解：

配置apache支持PHP（win7）

Ambari介绍和架构原理