天天看點

maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit

Ambari修改完tez配置檔案後發現tez的Shuffle一直失敗

TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1573557434627_0062_1_01_000001_3:java.lang.RuntimeException: Invlaid configuration: maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit: 807013504, mergeThreshold: 665786176
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.<init>(MergeManager.java:294)
	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle.<init>(Shuffle.java:156)
	at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.createShuffle(OrderedGroupedKVInput.java:151)
	at org.apache.tez.runtime.library.input.OrderedGroupedKVInput.start(OrderedGroupedKVInput.java:131)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable._callInternal(LogicalIOProcessorRuntimeTask.java:512)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:501)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$StartInputCallable.callInternal(LogicalIOProcessorRuntimeTask.java:487)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:1, Vertex vertex_1573557434627_0062_1_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]
Vertex killed, vertexName=Map 1, vertexId=vertex_1573557434627_0062_1_00, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:0, Vertex vertex_1573557434627_0062_1_00 [Map 1] killed/failed due to:OTHER_VERTEX_FAILURE]
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1573557434627_0062_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1573557434627_0062_1_02 [Reducer 3] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:2, counters=Counters: 55, org.apache.tez.common.counters.DAGCounter, NUM_FAILED_TASKS=7, NUM_KILLED_TASKS=1, NUM_SUCCEEDED_TASKS=15, TOTAL_LAUNCHED_TASKS=22, DATA_LOCAL_TASKS=12, RACK_LOCAL_TASKS=3, AM_CPU_MILLISECONDS=13050, AM_GC_TIME_MILLIS=56, File System Counters, FILE_BYTES_READ=16080, FILE_BYTES_WRITTEN=63685, HDFS_BYTES_READ=2267964, HDFS_READ_OPS=30, HDFS_OP_OPEN=30, org.apache.tez.common.counters.TaskCounter, SPILLED_RECORDS=4564, GC_TIME_MILLIS=1647, TASK_DURATION_MILLIS=28468, CPU_MILLISECONDS=51110, PHYSICAL_MEMORY_BYTES=19831717888, VIRTUAL_MEMORY_BYTES=64829505536, COMMITTED_HEAP_BYTES=19831717888, INPUT_RECORDS_PROCESSED=4206, INPUT_SPLIT_LENGTH_BYTES=562522068, OUTPUT_RECORDS=4564, OUTPUT_LARGE_RECORDS=0, OUTPUT_BYTES=50277, OUTPUT_BYTES_WITH_OVERHEAD=61385, OUTPUT_BYTES_PHYSICAL=55645, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILL_COUNT=0, SHUFFLE_CHUNK_COUNT=15, HIVE, DESERIALIZE_ERRORS=0, RECORDS_IN_Map_1=4202617, RECORDS_OUT_INTERMEDIATE_Map_1=6993, RECORDS_OUT_OPERATOR_GBY_10=4564, RECORDS_OUT_OPERATOR_MAP_0=0, RECORDS_OUT_OPERATOR_RS_11=6993, RECORDS_OUT_OPERATOR_SEL_9=4202617, RECORDS_OUT_OPERATOR_TS_0=4202617, TaskCounter_Map_1_INPUT_ip_rubik_u_result, INPUT_RECORDS_PROCESSED=4206, INPUT_SPLIT_LENGTH_BYTES=562522068, TaskCounter_Map_1_OUTPUT_Reducer_2, ADDITIONAL_SPILLS_BYTES_READ=0, ADDITIONAL_SPILLS_BYTES_WRITTEN=0, ADDITIONAL_SPILL_COUNT=0, OUTPUT_BYTES=50277, OUTPUT_BYTES_PHYSICAL=55645, OUTPUT_BYTES_WITH_OVERHEAD=61385, OUTPUT_LARGE_RECORDS=0, OUTPUT_RECORDS=4564, SHUFFLE_CHUNK_COUNT=15, SPILLED_RECORDS=4564, org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters, GROUPED_INPUT_SPLITS_Map_1=15, INPUT_DIRECTORIES_Map_1=1, INPUT_FILES_Map_1=15, RAW_INPUT_SPLITS_Map_1=15
2019-11-13 09:24:59,703 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: Ignoring multiple aborts for vertex: vertex_1573557434627_0062_1_01 [Reducer 2]
2019-11-13 09:24:59,703 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: Ignoring multiple aborts for vertex: vertex_1573557434627_0062_1_02 [Reducer 3]
2019-11-13 09:24:59,704 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|: Ignoring multiple aborts for vertex: vertex_1573557434627_0062_1_00 [Map 1]
2019-11-13 09:24:59,705 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: DAG: dag_1573557434627_0062_1 finished with state: FAILED
2019-11-13 09:24:59,705 [INFO] [Dispatcher thread {Central}] |impl.DAGImpl|: dag_1573557434627_0062_1 transitioned from TERMINATING to FAILED due to event DAG_VERTEX_COMPLETED
2019-11-13 09:24:59,706 [INFO] [Dispatcher thread {Central}] |container.AMContainerImpl|: Container container_e70_1573557434627_0062_01_000012 exited with diagnostics set to Container failed, exitCode=-105. [2019-11-13 09:24:59.533]Container killed by the ApplicationMaster.
[2019-11-13 09:24:59.554]Container killed on request. Exit code is 143
[2019-11-13 09:24:59.564]Container exited with a non-zero exit code 143. 

           

修改設定

maxSingleShuffleLimit should be less than mergeThresholdmaxSingleShuffleLimit

隻要tez.runtime.shuffle.fetch.buffer.percent值比tez.runtime.shuffle.memory.limit.percent小就可以,報錯原因為Shuffle的單個記憶體大于Shuffle節點記憶體

附上tez的TezRuntimeConfiguration配置的中文文檔

https://s0tez0apache0org.icopy.site/releases/0.8.4/tez-runtime-library-javadocs/configs/TezRuntimeConfiguration.html

Tez