Failed to View Spark Task Logs
Symptom
- A user fails to view logs when a task is running.
- A user fails to view logs when a task is complete.
Cause Analysis
- Symptom 1: The MapReduce component is abnormal.
- Symptom 2:
- The JobHistory service of Spark is abnormal.
- The log size is too large, and NodeManager times out during log aggregation.
- The permission on the HDFS log storage directory (/tmp/logs/Username/logs by default) is abnormal.
- Logs have been deleted. By default, Spark JobHistory stores event logs for seven days (specified by spark.history.fs.cleaner.maxAge). MapReduce stores task logs for 15 days (specified by mapreduce.jobhistory.max-age-ms).
- If the task cannot be found on the Yarn page, it may have been cleared by Yarn. By default, Yarn stores 10,000 historical tasks (specified by yarn.resourcemanager.max-completed-applications).
Procedure
- Symptom 1: Check whether the MapReduce component is running properly. If it is abnormal, restart it. If the fault persists, check the JobhistoryServer log file in the background.
- Symptom 2: Perform the following checks in sequence:
- Check whether JobHistory of Spark is running properly.
- On the app details page of Yarn, check whether the log file is too large. If log aggregation fails, the value of Log Aggregation Status should be Failed or Timeout.
- Check whether the permission on the corresponding directory is normal.
- Check whether the corresponding appid file exists in the directory. In MRS 3.x or later, the event log files are stored in the hdfs://hacluster/spark2xJobHistory2x directory. In versions earlier than MRS 3.x, the event log files are stored in the hdfs://hacluster/sparkJobHistory directory. The task run logs are stored in the hdfs://hacluster/tmp/logs/Username/logs directory.
- Check whether appid or the current job ID exceeds the maximum value in the historical records.
Parent topic: Using Spark
- Symptom
- Cause Analysis
- Procedure