nav-img
Advanced

Why Is the Flink Job Abnormal Due to Heartbeat Timeout Between JobManager and TaskManager?

Symptom

JobManager and TaskManager heartbeats timed out. As a result, the Flink job is abnormal.

Figure 1 Error information


Possible Causes

  1. Check whether the network is intermittently disconnected and whether the cluster load is high.
  2. If Full GC occurs frequently, check the code to determine whether memory leakage occurs.

    Figure 2 Full GC


Handling Procedure

  • If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
  • Allocate more resources for a single TaskManager.
  • Contact technical support to modify the cluster heartbeat configuration.