Why Is the Flink Job Abnormal Due to Heartbeat Timeout Between JobManager and TaskManager?
Symptom
JobManager and TaskManager heartbeats timed out. As a result, the Flink job is abnormal.
Figure 1 Error information

Possible Causes
- Check whether the network is intermittently disconnected and whether the cluster load is high.
- If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
Figure 2 Full GC
Handling Procedure
- If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
- Allocate more resources for a single TaskManager.
- Contact technical support to modify the cluster heartbeat configuration.
Parent topic: Flink Job Performance Tuning
- Symptom
- Possible Causes
- Handling Procedure