Advanced
Тема интерфейса

High CPU Usage Caused by Zero-Loaded RegionServer

Symptom

The CPU usage of RegionServer is high, but there is no service running on RegionServer.

Cause Analysis

  1. Run the top command to obtain the CPU usage of RegionServer processes and check the IDs of processes with high CPU usage.
  2. Obtain the CPU usage of threads under these processes based on the RegionServer process IDs.

    Run the top -H -p <PID> (replace it with the actual RegionServer process ID). As shown in the following figure, the CPU usage of some threads reaches 80%.

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    75706 omm 20 0 6879444 1.0g 25612 S 90.4 1.6 0:00.00 java
    75716 omm 20 0 6879444 1.0g 25612 S 90.4 1.6 0:04.74 java
    75720 omm 20 0 6879444 1.0g 25612 S 88.6 1.6 0:01.93 java
    75721 omm 20 0 6879444 1.0g 25612 S 86.8 1.6 0:01.99 java
    75722 omm 20 0 6879444 1.0g 25612 S 86.8 1.6 0:01.94 java
    75723 omm 20 0 6879444 1.0g 25612 S 86.8 1.6 0:01.96 java
    75724 omm 20 0 6879444 1.0g 25612 S 86.8 1.6 0:01.97 java
    75725 omm 20 0 6879444 1.0g 25612 S 81.5 1.6 0:02.06 java
    75726 omm 20 0 6879444 1.0g 25612 S 79.7 1.6 0:02.01 java
    75727 omm 20 0 6879444 1.0g 25612 S 79.7 1.6 0:01.95 java
    75728 omm 20 0 6879444 1.0g 25612 S 78.0 1.6 0:01.99 java
  3. Obtain the thread stack information based on the ID of the RegionServer process.

    jstack 12345 >allstack.txt (Replace it with the actual RegionServer process ID.)

  4. Convert the thread ID into the hexadecimal format:

    printf "%x\n" 30648

    In the command output, the TID is 77b8.

  5. Search the thread stack based on the hexadecimal TID. It is found that the compaction operation is performed.

  6. Perform the same operations on other threads. It is found that the threads are compaction threads.

Solution

This is a normal phenomenon.

The threads that consume a large number of CPU resources are compaction threads. Some threads invoke the Snappy compression algorithm, and some threads invoke HDFS data writing and reading. Each region has massive sets of data and numerous data files and uses the Snappy compression algorithm. For this reason, the compaction operations consume a large number of CPU resources.

Fault Locating Methods

  1. Run the top command to check the process with high CPU usage.
  2. Check the threads with high CPU usage in the process.

    Run the top -H -p <PID> command to print CPU usage of threads under the process.

    Obtain the thread with the highest CPU usage from the query result. You can also obtain the thread by running the following command:

    Or run the ps -mp <PID> -o THREAD,tid,time | sort -rn command.

    View the command output to obtain the ID of the thread with the highest CPU usage.

  3. Obtain the stack of the faulty thread.

    The jstack tool is the most effective and reliable tool for locating Java problems.

    You can obtain the jstack tool from the java/bin directory.

    jstack <PID> > allstack.txt

    Obtain the process stack and output it to a local file.

  4. Convert the thread ID into the hexadecimal format:

    printf "%x\n" <PID>

    The process ID in the command output is the TID.

  5. Run the following command to obtain the TID and output it to a local file:

    jstack <PID> | grep <TID> > Onestack.txt

    If you want to view the TID in the CLI only, run the following command:

    jstack <PID> | grep <TID> -A 30

    -A 30 indicates that 30 lines are displayed.