Line
IT Knowledgebase
< All Topics
Print

How to troubleshoot Linux server memory issues

Some unexpected behavior on the server side may sometimes be caused by system resource limitations. Linux by its design aims to use all of the available physical memory as efficiently as possible, in practice, the Linux kernel follows a basic rule that a page of free RAM is wasted RAM. The system holds a lot more in RAM than just application data, most importantly mirrored data from storage drives for faster access. This debugging guide aims to explain how to identify how much of the resources are actually being used and how to recognize real resource outage issues.

Process stopped unexpectedly

Suddenly killed tasks are often the result of the system running out of memory, which is when the so-called Out-of-memory (OOM) killer steps in. If a task gets killed to save memory, it gets logged into various log files stored at /var/log/

You can search the logs for messages of out-of-memory alerts.

sudo grep -i -r 'out of memory' /var/log/

Grep goes through all logs under the directory and, therefore, will show at least the just ran command itself from the /var/log/auth.log. Actual log marks of OOM killed processes would look something like the following.

kernel: Out of memory: Kill process 9163 (mysqld) score 511 or sacrifice child

The log note here shows the process killed was mysqld with pid 9163 and an OOM score of 511 at the time it was killed. Your log messages may vary depending on Linux distribution and system configuration.

If, for example, a process crucial to your web application was killed as a result of an out-of-memory situation, you have a couple of options: reduce the amount of memory asked by the process, disallow processes to overcommit memory, or simply add more memory to your server configuration.

Current resource usage

Linux comes with a few handy tools for tracking processes that can help identify possible resource outages. The command below, for example, can track memory usage.

free -h

The command prints out current memory statistics, for example in a 1 GB system the output is something along the lines of the example underneath.

                   total   used    free    shared  buffers cached
Mem:               993M    738M    255M    5.7M    64M     439M
-/+ buffers/cache: 234M    759M
Swap:              0B      0B      0B

It is important to distinguish between application-used memory, buffers, and caches here. On the Mem line of the output, it would appear that nearly 75% of our RAM is in use, but over half of the used memory is occupied by cached data.

The difference is that while applications reserve memory for their own use, the cache is simply commonly used hard drive data that the kernel stores temporarily in RAM space for faster access, which on the application level is considered free memory.

Keeping that in mind, it’s easier to understand why used and free memory are listed twice. On the second line, the actual memory usage is conveniently calculated when taking into account the amount of memory occupied by buffers and cache.

In this example, the system uses only 234MB of the total available 993MB, and no process is in danger of being killed to save resources.

Another useful tool for memory monitoring is ‘top’, which displays useful, continuously updated information about processes’ memory and CPU usage, runtime and other statistics. This is particularly useful for identifying exhaustive tasks for resources.

top

You can scroll the list using the Page Up and Page Down buttons on your keyboard. The program runs in the foreground until canceled by pressing ‘q’ to quit. The resource usage is shown in percentages, giving an easy overview of your system’s workload.

top - 17:33:10 up 6 days,  1:22,  2 users,  load average: 0.00, 0.01, 0.05
Tasks:  72 total,   2 running,  70 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.0 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   1017800 total,   722776 used,   295024 free,    66264 buffers
KiB Swap:        0 total,        0 used,        0 free.   484748 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
    1 root      20   0   33448   2784   1448 S  0.0  0.3   0:02.91 init
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S  0.0  0.0   0:00.02 ksoftirqd/0
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0       0      0      0 S  0.0  0.0   0:01.92 kworker/u2:0
    7 root      20   0       0      0      0 S  0.0  0.0   0:05.48 rcu_sched

In the example output shown above, the system is idle, and the memory usage is nominal.

Check if your process is at risk

If your server’s memory is used up to the extent that it can threaten system stability, the Out-of-memory killer will choose which process to eliminate based on many variables, such as the amount of work done that would be lost and the total memory freed. Linux keeps a score for each running process, representing the likelihood of the process being killed in an OOM situation.

This score is stored on file in /proc/<pid>/oom_score, where pid is the identification number for the process you are looking into. The pid can be easily found using the following command.

ps aux | grep <process name>

The command output when searching for ‘mysql’, for example, would be similar to the example below.

mysql     5872  0.0  5.0 623912 51236 ?        Ssl  Jul16   2:42 /usr/sbin/mysqld

Here, the process ID is the first number on the row, 5872 in this case. It can then be used to obtain further information on this particular task.

cat /proc/5872/oom_score

The readout of this gives us a single numerical value for the chance of the OOM killer axing the process. The higher the number, the more likely the task is to be chosen if an out-of-memory situation should arise.

If your important process has a very high OOM score, it may be wasting memory and should be looked into. However, a high OOM score, if the memory usage remains nominal, is no reason for concern. OOM killer can be disabled, but this is not recommended as it might cause unhandled exceptions in out-of-memory situations, possibly leading to a kernel panic or even a system halt.

Disable over commit

In major Linux distributions, the kernel allows processes to request more memory than is currently free in the system by default to improve memory utilization. This is based on the heuristic that processes never truly use all the memory they request. However, if your system is at risk of running out of memory and you wish to prevent losing tasks to OOM killer, it is possible to disallow memory overcommit.

To change how the system handles overcommit calls, Linux has an application called ‘sysctl’ that is used to modify kernel parameters at runtime. You can list all sysctl-controlled parameters using the following.

sudo sysctl -a

The particular parameters that control memory are very imaginatively named vm.overcommit_memory and vm.overcommit_ratio. To change the overcommit mode, use the below command.

sudo sysctl -w vm.overcommit_memory=2

This parameter has 3 different values:

  • 0 means “Estimate if we have enough RAM”
  • 1 equals “Always allow”
  • 2 is used here tells the kernel to “Say no if the system doesn’t have the memory”

The important part of changing the overcommit mode is remembering to also change the overcommit_ratio. When overcommit_memory is set to 2, the committed address space is not permitted to exceed swap space plus this percentage of physical RAM. To be able to use all of the system’s memory, use the next command.

sudo sysctl -w vm.overcommit_ratio=100

These changes are applied immediately but will only persist until the next system reboot. To have the changes remain permanent, the same parameter values need to be added to sysctl.conf –file. Open the configuration file for edit.

sudo nano /etc/sysctl.conf

Add the same lines to the end of the file.

vm.overcommit_memory=2
vm.overcommit_ratio=100

Save the changes (ctrl + O) and exit (ctrl + X) the editor. Your server will read the configurations every time at boot up, and prevent applications from overcommitting memory.

Add more memory to your server

The safest and most future-proof option for solving out-of-memory issues is adding more memory to your system. In a traditional server environment, you would need to order new memory modules, wait for them to arrive, and install them into your system, but with cloud servers, all you have to do is increase the amount of RAM you wish to have available.

Messenger