Troubleshooting Linux Time Synchronization with NTP
The network time protocol helps computer systems to synchronize their time. We know this protocol by its shorter name NTP. In the past, it was not really a big issue if your system was a few minutes off. This changed with the interconnected world we are now living in. One of the better examples is networks relying on the authentication protocol Kerberos. If your system time is not correct, you may not be able to authenticate. This is because granted tickets have a built-in protection against timing attacks. While you may not be an attacker, the system will refuse to work when it finds requests being from the past or future.
When your local clock is not correct, serious damage could happen. Database data and log files could be incorrect, resulting in data loss at worst. For forensics, it might become very hard to reconstruct the steps occurred in a security incident. So having your Linux systems happily synchronized is a must. Let’s have a look how things work and how we can troubleshoot when things don’t work.
History of Time
We relied in the past on the system itself, to maintain a time. This was done by using a hardware component, which is named the real-time clock (RTC). But no device or component is 100% reliable, so your system time could slowly become “outdated”. If it went a little bit too quickly, you would be living in the future, according to your computer. For other systems, they would be living in the past. Systems are nowadays connected to other networks. This makes it possible to synchronize our times to very precise clocks. We call those atomic clocks. Instead of using digital components, they use the radiation of atomic particles. Then we can share the time with radio waves, so other systems can get synchronized.
Linux and Time
Most Linux systems use the following options to synchronize time
- No synchronization
- NTP daemon
- NTP client
- Other clients
The first option “none” is obvious: there is no software installed on the system to maintain the time. While this may sound as a guarantee of getting out of sync, it isn’t always the case. Virtualized systems for example, may use the host system to get the right time. When starting such a system, they get the right time of the host, and be able to maintain it correctly during uptime. There is a risk of “skewing” (getting out of sync) if the client system is not able to count the cycles correctly, e.g. when the CPU speed is adjusted. Another risk is when the host system does not always give each client the same amount of time per CPU cycle, resulting in small variants in counting.
Next option is a NTP daemon. For Linux is typically a running process, or daemon, with the name ntpd. This process is waiting to receive time from several trusted sources. When it knows with a certain guarantee what the time is, it will instruct the kernel to use this new time, and synchronize it usually also with the hardware clock. This way hardware clock, Linux kernel and NTP daemon have the same understanding of the time. When the NTP daemon sees some skewing again, it will adjust the time again.
The process of time adjusting usually happens in small steps. This way other software on the systems doesn’t suddenly get confused. For example: it is now 4:43:52 PM and we would log something to a file. Then our NTP daemon decides to change the time 10 minutes back in time. Three minutes later we log another line to our file, which will be suddenly 4:36:52 PM. Not only does this get confusing in log files, it may corrupt data in databases and processes relying on network synchronization.
- openntpd (OpenBSD project)
A much simpler option is using a NTP client. It does a similar thing as a NTP daemon, except that it does not track the time from many sources. Instead, it requests the time of a trusted source, and acts upon that information. A tool like ntpdate or rdate are used this way, and scheduled by a cron job to regularly check the time and synchronize.
The last category is the other clients. When using virtualized systems this option might be used. A toolkit like the VMware tools is then installed on the client, which will do system householding in the background. It will exchange data with the host system to ensure things are in sync, including its time.
As with most software, things can go wrong. Many of us rarely check if our time sources are properly configured and still work correctly. We just assume the time is correct and the system does the synchronization correctly, right? Especially when using a NTP daemon, things can go wrong. Its configuration needs to be set-up correctly, and checked regularly. If not, sooner or later, time will skew and result in being a few minutes off.
The first category of NTP troubles is when using a so-called “false-ticker”. Like our own system can be incorrect, a trusted time source can be incorrect. It can be happening on purpose, misconfiguration, or hardware issues. If we rely on such a resource, our time will be wrong as well. If you are using the NTP daemon together with ntpq, these false-tickers can be recognized with a “x” in front of the entry.
Another thing to check for is the “stratum 16” entries. We refer to an atomic clock or a reference clock as stratum 0. Stratum 1 devices collect the time from a stratum 0 device, usually via radio waves (GPS, CMDA, etc). Then our own systems are usually at stratum 2 or 3. If an entry shows stratum 16, something is wrong. It might not be able to synchronize its date. This may be occurring when it can’t find the source. Something as simple as iptables filtering too much traffic.
The last category consists of sources which are unreliable. Because the NTP daemon receives time information from a configured set of systems, it will check them with regular intervals. It will compare the data received from the sources, and take factors like distance and network delay in account. When it finds that a source provides unexpected results, it will be marked as unreliable. You can solve this by using different sources which are closer to you, or even internal. If it already an internal network source, then something might be wrong with the device. Most likely multiple systems will mark the same system as unreliable. When using a NTP daemon (and ntpq), these items are marked with a minus (-).
Time out of sync
Good to know is that NTP daemons usually won’t synchronize in big steps, as previously described. If time is too far off, it may even stop functioning, which is on purpose. This is an indirect warning that the time should be correctly manually. Best way to handle this is stopping first all process relying on time synchronization. Then manually synchronize time with a tool like ntpdate or rdate.
Discover Time Issues
So now we know it is important to track the time, and keep it synchronized it properly. Using the ntpq utility we can query the details of our time synchronization. In particular, we can see what sources are used, and any issues.
The best way to discover time synchronization issues is by monitoring the output of ntpq when using a NTP daemon. If you are using a NTP client, then it would make sense to compare it to trusted source and see if it does not differs too much (e.g. a few seconds). You could add tests to your monitoring tool to validate your time configuration on a regular basis.
For those who already use our security auditing tool Lynis, you are covered when using a NTP daemon. Lynis will parse the output and inform you if any false-tickers, or unreliable sources are used on your Linux system.