Linux Security Principle: Containment of Failure
Everyone who used Windows 95 or 98 in the past is familiar with the concept of failure. One crashing application was enough to bring the system to a halt. Fortunately, Linux systems have a strong foundation, including privilege separation and memory management. When things go wrong, the impact is reduced to a minimum. This is called containment.
Linux Memory Management
Memory is like your the storage capacity of your brain. Every bit should be stored properly, or otherwise you will do strange things. Linux systems have powerful memory management, to ensure that data is properly sorted and permissions are assigned. For example an ELF binary, the most common binary format on Linux, has different sections for executable code and data. Then on top of that, each section gets different permissions in memory. For example code could be marked as read-only, to prevent it being overwritten by itself or another process.
As you can imagine, memory management is an important area of the GNU/Linux kernel component. A single implementation mistake is the difference between a stable system, or one that crashes for no reason.
Privilege Separation
One of the primary reasons that Linux systems are stable is the clear separation of privileges. We already have seen it in action for Linux memory management, where different structures are separated. This goes much further on other levels of the system, including what kind of functions can be performed by executables (e.g. using Linux capabilities).
Build for Impact Reduction
When you are building systems, we can learn a valuable lesson from the containment features of Linux. Every system should be built in such a way, that when the inevitable crash occurs, the impact to our full environment is limited. This containment of failure can be achieved by using a clear separation in functions. If one function goes down, it should only have an impact to that function. Where possible indirect damage should be limited, or avoided.
Never Fail A Little Bit
Systems will fail. Linux systems, while stable from the foundation, can fail as well. The worst outcome is a system which provides its services only half. It is not down, but not really up either. When you design your web server cluster, ensure that the load is properly shared among each node. Complete it with the right amount of monitoring, so it never gets stuck in “half” operation. This could happen when it is overloaded, yet the load balancer thinks it has enough resources left. It is better to fail completely, than just a little bit.
Conclusion
Everyone wants a stable system. Stability is the sum of a lot of factors combined, like privilege separation, proper memory management, and containment. To achieve a stable operating system, and system, these factors all need be in balance and correctly implemented. In upcoming blog posts, we will have a look at the more technical aspects.