How to Determine a File Type on Linux

Finding Files and Understanding Their Content

You may encounter a file on your system with known contents or goal. Usually, the first thing we do is then use cat to show the contents, or execute it. While that makes sense, it may be dangerous to do. It might be a piece of malware, disrupt your screen output or even hang the terminal. Here is a better way to do it, using the file command. Great for forensics, malware analysis, intrusion detection, and normal day-to-day system administration.

The file command

Most systems will have the file command available. It is a nifty small tool which helps you quickly determine what the purpose of a file is. Besides just telling if it is binary code or data, it will include additional details. For binaries, it may share that it is an ELF binary, for 64 bits systems, how it is linked and if it depends on external function libraries.

How does file work?

Even veteran administrators might never have looked into the details of the file command, but taken its power for granted. The tool is pretty nifty, because it uses a staged set of tests, working towards a final answer. Depending on the outcome of each test it continues, till it finds useful details to share.

Stage 1: File system tests

The file command starts with determining if a file is a “simple” file. It can be a symbolic link to another file, or a directory. Yes, directories are files as well. To help with this, file uses the stat(2) system call, which is also a standalone utility.

Screenshot of stat utility showing file details

Regular file is shown by stat utility

From this output, we can see that the stat command does not reveal much. It is considered to be a regular file, which might hold any type of data. So time to go the next phase.

Stage 2: Magic discovery

When the file command knows the type of file we are dealing with, it can test more in-depth. This is done via a magic file, which represents many text strings, or character combinations. For example a file starting with PK, might be a compressed file.

Screenshot of file -l with magic strings

Output of file -l displaying magic strings

With this predefined list of strings and regular expressions, most file types can be discovered.

Stage 3: Text files

The last stage is determining if the file is a text file. If it didn’t find a match by using tips from the magic dataset, it will assume it is a normal file with text in it. To be sure, it will check the character set used (ASCII, UTF-8). Also if line breaks are used and what type, like applied line feed and carriage returns, which differ between files created in MS-DOS/Windows, Mac OS and Linux systems.

Common types of output are:

  • ASCII text
  • ASCII text, with very long lines
  • gzip compressed data, from Unix, last modified: <date>

File Command and Parameters

The file utility is very easy to use, as it actually does not require any parameter, except the file you want to analyze. While there are parameters available, most of them cover very specific cases. An example is changing the behavior of the tool, or the output itself.

  • brief (-b) – Do not show the file name
  • uncompress (-z) – Uncompress the data file for further inspection

See the man page for more specific use cases.

Automate security audits and know your risks
Lynis Enterprise screenshot to help with system hardening

This blog post is part of our Linux security series to get Linux and Unix-based systems more secure.

Is system hardening taking a lot of time for you? Don't know where to start? We solved that problem: Lynis Enterprise.

Leave a Reply

Your email address will not be published. Required fields are marked *