How to Determine a File Type on Linux
Finding Files and Understanding Their Content
You may encounter a file on your system with known contents or goal. Usually, the first thing we do is then use cat to show the contents, or execute it. While that makes sense, it may be dangerous to do. It might be a piece of malware, disrupt your screen output or even hang the terminal. Here is a better way to do it, using the file command. Great for forensics, malware analysis, intrusion detection, and normal day-to-day system administration.
The file command
Most systems will have the file command available. It is a nifty small tool which helps you quickly determine what the purpose of a file is. Besides just telling if it is binary code or data, it will include additional details. For binaries, it may share that it is an ELF binary, for 64 bits systems, how it is linked and if it depends on external function libraries.
How does file work?
Even veteran administrators might never have looked into the details of the file command, but taken its power for granted. The tool is pretty nifty, because it uses a staged set of tests, working towards a final answer. Depending on the outcome of each test it continues, till it finds useful details to share.
Stage 1: File system tests
The file command starts with determining if a file is a “simple” file. It can be a symbolic link to another file, or a directory. Yes, directories are files as well. To help with this, file uses the stat(2) system call, which is also a standalone utility.
From this output, we can see that the stat command does not reveal much. It is considered to be a regular file, which might hold any type of data. So time to go the next phase.
Stage 2: Magic discovery
When the file command knows the type of file we are dealing with, it can test more in-depth. This is done via a magic file, which represents many text strings, or character combinations. For example a file starting with PK, might be a compressed file.
With this predefined list of strings and regular expressions, most file types can be discovered.
Stage 3: Text files
The last stage is determining if the file is a text file. If it didn’t find a match by using tips from the magic dataset, it will assume it is a normal file with text in it. To be sure, it will check the character set used (ASCII, UTF-8). Also if line breaks are used and what type, like applied line feed and carriage returns, which differ between files created in MS-DOS/Windows, Mac OS and Linux systems.
Common types of output are:
- ASCII text
- ASCII text, with very long lines
- gzip compressed data, from Unix, last modified: <date>
File Command and Parameters
The file utility is very easy to use, as it actually does not require any parameter, except the file you want to analyze. While there are parameters available, most of them cover very specific cases. An example is changing the behavior of the tool, or the output itself.
- brief (-b) – Do not show the file name
- uncompress (-z) – Uncompress the data file for further inspection
See the man page for more specific use cases.