How to see the file type?

Did you come across a file, but don’t know what type it is? Let’s learn how to analyze it.

The unknown file

You may encounter a file on your system with known contents or goal. Usually, the first thing we do is then use cat to show the contents, or execute it. While that makes sense, it may be dangerous to do. It might be a piece of malware, disrupt your screen output or even hang the terminal. Here is a better way to do it, using the file command. Great for forensics, malware analysis, intrusion detection, and normal day-to-day system administration.

The file command

Most systems will have the file command available. It is a nifty small tool which helps you quickly determine what the purpose of a file is. Besides just telling if it is binary code or data, it will include additional details. For binaries, it may share that it is an ELF binary, for 64 bits systems, how it is linked and if it depends on external function libraries.

How does file work?

Even veteran administrators might never have looked into the details of the file command, but taken its power for granted. The tool is pretty nifty, because it uses a staged set of tests, working towards a final answer. Depending on the outcome of each test it continues, till it finds useful details to share.

Stage 1: File system tests

The file command starts with determining if a file is a “simple” file. It can be a symbolic link to another file, or a directory. Yes, directories are files as well. To help with this, file uses the stat(2) system call, which is also a standalone utility.

From this output, we can see that the stat command does not reveal much. It is considered to be a regular file, which might hold any type of data. So time to go the next phase.

Stage 2: Magic discovery

When the file command knows the type of file we are dealing with, it can test more in-depth. This is done via a magic file, which represents many text strings, or character combinations. For example, a file starting with PK might be a compressed file.

With this predefined list of strings and regular expressions, most file types can be discovered.

Stage 3: Text files

The last stage is determining if the file is a text file. If it didn’t find a match by using tips from the magic dataset, it will assume it is a normal file with text in it. To be sure, it will check the character set used (ASCII, UTF-8). Also if line breaks are used and what type, like applied line feed and carriage returns, which differ between files created in MS-DOS/Windows, Mac OS and Linux systems.

Common types of output are:

  • ASCII text
  • ASCII text, with very long lines
  • gzip compressed data, from Unix, last modified: 

File Command and Parameters

The file utility is very easy to use, as it actually does not require any parameter, except the file you want to analyze. While there are parameters available, most of them cover very specific cases. An example is changing the behavior of the tool, or the output itself.

  • brief (-b) - Do not show the file name
  • uncompress (-z) - Uncompress the data file for further inspection

See the man page for more specific use cases.

Feedback

Small picture of Michael Boelen

This article has been written by our Linux security expert Michael Boelen. With focus on creating high-quality articles and relevant examples, he wants to improve the field of Linux security. No more web full of copy-pasted blog posts.

Discovered outdated information or have a question? Share your thoughts. Thanks for your contribution!

Mastodon icon