Opening files with the right program – Are we really stuck with the same brain-dead extension system that Windows uses? Well, this was a topic suggested by an anonymous website viewer to our team and as part of ‘You suggest, We publish‘ ideology that our website follows, here is an article that explains how important are file extensions in Linux.
The concept of file extensions in Linux
A file name extension is a suffix to the name of file separated by a dot which indicate its content or usage e.g. .txt extension tells it’s a text file, in same way HTML extension tells it’s a HTML file.
Although, file extensions makes life pretty simple, however at the same time adding vulnerability to the file system as it can be misused massively. Also if the extension gets deleted accidentally, the file will become completely useless as we will have no clue as what application is needed to read this file.
File Extensions in Linux
Linux generally does not use the file extensions. For example, a text file in Linux can be named any of the following :
text text.txt text.text anytext.lin.txt
Rather it uses the magic number (discussed later) to detect the application to run with the application e.g. Shell scripts do not need to have any extension but the beginning #! tells which interpreter to use to execute the script, in same way other files also have magic numbers.
However some application programs that run on them do need extension to be present with the file name but this is just the limitation of the application and not the OS itself.
For example the gcc compiler.
$ gcc f1.txt -o f1 f1.txt: file not recognized: File format not recognized collect2: ld returned 1 exit status $ mv f1.txt f1.c $ gcc f1.c -o f1 $
It would not identify any file unless it has an extension .c, .cpp etc.
Even when extension are not actually needed, they are frequently used to make it easier for users to get the file types just by looking at them for example you might have seen the shell scripts with extension .sh or perl scripts with extension .pl , actually they are not needed by the system, the extension is there only for the user’s sake.
A Magic Number is a number associate with every Linux file and is generally the first few bytes of the file, which identifies the file type. It is used by the Linux command line utility file which displays the information related to a file. File command uses “libmagic” library which implements the magic number retrieval and determining the file types to get the magic number and match with the database file /usr/share/file/magic. (The location of this file may vary with flavours of Linux)
However, the number of bytes representing the magic number may vary from file type to file type. To see, its working, let’s see how file Linux Utility works:
$ file image image: PC bitmap, Windows 3.x format, 1180 x 622 x 24
Note, the name of the file was image with no extension, and Linux still came to know it’s a bitmap file.
Here is another example:
$ file f1 f1: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped
So we see that the file utility detected that f1 is an ELF executable.
A file magic is used by the file utility to determine the details. To know more about magic file, see here.
MIME stands for Multi-purpose Internet Mail Extension. Mime types are used to identify file type. They are usually a combination of both extension and magic number. These mime types are used by File managers for file association.
Applications such as Nautilus can detect the MIME type of a file as follows:
- The application first uses file content sniffer to find a particular pattern (magic number) that is registered with the MIME Types database.
- If the file content snuffer fails to identify the MIME type then the application checks the filenames against the filename and extension patterns in MIME type registry to get the MIME type.
The MIME type tells which application to use with the file type.
There is mime type specifications (here) which tells the specification for creating new mime types.
An example from the site :
<?xml version="1.0"?> <mime-info xmlns='http://www.freedesktop.org/standards/shared-mime-info'> <mime-type type="text/x-diff"> <comment>Differences between files</comment> <comment xml:lang="af">verskille tussen lêers</comment> ... <magic priority="50"> <match type="string" offset="0" value="diff\t"/> <match type="string" offset="0" value="***\t"/> <match type="string" offset="0" value="Common subdirectories: "/> </magic> <glob pattern="*.diff"/> <glob pattern="*.patch"/> </mime-type> </mime-info>
Here is what some of the important elements signify (excerpt from here):
- Glob elements have a pattern attribute. Any file whose name matches this pattern will be given this MIME.
- Magic elements contain a list of match elements, any of which may match, and an optional priority attribute for all of the contained rules. Low numbers should be used for more generic types (such as ‘gzip compressed data’) and higher values for specific subtypes (such as a word processor format that happens to use gzip to compress the file). The default priority value is 50, and the maximum is 100.
Each match element has a number of attributes:
type Yes string, host16, host32, big16, big32, little16, little32 or byte. offset Yes The byte offset(s) in the file to check. This may be a single number or a range in the form `start:end’, indicating that all offsets in the range should be checked. The range is inclusive. value Yes The value to compare the file contents with, in the format indicated by the type attribute. mask No The number to AND the value in the file with before comparing it to `value’. Masks for numerical types can be any number, while masks for strings must be in base 16, and start with 0x.
In the end
Here are a couple of relevant discussions that are on internet about the same topic :
An excerpt from discussion-1 :
Linux on the other hand has a executable bit in it’s permissions model. If it’s on, then the files is executable, if it’s off, then it’s not. Downloaded files are never set to execute, unless they are first extracted from a archive.. and that’s never done autocratically either.
There are further UI improvements, also. Like how nautilus will identify mislabelled files as such when you try to open them. Also it will not automatically execute files with misleading names, even if they are set to be executable.
So you see if I take a shell script, and name it .jpg, it won’t execute from my file manager even if it’s set to be executable..
so Linux can and does pay attention time to time about the file extension. In order for the file to be run I either have to name it something nautilus likes or run it from a command line.
An excerpt from discussion-2 :
The file manager (Nautilus, by default) uses the MIME type of a file to determine which program to open it with. When an application is installed, it can specify what MIME types it can open and the command to use to open the files in the .desktop file which is placed in /usr/share/applications. This is the file used for menus, desktop short-cuts, etc.
For example, GIMP has the following .desktop file:
Some important Links