A practical approach to learn Linux hexdump command

By | 11/02/2013

Hexdump is a Linux command that provides many options to dump file contents. It can dump file contents into formats such as hexadecimal, octal, ASCII, decimal. This command takes input either from a file or standard input. Hexdump is very helpful utility for debugging and verifying file contents written by any application program. In this article, we will first understand the basic usage of hexdump and then we will discuss some practical scenarios where hexdump command comes in really handy.

To begin with, here is a list of command line options of hexdump utility that will be discussed in this article:

-x option
-o option
-d option
-C option
-b option
-c option
-s option
-n option
-e option

Given below is a text file to be used for running hexdump utility with its options.

$ cat test.txt
ABCD

Hexdump command options

 
Hexdump utility provides options to display file contents in a 2-byte format. Let’s discuss those options.
 

–x option

This hexdump option displays file contents in hexadecimal format. It displays the output, which is 2 bytes in hexadecimal. For example:

$ hexdump -x test.txt 
0000000    4241    4443    000a                                        
0000005

In above output, we can see AB is shown as 4241 because A’s ASCII value is 65 which is 41 in hexadecimal. Here we can see hexdump utility picks in chunk of 2 bytes from file and displays them from LSB (i.e. least significant byte) to MSB (i.e. most significant byte), as illustrated in the following figure:

lsb

NOTE: hexdump utility is showing Next Line character (i.e. ‘\n’ whose ASII value is 10) ie 0x0a in hexadecimal.
In beginning of output, input offset is also shown in hexadecimal. Like, 0000000 is shown at first line of output and 0000005 is shown at second line, here 5 indicates that 5 bytes were dumped (i.e. 4 bytes of ABCD and 1 byte of ‘\n’).

With more characters in file.

$ cat test.txt
ABCDEFGHIJKLMNOPQRST
$ hexdump -x test.txt
0000000    4241    4443    4645    4847    4a49    4c4b    4e4d    504f
0000010    5251    5453    000a                                        
0000015

Here, you can see ASCII values of alphabets A to T (i.e. 65-84) are shown in hexadecimal format (i.e. 41 to 54).

NOTE: A line displayed in output contains a maximum of 8 entries. If an entry contains only 1 byte, like last byte in file, it would also be displayed in 2byte display format only. Like here 000a is shown for displaying ‘\n’ byte.

-o option

This hexdump option displays file contents in octal format. It displays output in 2 bytes in octal.
For example:

$ cat test.txt
ABCD
$ hexdump -o test.txt
0000000  041101  042103  000012                                        
0000005

Now we will understand above output.

We have already learned that AB would be taken as BA by hexdump utility.
BA in ASCII byte by byte =6665
BA in Binary byte bye byte = [01000010] [01000001]
Now, BA in octal taken these 2 bytes in sequence=041101

This way hexdump is displaying file content in octal format. Notes given in description of option ‘x’ are also applicable here.

-d option

This hexdump option displays file contents in decimal format. It displays output in 2 bytes in decimal.
For example:

$ cat test.txt
ABCD
$ hexdump -d test.txt
0000000   16961   17475   00010                                        
0000005

We have already learned that AB would be taken as BA by hexdump utility.
BA in ASCII byte by byte =6665
BA in Binary byte bye byte = [01000010] [01000001]
Now, BA in decimal taken these 2 bytes in sequence=16961

This way hexdump is displaying file content in decimal format. Notes given in description of option ‘x’ are also applicable here.

 
Hexdump utility provides options to display file contents byte by byte in different formats. Let’s discuss those options.
 

-C option

This hexdump option displays file contents in hexadecimal plus ASCII format. It displays output byte by byte in hexadecimal plus ASCII.
For example:

$ cat test.txt
ABCD
$ hexdump -C test.txt
00000000  41 42 43 44 0a                                    |ABCD.|
00000005

Now we will understand above output which is quite simple. Every alphabet is dumped sequentially from A to D in hexadecimal.
A in ASCII =65
A in hexadecimal =41
Above is similar for B, C and D.
Note here, that file contents are shown in ASCII format also enclosed within ‘|’. Input offset at start of each line is shown in hexadecimal similar to other options covered above.

-b option

This hexdump option displays file contents in octal format. It displays output byte by byte in octal.
For example:

$ cat test.txt
ABCD
$ hexdump -b test.txt
0000000 101 102 103 104 012                                            
0000005

In above output, every alphabet is dumped sequentially from A to D in octal.
A in ASCII =65
A in octal =101
Similarly for B, C and D.
Note that here also, Input offset at start of each line is shown in hexadecimal similar to other options covered above.

-c option

This hexdump option displays file contents in character display format. It displays output byte by byte in character.
For example:

$ cat test.txt
ABCD
$ hexdump -c test.txt
0000000   A   B   C   D  \n                                            
0000005

In above output, every alphabet is dumped sequentially from A to D in characters.
Note here also, Input offset at start of each line is shown in hexadecimal similar to other options covered above.

 
Hexdump also provides control over dumping characters from file contents. You can skip starting characters from dumping or you can dump only specified bytes from input file. Let’s discuss the options provided by hexdump for this purpose.
 

-s option

This option will allow you to skip starting bytes from dumping

$ cat test.txt
ABCD
$ hexdump -s 2 -b test.txt
0000002 103 104 012                                                    
0000005

Here, first 2 bytes (i.e. A and B) are skipped in dumped output.

-n option

This option will allow you to dump only specified number of bytes into format specified.

$ cat test.txt
ABCD
hexdump -n 2 -x test.txt
0000000    4241                                                        
0000002

Here, only 2 bytes (i.e. A and B) from input file are dumped output as per format provided by ‘x’ option.

 
The hexdump command also provides options to control the display formatting of the output. Let’s discuss the options here.
 

-e option

This option of hexdump allows user to provide format string to display data as per format option specified. Format string can contain any number of pairs of format options. This option is used as shown below:

$ hexdump –e ‘’

Here, format string can contain any number of format option pair in which each pair includes 3 items:

Iteration count: It is count for how many times format option would be applied
Byte count:        It is count of bytes which would be processed and displayed by hexdump on each iteration of format option applied
Format option:  It is option of hexdump that tells hexdump to display data in this specified format. It is used as “%”

There are format options that can be used with hexdump, are given below.

Format option that processes one byte at one time are – %_c, %_p, %_u, %c

Format option that processes one, two or four byte at one time are – %d, %i, %o, %u, %X, %x. By default, four bytes processed by these options.

Format option that processes four, eight and twelve at one time are – %E, %e, %f, %G, %g. By default, eight bytes processed by these options.

If user wants to customize display of data with above option, then user may use –e option several times along with option ‘_a’ or ‘_A’ for displaying input offset. For example, _ad, _ao, _ax options would display the input offset with display base as decimal, octal or hexadecimal respectively. ‘_A’ option is same ‘_a’ but it displays input offset at last when all of the input data has been processed/displayed by hexdump.

Now, there are examples given below to understand ‘-e’ option better and will help you to explore it better.

$ echo abcdefghijklm | hexdump -e '/1 "%_ax) "' -e '/1 "%02X" "\n"'
0) 61
1) 62
2) 63
3) 64
4) 65
5) 66
6) 67
7) 68
8) 69
9) 6A
a) 6B
b) 6C
c) 6D
d) 0A

Here, input offset is shown in hexadecimal with the help of format string ‘/1 “%_ax) “‘ and string “abcdefghijklm” is shown byte by byte in hex format with the help of format string ‘/1 “%02X” “\n”‘. Note, iteration count is not specified in format string, in this case, format option is applied on all input data and here it is processed byte by byte because number of bytes after ‘/’ is 1 that tells number of bytes on which format option is applied.

$ echo abcd | hexdump -e '/1 "%_Ax) "' -e '/1 "%02X" "\n"'
61
62
63
64
0A
5)

Here in this example, only difference from above example is of ‘_A’ option usage instead of ‘_a’ option. Here, we can clearly see that after all data (i.e. “abcd”) is processed, 5) is shown that is input offset to show number of characters processed.

Below are more examples with different format strings.

$ cat test.txt
ABCD
$ hexdump -e '8/1 "%o ""\t"" "' -e '8/1 "%c"' test.txt
101 102 103 104 12       ABCD

Here, all input data in file is processed since iteration count is 8 and byte count is 1 and data is displayed in octal first then data is shown in characters.

$ hexdump -e '2/1 "%o ""\t"" "' -e '6/1 "%c"' test.txt
101 102  ABCD

Here, 2 bytes on two iterations are processed as per first format string and data is displayed in octal. As per second format string, all input data in file is processed since iteration count is 6 that is greater than number of bytes in file and data is shown in characters.

$ hexdump -e '2/1 "_%d"' test.txt
_65_66_67_68_10_

Here note that even though iteration count is 2, but all data in file is processed and displayed as per format string. It is because when there is only one format option pair in format string, then iteration count is incremented until number of bytes processed reaches end of entire input data. Iteration count is 2 and byte count is 1, but data in file is 4 characters, in this case, when bytes to be processed is lesser than input data and it is only format option pair in format string, then hexdump processes every byte as per format option on each iteration until end of entire input data is reached. Hence, all bytes in input data is displayed in decimal prefixed with “_” as given in format string.

After going through several options of hexdump utlilty, it would help you to try other combinations of these options with hexdump utility to proess and display input data as per your need.

 

In the Real World

In the above sections, we learned in detail, about various usage options and the plethora of ways hexdump can be used. However, in the real world, why and when do we need hexdump? Well, we may need hexdump mostly in circumstances where we are dealing with non-ascii files. That does not mean, we never read hex values of an ascii text file. It is just that, while working with non-text files, we ought to use hexdump more often. Moreover, reading hex values is a common norm for programmers to read memory.

A curious question now would be, what are the scenarios where we would need to use hexdump. Let’s try to find an answer in the following example scenario.

In certain cases, we might come across certain media or image files which have incorrect extensions, or needs an effort to find the right matching application to open/play the image/media. In such cases hexdump can be pretty useful with some minimal knowledge of the file format. It is a well proven method to confirm the format through a file’s header. A file can be confirmed of its file format by its magic number.

Here is a list of all the magic numbers corresponding to various file formats and their variants. Note, the list provided also specifies an offset at which the magic number is expected to be present. Magic numbers are also called file signatures.
The process is, just check the few initial bytes say first 16 byte using hexdump, and match the magic number in hex, which will confirm the format of the file.

Suppose on my system I have an audio file, test.mp3. To see first 16 bytes of this mp3 file, I do

$ hexdump -n 16 -C test.mp3

To reiterate,
-n option specifies how many bytes of the beginning of the file to dump
and -C displays the file contents in hex and ASCII.

Therefore, here is the output I get

00000000  49 44 33 03 00 00 00 00  21 76 54 49 54 32 00 00  |ID3.....!vTIT2..|
00000010

If we check the magic number from the list , we see two variants of mp3. One with magic number “FF FB” and other with “49 44 33”. Now, look at the offset where it is expected to be present in the file. It is zero in both the cases. Further, check the first three bytes of our hexdump, its “49 44 33” which matches with the magic number of one of the mp3 variant. Yes, the file we have is indeed mp3.

If hex magic numbers are difficult the match, one can also identify the ASCII string which is

ID3

Now let us try to determine the fomat of another image file we have. Let’s call it ‘image’ with no extension as we will determine the extension using hex dump. Dump the first 16 bytes of the ‘image’ file as

$ hexdump -n 16 -C image.png
00000000  89 50 4e 47 0d 0a 1a 0a  00 00 00 0d 49 48 44 52  |.PNG........IHDR|
0000001

Note the first few bytes of the dump in ASCII

.PNG

It kind of confirms the image format is ‘PNG’. However, look for the hex magic number from the list for PNG format and match it with the hex dump output. We do have the first 8 bytes of the file as

89 50 4e 47 0d 0a 1a 0a

i.e. at offset zero.

So, we were successfully able to find the format of the file ‘image’ without considering the file extension.

Note, for a few formats like CD/DVD image files, dumping first 16 bytes may not be sufficient. So, one may keep experimenting and exploring through the hexdump utility. We have gone through an experience resolving simple format recognition exercises using hexdump utility. There may be other complex situations, where hexdump makes our life relatively easier. Here are some of the example use cases.

Case 1:
To start with in a Linux C programmer’s world, where one wants to alter an ELF executable binary to include another instruction in the .ro section. Inserting an instruction, is like inserting numbers at the binary level, as all we have are numbers at the machine level. Even before the programmer begins to alter the executable, one should be pretty clear about each and every byte of the executable, as to what it is. Hence, hexdump could be pretty handy here.

Hexdump assists here to dump all the values byte by byte of the elf binary, and understand what follows what value, aligning to the programmer’s understanding. Therefore, it can be through the use of hexdump, programmer would be able to determine, at what offset, which value to be inserted to add an extra instruction.

Case 2:
In the networking world, engineers deal with a lot of data packets. Sometimes, there is a need to decode these data packets, which are definitely not text strings or in human readable form. This arises the need to use hexdump which helps analysing data coming in, going out and passing through each layer of the protocol stack.

Case 3:
When a media player plays a content, what goes in the internals of the player is parsing, decoding, maybe decryption. In case of a bug, which disrupts the the playback for a few seconds, hexdump utility can be used to help in analysing the data flow in the media player and determine which part of the player control flow is causing the problem.

I hope with the help of above example scenarios, it is clear as to where and what kinds of assistance is offered by hexdump utility.

 

References

http://linux.math.tifr.res.in/manuals/man/hexdump.html

http://www.novell.com/communities/node/6419/making-sense-hexdump

One thought on “A practical approach to learn Linux hexdump command

  1. mulyawan

    dear sir,
    i still problem at hexa dump,
    could you help me to convert from hexa dump to listing program ?
    if you can help me i will send the dump file to you thanks regards

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *