Linux Processes explained – Part II

By | 08/04/2013

In the part -I, we got a basic understanding of Linux processes and saw how the various Linux commands help us to explore the various processes running in the system. In this part of the discussion, we shall get our hands dirty by creating certain processes and get hold of the memory layout in the RAM.

Sometimes we use the terms process and program interchangeably. However, under legitimate glossary, programs and processes are absolutely different terms. When we write a source code of some logic, it is a file which is stored in the hard disk. This file on the hard disk is called as a program. However, when we run this program (i.e. the program executable, obtained after compiling and linking), it creates its own memory space in the RAM. This running instance is a process. A process can be initiated in two significant ways. One, as we just mentioned above, by running a built executable/command from a Linux shell and other, from within a program. We shall discuss programmatic approach more in further sections.

 

Linux Processes in Memory

Well, now we know that an active process occupies memory in RAM. An immediate question which pops out is, how is this memory allocated, and what all does it enfold?

A clear and comprehensive understanding of the memory layout of linux processes involves myriad linux operating system concepts and memory management theory. Here in this part, we shall learn about some of the key ideas to grasp the overall understanding.

While a process is running, it is doing certain task, so it might have to use the system resources. A simple example to illustrate a system resource is a file on the disk. In order to read a file from the file system, it has be accessed. Now, what if another process has to access the same file on the disk? Who will decide whether it should be allowed to, and if it is allowed and Linux has to manage all the accesses to the file.

How Linux manages its resources is, it does not allow any user running program to directly access any of the system resource. We have system calls in a program which are called from the user space to use the system resources. Only when a process enters kernel space it can access the file, and kernel keeps track of the opened files in the form of kernel file tables.

In memory, every process is given a 4GB of virtual address space considering a 32-bit architecture. The lower 3GB virtual addresses is accessible to the user space portion of the process and the upper 1GB is accessible to the kernel space portion.

Following is a diagram illustrating the 4GB process virtual address space.

MemoryLayout

              (Click to Enlarge)

The above process memory split of 1G to kernel space and 3G to user space is changeable though as per the need. However, in most of the cases, 1G/3G split suits the best. The program which we generally write to create a process is at the user space, and hence we shall be eyeing more for the user space memory layout of the process. Zooming into the user space address memory layout, there are various segments of a process. Following is a diagram to get the picture clear in our minds.

MemoryLayout1

(Click to Enlarge)

The above diagram is an abstract snapshot of a generic process image in memory. We see the memory has been categorized into various segments in accordance with the ELF format.

Let us walk through these segments and see what is in store there.

Text Segment

The executable binary is mapped onto this text segment and therefore is a read only segment for the process. Basically, this segment contains the instructions to be followed by the processor in a language that the processor understands. Hence this is also called the code segment. In certain scenarios, where multiple processes are created out of the same program, Linux allows sharing of the text segment to avoid reloading the same instructions.

Data Segment

This segment cannot be shared among processes. The data segment is internally parted into two areas. One part of the segment memory holds all the initialized data variables i.e. initialized global variables or static variables. The second part, also called the BSS segment of the process image enfolds the uninitialized data of the program, which includes the global variables which have not been initialized. However, upon the onset of the process, all the data of the BSS segment is zero-initialized.

Heap

When there is a dynamic memory allocation in the process, heap is the segment which is being used for all the dynamic memory allocations. In the C source code of the user program, standard library function ‘malloc()’, ‘calloc()’ or ‘alloc()’ are used to allocate linear memory dynamically.

Stack

The stack contains all the local variables used in the program considering their respective local scopes. Whenever a local variable is defined, it is pushed into the stack and whenever the scope for that local variable ends, it is popped out. The stack might be decreasing or increasing in linear addresses depending upon the underlying architecture.

Shared Libraries

This is no segment as such. In case we have dynamically linked our program to shared objects, shared objects are loaded at the runtime. This part of memory help the process in mapping and accessing the functions which are used from the shared libraries.

The process virtual memory status

We’ve learnt about how the various aspects of a process being held in the memory. Get ready to see the real statistics. However, getting down into the memory, one needs to understand the intricacies of concepts like pages, physical address mappings, etc which is out of scope of our current article.

Anyhow, in this section, we’ll touch base some of it.

To begin with, lets code our ancient hello-world C program.

#include < stdio.h >
#include < unistd.h >

int main()
{
	char mesg[] = "HelloWorld";

	/*We need time to check process stats*/
	sleep(100);
	printf("%s\n", mesg);
	return 0;
}

Well, the above program is pretty straightforward, except the ‘sleep()’ call we have before we print our message. This is because, we want the process to be valid for sometime, so that we can explore the memory layout of the process. While the process is sleeping, we can access the status of the memory layout.

We compile the program and run it in the background.

$gcc helloworld.c -Wall -o helloworld
./helloworld &

To get any information regarding a process, PID is the prime key. Hence, we need to retrieve the assigned PID of our process. (Refer part I to what know PID is.)

$ps
  PID TTY      	TIME CMD
 1746 pts/1	00:00:00 bash
 2808 pts/1	00:00:00 helloworld
 2809 pts/1	00:00:00 ps

In order to get the complete status of our newly created process including the status, size of memory segments, etc, we use command

$cat /proc/2808/status

Note: 2808 in the above command is the PID of our process of interest. The generic syntax looks like

cat /proc//status

Well, this is a command to retrieve data using /proc file system which is a special linux file system providing information about system and processes. More about /proc file system can be referred here. We will see, procfs is really helpful in getting more and more about the current processes.

The output I see on my system is

Name:    helloworld
State:    S (sleeping)
Tgid:    2808
Pid:    2808
PPid:    1746
TracerPid:    0
Uid:    1000    1000    1000    1000
Gid:    1000    1000    1000    1000
FDSize:    256
Groups:    4 24 27 30 46 109 124 1000
VmPeak:    	2048 kB
VmSize:    	1988 kB
VmLck:       	0 kB
VmPin:       	0 kB
VmHWM:     	280 kB
VmRSS:     	280 kB
VmData:      	32 kB
VmStk:     	136 kB
VmExe:       	4 kB
VmLib:    	1788 kB
VmPTE:      	16 kB
VmSwap:       	0 kB
Threads:    1
SigQ:    0/3851
SigPnd:    0000000000000000
ShdPnd:    0000000000000000
SigBlk:    0000000000000000
SigIgn:    0000000000000000
SigCgt:    0000000000000000
CapInh:    0000000000000000
CapPrm:    0000000000000000
CapEff:    0000000000000000
CapBnd:    ffffffffffffffff
Cpus_allowed:    1
Cpus_allowed_list:    0
Mems_allowed:    1
Mems_allowed_list:    0
voluntary_ctxt_switches:    4
nonvoluntary_ctxt_switches:    0

The output gives us humongous information regarding our process.

Name : Name of the process
State   : Process state
Tgid   : Thread group ID, equal to the PID of the process (and this is also the TID of the first thread started in the process).
Pid     :  PID i.e. Process Identifier
PPid  : PID of the parent process.
TracerPid : PID of process tracing this process (0 if the process is not traced).
Uid    : Set of four User IDs: real, effective, saved, filesystem.
Gid   : Set of four Group IDs: real, effective, saved, filesystem.
FDSize : Number of file descriptors in use.
Groups : List of GIDs of supplementary groups.
VmPeak : Peak size of the virtual memory of the process.
VmSize : Current total size of the virtual memory.
VmLck : Size of locked memory.
VmPIN : Size of pinned pages i.e. which are never swappable.
VmHWM : Peak size of the resident set (“high water mark”). The resident set is the part of process memory which currently resides in the physical memory.
VmRSS : Current size of the resident set.
VmData : Size of the data segment.
VmStk : Size of the stack segment.
VmExe : Size of the text segment
VmLib : Size of shared libraries loaded by this process
VmPTE : Size of page table entries
VmSwap : Size of swap space used by this process.
Threads : Number of threads.
SigQ : Two values separated by “/”: Current number of queued signals/maximum allowed number of queued signals. (Signals are one of the means of communication among processes. They are also used to send alerts/notifications to the processes.)
SigPnd : A bitmap (as a hexadecimal value) of pending signals for the thread[2] whose TID == TGID == PID.
ShdPnd : A bitmap (as a hexadecimal value) of pending signals targeted on the whole process.
SigBlk : A bitmap (as a hexadecimal value) of blocked signals for the thread whose TID == TGID == PID.
SigIgn : A bitmap (as a hexadecimal value) of ignored signals (per-process).
SigCgt : A bitmap (as a hexadecimal value) of signals a signal handler is installed for (per-process).
CapInh : The inheritable capability set (a bitmap shown as a hexadecimal value). More about capabilities here
CapPrm : The permitted capability set (a bitmap shown as a hexadecimal value).here
CapEff  : The effective capability set (a bitmap shown as a hexadecimal value). here
CapBnd : The bounding capability set (a bitmap shown as a hexadecimal value). here
Cpus_allowed : A bitmap (as a hexadecimal value) of CPUs where the process may run.
Cpus_allowed_list : A list of CPU numbers where the process may run (same information as Cpus_allowed).
Mems_allowed : A bitmap (as a hexadecimal value) of memory nodes the process can use.
Mems_allowed_list : A list of memory nodes the process can use
voluntary_ctxt_switches : Number of voluntary context switches.
nonvoluntary_ctxt_switches : Number of involuntary context switches.

It is fine, if one is not able to understand what each and every value in the output means. Linux processes are like an ocean of concepts and knowledge. Hence, one cannot take-in everything in one gulp. The principle mantra is try to grasp as much as one can. Things will link on its own in the journey of learning.

Although, at the moment it is significant to know following :

State: S (sleeping)
Since, the process is currently having a valid PID, only because of the sleep() call, hence we understand the process is sleeping.

Tgid: 2808
The thread group ID(Tgid) here is same as the PID of the process

Pid: 2808
Nothing new here, we already knew this number.

PPid: 1746
The parent ID. In our case, the bash shell is the parent process as we executed the binary from the shell. To confirm, just note the PID of ‘bash’ from the ‘ps’ command output.

$ps 
  PID TTY      	TIME CMD
 1746 pts/1	00:00:00 bash

So this confirms our understanding.

VmPeak: 2048 kB
The peak virtual memory size infers that the memory size of the process keeps on varying.

VmSize: 1988 kB
The current size of the virtual memory being used by the process.

VmData: 32 kB
The size of the data segment of the process in memory.

VmStk: 136 kB
The size of the stack segment of the process in memory

VmExe: 4 kB
The size of the text segment of the process in memory is 4KB

VmLib: 1788 kB
The size of virtual memory occupied by the shared objects is 1788KB.

Threads: 1
The number of threads. Since, we never started any new thread, just the main thread is running.

The real addresses

It is time to jump into the addresses of the segments which we just discussed in the previous section of this article. The process segments are also termed as virtual areas in the kernel terminology. Worth mentioning, the addresses we see in this section are all virtual addresses.

We shall pick up the same ‘helloworld’ program what we wrote in the previous section.
Let us create our process and get it running and sleeping :)

$./helloworld &

We again will use the procfs to get the absolute process virtual memory mapping to each segment. The generic command would be

 cat /proc//maps

The process ID would be different this time, hence we need to get it through linux ‘ps’ command.

  PID TTY      	TIME CMD
 1746 pts/1	00:00:00 bash
 2866 pts/1	00:00:00 helloworld
 2867 pts/1	00:00:00 ps

Therefore, with the new process ID as ‘2866’, we read the procfs as:

$ cat /proc/2866/maps

The memory maps from my system looks like

$ cat /proc/2866/maps
08048000-08049000 r-xp 00000000 08:01 393767 	/home/ubuntu/progs/helloworld
08049000-0804a000 r--p 00000000 08:01 393767 	/home/ubuntu/progs/helloworld
0804a000-0804b000 rw-p 00001000 08:01 393767 	/home/ubuntu/progs/helloworld
b75b0000-b75b1000 rw-p 00000000 00:00 0
b75b1000-b7750000 r-xp 00000000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7750000-b7752000 r--p 0019f000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7752000-b7753000 rw-p 001a1000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7753000-b7756000 rw-p 00000000 00:00 0
b7765000-b7768000 rw-p 00000000 00:00 0
b7768000-b7769000 r-xp 00000000 00:00 0      	[vdso]
b7769000-b7789000 r-xp 00000000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
b7789000-b778a000 r--p 0001f000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
b778a000-b778b000 rw-p 00020000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
bf8d2000-bf8f3000 rw-p 00000000 00:00 0      	[stack]

Plethora of hexadecimal numbers, correct? Welcome to the world of memory addresses!  First of all, let us understand what each column of the output signify.

  • 1st column from the left, is the start address and the end address of a particular segment.
  • 2nd column is a string which depicts the permission flags where :

 r means readable
w means writable
x means executable
p means private i.e it is not shared.
s means shared

  • 3rd column is again a number which indicates the offset in the file where the mapping begins. Not every segment is mapped from a file, so the value of offset is zero in that case.
  • 4th column is of the form “major number : minor number” of the associated device, as in, the file from which this segment is mapped from. Again, the segments which are not mapped from any file, value is 00:00.
  • 5th column is the i-node of the related file.
  • The last column on the right is the path of the related file. Its blank in case there is no related file.

In order to identify the segments, we need to notice the access permissions associated with each segment. Lets take the first segment i.e. the first row of the output.

 
08048000-08049000 r-xp 00000000 08:01 393767 	/home/ubuntu/progs/helloworld

The access permissions implies it is a readable and executable private segment. From the all the segment descriptions, we can make out it is the text segment. The text segment is the read-only with contains the instructions from the binary. Looking at the related file, it is not hard to guess, the region from virtual address ‘08048000’ to ‘08049000’ is the text segment of our helloworld program.
Moving on to the next row

08049000-0804a000 r--p 00000000 08:01 393767 	/home/ubuntu/progs/helloworld

From the permissions, this looks like a read-only private segment. This is one of the additional segment a process image can possess.

The next one,

0804a000-0804b000 rw-p 00001000 08:01 393767 	/home/ubuntu/progs/helloworld

Again, noticing the access permissions, its a read/write segment which points to the data segment of the process.

The following row again looks like a segment with read/write permissions.

b75b0000-b75b1000 rw-p 00000000 00:00 0

However, it is not related to any file (as offset = major # = minor # = inode = 0 and file is blank). It is the BSS segment which is mapped to the zero page and is initialized to zero therefore making it a zeroed memory area.

The next ones, again reads like a text segment, read-only segment and data segment and bss respectively, but the related file is a shared object. Our program is implicitly dynamically linked to the ‘libc’ standard library, which we can see here.

b75b1000-b7750000 r-xp 00000000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7750000-b7752000 r--p 0019f000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7752000-b7753000 rw-p 001a1000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7753000-b7756000 rw-p 00000000 00:00 0

Note that, although these segments are mapped out of a shared object, though currently in this virtual memory area, they are not shared. Hence the permissions states them as private memory mapping.

We have similar set of segments (read, read-only and data) for the dynamically linked ld library.

b7769000-b7789000 r-xp 00000000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
b7789000-b778a000 r--p 0001f000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
b778a000-b778b000 rw-p 00020000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so

Besides, we have three more additional segments as listed below.

b776or5000-b7768000 rw-p 00000000 00:00 0
b7768000-b7769000 r-xp 00000000 00:00 0      	[vdso]
bf8d2000-bf8f3000 rw-p 00000000 00:00 0      	[stack]

The first one represents yet another anonymous read/write segment which may be used as a backing store.
The second one, though not related to any file, but specifies itself by [vdso]. It is a special memory area for kernel provided shared library to aid userspace call to kernel-space. The abbreviated full form is virtual dynamically linked shared object.
The third one is the stack segment. It is already know it enfolds all the local variables.

A simple exercise

We have one local variable in our helloworld.c program. Let us check out which segment it belongs to in the process image. It is pretty simple. What we need to do is just print the address of the variable and find the correct address range from the memory mapping which includes the address of our variable.

To print the address of the variable, we modify our helloworld.c to

#include < stdio.h >
#include < unistd.h >

int main()
{

	char mesg[] = "HelloWorld";

	/*We need time to check process stats*/
	sleep(100);

	printf("Address of string mesg is %x\n", mesg);
	printf("%s\n", mesg);

	return 0;
}

Lets build and run the binary, and in the meantime till we get the address of our ‘mesg’ variable, lets get the memory mapping of the newly created process.

$gcc helloworld.cl -o helloworld
$./helloworld &
  PID TTY      	TIME CMD
 1746 pts/1	00:00:00 bash
 3036 pts/1	00:00:00 helloworld
 3037 pts/1	00:00:00 ps

Retrieving the memory segment mapping,

$ cat /proc/3036/maps
08048000-08049000 r-xp 00000000 08:01 393656 	/home/ubuntu/progs/helloworld
08049000-0804a000 r--p 00000000 08:01 393656 	/home/ubuntu/progs/helloworld
0804a000-0804b000 rw-p 00001000 08:01 393656 	/home/ubuntu/progs/helloworld
b759e000-b759f000 rw-p 00000000 00:00 0
b759f000-b773e000 r-xp 00000000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b773e000-b7740000 r--p 0019f000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7740000-b7741000 rw-p 001a1000 08:01 658874 	/lib/i386-linux-gnu/libc-2.15.so
b7741000-b7744000 rw-p 00000000 00:00 0
b7754000-b7756000 rw-p 00000000 00:00 0
b7756000-b7757000 r-xp 00000000 00:00 0      	[vdso]
b7757000-b7777000 r-xp 00000000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
b7777000-b7778000 r--p 0001f000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
b7778000-b7779000 rw-p 00020000 08:01 658854 	/lib/i386-linux-gnu/ld-2.15.so
bff47000-bff68000 rw-p 00000000 00:00 0      	[stack]

After 100 secs since the launch of our process, we get,

Address of string mesg is bff66561
HelloWorld

We need to find out, the address 0xbff66561 lies in which memory segment.Therefore, look for memory segment whose end-address is greater than 0xbff66561. We can see only one memory segment entry with has its end address greater than 0xbff66561 address value.
That is,

bff47000-bff68000 rw-p 00000000 00:00 0      	[stack]

Now, check the start address of this segment if it is less than 0xbff66561. Yes, it is. Hence we are sure the variable ‘mesg’ is stored in the above memory segment which is the stack.

One more

So, now, we know the virtual addresses of the process memory areas and we also confirmed our understanding that local variables go to stack. But did we miss heap in the above discussed output? Well, maybe it was never used.
Lets write another program with a dynamic memory allocation which uses heap. To get realistic, let us convert a number to its corresponding string. the C source looks like

#include < stdio.h >
#include < stdlib.h >
#include < unistd.h >

int g_init_number = 123;
int g_uninit_temp;

/*convert a number to a string*/
int main()
{
	int local_size = 1;
	char *dynmem_str = NULL;
	int local_i = 0;
	int local_temp_int = 0;

	/*Determine the length of our output
    string i.e how many numbers form our number*/
	g_uninit_temp = g_init_number;
	while (1)
	{
    	g_uninit_temp /= 10;
    	if (!g_uninit_temp)
    	{
        	break;
    	}
    	local_size++;
	}

	/*Allocate memory for string of
	*   	 computed size + 1 for terminating
	*   	 null character*/      	 
	dynmem_str = (char*) malloc ((local_size + 1) * sizeof(char));
	if (dynmem_str == NULL)
	{
    	printf("Memory error!\n");
    	exit(1);
	}

	g_uninit_temp = g_init_number;
	for (local_i = local_size - 1;local_i >= 0 ; local_i--)
	{
    	local_temp_int = g_uninit_temp % 10;
    	dynmem_str[local_i] = local_temp_int + '0';
    	g_uninit_temp /= 10;
	}

	dynmem_str[local_size] = 0;

	printf("The output string is %s\n", dynmem_str);

            /*We need time to get hold of the memory maps*/
	sleep(100);

	printf("\n\nAddresses of local variables: &local_size = %x\n \t &local_i = %x\n\t &local_temp_int = %x\n", &local_size, &local_i, &local_temp_int);
	printf("Address of global initialized variable g_init_number = %x\n", &g_init_number);
	printf("Address of global un-initialized variable g_uninit_temp = %x\n", &g_uninit_temp);
	printf("Address of dynamically allocated variable dynmem_str = %x\n", dynmem_str);

	free(dynmem_str);
	return 0;
}

In the above written source, first of all we determine the size of our output string i.e. number of individual characters that form the number. We make an effort to determine the size, because we wish to dynamically allocate the memory for the output string, for which we need the size.

Once we have the size, allocate that much amount of memory and apply the logic to populate this newly allocated memory with our desired number-equivalent string.

In the end, we print the addresses of all the data i.e. variables used in the program. We’ll repeat the exercise of determining which variable goes in which memory segment. This will help us give a full-proof seal to our understanding.
We are all set for the fun. Lets build and run the program.

$ gcc memmaps.c  -o memmaps
./memmaps &
[2] 2423

To clench the PID of our newly created process,

$ ps
  PID TTY      	TIME CMD
 2004 pts/1	00:00:00 bash
 2423 pts/1	00:00:00 memmaps
 2424 pts/1	00:00:00 ps

Now using the PID of our ‘memmaps’ process, we retrieve the memory maps

$ cat /proc/2423/maps
00110000-00111000 r-xp 00000000 00:00 0      	[vdso]
00514000-006b7000 r-xp 00000000 08:01 15729750   /lib/i386-linux-gnu/libc-2.15.so
006b7000-006b8000 ---p 001a3000 08:01 15729750   /lib/i386-linux-gnu/libc-2.15.so
006b8000-006ba000 r--p 001a3000 08:01 15729750   /lib/i386-linux-gnu/libc-2.15.so
006ba000-006bb000 rw-p 001a5000 08:01 15729750   /lib/i386-linux-gnu/libc-2.15.so
006bb000-006be000 rw-p 00000000 00:00 0
00993000-009b3000 r-xp 00000000 08:01 15732611   /lib/i386-linux-gnu/ld-2.15.so
009b3000-009b4000 r--p 0001f000 08:01 15732611   /lib/i386-linux-gnu/ld-2.15.so
009b4000-009b5000 rw-p 00020000 08:01 15732611   /lib/i386-linux-gnu/ld-2.15.so
08048000-08049000 r-xp 00000000 08:01 7345013	/home/ubuntu/aprograms/memmaps
08049000-0804a000 r--p 00000000 08:01 7345013	/home/ubuntu/aprograms/memmaps
0804a000-0804b000 rw-p 00001000 08:01 7345013	/home/ubuntu/aprograms/memmaps
082f1000-08312000 rw-p 00000000 00:00 0      	[heap]
b76ec000-b76ed000 rw-p 00000000 00:00 0
b7700000-b7703000 rw-p 00000000 00:00 0
bfe5a000-bfe7b000 rw-p 00000000 00:00 0      	[stack]

After 100 seconds of sleeping of the process, we get the addresses of all the variables as well.

Addresses of local variables: &local_size = bfe79520
	  &local_i = bfe79524
   	   &local_temp_int = bfe79528
Address of global initialized variable g_init_number = 804a028
Address of global un-initialized variable g_uninit_temp = 804a034
Address of dynamically allocated variable dynmem_str = 82f1008

We got all the data to analyse. First look at the procfs output. we got the memory map entry for heap as well. It is given area from virtual address 082f1000 to 08312000.

Studying the addresses printed by the program, its not that difficult to affirm, the local variables belong to the stack segment, as stack memory begins at address 0xbfe5a000 and end at 0xbfe7b000. The address if local variable ‘local_size’ is 0xbfe79520 which is

0xbfe5a000	 >	0xbfe79520	 <	 0xbfe7b000
|stack-start-addr|	|local_size addr|	|stack-end-addr|

Similarly, the address of ‘local_i’ = 0xbfe79524 and ‘local_temp_int’ = bfe79528 lies within the address limits of the stack.

Next, moving on to check which memory segment the global variables belong to. The initialised global variable ‘g_init_number’ is at memory address 0x804a028 and the un-initialized global variable ‘g_uninit_temp’ at memory address 0x804a034 falls in the range of the read/write memory segment

0804a000-0804b000 rw-p 00001000 08:01 7345013	/home/ubuntu/aprograms/memmaps

We rightly comprehended it as the data segment. Although internally the the un-initialized globals should be in the bss portion of the data segment as per our understanding.

The last but not the least, we have the pointer ‘dynmem_str’ which stores the address of the dynamically allocated memory i.e. 0x82f1008. Looking at the memory map, this address belongs to

082f1000-08312000 rw-p 00000000 00:00 0      	[heap]

This memory area represents the heap. So, we even proved that ‘malloc()’ allocates memory from the heap segment.

Creating new processes from existing processes

Standard C facilitates to create new processes from a program in execution. This is called creating processes from a running process, which is generally true in most of the cases. Another paramount fact is that, when a process creates a new process, both the two processes have their own separate memories in the RAM, where each of the process have their own individual memory segments.

There are two popular ways of creating new processes from a program.

The system() method

Standard C gives its programmers a convenient option to run commandline commands/executables from within a program. It is done through the system() method.
The syntax from the man page looks like:

   	#include < stdlib.h >
   	int system(const char *command);

It takes a command in the format of a string as its input argument and returns the command status after the command completion.
Here is an example program to illustrate the use of the system() method

#include < stdio.h >
#include < stdlib.h >

int main()
{
	int retcode = 0;
	retcode = system("cat sys.c");

  	 return retcode;
}

Compiling and running the built binary,

$ gcc sys.c -Wall -o sys
$./sys

The output I get is:

#include < stdio.h >
#include < stdlib.h >

int main()
{
	int retcode = 0;

	retcode = system("cat sys.c");

   return retcode;
}

Pretty much as we expected. It runs the command given to the system() method which displays the contents of the ‘sys.c’ file on the standard output.

However, interesting to know is that it isn’t the ‘sys’ process which created the new process running ‘cat’ executable. With the system() method, it creates new process ‘shell’ and hands over the command to it, which in turn triggers the ‘cat’ command and creates that process. To confirm, let us run our ‘helloworld’ binary from our source and check its parent process.

Here is the modified source, to trigger ‘helloworld’

#include < stdio.h >
#include < stdlib.h >

int main()
{
	int retcode = 0;
	retcode = system("./helloworld");

	return retcode;
}

We build the source and run the binary in the background.

$ gcc sys.c -Wall -o sys
$./sys &

We know running ‘helloworld’ executable is a blocking call as it has a 100 sec sleep().
Therefore, in the meanwhile, first wish to see what all processes are running and their PID’s.

$ps
  PID TTY      	TIME CMD
 1912 pts/1	00:00:00 bash
 2112 pts/1	00:00:00 sys
 2113 pts/1	00:00:00 sh
 2114 pts/1	00:00:00 helloworld
 2115 pts/1	00:00:00 ps

We get an inkling from the running ‘sh’ process that it must have been started by the system() call in the ‘sys’ process. Alongwith, we also notice the ‘helloworld’ process running, which has been launched from within our program using the ‘system()’ call.

To check the parent process of ‘helloworld’, we use the Linux command ‘ps’ as

$ps -f
UID    	PID  PPID  C STIME TTY      	TIME CMD
ubuntu  1912  1902  0 16:12 pts/1	00:00:00 bash
ubuntu  2112  1912  0 18:07 pts/1	00:00:00 ./sys
ubuntu  2113  2112  0 18:07 pts/1	00:00:00 sh -c ./helloworld
ubuntu  2114  2113  0 18:07 pts/1	00:00:00 ./helloworld
ubuntu  2116  1912  0 18:08 pts/1	00:00:00 ps -f

Note: One can also use the procfs as /proc/2114/status to know the PPID of the ‘helloworld’ process.

From the ‘ps -f’ output, our line of interest i.e.

ubuntu  2114  2113  0 18:07 pts/1	00:00:00 ./helloworld

sets forth the PPID as ‘2113’ which is the PID of the shell process. Moreover, the PPID of the shell process is ‘2112’ which is the PID of our ‘sys’ process. The numbers above says it all.

Caveat  : Although, using the ‘system()’ method is pretty convenient, it has limitations and drawbacks too. Since, it uses shell to run the command, the behavior of the command will depend upon what shell it is using. If the shell is compromised in security, using the ‘system()’ method could be a hazard.

The fork() method

The fork() call is used to create a duplicate child process as the parent process which is calling it. So, from the point, the fork() call was made, both the parent and the child process starts executing the same program.
The syntax from the man page is

#include < unistd.h >
pid_t fork(void);

Since when this method returns, we have two processes running the same program. Therefore, in the main parent process, the return value is the PID of the newly created child process. However, in the newly created child process, the return value is zero. This is a significant distinguishing parameter between a child process and the parent process while in the program. We shall see more in the later discussions.

Although we would be discussing the data-structures related to processes in the next part, we can’t ignore the datatype used for the PID.

pid_t

Therefore, if we want to store a PID in a variable, we now know what data-type it should be.

Further, we shall experience using the fork() method in a program.

#include < stdio.h >
#include < unistd.h >

int main()
{
	pid_t pid_child = 0;
	int a = 12;
	int b = 3;
	int result = -1;

	printf("All set to create another process\n");
	pid_child = fork();
	if (pid_child)
	{
    	result = a + b;
    	printf("Parent process : The result is %d\n", result);
	}
	else
	{
            /*The pid_child = 0*/
    	result = a * b;
    	printf("Child process : The result is %d\n", result);
	}

	/*Buying time to check the running processes*/
	sleep(100);
	return 0;
}

In the above example source code, we do two different operations in both the child and parent processes. The way we distinguish what to do in the parent process and what has to be done in the child process is by checking the value of returned PID by the fork() call.

In the parent process, the value returned by fork() would be a valid PID i.e. non-zero positive value. However, in the child process instance, the variable value would be just zero, as this process has just been created. Well, we’ve sent both the processes to sleep for 100 seconds.

Time to compile and launch the process:

$ gcc frk.c -Wall -o frk
$./frk &
[3] 2153
$ All set to create another process
Parent process : The result is 15
Child process : The result is 36

The process is now sleeping after printing the results. Meanwhile, let us check the active process information.

$ps -f
UID    	PID  PPID  C STIME TTY      	TIME CMD
ubuntu  1912  1902  0 16:12 pts/1	00:00:00 bash
ubuntu  2153  1912  0 18:57 pts/1	00:00:00 ./frk
ubuntu  2154  2153  0 18:57 pts/1	00:00:00 ./frk
ubuntu  2155  1912  0 18:57 pts/1	00:00:00 ps -f

From the above output, we identify our two processes with the same command. One of them(1) is a parent process which as the PPID of the ‘bash’ and other(2) is the child process which has the PPID as the PID of (1).

(1) ubuntu  2153  1912  0 18:57 pts/1	00:00:00 ./frk
(2) ubuntu  2154  2153  0 18:57 pts/1	00:00:00 ./frk

Conclusion

Linux processes part II is an effort to extend the horizons of the readers with how processes reside in the virtual memory of Linux along with a pragmatic approach to visualize the memory areas. The article also talked about how we can create processes programmatically. We hope this could be a little contribution in the vast and diversified topic as Linux Processes and provide a platform to initiate a child process in the mind of readers, which begins playing around with the memory addresses and launching new complex processes.

For now,

 $ps
 PID CMD
 222 Article Linux Process Part II
 $kill -9 222

In the Linux processes part III, we shall get apprised with data-structures related to processes and much more.

Want more? Read an interesting article on file extensions in Linux.

2 thoughts on “Linux Processes explained – Part II

  1. Sandip Khot

    Wonderfull article….helped me understand the basics concepts…The best part about it was the practical explaination with programs that I could try out directly…

    Thanks

    Reply
  2. Ravindra Singhix

    Hi,,

    I came to your site while searching for what is shell in *inx systems.
    then go through site all posts are nicelly written
    Really appricated :)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *