27-09-2012, 04:17 PM
Linux Process Management
Linux Process.doc (Size: 148.5 KB / Downloads: 99)
Abstract:
In theory, there is no difference in theory and practice, but in practice, there is. Learning the standard operating system concepts breaks the ice for studying a practical operating system implementation. In this seminar we will have a look on Linux process management and scheduler. I will explain the Linux 2.6.8.1 CPU scheduler. Process management is also in context of this scheduler.
The various attributes of the process descriptor are used to manage and control the lifetime of a process. We will see in detail how the process descriptor is implemented. We will see the data structures kernel uses for managing all the processes.
Till 2.4.x series of schedulers, the running time of scheduler increases linearly with the number of processes in the system. The asymptotic running time of these schedulers is of the order O(n), where n is the number of processes in the system. But the 2.6.8.1 scheduler performs all it’s duties in O(1) time. It means that it take a constant amount of time for scheduling purpose, independent of the number of processes in the system. There is no algorithm in this scheduler that takes more time than this. We will see how this is achieved.
Process
There are two kinds of execution contexts in Linux. Process and lightweight processes. However, Linux makes no distinction among these forms from scheduling point of view. Linux uses lightweight processes to provide the features of a multithreaded application. The Linux kernel use to store information about each process in the process descriptor. Which is implemented as task_struct.
The process list:
Kernel uses dynamic memory to store the process descriptors. It uses doubly linked lists to store the process descriptors. For each type of process state it uses a process descriptor list for it. Through the doubly linked lists it can scan efficiently all the process’s descriptors in a particular state. The special thing about these lists is that a pointer for these lists is embedded in the process descriptor itself. Following is an example list.
Locking:
Only one task may modify a CPU’s runqueue at any given time, and as such any task that wishes to modify a runqueue must obtain its lock first. Obtaining multiple runqueue locks must be done by order of ascending runqueue address in order to avoid deadlocks. A convenient function for obtaining two runqueue locks is double_rq_lock(rq1, rq2), which handles lock ordering itself. Its opposite, double_rq_unlock(rq1,rq2), does the same but unlocks instead of locks. Locking a runqueue that a certain task is in can be done with task_rq_lock(task, &flags).
Priority Arrays:
This data structure is responsible for the amazing working of the scheduler in O(1) time. The Linux 2.6.8.1 scheduler finds always the highest priority task in the system and schedules it. If there is more than one task on the same priority then, these tasks are scheduled in a round robin manner.
The scheduler finds the task having maximum priority in a constant time. Also if multiple tasks are on same priority these can be scheduled in a round robin fashion, in a constant time. When all the tasks complete their time slices the swapping of the active and expired array is done in constant time by just exchanging the two pointers.
Sleeping and Waking Tasks:
A task or process cannot always be running. If a task needs input from user then it can not move further until and unless the user types the input string in. It may take a few seconds. But in these few seconds number of other processes may complete their execution. So this process is blocked until user enters the input.
Sleeping is a special state in which tasks cannot be scheduled or run, which is important since if they could get scheduled or run execution would proceed when it shouldn’t and sleeping would have to be implemented as a busy loop. For example - if a task could be run after requesting data from a hard drive but before it was sure the data had arrived, it would have to constantly check (via a loop) to see whether or not the data had arrived.
Linux Process.doc (Size: 148.5 KB / Downloads: 99)
Abstract:
In theory, there is no difference in theory and practice, but in practice, there is. Learning the standard operating system concepts breaks the ice for studying a practical operating system implementation. In this seminar we will have a look on Linux process management and scheduler. I will explain the Linux 2.6.8.1 CPU scheduler. Process management is also in context of this scheduler.
The various attributes of the process descriptor are used to manage and control the lifetime of a process. We will see in detail how the process descriptor is implemented. We will see the data structures kernel uses for managing all the processes.
Till 2.4.x series of schedulers, the running time of scheduler increases linearly with the number of processes in the system. The asymptotic running time of these schedulers is of the order O(n), where n is the number of processes in the system. But the 2.6.8.1 scheduler performs all it’s duties in O(1) time. It means that it take a constant amount of time for scheduling purpose, independent of the number of processes in the system. There is no algorithm in this scheduler that takes more time than this. We will see how this is achieved.
Process
There are two kinds of execution contexts in Linux. Process and lightweight processes. However, Linux makes no distinction among these forms from scheduling point of view. Linux uses lightweight processes to provide the features of a multithreaded application. The Linux kernel use to store information about each process in the process descriptor. Which is implemented as task_struct.
The process list:
Kernel uses dynamic memory to store the process descriptors. It uses doubly linked lists to store the process descriptors. For each type of process state it uses a process descriptor list for it. Through the doubly linked lists it can scan efficiently all the process’s descriptors in a particular state. The special thing about these lists is that a pointer for these lists is embedded in the process descriptor itself. Following is an example list.
Locking:
Only one task may modify a CPU’s runqueue at any given time, and as such any task that wishes to modify a runqueue must obtain its lock first. Obtaining multiple runqueue locks must be done by order of ascending runqueue address in order to avoid deadlocks. A convenient function for obtaining two runqueue locks is double_rq_lock(rq1, rq2), which handles lock ordering itself. Its opposite, double_rq_unlock(rq1,rq2), does the same but unlocks instead of locks. Locking a runqueue that a certain task is in can be done with task_rq_lock(task, &flags).
Priority Arrays:
This data structure is responsible for the amazing working of the scheduler in O(1) time. The Linux 2.6.8.1 scheduler finds always the highest priority task in the system and schedules it. If there is more than one task on the same priority then, these tasks are scheduled in a round robin manner.
The scheduler finds the task having maximum priority in a constant time. Also if multiple tasks are on same priority these can be scheduled in a round robin fashion, in a constant time. When all the tasks complete their time slices the swapping of the active and expired array is done in constant time by just exchanging the two pointers.
Sleeping and Waking Tasks:
A task or process cannot always be running. If a task needs input from user then it can not move further until and unless the user types the input string in. It may take a few seconds. But in these few seconds number of other processes may complete their execution. So this process is blocked until user enters the input.
Sleeping is a special state in which tasks cannot be scheduled or run, which is important since if they could get scheduled or run execution would proceed when it shouldn’t and sleeping would have to be implemented as a busy loop. For example - if a task could be run after requesting data from a hard drive but before it was sure the data had arrived, it would have to constantly check (via a loop) to see whether or not the data had arrived.