Linux SMP and Multicore

As the processor frequencies reach their limit ,today’s  SoC are packaged with multiple processor cores with the attempt to boost the computing power and performance of the devices. The chip-level multiprocessing solutions provides more CPUs on a single chip with performance oriented architecture devising dedicated  cache / memory  architectures. For the Linux OS , implemented as multi core aware operating system kernel does its part to optimize the load across the available CPUs (from threads to virtualized operating systems). All that’s left is to ensure that the application can be sufficiently multi-threaded to exploit the power in SMP. So in an SMP configured system ,Linux with its default SMP aware configuration will manage the migration of process and Interrupt resource between different processors/core over time depending upon how you have set affinity. By default a process uses all the individual processors(cores). Although multiple processors improve the overall system performance, it sometimes do impact the performance of several processes. This is because when a process is switched from one core to another the new processor has to flush its cache. Now imagine if your process is running on a 8 core system, a violent and aggressive migration scheme will eventually lead to many cache invalidation and flushes during its lifetime resulting in performance degradation.

Linux scheduling goals

The Linux OS scheduler implementation follows the principle to avoid starvation and boost interactivity . Fast response to user despite high load Achieved by inferring interactive processes and dynamically increasing their priorities per the default scheduling policies and will scale well with number of processes O(1) scheduling overhead

SMP goals

With Linux supporting SMP (symmetric multi processing) [1] ,it has to cater to certain additional scheduling goals becomes very implicit in a  multi core environment [2]. The system in this configuration will have an additional requirement to scale well with number of processors and need to have mechanisms to

Load balance: no CPU should be idle if there is work
CPU affinity: no random bouncing of processes

Reference: Documentation/sched-design.txt 14

Multiprocessor scheduling

Per-CPU runqueue – Possible for one processor to be idle while others have jobs waiting in their run queues Periodically, OS rebalance runqueues and migration threads move processes from one runqueue to another The kernel always locks runqueues in the same order for deadlock prevention

To simplify things a bit: the OS sets up a timer which interrupts the system at a fixed interval. A single interval is known as a time slice. Everytime this interrupt occurs, the OS runs the scheduling routine, which picks the next thread that is due to be executed. The context of the core is then switched from the currently running thread to the new thread, and execution continues.

2015-11-05 10_38_44-Clipboard

This diagram can be seen as a simplified view of the run-queue management and not necessarily reflect the complexities involved in deriving the heuristics based on scheduling policies setup for priority, inter-activeness and affinity .

Linux allows the user to specify which CPUs processes and interrupt handlers are bound.   Each process has a bitmask saying what CPUs it can run on By default, all CPUs Processes can change the mask Inherited by child processes (and threads), thus tending to keep them on the same CPU rebalancing does not override affinity

Performance boost Myth!

This section to me looks really off the topic ,but I purposely included in this post as this really challenges us to work cautiously in detailing the parallelization objectives to get the best out from the multi core architecture. I think this needs a lot of detailing considering the problems in the domain for which the solution is being worked on.

Amdahl’s law deals with these limitations of parallel computing. In one sentence, it says this:

The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program

The sequential parts result in situations where threads for one step in the algorithm have to wait for the threads of the previous step to signal that they’re ready. The more sequential parts there are in a program, the less benefit it will have from multiple cores. And also, the more benefit it will have from the single-threaded performance of each core.

2015-11-05 11_08_41-Linux and symmetric multiprocessing

http://www.ibm.com/developerworks/library/l-linux-smp/

The top line shows the number of processors. Ideally, this is what you’d like to see when you add additional processors to solve a problem. Unfortunately, because not all of the problem can be parallelized and there’s overhead in managing the processors, the speedup is quite a bit less. At the bottom (purple line) is the case of a problem that is 90% sequential. In the best case for this graph, the brown line shows a problem that’s 10% sequential and, therefore, 90% parallelizable. Even in this case, ten processors perform only slightly better than five

And that brings me back to the original point: people who think that the number of cores is the only factor in performance of multithreaded software.

Another side-effect of the faster single core is that parts that are strictly sequential in nature (where only one thread is active) are processed faster. Amdahl’s law essentially formulates a rather paradoxical phenomenon:

The more cores you add to a CPU, the faster the parallel parts of an application are processed, so the more the performance becomes dependent on the performance in the sequential parts

In the next post,we will review the ways to observe and audit the performance of system running with default affinity scheme and details the ways to  set the CPU affinity of the process/Interrupts to get them pinned to  specific or set of desired cores to improve or balance  the performance objectives

References

[1] : Symmetric Multi Processing : https://en.wikipedia.org/wiki/Symmetric_multiprocessing

[2] : Linux Kernel SMP support
http://www.ibm.com/developerworks/library/l-linux-smp/

Leave a comment