Multi-processor Scheduling
Objectives
We would like to :
- Discuss the impact of multiple cores/processors on scheduling.
Notes
- SMP - Symmetric Multiprocessor
- All cores/processors have equal access to all memory
- But each processor has a dedicated cache.
- But they also use this for Symmetric Multiprocessing.
- NUMA - Non Uniform Memory Access
- Each processor/group of processors has a memory
- Each processor has dedicated cache.
- Moving processes on these different architectures have different costs.
- Staying on a core is best.
- Staying on a processor is good
- Moving between processors becomes expensive, with the cost depending on the architecture.
- A first thought:
- Dedicate one thread of execution to the kernel
- But this can lead to bottlenecks and slowdowns.
- And it is not very efficient.
- So then, what do we do about scheduling?
- Have a single queue, and the next selected process goes on the next available core?
- Have a per-core queue
- In general, a critical section leads to inefficiency.
- Removing a process from the queue would be critical.
- As would placing a new process on the queue.
- In the single queue case, this is expensive.
- So choice A is not good.
- In addition, choice A does not promote processor affinity.
- Choice B, however, leads to balance issues
- One core can become lightly loaded while others are heavily loaded.
- The author proposes two solutions to load balancing
- Push migration: a periodic task reblanaces the load among processors/cores.
- Pull migration: an idle core searches for processes to run.
- The author notes that some OS's do both.
- Hardware thread scheduling is covered in 5.5.2
- Hardware manufacturers have built mechanisms to allow hardware to handle multiple threads per core.
- This is from studies indicating burst patterns
- The core can support quickly swapping several threads
- This is done by the hardware, not the OS.
- On a system with n cores, the OS believes there are 2n (or even larger multiples) cores.
- The OS schedules a thread on each core, then the hardware schedules between the the 2 threads on a single core.
- But if the OS is aware of the hardware scheduling algorithm it can make some decisions to improve performance
- Be aware that for CPU bound applications, hyper-threading can be slower.
- Processor Affinity
- If a scheduler trys to keep threads on the same processor, but will allow migration, this is called dsoft affinity
- If it does not allow absolute migration, but only within a processor it is called hard affinity.
- Apparently linux will support either.
- Affinity is very important in NUMA applications
- Apparently there is real conflict between load balancing and processor affinity.
- The author states "Thus scheduling algorithms for modern multicore NUMA systems have become quite complex."
- The last issue they introduce here is heterogeneous multiprocessing
- My phone has a 2x2.2 GHz Cortex A-76 and a 6x2.0 GHz Cortex-A55) CPU.
- These are both ARM processors
- But have different cache sizes.
- And apparently different pipeline characteristics.
- The A-55 is slow but low energy
- The A-76 is fast by high energy.
- The scheduler is (hopefully) aware of this and will run
- Low priority, long term tasks on the A-55
- High priority, short term tasks on the A-76
- And can work without the A-76 if we need to preserve power.