Multi-processor Scheduling

Objectives

We would like to :

Discuss the impact of multiple cores/processors on scheduling.

Notes

SMP - Symmetric Multiprocessor
- All cores/processors have equal access to all memory
- But each processor has a dedicated cache.
- But they also use this for Symmetric Multiprocessing.
NUMA - Non Uniform Memory Access
- Each processor/group of processors has a memory
- Each processor has dedicated cache.
Moving processes on these different architectures have different costs.
- Staying on a core is best.
- Staying on a processor is good
- Moving between processors becomes expensive, with the cost depending on the architecture.
A first thought:
- Dedicate one thread of execution to the kernel
  - But this can lead to bottlenecks and slowdowns.
  - And it is not very efficient.
So then, what do we do about scheduling?
1. Have a single queue, and the next selected process goes on the next available core?
2. Have a per-core queue
In general, a critical section leads to inefficiency.
- Removing a process from the queue would be critical.
- As would placing a new process on the queue.
- In the single queue case, this is expensive.
- So choice A is not good.
In addition, choice A does not promote processor affinity.
Choice B, however, leads to balance issues
- One core can become lightly loaded while others are heavily loaded.
The author proposes two solutions to load balancing
- Push migration: a periodic task reblanaces the load among processors/cores.
- Pull migration: an idle core searches for processes to run.
- The author notes that some OS's do both.
Hardware thread scheduling is covered in 5.5.2
- Hardware manufacturers have built mechanisms to allow hardware to handle multiple threads per core.
  - This is from studies indicating burst patterns
  - The core can support quickly swapping several threads
  - This is done by the hardware, not the OS.
- On a system with n cores, the OS believes there are 2n (or even larger multiples) cores.
- The OS schedules a thread on each core, then the hardware schedules between the the 2 threads on a single core.
- But if the OS is aware of the hardware scheduling algorithm it can make some decisions to improve performance
  - The book's examples are weak.
  - See for a proposed patch.
- Be aware that for CPU bound applications, hyper-threading can be slower.
Processor Affinity
- If a scheduler trys to keep threads on the same processor, but will allow migration, this is called dsoft affinity
- If it does not allow absolute migration, but only within a processor it is called hard affinity.
- Apparently linux will support either.
- Affinity is very important in NUMA applications
- Apparently there is real conflict between load balancing and processor affinity.
- The author states "Thus scheduling algorithms for modern multicore NUMA systems have become quite complex."
  - The last issue they introduce here is heterogeneous multiprocessing
    - My phone has a 2x2.2 GHz Cortex A-76 and a 6x2.0 GHz Cortex-A55) CPU.
      - These are both ARM processors
      - But have different cache sizes.
      - And apparently different pipeline characteristics.
    - The A-55 is slow but low energy
    - The A-76 is fast by high energy.
    - The scheduler is (hopefully) aware of this and will run
      - Low priority, long term tasks on the A-55
      - High priority, short term tasks on the A-76
      - And can work without the A-76 if we need to preserve power.