Introduction to OpenMP

Objectives

We would like to :

Understand the basics of OpenMP

Notes

I am using an open source book The Art of HPC, volume 2 by Victor Eijkhout.
I think we should start with OpenMP, because I really don't have mpi set up anywhere.
I was thinking of going to a more tutorial style class?
I was thinking of a image based problem for our first computation
- A fractal computation
- Conway's game of life
OpenMP
- An extension to most c/c++ fortran compilers
  - There is a library of functions
  - A set of compiler extensions
- OpenMP uses several system level concepts
  - Threads:
    - These are light weight processes
    - Or the basic unit of computation today.
    - They have
      - Independent IR/PC/CU
      - Stack
    - But share
      - data/bss/text segments
      - Heap
    - They operate in the same process
    - But each can run simultaneously on different processors.
    - There is a great picture here.
      - The master thread (or main program) in red
      - May create more threads to work on parallel parts
      - But these will rejoin for the sequential parts.
  - Synchronization
    - This is essentially a barrier
    - We mark a section as "critical" to control things like memory writes.
The goal of a parallel program is speedup
- See speedup at wikipedia.
- s = T_sequential/T_parallel
- If it takes 20 seconds to run on a single processor, and 5 seconds on a parallel machine
  - The speedup is 20/5 = 4;
  - This is good if we have four processors/cores, it is linear
  - If we have more than 4 processors/cores it is not good, but it is ok.
  - If we have fewer than 4 processors/cores it is called super linear and could be problematic
    - Normally we do not expect this
    - But if we somehow gain other resources it is ok
    - Usually cache, swap space, memory.
- Generally parallelization does not produce superlinear speedup.
  - Overhead in the parallelization at least.
  - Plus not everything is parallel.
  - Consider labl 1
    - The generation of the image might be perfectly parallelizable.
    - But writing the image is not.
Amdhal's Law
- The overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is used.
- If the part you can make faster is 10% of the computation, 90% of the computation remains unchanged.
  - So if a task takes 100 seconds.
  - It will still take 90 seconds + 10/speedup seconds.
- Or optimize/parallelize if most of the task is parallelizable.
- Speedup is defined better here.