Introduction to OpenMP
Objectives
We would like to :
- Understand the basics of OpenMP
Notes
- I am using an open source book The Art of HPC, volume 2 by Victor Eijkhout.
- I think we should start with OpenMP, because I really don't have mpi set up anywhere.
- I was thinking of going to a more tutorial style class?
- I was thinking of a image based problem for our first computation
- A fractal computation
- Conway's game of life
- OpenMP
- An extension to most c/c++ fortran compilers
- There is a library of functions
- A set of compiler extensions
- OpenMP uses several system level concepts
- Threads:
- These are light weight processes
- Or the basic unit of computation today.
- They have
- Independent IR/PC/CU
- Stack
- But share
- data/bss/text segments
- Heap
- They operate in the same process
- But each can run simultaneously on different processors.
- There is a great picture here.
- The master thread (or main program) in red
- May create more threads to work on parallel parts
- But these will rejoin for the sequential parts.
- Synchronization
- This is essentially a barrier
- We mark a section as "critical" to control things like memory writes.
- The goal of a parallel program is speedup
- See speedup at wikipedia.
- s = Tsequential/Tparallel
- If it takes 20 seconds to run on a single processor, and 5 seconds on a parallel machine
- The speedup is 20/5 = 4;
- This is good if we have four processors/cores, it is linear
- If we have more than 4 processors/cores it is not good, but it is ok.
- If we have fewer than 4 processors/cores it is called super linear and could be problematic
- Normally we do not expect this
- But if we somehow gain other resources it is ok
- Usually cache, swap space, memory.
- Generally parallelization does not produce superlinear speedup.
- Overhead in the parallelization at least.
- Plus not everything is parallel.
- Consider labl 1
- The generation of the image might be perfectly parallelizable.
- But writing the image is not.
- Amdhal's Law
- The overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is used.
- If the part you can make faster is 10% of the computation, 90% of the computation remains unchanged.
- So if a task takes 100 seconds.
- It will still take 90 seconds + 10/speedup seconds.
- Or optimize/parallelize if most of the task is parallelizable.
- Speedup is defined better here.