Reductions

We would like to :

variables declared outside the parallel block are shared
Variables inside the block are per thread.
Reading and writing to shared variables can be problematic
- you can do this in a critical region
- #pragma omp critical
You can do like we did and declare a variable for each thread (vector)
Or you can use a reduction
- #pragma omp parallel for reduction(operator:variable)
  - The variable becomes a per-thread variable
  - The variable is initialized to the operators default value.
  - Each thread reduces the value into the local variable
  - At the end of the parallel region, the reduction operator will be performed in a critical region.
- Reduction operators: reference.
  - +, -, |, ^, || initialized to 0.
  - *, && initialized to 1.
  - max - smallest value of the type.
  - min - largest value of the type.
It is most likely that reductions are done in a tree like manner.
- Consider 8 threads doing a sum.
- Step 1: on threads 0 through 3: s_i = s_i + sum_i+4
- Step 2: on threads 0 through 1: s_i = s_i + sum_i+2
- Step 3: on threads 0 : s_i = s_i + sum_i+1
I just used these pairings because it was easy to write the notation.
- It is likely that this is performed in some other way.
You can make your own reductions.
- See reductions.cpp for an example.
- But these can be functions as well.