Next: 3.7.2 General: Inlining
Up: 3.7 Optimization, Compiler Switches
Previous: 3.7 Optimization, Compiler Switches
Contents
Index
3.7.1 General: OpenMP settings
To activate OpenMP
you have to set the corresponding
environment variable (see 3.5)
before calling the configure script like
export F90_PARALLEL=openmp
./configure
make
This will insert the corresponding compiler switch (e.g. -openmp
, -omp
,
-mp
,...confer the following sections) into the compiler calls in the
makefile (see Sect. 3.5).
The calls to the timing routines that would be executed in parallel
are removed by (not) setting the appropriate compiler macros (see Sect. 3.6).
In addition, the switch rhd_shortrad_dir_l02
(see 3.6)
might be set, according to experience about performance enhancements.
The user has to find optimum values for the parameters
n_hydcellsperchunk
(for the Roe solver module,
see Sect. 5.4.7)
and
n_viscellsperchunk
(for the tensor viscosity module,
see Sect. 5.4.8)
to optimize the size of the chunk given to one thread per time.
For several modules the environment variable
OMP_SCHEDULE
can be set (before running CO5BOLD) to
control its OpenMP scheduling behavior.
Important parallel loops in the SHORTrad module have a SCHEDULE(RUNTIME)
modifier that allows this external control.
The old default is achieved by not defining the variable or by setting
export OMP_SCHEDULE="STATIC,1"
On some machines (e.g. Intel Xeon with Linux and PGI compiler) a dynamic scheduling
activated with
export OMP_SCHEDULE="DYNAMIC,1"
is advantageous. The size of the individual chunks might be set to larger values
than 1 (in the examples above).
The optimal value has to be found empirically. A good starting point
is number_of_grid_points_in_1D
/Number_of_treads
, which
gives for a model with
grid points on a 4-processor machine
export OMP_NUM_THREADS=4
export OMP_SCHEDULE="STATIC,43"
The behavior of the other modules is not affected.
The number of threads should equal the number of available
processors and has to be set at run-time with the environment variable
OMP_NUM_THREADS
,
e.g. with
export OMP_NUM_THREADS=16
Next: 3.7.2 General: Inlining
Up: 3.7 Optimization, Compiler Switches
Previous: 3.7 Optimization, Compiler Switches
Contents
Index