To activate OpenMP you have to set the corresponding environment variable (see 3.5) before calling the configure script like
This will insert the corresponding compiler switch (e.g. -openmp
, -omp
,
-mp
,...confer the following sections) into the compiler calls in the
makefile (see Sect. 3.5).
The calls to the timing routines that would be executed in parallel are removed by (not) setting the appropriate compiler macros (see Sect. 3.6).
In addition, the switch rhd_shortrad_dir_l02
(see 3.6)
might be set, according to experience about performance enhancements.
The user has to find optimum values for the parameters
n_hydcellsperchunk
(for the Roe solver module,
see Sect. 5.4.7)
and
n_viscellsperchunk
(for the tensor viscosity module,
see Sect. 5.4.8)
to optimize the size of the chunk given to one thread per time.
So far, only for the SHORTrad module the environment variable
OMP_SCHEDULE
can be set (before running CO5BOLD) to
control its OpenMP scheduling behavior.
Important parallel loops in the SHORTrad module have a SCHEDULE(RUNTIME)
modifier that allows this external control.
The old default is achieved by not defining the variable or by setting
On some machines (e.g. Intel Xeon with Linux and PGI compiler) a dynamic scheduling activated with
is advantageous. The size of the individual chunks might be set to larger values
than 1 (in the examples above).
The optimal value has to be found empirically. A good starting point
is number_of_grid_points_in_1D
/Number_of_treads
, which
gives for a model with grid points on a 4-processor machine
The behavior of the other modules is not affected.
The number of threads should equal the number of available
processors and has to be set at run-time with the environment variable
OMP_NUM_THREADS
,
e.g. with