-omp
,
-mp
,...confer the following sections) into the compiler calls in the
Makefile (see Sect.4.3).
The calls to the timing routines that would be executed in parallel
are removed by (not) setting the appropriate compiler macros (see Sect.4.4).
In addition, the switch rhd_shortrad_dir_l02
(see Sect.4.4.7.16)
might be set, according to experience about performance enhancements.
The user has to find optimum values for the parameters
n_hydcellsperchunk
(for the Roe and the HLLMHD solver module,
see Sect.7.1.8.9)
and
n_viscellsperchunk
(for the tensor-viscosity module,
see Sect.7.1.11.17)
to optimize the size of the chunk given to one thread per time.
For the Roe solver of the hydrodynamics module,
there exist also the optional parameters
n_hydcellsperchunk2
(see Sect.7.1.9.1)
and
n_hydcellsperchunk3
(see Sect.7.1.9.2).
For several modules the environment variable
OMP_SCHEDULE
can be set (before running CO5BOLD) to
control its OpenMP scheduling behavior.
Important parallel loops in the SHORTrad module have a SCHEDULE(RUNTIME)
modifier that allows this external control.
The old default is achieved by not defining the variable or by setting
is advantageous. The size of the individual chunks might be set to larger values
than 1 (in the examples above).
The optimal value has to be found empirically. A good starting point
is number_of_grid_points_in_1D
/Number_of_treads
, which
gives for a model with grid points on a 4-processor machine
However, usually the general default
is a good choice.
The number of threads should equal the number of available
processors and has to be set at run-time with the environment variable
OMP_NUM_THREADS
,
e.g. with
The size of the stack per thread can be set with
OMP_STACKSIZE
, as e.g. in
Usually, the default value is too small.
On machines with many cores, experiments with KMP_AFFINITY
might be beneficial for the performance as e.g. in