next up previous contents index
Next: 3.7.2 General: Inlining Up: 3.7 Optimization, Compiler Switches Previous: 3.7 Optimization, Compiler Switches   Contents   Index


3.7.1 General: OpenMP settings

To activate OpenMP you have to set the corresponding environment variable (see 3.5) before calling the configure script like

export F90_PARALLEL=openmp
./configure
make

This will insert the corresponding compiler switch (e.g. -openmp, -omp, -mp,...confer the following sections) into the compiler calls in the makefile (see Sect. 3.5).

The calls to the timing routines that would be executed in parallel are removed by (not) setting the appropriate compiler macros (see Sect. 3.6).

In addition, the switch rhd_shortrad_dir_l02 (see 3.6) might be set, according to experience about performance enhancements.

The user has to find optimum values for the parameters n_hydcellsperchunk (for the Roe solver module, see Sect. 5.4.7) and n_viscellsperchunk (for the tensor viscosity module, see Sect. 5.4.8) to optimize the size of the chunk given to one thread per time.

So far, only for the SHORTrad module the environment variable OMP_SCHEDULE can be set (before running CO5BOLD) to control its OpenMP scheduling behavior. Important parallel loops in the SHORTrad module have a SCHEDULE(RUNTIME) modifier that allows this external control. The old default is achieved by not defining the variable or by setting

export OMP_SCHEDULE="STATIC,1"

On some machines (e.g. Intel Xeon with Linux and PGI compiler) a dynamic scheduling activated with

export OMP_SCHEDULE="DYNAMIC,1"

is advantageous. The size of the individual chunks might be set to larger values than 1 (in the examples above). The optimal value has to be found empirically. A good starting point is number_of_grid_points_in_1D/Number_of_treads, which gives for a model with $171^3$ grid points on a 4-processor machine

export OMP_NUM_THREADS=4
export OMP_SCHEDULE="STATIC,43"

The behavior of the other modules is not affected.

The number of threads should equal the number of available processors and has to be set at run-time with the environment variable OMP_NUM_THREADS, e.g. with

export OMP_NUM_THREADS=16


next up previous contents index
Next: 3.7.2 General: Inlining Up: 3.7 Optimization, Compiler Switches Previous: 3.7 Optimization, Compiler Switches   Contents   Index