3.7.11 Linux: Intel Compiler

Next: 3.7.12 Linux: PathScale Compiler Up: 3.7 Optimization, Compiler Switches Previous: 3.7.10 Linux: PGI Compiler Contents Index

3.7.11 Linux: Intel Compiler

With Version 7.0 and 7.1 of the Intel compiler CO5BOLD compiled (with tricks, see below). Version 8.0 still caused trouble. With version 9.1 (and up) everything compiles smoothly.

The native format on Intel machines is little_endian. With

export F_UFMTENDIAN=big

(to be set at runtime after compilation before running CO5BOLD) the default can be changed to big_endian. In 3.6 the preprocessor switches are listed, that control the modern - single - version uio_mac_module.F90. The compiler is called with ifort (ifc on older compiler versions).

Important switches are:

-Vaxlib: Link proper library to make the machine understand e.g. call flush(6).
fpp: Activate the preprocessor (silently).
-O3: General optimization flag.
-tpp6 -xK: Optimization especially for Pentium III (and Athlon, includes SSE vector commands).
-tpp7 -xW: Optimization especially for Pentium IV (includes SSE2 vector commands).
-xP: Optimization especially for Core 2 Duo and simular architectures.
-ip: Optimization: activate interprocedural optimization within each source file. This enables inlining.
-DMSrad_raytas=2: Optimization: choose non-default version of loop in SUBROUTINE raytas in file MSrad3D.F90. See Sect. 3.6.
-Drhd_shortrad_dir1_l01=1: Optimization: Transpose arrays and use routine rhd_shortrad_dir3 for rays in x1 direction. See Sect. 3.6.
-openmp: Parallelization: OpenMP directives are activated. Note that the for compiler versions before 9.0 the UIO routines should be compiled without OpenMP support (even if they do not contain any OpenMP directives themselves).
-fast: General optimization flag to choose (close to) optimum optimization for the local machine. However, on AMD machines this works less than perfect, because the features of the processors are not well recognized.
-i_dynamic: Helpful against ``undefined reference to `__ctype_b''' errors.
-r8 -fpconstant: Useful to force compilation in double precision (see 3.7.3).

On Macintosh machines the typical optimization flags are -O3 -no-prec-div -fno-alias -ip. A big problem is the tiny stack size on those machines: large arrays taken from the stack should be avoided. For the SHORTrad module, this can be achieved by setting -Drhd_arrays_l01=2 during compilation. In addition, relatively small chunk sizes should be specified in rhd.par, see Sect. 5.4.7 and Sect. 5.4.8.

Using the Intel compiler (before version 9.1) there was a problem with the UIO modules when OpenMP is activated. This was a bit weird because the UIO modules do not contain any OpenMP directives. However, this means that OpenMP can be safely deactivated for these modules. A proposed compiling sequence is (all modules activated):

export F90_LHDRAD=1 export F90_MSRAD=1 export F90_SHORTRAD=1 export F90_DUST=1 export F90_MHD=1 export F90_PARALLEL=scalar ./configure make UIO export F90_PARALLEL=openmp ./configure make

For OpenMP (see Sect. 3.7.1), the number of threads can be set for instance with

export OMP_NUM_THREADS=16

for a machine with 16 threads (e.g.: 2 processors, 4 cores per processor, 2 threads per core). Experimenting with the scheduling, e.g., with

export OMP_SCHEDULE=DYNAMIC,1

export OMP_SCHEDULE=GUIDES,2

might improve the performance (see Sect. 3.7.1). The last two OpenMP variables are recognized by several compiler. However, there are Intel-specific ones:

In some cases it might be helpful to set

export LD_ASSUME_KERNEL=2.4.19

when encountering problems with OpenMP. However, that seems not to be necessary with recent compiler versions. Still, often the stack memory per thread is too small, which can be increased e.g., with

export KMP_STACKSIZE=300000000

To optimize the performance, particularly on many-core systems, the thread affinity (see ``Intel Thread Affinity Interface'') can specified e.g., with

export KMP_AFFINITY=verbose,granularity=core,compact

Next: 3.7.12 Linux: PathScale Compiler Up: 3.7 Optimization, Compiler Switches Previous: 3.7.10 Linux: PGI Compiler Contents Index