With Version 7.0 and 7.1 of the Intel compiler CO5BOLD compiled (with tricks, see below). Version 8.0 still caused trouble. With version 9.1 (and up) everything compiles smoothly.
The native format on Intel machines is
little_endian
.
With
big_endian
.
In 3.6 the preprocessor switches are listed, that control the
modern - single - version uio_mac_module.F90
.
The compiler is called with ifort
(ifc
on older compiler versions).
Important switches are:
-Vaxlib
: Link proper library to make the machine understand
e.g. call flush(6)
.
fpp
: Activate the preprocessor (silently).
-O3
: General optimization flag.
-tpp6 -xK
: Optimization especially for Pentium III (and Athlon,
includes SSE vector commands).
-tpp7 -xW
: Optimization especially for Pentium IV
(includes SSE2 vector commands).
-xP
: Optimization especially for Core 2 Duo and simular architectures.
-ip
: Optimization: activate interprocedural optimization within
each source file. This enables inlining.
-DMSrad_raytas=2
: Optimization: choose non-default version
of loop in SUBROUTINE raytas
in file MSrad3D.F90
.
See Sect. 3.6.
-Drhd_shortrad_dir1_l01=1
: Optimization:
Transpose arrays and use routine rhd_shortrad_dir3
for rays in x1 direction.
See Sect. 3.6.
-openmp
: Parallelization: OpenMP directives are activated. Note that the
for compiler versions before 9.0
the UIO routines should be compiled without OpenMP support (even if they do not contain
any OpenMP directives themselves).
-fast
: General optimization flag to choose (close to)
optimum optimization for the local machine. However, on AMD machines this works less than perfect,
because the features of the processors are not well recognized.
-i_dynamic
: Helpful against ``undefined reference to `__ctype_b
''' errors.
-r8 -fpconstant
: Useful to force compilation in double precision (see 3.7.3).
On Macintosh machines the typical optimization flags are
-O3 -no-prec-div -fno-alias -ip
.
A big problem is the tiny stack size on those machines:
large arrays taken from the stack should be avoided.
For the SHORTrad module, this can be achieved by setting
-Drhd_arrays_l01=2
during compilation.
In addition, relatively small chunk sizes should be specified in rhd.par
,
see Sect. 5.4.7 and
Sect. 5.4.8.
Using the Intel compiler (before version 9.1) there was a problem with the UIO modules when OpenMP is activated. This was a bit weird because the UIO modules do not contain any OpenMP directives. However, this means that OpenMP can be safely deactivated for these modules. A proposed compiling sequence is (all modules activated):
For OpenMP (see Sect. 3.7.1), the number of threads can be set for instance with
for a machine with 16 threads (e.g.: 2 processors, 4 cores per processor, 2 threads per core). Experimenting with the scheduling, e.g., with
or
might improve the performance (see Sect. 3.7.1). The last two OpenMP variables are recognized by several compiler. However, there are Intel-specific ones:
In some cases it might be helpful to set
when encountering problems with OpenMP. However, that seems not to be necessary with recent compiler versions. Still, often the stack memory per thread is too small, which can be increased e.g., with
To optimize the performance, particularly on many-core systems, the thread affinity (see ``Intel Thread Affinity Interface'') can specified e.g., with