Next: 3.7 Optimization, Compiler Switches
Up: 3 Program Files, Installation,
Previous: 3.5 Configure Script
Contents
Index
3.6 Compiler Macros
Some of the modules of the CO5BOLD code (with suffix ``.F90'') employ compiler macros
to switch between code versions during compile time.
Typically you define at least one of the three switches rhd_r01
, rhd_r02
,
or rhd_r03
to choose a radiation transport module.
The others have reasonable default values.
To find the combination with the optimal performance, you should look into Sect. 3.7
The macros are sorted into different categories:
Some activate a certain feature (like a radiation transport module or
the dust module). They have to be selected by the user (typically via
environment variables and the configure
script, see Sect. 3.5) each time
the code is compiled for a certain purpose.
Other macros are meant to improve the performance by offering the
choice between e.g. different loop structures or case distinctions. These
macros are set by the configure script to the best knowledge of the author(s).
Ideally, they should be checked and modified if necessary each time CO5BOLD is
compiled on a new machine. It should be save to modify these settings: the
results between runs with different settings should only differ slightly due
to round-off errors.
Some macros select between different numerical approximations. A change here
should be visible in a (more or less drastic) change of the results of a simulation.
Usually, the default values should be accepted. Other settings typically only exist
to allow the comparison with older versions of CO5BOLD or because there are new
developments going on which have not yet managed to become the default.
A couple of macros only activate timing measurements and result in additional output.
Some of them are not thread-save und should only be activated for runs on one thread
(as done by the configure script). It is always save to switch any of them off
(by removing or undefining them).
The macros in the category test mark parts of code under development.
The default values should only be changed with great care (typically by the author of
that code segment). The configure script does not touch these settings.
General:
timing_c_factor
:
in timing_module.F90
,
(``timing count factor'').
Category: account for property of machine.
To produce the timing statistics printed at the end of a
simulation run the standard Fortran routine SYSTEM_CLOCK
is used. The macro
timing_c_factor
specifies by how much the count rate of this routine is reduced
when storing its count value. This does not prevent all overflows but can make the
output much more useful.
Values:
1
: (default) count rate of SYSTEM_CLOCK
is used directly.
- otherwise: e.g. 1000, count rate of
SYSTEM_CLOCK
is reduced by this factor.
By a proper choice of this factor the timing measurements of individual routines can
be made meaningful: the reduction of the count rate prevents overflows due to the
addition of several measurements.
An overflow during an individual measurement can not be prevented.
Therefore, the count rate for the entire program still tends to produce overflows.
timing_c_range
:
in timing_module.F90
,
(``timing counter range'').
Category: account for property of machine.
To produce the timing statistics printed at the end of a
simulation run the standard Fortran routine SYSTEM_CLOCK
is used. The macro
timing_c_range
specifies the number of digits (the ``range'') used for the
integers storing the counters.
The value appears in the code e.g. in
integer(kind=selected_int_kind(timing_c_range)) :: count_total
.
Fortran standard is the use of 4-byte integers.
However, that is often not sufficient and can lead to overflows.
Many compilers can use longer integers, though.
That can be tried by setting the timing_c_range
not to 9 (the default) but e.g. to 15.
If it works, overflows are essentially ruled out.
If it does not work, the compilation stops with an error message.
Values:
9
: (default) use standard 4-byte integers for timing counters.
15
or otherwise: use this number for the range of Fortran integers
By a proper choice of this number all the timing measurements can
be made meaningful, even for long runs on machines with fast counters.
timing_r_type
: in timing_module.F90
,
(``timing rate type'').
Category: account for property of machine.
During a CO5BOLD run, the total execution time is measured and printed after
each time step. The type of clock used can be chosen with:
1
: Use call system_clock()
(default).
2
: Use call Date_and_Time()
.
3
: Use t=etime()
(Sun version, used in most cases).
4
: Use call etime()
(gfortran version).
5
: Use call tremain()
(Cray version).
6
: Use call clock()
(Hitachi version).
gasinter_l01
:
in gasinter_routines.F90
,
(``gas interpolation l01'').
Category: performance enhancement.
This switch determines how temporary arrays are handled to improve performance
Values:
0
: (default) Temporary coefficient arrays are actually copied.
1
: Temporary coefficient arrays just get a pointer link into the big arrays.
2
: No temporary coefficient arrays.
gasinter_l02
:
in gasinter_routines.F90
,
(``gas interpolation l02'').
Category: performance enhancement.
This switch determines how coefficient arrays are handled to improve performance
Values:
0
: Coefficients are stored as pointer arrays (old version).
1
: (default) Coefficients are stored as allocatable arrays.
rhd_box_arrays01
:
in rhd_box_module.F90
,
(``rhd box arrays 01'').
Category: performance enhancement.
Switch to choose between
the (``classical'') use of pointer arrays within the box structure to store arrays,
or the (``new'') version with allocatable arrays, that is potentially faster.
Values:
0
: classical pointer arrays (default).
1
: new allocatable arrays (probably faster)
rhd_box_grav01
:
in rhd_box_module.F90
,
(``rhd box gravitation 01'').
Category: feature activation.
Switch to activate the array for the gravitational potential in the box
structure.
If the switch is set to 1, a 3D array for the potential is created, copied,
removed, ... There is no module to compute the gravitational potential, yet. Therefore
the entire thing has no practical value, yet.
Values:
0
: (default) no handling of array.
1
: array handling activated.
rhd_box_quc01
:
in rhd_box_module.F90
and rhd.F90
,
(``rhd box quantity centered 01'').
Category: feature activation.
CO5BOLD is able to handle a number of further quantities
(quc
: ``quantity centered'')
in addition to the basic hydrodynamics quantities (
,
, ...) if this
compiler switch is activated. These additional quantities can be e.g. densities of
dust distribution moments or densities of molecules.
They are required for the treatment of chemical reaction
networks and time-dependent hydrogen
ionization. For the latter the quc
arrays
contain the number densities of the atomic level populations.
For the chemical reaction network, the arrays contain the species number densities in
cm
and with headers following this example: "Number density of H2".
Values:
0
: (default) no handling of additional quantities (density arrays).
1
: handling of additional density arrays is activated.
To actually include dust formation in a simulation, it is necessary to
- set the switch
-Drhd_box_quc01=1
during compilation (this is done by the
configure script if the environment variable F90_DUST
is set to 1,
see the description of the variable in Sect. 3.5),
- put arrays specifying the initial conditions of the additional density into
the start model (as
real quc001
, real quc002
, ...),
- select a proper model describing dust (or molecule) formation in the parameter
file (with
character dustscheme
).
rhd_box_bmag01
:
in rhd_box_module.F90
and rhd.F90
,
(``rhd box b magnetic 01'').
Category: feature activation.
CO5BOLD can handle magnetic field arrays if this compiler switch is set.
Values:
: (default) no handling of magnetic field arrays.
: handling of magnetic field arrays is activated.
To actually account for magnetic fields in a simulation, it is necessary to
- set the switch
-Drhd_box_bmag01=1
during compilation (this is done by the
configure script if the environment variable F90_MHD
is set to 1,
see Sect. 3.5),
- put arrays specifying the initial conditions of the
boundary centered magnetic field arrays into
the start model (as
real bb1
, real bb2
, real bb3
),
- select an hydrodynamics scheme that is able to handle magnetic fields in the parameter
file (with
character hdscheme
).
Input/output (UIO):
uio_switch_system_l01
: in uio_mac_module.F90
,
(``uio switch system list 01'').
Category: I/O, account for property of machine.
Sometimes, it is useful to have in the header of UIO file information about the system.
that wrote the file. How Fortran gets this information depends on the machine.
Values:
0
: Read information from file uioenvfile.txt
. Don't write or delete the file.
1
: Call external function system()
to produce the file uioenvfile.txt
.
Read and delete the file (default).
2
: Call routine system()
to produce uioenvfile.txt
.
Read and delete the file.
3
: Call external routine HF_SH()
to produce uioenvfile.txt
.
Read and delete it.
4
: Call external function uname()
to get system information.
5
: Call external routine pxfuname()
to get system information.
uio_switch_native_l01
: in uio_mac_module.F90
,
(``uio switch native list 01'').
Category: I/O, account for property of machine.
All machines can read and write their native format - only that it might differ from
one to the other. It is there, if needed, but should be avoided in practise.
Values:
1
: Get information about word lengths from uio_deform()
(default).
0,2
: Set word lengths explicitely.
uio_switch_ieeebe_l01
: in uio_mac_module.F90
,
(``uio switch ieee big endian list 01'').
Category: I/O, account for property of machine.
This is the standard for files read of written by CO5BOLD. All machine/compiler combinations
should support it - either natively or via some conversion process.
Values:
0
: Don't include this conversion type (default).
1
: Get information about word lengths from uio_deform()
.
2
: Set word lengths explicitely.
uio_switch_ieeele_l01
: in uio_mac_module.F90
,
(``uio switch ieee little endian list 01'').
Category: I/O, account for property of machine.
Values:
0
: Don't include this conversion type (default).
1
: Get information about word lengths from uio_deform()
.
2
: Set word lengths explicitely.
uio_switch_ieee_l01
: in uio_mac_module.F90
,
(``uio switch ieee list 01'').
Category: I/O, account for property of machine.
This is a fallback for reading of UIO files,
that should be activated for compilation if either of the Endian formats above is allowed.
It should not be used for writing.
Values:
0
: Don't include this conversion type (default).
1
: Get information about word lengths from uio_deform()
.
2
: Set word lengths explicitely (4-byte words).
3
: Set word lengths explicitely (8-byte words).
uio_switch_crayxmp_l01
: in uio_mac_module.F90
,
(``uio switch cray x-mp list 01'').
Category: I/O, account for property of machine.
This is the native format on old Cray X-MP and Cray TS machines.
Values:
0
: Don't include this conversion type (default).
1
: Get information about word lengths from uio_deform()
.
uio_switch_open_l01
: in uio_mac_module.F90
,
(``uio switch open list 01'').
Category: I/O, account for property of machine.
Many compilers support a transparent conversion between number representations
during reading and writing. This is not a Fortran standard, though, and has to
be activated in different ways:
0
: No conversion (default).
1
: Use the'convert
keyword in the open()
statement
(Intel and gfortran compilers on Linux machines, standard compiler on Dec Alphas).
2
: Use asnunit()
Cray X-MP style.
3
: Use asnunit()
Cray TS style.
Hydrodynamics (Roe solver):
rhd_hyd_gravcorr_p01
:
in rhd_hyd_module.F90
,
(``rhd hydrodynamics gravitation correction parameter 01'').
Category: selection of approximation.
This parameter controls the way the Roe solver handles the source terms
due to gravity. A different choice results in different simulation results
and not just in slightly faster (or slower) code.
The problem is that the original Roe solver interpretes the pressure gradient
in a hydrostatic stratification a fluctuation due to shock waves.
In case of strong stratification this can lead to weird effects.
With activated correction the Roe solver treats only the deviations from
a hydrostatic stratification as due to waves (or shocks).
Several correction formulas have been tried.
The latest is the recommended default.
Values:
0
: No pressure correction terms in Roe solver.
1
: Simple correction with rhomean, no new average pressure.
2
: Simple correction with rhomean, new average pressure.
3
: Correction with local rho, limited, new average pressure.
4
: Correction with local rho, new (different formula) average pressure.
5
: (default) Correction with local rho, new limit, new average pressure.
6
: Modification of 5
with different geometry factors in case of
non-equidistant grid.
rhd_hyd_entropyfix_p01
:
in rhd_hyd_module.F90
,
(``rhd hydrodynamics entropy fix parameter 01'').
Category: performance enhancement.
The entropy fix can be done in one of two ways
to get optimum performance (with essentially the same results).
Values:
0
: (default) ``if...then...else'' construction
1
: use a mask and the signum function
rhd_hyd_upwind_p01
:
in rhd_hyd_module.F90
,
(``rhd hydrodynamics upwind parameter 01'').
Category: performance enhancement.
The determination of the upwind direction can be done in one of two ways
to get optimum performance (with essentially the same results).
Values:
0
: (default) ``if...then...else'' construction
1
: use a mask and the signum function
rhd_roe1d_flux_l01
:
in rhd_hyd_module.F90
,
(``rhd roe 1 dimension flux loop 01'').
Category: test.
By setting this switch an alternative way of computing the upwind centered Roe states
is activated (only for 'constant' reconstruction,
for performance test purposes only: do not activate!).
Values:
undefined
: (default) Use standard method to compute the Roe states.
defined
: Use non-standard method to compute the Roe states.
rhd_bound_t01
:
in rhd_hyd_module.F90
,
(``rhd bound timing 01'').
Category: additional output.
Produce timing information for ``inner boundary'' routine (central potential)
or lower and upper boundary routines (constant gravitation).
It can be used together with OpenMP.
Values:
undefined
: (default) no timing information.
defined
: call subroutines to measure elapsed time.
rhd_roe1d_flux_t01
:
in rhd_hyd_module.F90
,
(``rhd roe 1 dimension flux timing 01'').
Category: additional output.
Produce timing information for the routine which computes the Roe fluxes.
It should not be used in conjunction with OpenMP.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
rhd_roe1d_step_t01
:
in rhd_hyd_module.F90
,
(``rhd roe 1 dimension step timing 01'').
Category: additional output.
Produce timing information for the routine which performs the Roe step.
It should not be used in conjunction with OpenMP.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
Hydrodynamics (tensor viscosity):
rhd_vis_density_p01
:
in rhd_vis_module.F90
,
(``rhd viscosity density parameter 01'').
Category: selection of approximation.
Choose formula for density average at cell boundary in tensor viscosity routines.
Values:
0
: rhomean=min(rholeft,rhoright)
1
: (default) rhomean=0.5 * (rholeft + rhoright)
rhd_vis_t01
:
in rhd_vis_module.F90
,
(``rhd viscosity timing 01'').
Category: additional output.
Produce timing information for 2D/3D tensor viscosity routines.
It should not be used in conjunction with OpenMP.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
Radiation transport:
rhd_r01
:
in rhd.F90
,
(``rhd radiation 01'').
Category: feature activation.
Switch to include LHDrad radiation transport module.
It uses long characteristics and is restricted to an equidistant grid and
open boundaries at all surfaces (old ``supergiant module'').
Values:
undefined
: (default) LHDrad routines are deactivated.
1
: LHDrad routines are recognized by the compiler.
rhd_r02
:
in rhd.F90
,
(``rhd radiation 02'').
Category: feature activation.
Switch to include MSrad radiation transport module.
It uses long characteristics. The lateral boundaries have to be periodic.
Top and bottom can be closed or open (``solar module'').
Values:
undefined
: (default) MSrad routines are deactivated.
1
: MSrad routines are recognized by the compiler.
rhd_r03
:
in rhd.F90
,
(``rhd radiation 03'').
Category: feature activation.
Switch to include SHORTrad radiation transport module.
It uses short characteristics and is restricted to an equidistant grid and
open boundaries at all surfaces (new ``supergiant module'').
Values:
undefined
: (default) SHORTrad routines are deactivated.
1
: SHORTrad routines are recognized by the compiler.
rhd_rad3d_toray_l01
:
in rhd_lhdrad_module.F90
,
(``rhd radiation 3 dimensions to ray loop 01'').
Category: performance enhancement.
There might be a performance gain by splitting the main loop in routine rhd_rad3d_toray
into three separate loops. Typically, one big loop is to be preferred.
Values:
undefined
: (default) One big loop
defined
: Three smaller loops
rhd_rad3d_fromray_l01
:
in rhd_lhdrad_module.F90
,
(``rhd radiation 3 dimensions from ray loop 01'').
Category: performance enhancement.
There might be a performance gain by splitting a big loop in routine rhd_rad3d_fromray
into two separate loops. Typically, one big loop is to be preferred.
Values:
undefined
: (default) One big loop
defined
: Two smaller loops
rhd_rad3d_r02
:
in rhd_lhdrad_module.F90
,
(``rhd radiation 3 dimensions radiation 02'').
Category: test.
Module rhd_lhdrad_module
contains a routine for the handling of periodic
boundaries.
It is in an experimental state and is deactivated by default.
Values:
undefined
: (default) Skip routine rhd_rad3d_dirper
during compilation
defined
: Compile routine rhd_rad3d_dirper
rhd_rad3d_solve_t01
:
in rhd_lhdrad_module.F90
,
(``rhd radiation 3 dimensions solve timing 01'').
Category: additional output.
Produce timing information for the routines which solves the 1D radiation transport
equation along single ray.
This routine is called very frequently. The timing measurement might slow it down somewhat.
It should not be used in conjunction with OpenMP.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
rhd_rad3d_dir_t01
:
in rhd_lhdrad_module.F90
,
(``rhd radiation 3 dimensions direction timing 01'').
Category: additional output.
Produce timing information for the routines which solves the radiation transport
equation for one direction field.
The timing measurement are called very frequently and might slow down the code.
It should not be used in conjunction with OpenMP.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
rhd_rad3d_step_t01
:
in rhd_lhdrad_module.F90
,
(``rhd radiation 3 dimensions step timing 01'').
Category: additional output.
Produce timing information with main 3D radiation transport routine.
It can be used together with OpenMP and should cause no noticeable performance loss.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
rhd_shortrad_operator_l01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation operator loop 01'').
Category: performance enhancement, selection of approximation.
Choose type of short characteristics operator. The operators usually come in pairs (1/2, 3/4, 5/6).
There is a development from 1/2 over 3/4 to 5/6 towards higher stability.
Both members of each pair should do the same operation but use different ways
to do a case distinction. The 'even' operator has in some cases the better
performance. But the 'odd' operator might be saver to use.
Values:
0
: simple test operator, fast but results are utterly useless!
1
: case distinction with ``if..then..else'' construct.
2
: case distinction with masks (weights 0.0 or 1.0).
3
: case distinction with ``if..then..else'' construct,
slope reduction of source function.
4
: case distinction with masks (weights 0.0 or 1.0),
slope reduction of source function.
5
: case distinction with ``if..then..else'' construct,
modified slope reduction of source function.
6
: (default) case distinction with masks (weights 0.0 or 1.0),
modified slope reduction of source function.
8
: test version.
rhd_shortrad_operator_l02
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation operator loop 02'').
Category: performance enhancement.
Select the way the short characteristics operator is accessed.
Values:
0
: (default) The routine with the short characteristics operator is called
within a loop and should be inlined.
1
: The program fragment with the short characteristics operator is included.
No inlining necessary.
rhd_shortrad_dtauop_l01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation delta tau operator loop 01'').
Category: performance enhancement.
Choose type of short characteristics tau coupling operator.
Values:
1
: case distinction with ``if..then..else'' construct,
default if rhd_shortrad_operator_l01
=1,3,5.
2
: case distinction with masks (weights 0.0 or 1.0),
default if rhd_shortrad_operator_l01
=2,4,6.
rhd_shortrad_dtauop_l02
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation delta tau operator loop 02'').
Category: performance enhancement.
Select the way the operator for the tau coupling (short characteristics module) is accessed.
Values:
0
: (default) The routine with the tau coupling operator is called
within a loop and should be inlined.
1
: The program fragment with the tau coupling operator is included.
No inlining necessary.
rhd_shortrad_formal_l01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation formal loop 01'').
Category: performance enhancement.
Select version of loop splitting for exp(-dtau) computation.
Values:
0
: (default) dtauhalf
, exp_mdtauhalf
, expl2t_mdtauhalf
are computed in a single loop
1
: (dtauhalf
, exp_mdtauhalf
), (expl2t_mdtauhalf
)
are computed in separate loops.
This prevents the SUN1 machine (Sunfire, Solaris, Forte 6.2) from doing
some performance degrading optimization
rhd_shortrad_dir1_l01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation direction 1 loop 01'').
Category: performance enhancement.
Choose routine version for rays in x1 direction.
Values:
0
: (default) Use routine with permuted indices for rays in x1 direction.
In this case the innermost loop index is the third array index.
The transposition of arrays is not needed but some machines (e.g. SUN1) do not like
this index arrangement.
1
: Transpose arrays and use routine rhd_shortrad_dir3
for rays in x1 direction.
The extra step for the transposition of some arrays (and the reverse procedure)
needs some time. But now the routine with the optimum index ordering can be used.
rhd_shortrad_dir_l02
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation direction loop 02'').
Category: performance enhancement, OpenMP.
Determine position of PARALLEL statement relative to outer loop in rhd_shortrad_dirX
.
Both settings give the same results but might show a different performance on a specific machine.
Values:
0
: (default) PARALLEL statement inside of outer loop
1
: PARALLEL statement outside of outer loop
rhd_shortrad_lambda_l01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation lambda loop 01'').
Category: feature activation.
Handling of extra arrays to allow partially implicit Lambda* iteration
Values:
0
: (default) Only fully implicit Lambda* iteration allowed
(or fully explicit treatment).
1
: Also partially implicit Lambda* iteration allowed.
rhd_shortrad_formal_t01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation formal timing 01'').
Category: additional output.
Produce timing information for routine which gives the formal solution of the
radiation transport equation with the help of short characteristics.
It can be used together with OpenMP and should cause no noticeable performance loss.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
rhd_shortrad_step_t01
:
in rhd_shortrad_module.F90
,
(``rhd short-characteristics radiation step timing 01'').
Category: additional output.
Produce timing information for main short characteristics routine.
It can be used together with OpenMP and should cause no noticeable performance loss.
Values:
undefined
: (default) no timing information
defined
: call subroutines to measure elapsed time
MSrad_raytas
:
in MSrad3D.F90
,
(``Matthias Steffen radiation ray tau s'').
Category: performance enhancement.
Values:
0
: (default) Loop with IF..THEN..ELSE
1
: Loop with ABS,SIGN
2
: Loop with MIN,MAX
Next: 3.7 Optimization, Compiler Switches
Up: 3 Program Files, Installation,
Previous: 3.5 Configure Script
Contents
Index