Some information about the Hitachi compiler is here.
The appropriate machine dependent UIO module is uio_mac_hitachi_module.f90
.
The appropriate machine dependent RHD module is rhd_mac_hitachi_module.f90
.
-conti199
: Up to 199 continuation lines can be interpreted
(otherwise not more than 39 continuation lines are accepted).
-limit
: Limits the amount of time and memory for compilation.
-opt=ss
: use highest possible optimization level.
-nopredicate
: this option switches off a sub-option activated
by opt=ss. It is necessary to disable the -predicate option because
the code crashes otherwise (segmentation violation). The switch must
appear after setting -opt=ss
.
-pvfunc=2
: References the pseudo-vectorizing mathematical
function and applies the temporary array to reference the pseudo-vectorizing
mathematical function.
-omp -parallel=1
: parallelize based on
OpenMP directives only.
-procnum=8
: generated code for 8 processors on one node
-orphaned=1
: Checks if the regions sequentially executed
contain orphaned directives during run-time when PROCNUM=8 is specified.
If a sequentially executed region contains an orphaned directive, the
system outputs a message and terminates the program.
-nestcheck=1
: Checks for nesting errors in parallel regions.
If a parallel region is nested, the system returns an error and terminates
the program. Without this option, the code aborts with an error message,
indicating illegal nesting. Compiler bug?
-pmpar
: Collects the performance monitor information for each
parallelization unit.
-pmfunc
: Collects the performance monitor information for each
procedure.
-Drhd_hyd_roe1d_l01=1
: Optimization: Choose non-standard set of routines for
Roe solver.
See Sect. 3.6.
-DMSrad_raytas=0
: Optimization: choose default version
of loop in SUBROUTINE raytas
in file MSrad3D.F90
.
See Sect. 3.6.
-subchk
: Array bound checking. Without this checking
option, some UIO routines are not working properly (compiler bug?).
A proposed compiling sequence is (only default modules activated):
Performance tests on hwwsr8k
![]() |
n_hydcellsperchunk
and n_viscellsperchunk
(see Sect. 5.3.7 and
Sect. 5.3.8).
Two different models have been used, one consisting of
128x128x192 grid cells, the other of 252x252x188, respectively.
Grey radiative transfer has been performed with the MSrad module.
Different values for the chunk size(s) have been assumed where the
hydrodynamics and the viscosity parameter were set equal.
In all cases three time steps have been computed.
The results are shown in Fig. 2.
The number of resulting chunks for step HYD1 (the values for HYD2, HYD3, and VIS are very similar),
total memory, performance, and the wall clock duration of the hydrodynamics and the viscosity
routines are shown as functions of the chunk size parameter(s).
Clearly, the number of chunks decreases towards larger chunk sizes whereas the required
memory increases - in particular for very large chunk size values.
Moreover, performance and CPU time can be optimized by choosing the right parameter values.
Interestingly, the optimum chunk size is different for hydrodynamics and viscosity.
Based on these tests, a larger value seems to be preferable for the viscosity
(n_viscellsperchunk
).
In the case of the smaller model, 50000 seems to be fine for the hydrodynamics whereas the
optimum viscosity chunk size is 200000.
This difference explains the double-peaked structure of performance and CPU time.
Note that the optimum values do not only depend on the architecture used but also on the
dimensions of the model.
We recommend to test some chunk size values since it might lead to a higher performance.