next up previous contents index
Next: 3.7.9 IBM Up: 3.7 Optimization, Compiler Switches Previous: 3.7.7 Hewlett-Packard: Itanium 2   Contents   Index


3.7.8 Hitachi SR8000

Some information about the Hitachi compiler is here.

In 3.6 the preprocessor switches are listed, that control the modern - single - version uio_mac_module.F90.

A proposed compiling sequence is (only default modules activated):

export F90_PREFLAGS="-subchk"
./configure
make UIO

export F90_PREFLAGS=
./configure
make

Performance tests on hwwsr8k

Figure 2: Performance tests on Hitachi SR8000 at HLR Stuttgart. For models with 128x128x192 and 252x252x188 grid cells different values for the hydrodynamics and viscosity chunk size parameters were used. See text for more details. , Postscript version
\begin{figure}\centering
\includegraphics[width=16.2cm]{co5bold/cobold_perf_sr8k.eps}\end{figure}
Some tests have been performed on the machine hwwsr8k at HLR Stuttgart in order to determine the optimum chunk sizes which are set by the parameters n_hydcellsperchunk and n_viscellsperchunk (see Sect. 5.4.7 and Sect. 5.4.8). Two different models have been used, one consisting of 128x128x192 grid cells, the other of 252x252x188, respectively. Grey radiative transfer has been performed with the MSrad module. Different values for the chunk size(s) have been assumed where the hydrodynamics and the viscosity parameter were set equal. In all cases three time steps have been computed. The results are shown in Fig. 2. The number of resulting chunks for step HYD1 (the values for HYD2, HYD3, and VIS are very similar), total memory, performance, and the wall clock duration of the hydrodynamics and the viscosity routines are shown as functions of the chunk size parameter(s). Clearly, the number of chunks decreases towards larger chunk sizes whereas the required memory increases - in particular for very large chunk size values. Moreover, performance and CPU time can be optimized by choosing the right parameter values. Interestingly, the optimum chunk size is different for hydrodynamics and viscosity. Based on these tests, a larger value seems to be preferable for the viscosity (n_viscellsperchunk). In the case of the smaller model, 50000 seems to be fine for the hydrodynamics whereas the optimum viscosity chunk size is 200000. This difference explains the double-peaked structure of performance and CPU time. Note that the optimum values do not only depend on the architecture used but also on the dimensions of the model. We recommend to test some chunk size values since it might lead to a higher performance.


next up previous contents index
Next: 3.7.9 IBM Up: 3.7 Optimization, Compiler Switches Previous: 3.7.7 Hewlett-Packard: Itanium 2   Contents   Index