RUNNING ENZO
Procedure for a generic system. (updated to February 2015)
Procedure for JURECA/JUROPATEST (updated to June 2015)
Post-processing & analysis of snapshots
Running AMR runs on JURECA (updated to July 2015)

This short guideline is only meant to be an informal guidance for close collaborators, and concerns how to design simulations, produce their initial conditions and run them on the supercomputers that we usually use, with no attempt to be accurate, complete etc.. In the case of inconsistency with the information in the official "

Developer Guide" by the ENZO collaboration (see also links below) it is extremely likely
that the latter is most trustable!

What follows is thought for the version 2.3 of ENZO (but should be general enough).

Preliminary steps:

http://enzo-project.org/#gettingenzo
http://enzo-project.org/BootCamp.html
http://enzo-project.org/doc/

Now, let's assume ENZO (main program and all components) has been installed in the machine we want to use.
(If this is not true,  a list of useful Makefile for the compilation is here)


Then, let's get started.


Again, good to have a look at others more specific pages on the ENZO's website.

http://enzo-project.org/doc/tutorials/RunTestProblem.html
http://enzo-project.org/doc/tutorials/RunCosmologySimulation.html
http://enzo-project.org/doc/user_guide/RunningEnzo.html
http://enzo-project.org/doc/user_guide/CUDAEnzo.html

Now, let's go to specific tasks.




PRODUCING INITIAL CONDITIONS.
Two methods are available: "inits" and "mpgrafic".  -> http://enzo-project.org/doc/user_guide/CosmologicalInitialConditions.html
inits used to be the method of choice for many version, it is a program installed parallel to the enzo folder in the main installation, it runs in parallell and produces monolitic 3D files for the initial grid density, velocity and DM particles positions and velocities. The bottleneck is of course that for very larger grids (>2000^3) one has to find a very big serial machine with a RAM memory of some hundreds Gb to run the program, which is unpractical.
After this is done, in order to run with more than 64 processors, one has to distribute the initial condition for parallel IO, and for this we use another installed program, named
"ring".  This is a parallel program than needs to be run on the same number of procs of the enzo job we plan to execute.

mpgrafic is instead a parallel tool that in the same step produces the initial conditions and write them in a way already suitable for parallel IO by Enzo, so ring is not needed anymore.  The bad news is that mpgrafic usually is hard to install, as it makes use of different libraries and compiling procedures.

Notice: the set of parameters for  inits or  is mpgrafic different...which means that you have to use the same program to produce higher resolution version of your favourite set of IC, and you cannot for instance use  mpgrafic to produce larger set of IC for a lucky combination of parameters you first earlier produced with inits.

 Other tools for IC have been recently released : http://www.phys.ethz.ch/~hahn/MUSIC/
 But I have no experience with that.

Important: the IC change if we want to change the resolution/volume or cosmological parameters. They stay the same for all runs were instead we only want to change the physics (i.e. cooling, magnetic fields, star formation, etc...).



RUNNING A FIXED GRID SIMULATION.
Now it's going to be easier. ENZO's simulations are launched with
./enzo.exe -d param.dat >> run.dat  (serial)
mpirun -n 64 ./enzo.exe -d param.dat >> run.dat (parallel, similr with aprun for other compilers).

To restart:
./enzo.exe -d -r RD0001/RD0001 >> run.dat  to restart from a Restart File
 or ./enzo.exe -d -r DD0001/DD0001 >> run.dat  to restart from a Dump file (there is no real difference between the two, the idea is just that RD is something we will store after the simulation, and DD we can erase).

The "run.dat" just pipes the screen output of enzo into a file written on disk.

Here a sample parameter file

ParallelRootGridIO        = 1  # -> necessary for runs using more than 64 procs. it assumes we ran ring before.
ParallelParticleIO        = 1   # -> same as line above
Unigrid                   = 0     # -> 0 for small enough runs, i.e. <1024^3 runs.


RUNNING AN AMR SIMULATION.
Exactly as above to run the job, only the parameter file is obviously changed, see here.

We now need to specify the region for the active refinement, the total number of levels and the criteria to be used (with their threshold values)


StaticHierarchy           = 0  #  0-> AMR activated, 1->no AMR
MaximumRefinementLevel    = 5  # -> 0 is the root grid level
MaximumGravityRefinementLevel = 4 # -> same as above, this only for the gravity solver. Usually set to L or L-1 if L is the one in the line above
RefineBy                  = 2  #-> refinement factor for each level, 2 ensures the best stability
CellFlaggingMethod        = 2 4   # --> criteria for the flagging, see ENZO's site for references
MinimumEfficiency         = 0.35 #-> which fraction of the "patch" needs to fulfill the criteria to be tagged for AMR. low gives more AMR regions but less "patches", which means for low MinimumEfficiency less objects will be created into the Hierarchy file.
MinimumOverDensityForRefinement = 3 #-> overdensity respect to the root grid. For nested runs, see the manual
RefineRegionLeftEdge = 0.46 0.46 0.46  # -> boundaries for the AMR region. Can only be one region at a time.
RefineRegionRightEdge = 0.54 0.54 0.54


This is only going to change the sturcture of the output, obviously, since we have more AMR levels to accomodate.


RUNNING A NESTED AMR SIMULATION.

In this case both the initial condition file and the parameter file must be changed.


RUNNING A SIMULATION WITH MHD.

We just have to play with the parameter file, changing the hydro-MHD solver and specifying a few important things.

UseDivergenceCleaning = 0
DualEnergyFormalism = 1
HydroMethod            = 4
CourantSafetyNumber    = 0.4
Theta_Limiter=1.0
InterpolationMethod = 1
CosmologySimulationInitialUniformBField = 1e-8 1e-8 1e-8
;...Gauss, in physical units


RUNNING A SIMULATION WITH COOLING AND OTHER STUFF.

We have just to modify the parameter file in the opportune way.
Example:

MultiSpecies         = 1
MultiMetals = 1
RadiativeCooling     = 1
RadiationFieldType        = 1
RiemannSolver = 1
RiemannSolverFallback = 1
ReconstructionMethod = 0
ParticleSubgridDepositMode     = 0
MetalCooling = 3
CloudyCoolingGridFile     = solar_2008_3D_metals.h5







QUICK START FOR THE FIRST RUNS ON JURECA - JUROPA_TEST

Starting & resubmitting a run (with already existing IC)

The scratch directory:
>/work/hhh30/hhh300   (also $WORK as symbolic link on the machine)

Tbe folders for each different cluster:
>/work/hhh30/hhh300/tracA/trac1 etc..

The initial condition files are
Grid** for the grid quantities and
and
Particle** for the DM particles quantities.

To start a run, do
>sbatch enzo_amr.go

where inside you have an instruction like
> srun -n 128 enzo.exe -d param_amr.dat  >> run_amr8.dat

To restart from a dump, you modify the above line into
>srun -n 128 enzo.exe -d -r 8_3amrRD0004/8_3amrRD0004   >> run_amr8.dat      

and resubmit, to restart from 8_3amrRD0004/8_3amrRD0004 file.

The run_amr8.dat file is written at run time during the simulation, you can see it proceedig by typing
> tail -f run_amr8.dat
on the command line.


Other useful commands and output files.

> squeue -u hhh300 [your user]
shows you your ongoing or queued processes.

> scancel JOBID
erases the job.

>q_cpuquota
tells you the remaining CPU time from the allocation.

Some other interesting ENZO file:
> OutputLevelInformation
updates you on the hierarchy of AMR levels.
>amr.out
tells you other stuf that Enzo is outputting (like output files etc)
>OutputLog
(similar)
>performance.out
some performance benchmarks of the latest run.
>Evtime
timing in code units of each time step.

>slurm-XXXX.out
generated by the system, usually contains errors at the end if the run crashed.



POST-PROCESSING & ANALYSIS OF RUNS.


First, we need to assemble a monolytic 3D dataset from the hierarchy of AMR files written by each processor (i.e. the various RD or DD files).
Two methods are possible, one for runs with <64 procs and another that works even for bigger runs.
To find the active AMR regions in ENZO, search for
RefineRegionLeftEdge and  RefineRegionRightEdge
inside the DDXXXX/DDXXXX or RDXXXX/RDXXXX files (or also in the initial parameter file).
For both methods listed below: one can resconstruct the ouput ALSO up to a level which is lower than the maximum level, just to generate a first low-resolution dataset for testing.




First method: use enzo for the reconstruction
run (in a batch script) that calls in parallel (using the same number of processors of the simulation)
> enzo.exe -d -x -l 2 -b 0.1 0.1 0.1 -f 0.2 0.2 0.2 RD0002/RD0002
This will generate 3D dataset up to the 2nd level (-l 2) of the region 
-b 0.1 0.1 0.1 -f 0.2 0.2 0.2 (in units of the box length) of snapshot RD0002. As a result, one separate
file for each fields of the simulation is generated, in hdf5 format.
This is the method provided in ENZO, but cannot work for big runs and also for "deep" AMR structures: even if it runs in parallel, it can easily run into memory issue because all the volume
within the selection is mapped at the highest resolution in a dummy way and this can easily exceed the memory per processor.

Second method: use MeshMerger(+CreaInput.tcl)
This is an original  C++ code written by C. Gheller, works in serial but it's very efficient.

It's a 2 step procedure.

 a) run CreateInput.tcl on the hierarchy file:
  > ./CreaInput.tcl
   [then input the appropriate hierarchy file, for example
   8_3amrRD0006/8_3amrRD0006.hierarchy
 and then simply enter for the second question, unless the directory has changed since the simulation (for example if one has moved the raw data somewhere else). This second step was necessary
because in some version of Enzo the filenames were absolute.
 This creates an easier to interpret "8_3amrRD0006.hierarchy.reduced" for the MeshMerger called in the next step.

b) use MeshMerger to do the real job. Here what should happen (in red what has to be written in the command line):
>./MeshMergerB
Available fields:
0, /Dark_Matter_Density
1, /Density
2, /Temperature
3, /Bx
4, /By
5, /Bz
6, /Gas_Energy
7, /Total_Energy
Input the list of required fields (e.g.: 0,3,7)
>0,1,2
SELECTED FIELD 0 /Dark_Matter_Density
SELECTED FIELD 1 /Density
SELECTED FIELD 2 /Temperature
NUMBER OF SELECTED FIELD 3
Input Reduced Hierarchy file name:
>8_3amrRD0006/8_3amrRD0006.hierarchy.reduced
8_3amrRD0006/8_3amrRD0006.hierarchy.reduced
Input output file name: amr_out
Input number of cells in the base level (int) --> x =
256
Input number of cells in the base level (int) --> y =
256
Input number of cells in the base level (int) --> z =
256
Input first sub-box position (0-1) --> x =
0.1
Input first sub-box position (0-1) --> y =
0.1
Input first sub-box position (0-1) --> z =
0.1
Input second sub-box position (0-1) --> x =
0.2
Input second sub-box position (0-1) --> y =
0.2
Input second sub-box position (0-1) --> z =
0.2
Input Max refinemt level (0=Base level):
3
Merging block 1202
-----> 0.003906, 0.000488, 8.000000 3.000000
Merging block 1203
-----> 0.003906, 0.000488, 8.000000 3.000000
Merging block 1204
-----> 0.003906, 0.000488, 8.000000 3.000000
Merging block 1205
.....forever.

At the end, a single hdf5 file is generated, with all requested fields as internal arrays.
For  very big grids or a large number of AMR levels, the program can take up to many hours (e.g. for a 2400^3 it takes some 6-7 hours). But the usage of memory is such that all can be done
even in serial and without any crash for memory issue (thanks Claudio).

Some additional important notes:

-Problems with boundaries:
the sub-box positions can be any fraction of the box length BUT when multiplied by the root grid number of cell, they should give an integer number, otherwise nasty empty regions might appear
at the boundaries.
For example, to compute the left border x1 of the AMR region to extract, considering that the center of the AMR region is at x_AMR(in box length) and the length of the AMR region is dr2 (in box length), and the root grid dimension is N_root.

 x1=uint(N_root*(x_AMR-dr2*0.5))/float(N_root)
etc...

Note that the ugly black boxes at the boundaries can appear in the final reconstruction by MeshMerger also by imposing exactly the same AMR region used in the simulation at run-time, if the edges are not  giving an integer number in root grid units!

- Shell script

This is a shell script by F. Vazza to run the above combination of programs for any given large sequence of snapshots.
It is lunched simply by

sh merger.go
after all modules have been loaded (see below).

merger.go:

for l in 0
do
for j in 0
do
for i in 1 2 3 4 5
do
 echo 6RD0$l$j$i/6RD0$l$j$i.hierarchy > te.txt
./CreaInput.tcl < te.txt
 cp 6RD0$l$j$i/6RD0$l$j$i.hierarchy.reduced 6RD0$l$j$i/hierarchy.reduced
 cd 6RD0$l$j$i
 rm amr_3d
 ../MeshMergerV < ../3d0.job
 mv amr_3d   /homeb/hhh22/hhh220/CP/256/6amr_dt_z0$l$j$i
 rm amr_3d
 ../MeshMergerB < ../3d2.job
 mv amr_3d /homeb/hhh22/hhh220/CP/256/6amr_v_z0$l$J$i
 rm amr_3d
 cd ..
done
done
done

The code loops over l,j,i and through the requested sequence of snapshots, calls CreaInput and MeshMerger using the input values given in 3d0.job / 3d2.job, generates a temporary amr_3d array
that is soon copied elsewhere in the repository (e.g. /homeb/hhh22/hhh220/CP/256/) with an incremental name.
The 3d0.job or 3d2.job input files should be written as:

3d0.job:


0,1,2,3,4,5
hierarchy.reduced
amr_3d
256
256
256
0
0
0
1
1
1
0



3d2.job:


3,4,5
hierarchy.reduced
amr_3d
256
256
256
0
0
0
1
1
1
0


[They are two different input files here because MeshMergerB generates magnetic field valuse and MeshMergerV generates density, temperature etc.. and velocity fields].
[Of course here one is free to generate the output datasets as wanted, calling a 3d.job file only once with all fields there etc...]

- Notes for Juropa:


!!!very important: before running the two programs above, one should compile the modules for hdf5.
In the case of Juropatest, they are the same called by the batch file, i.e.
 module load intel/2015.03
 module load gpsolf/2015.03
 module load HDF5/1.8.13-gpfs

!!!

- For the present simulations, one should use MeshMergerB to reconstruct the B-field components and MeshMergerV to reconstruct the velocity components. For all other fields
any of these two can be used.





PIPELINE FOR PRODUCING NEW AMR CLUSTERS (ON JURECA)

1. Run the fixed grid run (see here ). Sets of initial conditions and parameter files are already present in
   /work/hhh30/hhh300/new_runs

2. Reconstruct the monolytic 3D grid (for problems equal or below 512^3, typically one single grid can be generated).
    Use MeshMerger.

3. Move the monolytic grid into a space *without the automatic cleaning after 1 month*, like
     /homea/hhh30/hhh300/DATA/new_runs
    (also useful to have a copy of the IC and the uniform grid in the /gpfs2)

4. Use the usual clusters_simple.pro (in the necessary 2 steps) to locate all halos. The root grid size, filename, resolution etc must be hardcoded in both functions.
    A copy is  also here /homea/hhh30/hhh300/IDL/catalog/clusters_simple.pro  (also needs.pro and temp_tab05_2kev_new03.dat are necessary).

5. Compute the edges of each wanted AMR region, using compute_AMR.pro
   The ideal goal is that we use a comoving cubic region of side 4 Rvir(z=0) for each cluster, considering that a 10^14 Msol cluster has a virial radius of order 1Mpc and a 10^15 Msol cluster has a virial radus of 3Mpc. 

6. Run the AMR simulation as here  . Make sure that each cluster runs into a separate folder (the set of IC might be copied multiple times or moved from time to time).  Make sure the CycleSkipDataDump inside the restart file (e.g. DD00XX/DD00XX..) is set to 10 for z>1 and to 1 for z<1.

7. Extract all final dataset for each snapshot like here
  Also generate the sequence of .conv2 files with the conversion factors using conv.job.
 
8. Move the reconstructed datasets into some safe place, where there is no automatic cleaning!
   either here 
/homea/hhh30/hhh300/DATA/new_runs and/or in the /gpfs2. 
 


      Usual problems:

 
Runs can be interrupted for a number of reasons.
 To have an idea of the problem, doing a tail -100 run.out or tail -100 slurmXXX.out will suggest the solution. Different kind of failures can appear in one or the other file.
 On Jureca (and with ENZO in general) this are the most common reason for code crashes on the kind of simulations we run:


-> NOT ENOUGH MEMORY.
The AMR runs can be very memory demanding, specially if the high-resolution region is 
equivalent or larger than 256^3. This is both because of the number of cells and because of the much increased hierarchy tree. To have the simulation running usually we need to relunch a few time the batch script, by using more memory per core, until the run proceeds.
On JURECA, some configurations that were able to run are
-  a root grid of 400^3 + 1 AMR region (3 levels), equivalent to 360^3, using
#SBATCH --ntasks-per-node=12
#SBATCH --nodes=32 

-  a root grid of 256^3 + 1 AMR region (3 levels), equivalent to 392^3, using
#SBATCH --ntasks-per-node=8
#SBATCH --nodes=30

This of course varies with the size of the AMR region (which depends on the cluster mass etc..)

-> Typical error messages (into slurm**.out) related to not having enough memory are

 what():  std::bad_alloc
terminate called after throwing an instance of 'std::bad_alloc'


(having a memory issue with just one core is enough to cause the crash).
Solution: increase the number of memory per core (at the price of not using all cores per node).

->  PARTICLE INTERPOLATION ERRORS
The DM particles cause errors in the interpolation to the highest resolution of the AMR grid.
 The typical error message (into run.out) is
 cloudsize > cellsize in cic_deposit!
=== ENZO ERROR ===   cic_deposit.F: 81   node 207

To my shame, I did not track down well enough what is the source of this, since it happens only in a rather random fashion only in some runs.  My understanding is however that this happen when in some cell there are not enough DM particles to allow a proper interpolation of their potential (by means of this "cloud") to the AMR grid for a proper computation of the potential.
The solution is however is to change the interpolation scheme of the particles, following the instructions here in the
ENZO guide has some information about this http://enzo.readthedocs.org/en/latest/parameters/gravity.html (
ParticleSubgridDepositMode)
Therefore, one would explicitly insert the instruction
ParticleSubgridDepositMode     = 0
in the initial parameter file.

-> RANDOM NODE FAILURE
That's by far the most common reason of failure, and the easiest to cure: just restarting from the last available restart file.
The cause is that we tend obviously to employ the minimum largest machine configuration that can afford the problem, so we are at the edge of the
memory allocation on the requested number of nodes. Even if the structure of the AMR grid does not change over time, as in our forced refinement scheme,
ENZO's buffer can be not efficiently deallocated and some pool of memory can slowly be filled timestep after timestep, untill all memory is used.
Or, some node/processor can randomatically fail, which can often happen for large runs.
The typical error message in run.out is

TopGrid dt = 2.288327e-01     time = 9.2397920852744    cycle = 83    z = 4.7823373760727
srun: error: jrc[0145-0147,0293-0296,0411-0417,0429-0434]: tasks 0-239: Exited with exit code 1

Solution: just restart from the latest available snapshot (making sure it has been completely written, i.e. the failure did not occur while writing). If the failures become
systematic (i.e. every few tens of snapshots) the only solution is to decrease the number of cores per node (but keeping the number of cores always the same, because this cannot be changed during a ENZO's run).
In very heroic runs where all maximum memory in a cluster was already used, some user had to manually restart the simulation every 4-5 timesteps...for a hundred time and across many weeks....