You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>As an example, one might request GPUs on a SLURM system using the following:</p>
176
176
<p><b>Disclaimer</b>: IBM's JSRUN on LSF-managed computers does not use the traditional node-based approach to allocate resources. Therefore, the MFC constructs equivalent resource sets in the task and GPU count.</p>
177
177
<h2><aclass="anchor" id="autotoc_md77"></a>
178
-
Profiling with NVIDIA Nsight</h2>
178
+
GPU Profiling</h2>
179
+
<h3><aclass="anchor" id="autotoc_md78"></a>
180
+
NVIDIA GPUs</h3>
179
181
<p>MFC provides two different arguments to facilitate profiling with NVIDIA Nsight. <b>Please ensure the used argument is placed at the end so their respective flags can be appended.</b></p><ul>
180
-
<li>Nsight Systems (Nsys): <code>./mfc.sh run ... --nsys [nsys flags]</code> allows one to visualize MFC's system-wide performance with <ahref="https://developer.nvidia.com/nsight-systems">NVIDIA Nsight Systems</a>. NSys is best for understanding the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC. When used, <code>--nsys</code> will run the simulation and generate <code>.nsys-rep</code> files in the case directory for all targets. These files can then be imported into Nsight System's GUI, which can be downloaded <ahref="https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms">here</a>. It is best to run case files with a few timesteps to keep the report files small. Learn more about NVIDIA Nsight Systems <ahref="https://docs.nvidia.com/nsight-systems/UserGuide/index.html">here</a>.</li>
181
-
<li>Nsight Compute (NCU): <code>./mfc.sh run ... --ncu [ncu flags]</code> allows one to conduct kernel-level profiling with <ahref="https://developer.nvidia.com/nsight-compute">NVIDIA Nsight Compute</a>. NCU provides profiling information for every subroutine called and is more detailed than NSys. When used, <code>--ncu</code> will output profiling information for all subroutines, including elapsed clock cycles, memory used, and more after the simulation is run. Adding this argument will significantly slow the simulation and should only be used on case files with a few timesteps. Learn more about NVIDIA Nsight Compute <ahref="https://docs.nvidia.com/nsight-compute/NsightCompute/index.html">here</a>.</li>
182
+
<li>Nsight Systems (Nsys): <code>./mfc.sh run ... -t simulation --nsys [nsys flags]</code> allows one to visualize MFC's system-wide performance with <ahref="https://developer.nvidia.com/nsight-systems">NVIDIA Nsight Systems</a>. NSys is best for understanding the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC. When used, <code>--nsys</code> will run the simulation and generate <code>.nsys-rep</code> files in the case directory for all targets. These files can then be imported into Nsight System's GUI, which can be downloaded <ahref="https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms">here</a>. It is best to run case files with a few timesteps to keep the report files small. Learn more about NVIDIA Nsight Systems <ahref="https://docs.nvidia.com/nsight-systems/UserGuide/index.html">here</a>.</li>
183
+
<li>Nsight Compute (NCU): <code>./mfc.sh run ... -t simulation --ncu [ncu flags]</code> allows one to conduct kernel-level profiling with <ahref="https://developer.nvidia.com/nsight-compute">NVIDIA Nsight Compute</a>. NCU provides profiling information for every subroutine called and is more detailed than NSys. When used, <code>--ncu</code> will output profiling information for all subroutines, including elapsed clock cycles, memory used, and more after the simulation is run. Adding this argument will significantly slow the simulation and should only be used on case files with a few timesteps. Learn more about NVIDIA Nsight Compute <ahref="https://docs.nvidia.com/nsight-compute/NsightCompute/index.html">here</a>.</li>
182
184
</ul>
183
-
<h2><aclass="anchor" id="autotoc_md78"></a>
185
+
<h3><aclass="anchor" id="autotoc_md79"></a>
186
+
AMD GPUs</h3>
187
+
<ul>
188
+
<li>Rocprof (ROC): <code>./mfc.sh run ... -t simulation --roc --hip-trace [rocprof flags]</code> allows one to visualize MFC's system-wide performance with <ahref="https://ui.perfetto.dev/">Perfetto UI</a>. When used, <code>--roc</code> will run the simulation and generate files in the case directory for all targets. <code>results.json</code> can then be imported in <ahref="https://ui.perfetto.dev/">Perfetto's UI</a>. Learn more about AMD Rocprof <ahref="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-5.5.1/rocprof.html">here</a> It is best to run case files with a few timesteps to keep the report files small.</li>
189
+
<li>Omniperf (OMNI): <code>./mfc.sh run ... -t simulation --omni [omniperf flags]</code>allows one to conduct kernel-level profiling with <ahref="https://rocm.github.io/omniperf/introduction.html#what-is-omniperf">AMD Omniperf</a>. When used, <code>--omni</code> will output profiling information for all subroutines, including rooflines, cache usage, register usage, and more after the simulation is run. Adding this argument will moderately slow down the simulation and run the MFC executable several times. For this reason it should only be used with case files that have a few timesteps.</li>
190
+
</ul>
191
+
<h2><aclass="anchor" id="autotoc_md80"></a>
184
192
Restarting Cases</h2>
185
193
<p>When running a simulation, MFC generates a <code>./restart_data</code> folder in the case directory that contains <code>lustre_*.dat</code> files that can be used to restart a simulation from saved timesteps. This allows a user to simulate some timestep $X$, then continue it to run to another timestep $Y$, where $Y > X$. The user can also choose to add new patches at the intermediate timestep.</p>
<divclass="textblock"><p><aclass="anchor" id="autotoc_md80"></a> To run MFC's test suite, run </p><divclass="fragment"><divclass="line">./mfc.sh test -j <thread count></div>
137
+
<divclass="textblock"><p><aclass="anchor" id="autotoc_md82"></a> To run MFC's test suite, run </p><divclass="fragment"><divclass="line">./mfc.sh test -j <thread count></div>
138
138
</div><!-- fragment --><p>It will generate and run test cases, comparing their output to that of previous runs from versions of MFC considered to be accurate. <em>golden files</em>, stored in the <code>tests/</code> directory contain this data, by aggregating <code>.dat</code> files generated when running MFC. A test is considered passing when our error tolerances are met, in order to maintain a high level of stability and accuracy. Run <code>./mfc.sh test -h</code> for a full list of accepted arguments.</p>
139
139
<p>Most notably, you can consult the full list of tests by running </p><divclass="fragment"><divclass="line">./mfc.sh test -l</div>
140
140
</div><!-- fragment --><p>To restrict to a given range, use the <code>--from</code> (<code>-f</code>) and <code>--to</code> (<code>-t</code>) options. To run a (non-contiguous) subset of tests, use the <code>--only</code> (<code>-o</code>) option instead.</p>
141
-
<h2><aclass="anchor" id="autotoc_md81"></a>
141
+
<h2><aclass="anchor" id="autotoc_md83"></a>
142
142
Creating Tests</h2>
143
143
<p>To (re)generate <em>golden files</em>, append the <code>--generate</code> option: </p><divclass="fragment"><divclass="line">./mfc.sh test --generate -j 8</div>
144
144
</div><!-- fragment --><p>It is recommended that a range be specified when generating golden files for new test cases, as described in the previous section, in an effort not to regenerate the golden files of existing test cases.</p>
<p>If a trace is empty (that is, the empty string <code>""</code>), it will not appear in the final trace, but any case parameter variations associated with it will still be applied.</p>
184
184
<p>Finally, the case is appended to the <code>cases</code> list, which will be returned by the <code>generate_cases</code> function.</p>
185
-
<h2><aclass="anchor" id="autotoc_md82"></a>
185
+
<h2><aclass="anchor" id="autotoc_md84"></a>
186
186
Testing Post Process</h2>
187
187
<p>To test updated post process code, append the <code>-a</code> or <code>--test-all</code> option: </p><divclass="fragment"><divclass="line">./mfc.sh test -a -j 8</div>
188
188
</div><!-- fragment --><p>This argument will re-run the test stack with <code>parallel_io=True</code>, which generates silo_hdf5 files. It will also turn most write parameters (<code>*_wrt</code>) on. Then, it searches through the silo files using <code>h5dump</code> to ensure that there are no NaNs or Infinitys. Although adding this option does not guarantee that accurate silo files are generated, it does ensure that post process does not fail or produce malformed data. </p>
<divclass="textblock"><p><aclass="anchor" id="autotoc_md83"></a> Post-processed database in Silo-HDF5 format can be visualized and analyzed using VisIt. VisIt is an open-source interactive parallel visualization and graphical analysis tool for viewing scientific data. Versions of VisIt after 2.6.0 have been confirmed to work with the MFC databases for some parallel environments. Nevertheless, installation and configuration of VisIt can be environment-dependent and are left to the user. Further remarks on parallel flow visualization, analysis and processing of MFC database using VisIt can also be found in <ahref="references.md#Coralic15">Coralic (2015)</a>; <ahref="references.md#Meng16">Meng (2016)</a>.</p>
138
-
<h1><aclass="anchor" id="autotoc_md84"></a>
137
+
<divclass="textblock"><p><aclass="anchor" id="autotoc_md85"></a> Post-processed database in Silo-HDF5 format can be visualized and analyzed using VisIt. VisIt is an open-source interactive parallel visualization and graphical analysis tool for viewing scientific data. Versions of VisIt after 2.6.0 have been confirmed to work with the MFC databases for some parallel environments. Nevertheless, installation and configuration of VisIt can be environment-dependent and are left to the user. Further remarks on parallel flow visualization, analysis and processing of MFC database using VisIt can also be found in <ahref="references.md#Coralic15">Coralic (2015)</a>; <ahref="references.md#Meng16">Meng (2016)</a>.</p>
138
+
<h1><aclass="anchor" id="autotoc_md86"></a>
139
139
Procedure</h1>
140
140
<p>After post-process of simulation data (see section <ahref="running.md#running-1">Running</a>), a folder that contains a silo-HDF5 database is created, named <code>silo_hdf5</code>. <code>silo_hdf5</code> includes directory named <code>root</code>, that contains index files for flow field data at each saved time step. The user can launch VisIt and open the index files under <code>/silo_hdf5/root</code>. Once the database is loaded, flow field variables contained in the database can be added to plot.</p>
141
141
<p>As an example, the figure bellow shows the iso-contour of the liquid void fraction (<code>alpha1</code>) in the database generated by example case <code>3D_sphbubcollapse</code>. For analysis and processing of the database using VisIt's capability, the user is encouraged to address <ahref="https://wci.llnl.gov/simulation/computer-codes/visit/manuals">VisIt user manual</a>.</p>
<p>*Iso-contour of the liquid void fraction (<code>alpha1</code>) in the database generated by example case <code>3D_sphbubcollapse</code>*</p>
144
-
<h1><aclass="anchor" id="autotoc_md85"></a>
144
+
<h1><aclass="anchor" id="autotoc_md87"></a>
145
145
Serial data output</h1>
146
146
<p>If <code>parallel_io = F</code> then MFC will output the conservative variables to a directory <code>D/</code>. If multiple cores are used ($\mathtt{ppn > 1}$) then a separate file is created for each core. If there is only one coordinate dimension (<code>n = 0</code> and <code>p = 0</code>) then the primivative variables will also be written to <code>D/</code>. The file names correspond to the variables associated with each equation solved by MFC. They are written at every <code>t_step_save</code> time step. The conservative variables are</p>
['problem_201d_12',['problem 1D',['../md_examples.html#autotoc_md44',1,'Lax shock tube problem (1D)'],['../md_examples.html#autotoc_md50',1,'Shu-Osher problem (1D)'],['../md_examples.html#autotoc_md41',1,'Titarev-Toro problem (1D)']]],
16
16
['problem_202d_13',['Isentropic vortex problem (2D)',['../md_examples.html#autotoc_md38',1,'']]],
17
17
['problem_202d_14',['Lid-Driven Cavity Problem (2D)',['../md_examples.html#autotoc_md47',1,'']]],
0 commit comments