MFlowCode
diff --git a/‎documentation/doxygen_crawl.html‎
Lines changed: 6 additions & 4 deletions b/‎documentation/doxygen_crawl.html‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎documentation/md_running.html‎
Lines changed: 13 additions & 5 deletions b/‎documentation/md_running.html‎
Lines changed: 13 additions & 5 deletions
diff --git a/‎documentation/md_testing.html‎
Lines changed: 3 additions & 3 deletions b/‎documentation/md_testing.html‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎documentation/md_visualization.html‎
Lines changed: 3 additions & 3 deletions b/‎documentation/md_visualization.html‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎documentation/navtreedata.js‎
Lines changed: 8 additions & 5 deletions b/‎documentation/navtreedata.js‎
Lines changed: 8 additions & 5 deletions
diff --git a/‎documentation/navtreeindex0.js‎
Lines changed: 6 additions & 4 deletions b/‎documentation/navtreeindex0.js‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎documentation/search/all_14.js‎
Lines changed: 2 additions & 3 deletions b/‎documentation/search/all_14.js‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎documentation/search/all_15.js‎
Lines changed: 1 addition & 1 deletion b/‎documentation/search/all_15.js‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎documentation/search/all_16.js‎
Lines changed: 4 additions & 4 deletions b/‎documentation/search/all_16.js‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎documentation/search/all_17.js‎
Lines changed: 2 additions & 2 deletions b/‎documentation/search/all_17.js‎
Lines changed: 2 additions & 2 deletions
@@ -101,11 +101,13 @@
 <a href="md_running.html#autotoc_md77"/>
 <a href="md_running.html#autotoc_md78"/>
 <a href="md_running.html#autotoc_md79"/>
+<a href="md_running.html#autotoc_md80"/>
+<a href="md_running.html#autotoc_md81"/>
 <a href="md_testing.html"/>
-<a href="md_testing.html#autotoc_md81"/>
-<a href="md_testing.html#autotoc_md82"/>
+<a href="md_testing.html#autotoc_md83"/>
+<a href="md_testing.html#autotoc_md84"/>
 <a href="md_visualization.html"/>
-<a href="md_visualization.html#autotoc_md84"/>
-<a href="md_visualization.html#autotoc_md85"/>
+<a href="md_visualization.html#autotoc_md86"/>
+<a href="md_visualization.html#autotoc_md87"/>
 </body>
 </html>
@@ -175,12 +175,20 @@ <h1><a class="anchor" id="autotoc_md75"></a>
 <p>As an example, one might request GPUs on a SLURM system using the following:</p>
 <p><b>Disclaimer</b>: IBM's JSRUN on LSF-managed computers does not use the traditional node-based approach to allocate resources. Therefore, the MFC constructs equivalent resource sets in the task and GPU count.</p>
 <h2><a class="anchor" id="autotoc_md77"></a>
-Profiling with NVIDIA Nsight</h2>
+GPU Profiling</h2>
+<h3><a class="anchor" id="autotoc_md78"></a>
+NVIDIA GPUs</h3>
 <p>MFC provides two different arguments to facilitate profiling with NVIDIA Nsight. <b>Please ensure the used argument is placed at the end so their respective flags can be appended.</b></p><ul>
-<li>Nsight Systems (Nsys): <code>./mfc.sh run ... --nsys [nsys flags]</code> allows one to visualize MFC's system-wide performance with <a href="https://developer.nvidia.com/nsight-systems">NVIDIA Nsight Systems</a>. NSys is best for understanding the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC. When used, <code>--nsys</code> will run the simulation and generate <code>.nsys-rep</code> files in the case directory for all targets. These files can then be imported into Nsight System's GUI, which can be downloaded <a href="https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms">here</a>. It is best to run case files with a few timesteps to keep the report files small. Learn more about NVIDIA Nsight Systems <a href="https://docs.nvidia.com/nsight-systems/UserGuide/index.html">here</a>.</li>
-<li>Nsight Compute (NCU): <code>./mfc.sh run ... --ncu [ncu flags]</code> allows one to conduct kernel-level profiling with <a href="https://developer.nvidia.com/nsight-compute">NVIDIA Nsight Compute</a>. NCU provides profiling information for every subroutine called and is more detailed than NSys. When used, <code>--ncu</code> will output profiling information for all subroutines, including elapsed clock cycles, memory used, and more after the simulation is run. Adding this argument will significantly slow the simulation and should only be used on case files with a few timesteps. Learn more about NVIDIA Nsight Compute <a href="https://docs.nvidia.com/nsight-compute/NsightCompute/index.html">here</a>.</li>
+<li>Nsight Systems (Nsys): <code>./mfc.sh run ... -t simulation --nsys [nsys flags]</code> allows one to visualize MFC's system-wide performance with <a href="https://developer.nvidia.com/nsight-systems">NVIDIA Nsight Systems</a>. NSys is best for understanding the order and execution times of major subroutines (WENO, Riemann, etc.) in MFC. When used, <code>--nsys</code> will run the simulation and generate <code>.nsys-rep</code> files in the case directory for all targets. These files can then be imported into Nsight System's GUI, which can be downloaded <a href="https://developer.nvidia.com/nsight-systems/get-started#latest-Platforms">here</a>. It is best to run case files with a few timesteps to keep the report files small. Learn more about NVIDIA Nsight Systems <a href="https://docs.nvidia.com/nsight-systems/UserGuide/index.html">here</a>.</li>
+<li>Nsight Compute (NCU): <code>./mfc.sh run ... -t simulation --ncu [ncu flags]</code> allows one to conduct kernel-level profiling with <a href="https://developer.nvidia.com/nsight-compute">NVIDIA Nsight Compute</a>. NCU provides profiling information for every subroutine called and is more detailed than NSys. When used, <code>--ncu</code> will output profiling information for all subroutines, including elapsed clock cycles, memory used, and more after the simulation is run. Adding this argument will significantly slow the simulation and should only be used on case files with a few timesteps. Learn more about NVIDIA Nsight Compute <a href="https://docs.nvidia.com/nsight-compute/NsightCompute/index.html">here</a>.</li>
 </ul>
-<h2><a class="anchor" id="autotoc_md78"></a>
+<h3><a class="anchor" id="autotoc_md79"></a>
+AMD GPUs</h3>
+<ul>
+<li>Rocprof (ROC): <code>./mfc.sh run ... -t simulation --roc --hip-trace [rocprof flags]</code> allows one to visualize MFC's system-wide performance with <a href="https://ui.perfetto.dev/">Perfetto UI</a>. When used, <code>--roc</code> will run the simulation and generate files in the case directory for all targets. <code>results.json</code> can then be imported in <a href="https://ui.perfetto.dev/">Perfetto's UI</a>. Learn more about AMD Rocprof <a href="https://rocm.docs.amd.com/projects/rocprofiler/en/docs-5.5.1/rocprof.html">here</a> It is best to run case files with a few timesteps to keep the report files small.</li>
+<li>Omniperf (OMNI): <code>./mfc.sh run ... -t simulation --omni [omniperf flags]</code>allows one to conduct kernel-level profiling with <a href="https://rocm.github.io/omniperf/introduction.html#what-is-omniperf">AMD Omniperf</a>. When used, <code>--omni</code> will output profiling information for all subroutines, including rooflines, cache usage, register usage, and more after the simulation is run. Adding this argument will moderately slow down the simulation and run the MFC executable several times. For this reason it should only be used with case files that have a few timesteps.</li>
+</ul>
+<h2><a class="anchor" id="autotoc_md80"></a>
 Restarting Cases</h2>
 <p>When running a simulation, MFC generates a <code>./restart_data</code> folder in the case directory that contains <code>lustre_*.dat</code> files that can be used to restart a simulation from saved timesteps. This allows a user to simulate some timestep $X$, then continue it to run to another timestep $Y$, where $Y &gt; X$. The user can also choose to add new patches at the intermediate timestep.</p>
 <p>If you want to restart a simulation,</p>
@@ -268,7 +276,7 @@ <h2><a class="anchor" id="autotoc_md78"></a>
 <div class="line">./mfc.sh run examples/1D_vacuum_restart/restart_case.py -t pre_process simulation</div>
 <div class="line">./mfc.sh run examples/1D_vacuum_restart/case.py -t post_process</div>
 <div class="line">./mfc.sh run examples/1D_vacuum_restart/restart_case.py -t post_process</div>
-</div><!-- fragment --><h2><a class="anchor" id="autotoc_md79"></a>
+</div><!-- fragment --><h2><a class="anchor" id="autotoc_md81"></a>
 Example Runs</h2>
 <ul>
 <li>Oak Ridge National Laboratory's <a href="https://www.olcf.ornl.gov/summit/">Summit</a>:</li>
 
@@ -134,11 +134,11 @@
   <div class="headertitle"><div class="title">Testing</div></div>
 </div><!--header-->
 <div class="contents">
-<div class="textblock"><p><a class="anchor" id="autotoc_md80"></a> To run MFC's test suite, run </p><div class="fragment"><div class="line">./mfc.sh test -j &lt;thread count&gt;</div>
+<div class="textblock"><p><a class="anchor" id="autotoc_md82"></a> To run MFC's test suite, run </p><div class="fragment"><div class="line">./mfc.sh test -j &lt;thread count&gt;</div>
 </div><!-- fragment --><p>It will generate and run test cases, comparing their output to that of previous runs from versions of MFC considered to be accurate. <em>golden files</em>, stored in the <code>tests/</code> directory contain this data, by aggregating <code>.dat</code> files generated when running MFC. A test is considered passing when our error tolerances are met, in order to maintain a high level of stability and accuracy. Run <code>./mfc.sh test -h</code> for a full list of accepted arguments.</p>
 <p>Most notably, you can consult the full list of tests by running </p><div class="fragment"><div class="line">./mfc.sh test -l</div>
 </div><!-- fragment --><p>To restrict to a given range, use the <code>--from</code> (<code>-f</code>) and <code>--to</code> (<code>-t</code>) options. To run a (non-contiguous) subset of tests, use the <code>--only</code> (<code>-o</code>) option instead.</p>
-<h2><a class="anchor" id="autotoc_md81"></a>
+<h2><a class="anchor" id="autotoc_md83"></a>
 Creating Tests</h2>
 <p>To (re)generate <em>golden files</em>, append the <code>--generate</code> option: </p><div class="fragment"><div class="line">./mfc.sh test --generate -j 8</div>
 </div><!-- fragment --><p>It is recommended that a range be specified when generating golden files for new test cases, as described in the previous section, in an effort not to regenerate the golden files of existing test cases.</p>
@@ -182,7 +182,7 @@ <h2><a class="anchor" id="autotoc_md81"></a>
 </ul>
 <p>If a trace is empty (that is, the empty string <code>""</code>), it will not appear in the final trace, but any case parameter variations associated with it will still be applied.</p>
 <p>Finally, the case is appended to the <code>cases</code> list, which will be returned by the <code>generate_cases</code> function.</p>
-<h2><a class="anchor" id="autotoc_md82"></a>
+<h2><a class="anchor" id="autotoc_md84"></a>
 Testing Post Process</h2>
 <p>To test updated post process code, append the <code>-a</code> or <code>--test-all</code> option: </p><div class="fragment"><div class="line">./mfc.sh test -a -j 8</div>
 </div><!-- fragment --><p>This argument will re-run the test stack with <code>parallel_io=True</code>, which generates silo_hdf5 files. It will also turn most write parameters (<code>*_wrt</code>) on. Then, it searches through the silo files using <code>h5dump</code> to ensure that there are no NaNs or Infinitys. Although adding this option does not guarantee that accurate silo files are generated, it does ensure that post process does not fail or produce malformed data. </p>
 
@@ -134,14 +134,14 @@
   <div class="headertitle"><div class="title">Flow visualization</div></div>
 </div><!--header-->
 <div class="contents">
-<div class="textblock"><p><a class="anchor" id="autotoc_md83"></a> Post-processed database in Silo-HDF5 format can be visualized and analyzed using VisIt. VisIt is an open-source interactive parallel visualization and graphical analysis tool for viewing scientific data. Versions of VisIt after 2.6.0 have been confirmed to work with the MFC databases for some parallel environments. Nevertheless, installation and configuration of VisIt can be environment-dependent and are left to the user. Further remarks on parallel flow visualization, analysis and processing of MFC database using VisIt can also be found in <a href="references.md#Coralic15">Coralic (2015)</a>; <a href="references.md#Meng16">Meng (2016)</a>.</p>
-<h1><a class="anchor" id="autotoc_md84"></a>
+<div class="textblock"><p><a class="anchor" id="autotoc_md85"></a> Post-processed database in Silo-HDF5 format can be visualized and analyzed using VisIt. VisIt is an open-source interactive parallel visualization and graphical analysis tool for viewing scientific data. Versions of VisIt after 2.6.0 have been confirmed to work with the MFC databases for some parallel environments. Nevertheless, installation and configuration of VisIt can be environment-dependent and are left to the user. Further remarks on parallel flow visualization, analysis and processing of MFC database using VisIt can also be found in <a href="references.md#Coralic15">Coralic (2015)</a>; <a href="references.md#Meng16">Meng (2016)</a>.</p>
+<h1><a class="anchor" id="autotoc_md86"></a>
 Procedure</h1>
 <p>After post-process of simulation data (see section <a href="running.md#running-1">Running</a>), a folder that contains a silo-HDF5 database is created, named <code>silo_hdf5</code>. <code>silo_hdf5</code> includes directory named <code>root</code>, that contains index files for flow field data at each saved time step. The user can launch VisIt and open the index files under <code>/silo_hdf5/root</code>. Once the database is loaded, flow field variables contained in the database can be added to plot.</p>
 <p>As an example, the figure bellow shows the iso-contour of the liquid void fraction (<code>alpha1</code>) in the database generated by example case <code>3D_sphbubcollapse</code>. For analysis and processing of the database using VisIt's capability, the user is encouraged to address <a href="https://wci.llnl.gov/simulation/computer-codes/visit/manuals">VisIt user manual</a>.</p>
 <p><img src="../res/visit.png" alt="" class="inline"/></p>
 <p>*Iso-contour of the liquid void fraction (<code>alpha1</code>) in the database generated by example case <code>3D_sphbubcollapse</code>*</p>
-<h1><a class="anchor" id="autotoc_md85"></a>
+<h1><a class="anchor" id="autotoc_md87"></a>
 Serial data output</h1>
 <p>If <code>parallel_io = F</code> then MFC will output the conservative variables to a directory <code>D/</code>. If multiple cores are used ($\mathtt{ppn &gt; 1}$) then a separate file is created for each core. If there is only one coordinate dimension (<code>n = 0</code> and <code>p = 0</code>) then the primivative variables will also be written to <code>D/</code>. The file names correspond to the variables associated with each equation solved by MFC. They are written at every <code>t_step_save</code> time step. The conservative variables are</p>
 <p>$$ {(\rho \alpha)}_{1}, \dots, (\rho\alpha)_{N_c}, \rho u_{1}, \dots, \rho u_{N_d}, E, \alpha_1, \dots, \alpha_{N_c} $$</p>
 
@@ -119,15 +119,18 @@ var NAVTREE =
     [ "Running", "md_running.html", [
       [ "Interactive Execution", "md_running.html#autotoc_md75", null ],
       [ "Batch Execution", "md_running.html#autotoc_md76", [
-        [ "Profiling with NVIDIA Nsight", "md_running.html#autotoc_md77", null ],
-        [ "Restarting Cases", "md_running.html#autotoc_md78", null ],
-        [ "Example Runs", "md_running.html#autotoc_md79", null ]
+        [ "GPU Profiling", "md_running.html#autotoc_md77", [
+          [ "NVIDIA GPUs", "md_running.html#autotoc_md78", null ],
+          [ "AMD GPUs", "md_running.html#autotoc_md79", null ]
+        ] ],
+        [ "Restarting Cases", "md_running.html#autotoc_md80", null ],
+        [ "Example Runs", "md_running.html#autotoc_md81", null ]
       ] ]
     ] ],
     [ "Testing", "md_testing.html", null ],
     [ "Flow visualization", "md_visualization.html", [
-      [ "Procedure", "md_visualization.html#autotoc_md84", null ],
-      [ "Serial data output", "md_visualization.html#autotoc_md85", null ]
+      [ "Procedure", "md_visualization.html#autotoc_md86", null ],
+      [ "Serial data output", "md_visualization.html#autotoc_md87", null ]
     ] ]
   ] ]
 ];
 
@@ -79,11 +79,13 @@ var NAVTREEINDEX0 =
 "md_running.html#autotoc_md75":[7,0],
 "md_running.html#autotoc_md76":[7,1],
 "md_running.html#autotoc_md77":[7,1,0],
-"md_running.html#autotoc_md78":[7,1,1],
-"md_running.html#autotoc_md79":[7,1,2],
+"md_running.html#autotoc_md78":[7,1,0,0],
+"md_running.html#autotoc_md79":[7,1,0,1],
+"md_running.html#autotoc_md80":[7,1,1],
+"md_running.html#autotoc_md81":[7,1,2],
 "md_testing.html":[8],
 "md_visualization.html":[9],
-"md_visualization.html#autotoc_md84":[9,0],
-"md_visualization.html#autotoc_md85":[9,1],
+"md_visualization.html#autotoc_md86":[9,0],
+"md_visualization.html#autotoc_md87":[9,1],
 "pages.html":[]
 };
@@ -1,7 +1,6 @@
 var searchData=
 [
   ['norms_0',['Density Norms',['../md_examples.html#autotoc_md40',1,'']]],
-  ['nsight_1',['Profiling with NVIDIA Nsight',['../md_running.html#autotoc_md77',1,'']]],
-  ['nvidia_20nsight_2',['Profiling with NVIDIA Nsight',['../md_running.html#autotoc_md77',1,'']]],
-  ['nvidia_20v100_20gpu_3',['NVIDIA V100 GPU',['../md_expectedPerformance.html#autotoc_md57',1,'NVIDIA V100 GPU'],['../md_expectedPerformance.html#autotoc_md60',1,'NVIDIA V100 GPU']]]
+  ['nvidia_20gpus_1',['NVIDIA GPUs',['../md_running.html#autotoc_md78',1,'']]],
+  ['nvidia_20v100_20gpu_2',['NVIDIA V100 GPU',['../md_expectedPerformance.html#autotoc_md57',1,'NVIDIA V100 GPU'],['../md_expectedPerformance.html#autotoc_md60',1,'NVIDIA V100 GPU']]]
 ];
@@ -5,5 +5,5 @@ var searchData=
   ['ordering_2',['Ordering',['../md_case.html#autotoc_md25',1,'Conservative Variables Ordering'],['../md_case.html#autotoc_md26',1,'Primitive Variables Ordering']]],
   ['osher_20problem_201d_3',['Shu-Osher problem (1D)',['../md_examples.html#autotoc_md50',1,'']]],
   ['output_4',['7. Formatted Output',['../md_case.html#autotoc_md15',1,'']]],
-  ['output_5',['Serial data output',['../md_visualization.html#autotoc_md85',1,'']]]
+  ['output_5',['Serial data output',['../md_visualization.html#autotoc_md87',1,'']]]
 ];
@@ -8,14 +8,14 @@ var searchData=
   ['performance_20results_5',['Performance Results',['../md_expectedPerformance.html',1,'']]],
   ['phase_20change_20model_6',['11. Phase Change Model',['../md_case.html#autotoc_md19',1,'']]],
   ['points_7',['Points',['../md_expectedPerformance.html#autotoc_md61',1,'16M Grid Points'],['../md_expectedPerformance.html#autotoc_md62',1,'64M Grid Points']]],
-  ['post_20process_8',['Testing Post Process',['../md_testing.html#autotoc_md82',1,'']]],
+  ['post_20process_8',['Testing Post Process',['../md_testing.html#autotoc_md84',1,'']]],
   ['power9_20cpu_9',['Power9 CPU',['../md_expectedPerformance.html#autotoc_md58',1,'IBM Power9 CPU'],['../md_expectedPerformance.html#autotoc_md63',1,'IBM Power9 CPU']]],
   ['primitive_20variables_10',['Analytical Definition of Primitive Variables',['../md_case.html#autotoc_md8',1,'']]],
   ['primitive_20variables_20ordering_11',['Primitive Variables Ordering',['../md_case.html#autotoc_md26',1,'']]],
   ['problem_201d_12',['problem 1D',['../md_examples.html#autotoc_md44',1,'Lax shock tube problem (1D)'],['../md_examples.html#autotoc_md50',1,'Shu-Osher problem (1D)'],['../md_examples.html#autotoc_md41',1,'Titarev-Toro problem (1D)']]],
   ['problem_202d_13',['Isentropic vortex problem (2D)',['../md_examples.html#autotoc_md38',1,'']]],
   ['problem_202d_14',['Lid-Driven Cavity Problem (2D)',['../md_examples.html#autotoc_md47',1,'']]],
-  ['procedure_15',['Procedure',['../md_visualization.html#autotoc_md84',1,'']]],
-  ['process_16',['Testing Post Process',['../md_testing.html#autotoc_md82',1,'']]],
-  ['profiling_20with_20nvidia_20nsight_17',['Profiling with NVIDIA Nsight',['../md_running.html#autotoc_md77',1,'']]]
+  ['procedure_15',['Procedure',['../md_visualization.html#autotoc_md86',1,'']]],
+  ['process_16',['Testing Post Process',['../md_testing.html#autotoc_md84',1,'']]],
+  ['profiling_17',['GPU Profiling',['../md_running.html#autotoc_md77',1,'']]]
 ];
@@ -3,14 +3,14 @@ var searchData=
   ['readme_2emd_0',['readme.md',['../readme_8md.html',1,'']]],
   ['references_1',['References',['../md_references.html',1,'']]],
   ['references_2emd_2',['references.md',['../references_8md.html',1,'']]],
-  ['restarting_20cases_3',['Restarting Cases',['../md_running.html#autotoc_md78',1,'']]],
+  ['restarting_20cases_3',['Restarting Cases',['../md_running.html#autotoc_md80',1,'']]],
   ['result_4',['Result',['../md_examples.html#autotoc_md34',1,'Result'],['../md_examples.html#autotoc_md37',1,'Result'],['../md_examples.html#autotoc_md43',1,'Result'],['../md_examples.html#autotoc_md46',1,'Result'],['../md_examples.html#autotoc_md52',1,'Result']]],
   ['results_5',['Performance Results',['../md_expectedPerformance.html',1,'']]],
   ['riemann_20test_202d_6',['2D Riemann Test (2D)',['../md_examples.html#autotoc_md29',1,'']]],
   ['running_7',['Running',['../md_running.html',1,'']]],
   ['running_20an_20example_20case_8',['Running an Example Case',['../md_getting-started.html#autotoc_md69',1,'']]],
   ['running_20the_20test_20suite_9',['Running the Test Suite',['../md_getting-started.html#autotoc_md68',1,'']]],
   ['running_2emd_10',['running.md',['../running_8md.html',1,'']]],
-  ['runs_11',['Example Runs',['../md_running.html#autotoc_md79',1,'']]],
+  ['runs_11',['Example Runs',['../md_running.html#autotoc_md81',1,'']]],
   ['runtime_12',['1. Runtime',['../md_case.html#autotoc_md5',1,'']]]
 ];