You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>MFC has been benchmarked on several CPUs and GPU devices. This page shows a summary of these results.</p>
138
+
<p>MFC has been benchmarked on several CPUs and GPU devices. This page is a summary of these results.</p>
139
139
<h1><aclass="anchor" id="autotoc_md67"></a>
140
140
Figure of merit: Grind time performance</h1>
141
-
<p>The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time. We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid). The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver. This case is located in <code>examples/3D_performance_test</code>.</p>
142
-
<p>Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then. All results are for the compiler that gave the best performance. Note:</p><ul>
143
-
<li>CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device. CPU results are for the best performance we achieved using a single socket (or die). These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.</li>
144
-
<li>GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are <em>not</em> for single-precision computation. AMD MI250X GPUs have two graphics compute dies (GCDs) per MI250X device; we report results for one GCD, though one can quickly estimate full MI250X runtime by halving the single GCD grind time number.</li>
141
+
<p>The following table outlines observed performance as nanoseconds per grid point (ns/gp) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time. We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid). The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver. This case is located in <code>examples/3D_performance_test</code>. You can run it via <code>./mfc.sh run -n <num_processors> -j $(nproc) ./examples/3D_performance_test/case.py -t pre_process simulation --case-optimization</code>, which will build an optimized version of the code for this case then execute it. If the above does not work on your machine, see the rest of this documentation for other ways to use the <code>./mfc.sh run</code> command.</p>
142
+
<p>Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then. Similar performance is also seen for other problem configurations, such as the Euler equations (4 PDEs). All results are for the compiler that gave the best performance. Note:</p><ul>
143
+
<li>CPU results may be performed on CPUs with more cores than reported in the table; we report results for the best performance given the full processor die by checking the performance for different core counts on that device. CPU results are the best performance we achieved using a single socket (or die). These are reported as (X/Y cores), where X is the used cores, and Y is the total on the die.</li>
144
+
<li>GPU results are for a single GPU device. For single-precision (SP) GPUs, we performed computation in double-precision via conversion in compiler/software; these numbers are <em>not</em> for single-precision computation. AMD MI250X and MI300A GPUs have multiple graphics compute dies (GCDs) per device; we report results for one <em>GCD</em>*, though one can quickly estimate full device runtime by dividing the grind time number by the number of GCDs on the device (the MI250X has 2 GCDs). We gratefully acknowledge the permission of LLNL, HPE/Cray, and AMD for permission to release MI300A performance numbers.</li>
145
145
</ul>
146
146
<tableclass="markdownTable">
147
147
<trclass="markdownTableHead">
148
-
<thclass="markdownTableHeadRight">Hardware </th><thclass="markdownTableHeadRight"></th><thclass="markdownTableHeadRight">Grind Time </th><thclass="markdownTableHeadLeft">Compiler </th><thclass="markdownTableHeadLeft">Computer </th></tr>
148
+
<thclass="markdownTableHeadRight">Hardware </th><thclass="markdownTableHeadRight"></th><thclass="markdownTableHeadRight">Grind Time [ns] </th><thclass="markdownTableHeadLeft">Compiler </th><thclass="markdownTableHeadLeft">Computer </th></tr>
<p>Strong scaling results are obtained by keeping the problem size constant and increasing the number of processes so that work per process decreases.</p>
229
229
<h2><aclass="anchor" id="autotoc_md73"></a>
230
230
NVIDIA V100 GPU</h2>
231
-
<p>For these tests, the base case utilizes 8 GPUs with one MPI process per GPU. The performance is analyzed at two different problem sizes of 16M and 64M grid points, with the base case using 2M and 8M grid points per process.</p>
231
+
<p>The base case utilizes 8 GPUs with one MPI process per GPU for these tests. The performance is analyzed at two problem sizes: 16M and 64M grid points. The "base case" uses 2M and 8M grid points per process.</p>
<li><ahref="https://strawberryperl.com/">Strawberry Perl</a> (Install and add <code>C:\strawberry\perl\bin\perl.exe</code> or your installation path to your <ahref="https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/">PATH</a>) Please note that Visual Studio must be installed first, and the oneAPI Toolkits need to be configured with the installed Visual Studio, even if you plan to use a different IDE.</li>
211
211
</ul>
212
-
<p>Then, in order to initialize your development environment, run the following command (or your installation path) in command prompt: </p><divclass="fragment"><divclass="line">"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"</div>
212
+
<p>Then, to initialize your development environment, run the following command (or your installation path) in the command prompt: </p><divclass="fragment"><divclass="line">"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"</div>
213
213
</div><!-- fragment --><p> Alternatively, you can run the following command in Powershell: </p><divclass="fragment"><divclass="line">cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'</div>
214
214
</div><!-- fragment --><p> You could verify the initialization by typing <code>where mpiexec</code> in the command prompt terminal (does not work in Powershell), which should return the path to the Intel MPI executable. To continue following this guide, please stay in the initialized terminal window. Replace <code>./mfc.sh</code> with <code>.\mfc.bat</code> for all commands.</p>
215
215
<p>If <code>.\mfc.bat build</code> produces errors, please run the command again. Repeating this process three times should resolve all errors (once each for pre_process, simulation, and post_process). If the same error persists after each attempt, please verify that you have installed all required software and properly initialized the development environment. If uncertain, you could try deleting the build directory and starting over.</p>
</div><!-- fragment --><p>An editor should open. Please paste the following lines into it before saving the file. If you wish to use a version of GNU's GCC other than 13, modify the first assignment. These lines ensure that LLVM's Clang, and Apple's modified version of GCC, won't be used to compile MFC. Further reading on <code>open-mpi</code> incompatibility with <code>clang</code>-based <code>gcc</code> on macOS: <ahref="https://stackoverflow.com/questions/27930481/how-to-build-openmpi-with-homebrew-and-gcc-4-9">here</a>. We do <em>not</em> support <code>clang</code> due to conflicts with the Silo dependency.</p>
235
+
</div><!-- fragment --><p>An editor should open. Please paste the following lines into it before saving the file. Modify the first assignment if you wish to use a different version of GNU's GCC. These lines ensure that LLVM's Clang and Apple's modified version of GCC are not used to compile MFC. Further reading on <code>open-mpi</code> incompatibility with <code>clang</code>-based <code>gcc</code> on macOS: <ahref="https://stackoverflow.com/questions/27930481/how-to-build-openmpi-with-homebrew-and-gcc-4-9">here</a>. We do <em>not</em> support <code>clang</code> due to conflicts with the Silo dependency.</p>
<p>Docker is a lightweight, cross-platform, and performant alternative to Virtual Machines (VMs). We build a Docker Image that contains the packages required to build and run MFC on your local machine.</p>
</div><!-- fragment --><p>To fetch the prebuilt Docker image and enter an interactive bash session with the recommended settings applied, run</p>
261
261
<divclass="fragment"><divclass="line">./mfc.sh docker # If on \*nix/macOS</div>
262
262
<divclass="line">.\mfc.bat docker # If on Windows</div>
263
-
</div><!-- fragment --><p>We automatically mount and configure the proper permissions in order for you to access your local copy of MFC, available at <code>~/MFC</code>. You will be logged-in as the <code>me</code> user with root permissions.</p>
264
-
<p>:warning: The state of your container is entirely transient, except for the MFC mount. Thus, any modification outside of <code>~/MFC</code> should be considered as permanently lost upon session exit.</p>
263
+
</div><!-- fragment --><p>We automatically mount and configure the proper permissions for you to access your local copy of MFC, available at <code>~/MFC</code>. You will be loggedin as the <code>me</code> user with root permissions.</p>
264
+
<p>:warning: The state of your container is entirely transient, except for the MFC mount. Thus, any modification outside of <code>~/MFC</code> should be considered permanently lost upon session exit.</p>
<p><em>⚠️ The <code>--gpu</code> option requires that your compiler supports OpenACC for Fortran for your target GPU architecture.</em></p>
285
-
<p>When these options are given to <code>mfc.sh</code>, they will be remembered when you issue future commands. You can enable and disable features at any time by passing any of the arguments above. For example, if you have previously built MFC with MPI support and no longer wish to run using MPI, you can pass <code>--no-mpi</code> once, for the change to be permanent.</p>
286
-
<p>MFC is composed of three codes, each being a separate <em>target</em>. By default, all targets (<code>pre_process</code>, <code>simulation</code>, and <code>post_process</code>) are selected. To only select a subset, use the <code>-t</code> (i.e., <code>--targets</code>) argument. For a detailed list of options, arguments, and features, please refer to <code>./mfc.sh build --help</code>.</p>
285
+
<p>When these options are given to <code>mfc.sh</code>, they will be remembered when you issue future commands. You can enable and disable features anytime by passing any of the arguments above. For example, if you previously built MFC with MPI support and no longer wish to run using MPI, you can pass <code>--no-mpi</code> once, making the change permanent.</p>
286
+
<p>MFC comprises three codes, each being a separate <em>target</em>. By default, all targets (<code>pre_process</code>, <code>simulation</code>, and <code>post_process</code>) are selected. To only select a subset, use the <code>-t</code> (i.e., <code>--targets</code>) argument. For a detailed list of options, arguments, and features, please refer to <code>./mfc.sh build --help</code>.</p>
287
287
<p>Most first-time users will want to build MFC using 8 threads (or more!) with MPI support: </p><divclass="fragment"><divclass="line">./mfc.sh build -j 8</div>
0 commit comments