You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>Reference: Hillewaert, K. (2013). TestCase C3.5 - DNS of the transition of the Taylor-Green vortex, Re=1600 - Introduction and result summary. 2nd International Workshop on high-order methods for CFD.</p>
173
+
<h2><aclass="anchor" id="autotoc_md35"></a>
174
+
Final Condition</h2>
175
+
<p>This figure shows the isosurface with zero q-criterion. <imgsrc="result-3D_TaylorGreenVortex-example.png" alt="" class="inline" title="Density"/></p>
176
+
<h1><aclass="anchor" id="autotoc_md36"></a>
171
177
Shu-Osher problem (1D)</h1>
172
178
<p>Reference: C. W. Shu, S. Osher, Efficient implementation of essentially non-oscillatory shock-capturing schemes, Journal of Computational Physics 77 (2) (1988) 439–471. doi:10.1016/0021-9991(88)90177-5.</p>
<p>Reference: V. A. Titarev, E. F. Toro, Finite-volume WENO schemes for three-dimensional conservation laws, Journal of Computational Physics 201 (1) (2004) 238–260.</p>
<p>The <ahref="case.py"><b>Scaling</b></a> case can exercise both weak- and strong-scaling. It adjusts itself depending on the number of requested ranks.</p>
226
235
<p>This directory also contains a collection of scripts used to test strong-scaling on OLCF Frontier. They required modifying MFC to collect some metrics but are meant to serve as a reference to users wishing to run similar experiments.</p>
227
-
<h2><aclass="anchor" id="autotoc_md47"></a>
236
+
<h2><aclass="anchor" id="autotoc_md49"></a>
228
237
Weak Scaling</h2>
229
238
<p>Pass <code>--scaling weak</code>. The <code>--memory</code> option controls (approximately) how much memory each rank should use, in Gigabytes. The number of cells in each dimension is then adjusted according to the number of requested ranks and an approximation for the relation between cell count and memory usage. The problem size increases linearly with the number of ranks.</p>
230
-
<h2><aclass="anchor" id="autotoc_md48"></a>
239
+
<h2><aclass="anchor" id="autotoc_md50"></a>
231
240
Strong Scaling</h2>
232
241
<p>Pass <code>--scaling strong</code>. The <code>--memory</code> option controls (approximately) how much memory should be used in total during simulation, across all ranks, in Gigabytes. The problem size remains constant as the number of ranks increases.</p>
233
-
<h2><aclass="anchor" id="autotoc_md49"></a>
242
+
<h2><aclass="anchor" id="autotoc_md51"></a>
234
243
Example</h2>
235
244
<p>For example, to run a weak-scaling test that uses ~4GB of GPU memory per rank on 8 2-rank nodes with case optimization, one could:</p>
236
245
<divclass="fragment"><divclass="line">./mfc.sh run examples/scaling/case.py -t pre_process simulation \</div>
<p>Reference: P. D. Lax, Weak solutions of nonlinear hyperbolic equations and their numerical computation, Communications on pure and applied mathematics 7 (1) (1954) 159–193.</p>
<p>Reference: Chamarthi, A., & Hoffmann, N., & Nishikawa, H., & Frankel S. (2023). Implicit gradients based conservative numerical scheme for compressible flows. arXiv:2110.05461</p>
<p>MFC has been benchmarked on several CPUs and GPU devices. This page shows a summary of these results.</p>
139
-
<h1><aclass="anchor" id="autotoc_md63"></a>
139
+
<h1><aclass="anchor" id="autotoc_md67"></a>
140
140
Figure of merit: Grind time performance</h1>
141
141
<p>The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better), also known as the grind time. We solve an example 3D, inviscid, 5-equation model problem with two advected species (8 PDEs) and 8M grid points (158-cubed uniform grid). The numerics are WENO5 finite volume reconstruction and HLLC approximate Riemann solver. This case is located in <code>examples/3D_performance_test</code>.</p>
142
142
<p>Results are for MFC v4.9.3 (July 2024 release), though numbers have not changed meaningfully since then. All results are for the compiler that gave the best performance. Note:</p><ul>
<p><b>All grind times are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.</b></p>
207
-
<h1><aclass="anchor" id="autotoc_md64"></a>
207
+
<h1><aclass="anchor" id="autotoc_md68"></a>
208
208
Weak scaling</h1>
209
209
<p>Weak scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant.</p>
210
-
<h2><aclass="anchor" id="autotoc_md65"></a>
210
+
<h2><aclass="anchor" id="autotoc_md69"></a>
211
211
AMD MI250X GPU</h2>
212
212
<p>MFC weask scales to (at least) 65,536 AMD MI250X GPUs on OLCF Frontier with 96% efficiency. This corresponds to 87% of the entire machine.</p>
<p>Strong scaling results are obtained by keeping the problem size constant and increasing the number of processes so that work per process decreases.</p>
225
-
<h2><aclass="anchor" id="autotoc_md69"></a>
225
+
<h2><aclass="anchor" id="autotoc_md73"></a>
226
226
NVIDIA V100 GPU</h2>
227
227
<p>For these tests, the base case utilizes 8 GPUs with one MPI process per GPU. The performance is analyzed at two different problem sizes of 16M and 64M grid points, with the base case using 2M and 8M grid points per process.</p>
<p>:warning: The state of your container is entirely transient, except for the MFC mount. Thus, any modification outside of <code>~/MFC</code> should be considered as permanently lost upon session exit.</p>
265
265
<p></p>
266
266
</details>
267
-
<h1><aclass="anchor" id="autotoc_md76"></a>
267
+
<h1><aclass="anchor" id="autotoc_md80"></a>
268
268
Building MFC</h1>
269
269
<p>MFC can be built with support for various (compile-time) features:</p>
0 commit comments