|
137 | 137 | <div class="textblock"><p><a class="anchor" id="autotoc_md54"></a> MFC has been benchmarked on several CPUs and GPU devices. This page shows a summary of these results.</p> |
138 | 138 | <h1><a class="anchor" id="autotoc_md55"></a> |
139 | 139 | Expected time-steps/hour</h1> |
140 | | -<p>The following table outlines observed performance as nanoseconds per grid point (ns/GP) per right-hand side evaluation (lower is better). We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs). The numerics are WENO5 and the HLLC approximate Riemann solver. We report results for various numbers of grid points per CPU die (or GPU device) and hardware.</p> |
| 140 | +<p>The following table outlines observed performance as nanoseconds per grid point (ns/GP) per equation (eq) per right-hand side (rhs) evaluation (lower is better). We solve an example 3D, inviscid, 5-equation model problem with two advected species (a total of 8 PDEs). The numerics are WENO5 and the HLLC approximate Riemann solver. This case is located in <code>examples/3D_performance_test</code>. We report results for various numbers of grid points per CPU die (or GPU device) and hardware.</p> |
141 | 141 | <table class="markdownTable"> |
142 | 142 | <tr class="markdownTableHead"> |
143 | 143 | <th class="markdownTableHeadRight">Hardware </th><th class="markdownTableHeadCenter"></th><th class="markdownTableHeadCenter">1M GPs </th><th class="markdownTableHeadCenter">4M GPs </th><th class="markdownTableHeadCenter">8M GPs </th><th class="markdownTableHeadCenter">Compiler </th><th class="markdownTableHeadLeft">Computer </th></tr> |
144 | 144 | <tr class="markdownTableRowOdd"> |
145 | | -<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">96 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
| 145 | +<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">12.0 </td><td class="markdownTableBodyCenter">13.0 </td><td class="markdownTableBodyCenter">13.0 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
146 | 146 | <tr class="markdownTableRowEven"> |
147 | | -<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">101 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">104 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr> |
| 147 | +<td class="markdownTableBodyRight">NVIDIA V100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">12.6 </td><td class="markdownTableBodyCenter">13.0 </td><td class="markdownTableBodyCenter">13.0 </td><td class="markdownTableBodyCenter">NVHPC 22.11 </td><td class="markdownTableBodyLeft">OLCF Summit </td></tr> |
148 | 148 | <tr class="markdownTableRowOdd"> |
149 | | -<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">71 </td><td class="markdownTableBodyCenter">56 </td><td class="markdownTableBodyCenter">59 </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr> |
| 149 | +<td class="markdownTableBodyRight">NVIDIA A100 </td><td class="markdownTableBodyCenter">1 device </td><td class="markdownTableBodyCenter">8.9 </td><td class="markdownTableBodyCenter">7.0 </td><td class="markdownTableBodyCenter">7.4 </td><td class="markdownTableBodyCenter">NVHPC 23.5 </td><td class="markdownTableBodyLeft">Wingtip </td></tr> |
150 | 150 | <tr class="markdownTableRowEven"> |
151 | | -<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 GCD </td><td class="markdownTableBodyCenter">108 </td><td class="markdownTableBodyCenter">90 </td><td class="markdownTableBodyCenter">96 </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr> |
| 151 | +<td class="markdownTableBodyRight">AMD MI250X </td><td class="markdownTableBodyCenter">1 GCD </td><td class="markdownTableBodyCenter">13.5 </td><td class="markdownTableBodyCenter">11.3 </td><td class="markdownTableBodyCenter">12 </td><td class="markdownTableBodyCenter">CCE 16.0.1 </td><td class="markdownTableBodyLeft">OLCF Frontier </td></tr> |
152 | 152 | <tr class="markdownTableRowOdd"> |
153 | | -<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 cores </td><td class="markdownTableBodyCenter">1963 </td><td class="markdownTableBodyCenter">1688 </td><td class="markdownTableBodyCenter">1686 </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
| 153 | +<td class="markdownTableBodyRight">Intel Xeon Gold 6226 </td><td class="markdownTableBodyCenter">12 cores </td><td class="markdownTableBodyCenter">245 </td><td class="markdownTableBodyCenter">211 </td><td class="markdownTableBodyCenter">211 </td><td class="markdownTableBodyCenter">GNU 10.3.0 </td><td class="markdownTableBodyLeft">PACE Phoenix </td></tr> |
154 | 154 | <tr class="markdownTableRowEven"> |
155 | | -<td class="markdownTableBodyRight">Apple M2 </td><td class="markdownTableBodyCenter">6 cores </td><td class="markdownTableBodyCenter">2919 </td><td class="markdownTableBodyCenter">245 </td><td class="markdownTableBodyCenter">4500 </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr> |
| 155 | +<td class="markdownTableBodyRight">Apple M2 </td><td class="markdownTableBodyCenter">6 cores </td><td class="markdownTableBodyCenter">365 </td><td class="markdownTableBodyCenter">306 </td><td class="markdownTableBodyCenter">563 </td><td class="markdownTableBodyCenter">GNU 13.2.0 </td><td class="markdownTableBodyLeft">N/A </td></tr> |
156 | 156 | </table> |
157 | | -<p><b>All results are in nanoseconds (ns) per grid point (gp) per right-hand side (rhs) evaluation. Lower is better.</b></p> |
| 157 | +<p><b>All results are in nanoseconds (ns) per grid point (gp) per equation (eq) per right-hand side (rhs) evaluation, so X ns/gp/eq/rhs. Lower is better.</b></p> |
158 | 158 | <h1><a class="anchor" id="autotoc_md56"></a> |
159 | 159 | Weak scaling</h1> |
160 | 160 | <p>Weak scaling results are obtained by increasing the problem size with the number of processes so that work per process remains constant.</p> |
|
0 commit comments