@@ -79,52 +79,91 @@ Hyesun Hong,
7979SYCL Extension for PIM/PNM
8080* Work in collaboration with Codeplay Software team
8181* Goals
82+
8283 * Seamlessly integrate PIM/PNM operation into SYCL
8384 * Allow combination of xGPU and PIM/PNM in one device kernel
8485 * Not specific to one hardware
86+
8587* Design
86- * Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
88+
89+ * Vector operation seem like natural fit
90+ * no convergence guarantee and vector size explicit
91+
8792* Model as special function unit
93+
8894 * Aligns with trends to model special functional units inside accelerators
8995 * Compiler automatic mapping often not possible
90- * joint_matrix
96+ * joint_matrix-like interface
97+
98+
9199* Group functions
100+
92101 * Easy to use
93102 * Can easily be combined with device code
94103 * Give necessary convergence guarantees
104+
105+
95106* Recap of SYCL work-item, work-group and group functions
107+
96108 * Group functions must be encountered in converged control flow
109+
97110* Extension
98- * Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
99- * Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
111+
112+ * Extended group functions with additional overload of joint_reduce
113+ * and new joint_transform and joint_inner_product
114+ * Block size as template parameter, number of blocks as runtime parameter
115+ * allows calculation of number of elements to process
116+
100117* Extension for PNM
118+
101119 * Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
120+
102121* PNM standalone has less opportunity for parallelism, also limited by memory controller
122+
103123 * -> Combine PNM and PIM, PNM generates commands for PIM blocks
124+
104125* Two modes
126+
105127 * PIM mode: PIM blocks can operate independently, can choose number of blocks
106128 * PNM mode: Synchronized execution on multiple PIM blocks
129+
107130* Mapping
131+
108132 * Every PIM block is one work-item
109133 * PNM with attached PIM blocks forms one work-group
134+
110135* Execution
136+
111137 * Work-item operations map to PIM operation
112138 * Group functions map to PNM operation
139+
113140* Example
141+
114142 * work-item execution maps to PIM
115143 * group function maps to PNM
144+
116145* Conclusion
146+
117147 * Integrate support for PIM/PNM into SYCL
118148
119149Q&A
120- * Are the proposed functions specific to PIM or could also be used with other HW?
121- * Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
150+ * Are the proposed functions specific to PIM, could also be used with other HW?
151+
152+ * Can also be used with other hardware.
153+ * Semantics not PIM-specific, but translation of C++ to SYCL
122154 * Can also map nicely to other types of hardware, for example vector processor
155+
123156* Why have the user explicitly specify a block-size?
157+
124158 * Not a hardware detail
125- * Rather a promise by the user that data-blocks will always be at least that big
126- * Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
127- * Could num_blocks runtime parameter be replaced by iterator, requiring to be divisable by block-size
159+ * Rather a promise by the user that data-blocks
160+ will always be at least that big
161+ * Promise allows device compiler to perform optimizations,
162+ efficient looping inside PIM unit
163+
164+ * Could num_blocks runtime parameter be replaced by iterator?
165+
166+ * requires to be divisable by block-size
128167 * Yes, that is possible, mainly a design question
129168 * Current version might have additional implications regarding alignment
130169
0 commit comments