Skip to content
This repository was archived by the owner on Feb 5, 2024. It is now read-only.

Commit 6e2c43b

Browse files
author
Ruyman Reyes Castro
committed
Re-formatting lists
1 parent 8846ef4 commit 6e2c43b

File tree

1 file changed

+48
-9
lines changed

1 file changed

+48
-9
lines changed

language/README.rst

Lines changed: 48 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -79,52 +79,91 @@ Hyesun Hong,
7979
SYCL Extension for PIM/PNM
8080
* Work in collaboration with Codeplay Software team
8181
* Goals
82+
8283
* Seamlessly integrate PIM/PNM operation into SYCL
8384
* Allow combination of xGPU and PIM/PNM in one device kernel
8485
* Not specific to one hardware
86+
8587
* Design
86-
* Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
88+
89+
* Vector operation seem like natural fit
90+
* no convergence guarantee and vector size explicit
91+
8792
* Model as special function unit
93+
8894
* Aligns with trends to model special functional units inside accelerators
8995
* Compiler automatic mapping often not possible
90-
* joint_matrix
96+
* joint_matrix-like interface
97+
98+
9199
* Group functions
100+
92101
* Easy to use
93102
* Can easily be combined with device code
94103
* Give necessary convergence guarantees
104+
105+
95106
* Recap of SYCL work-item, work-group and group functions
107+
96108
* Group functions must be encountered in converged control flow
109+
97110
* Extension
98-
* Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
99-
* Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
111+
112+
* Extended group functions with additional overload of joint_reduce
113+
* and new joint_transform and joint_inner_product
114+
* Block size as template parameter, number of blocks as runtime parameter
115+
* allows calculation of number of elements to process
116+
100117
* Extension for PNM
118+
101119
* Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
120+
102121
* PNM standalone has less opportunity for parallelism, also limited by memory controller
122+
103123
* -> Combine PNM and PIM, PNM generates commands for PIM blocks
124+
104125
* Two modes
126+
105127
* PIM mode: PIM blocks can operate independently, can choose number of blocks
106128
* PNM mode: Synchronized execution on multiple PIM blocks
129+
107130
* Mapping
131+
108132
* Every PIM block is one work-item
109133
* PNM with attached PIM blocks forms one work-group
134+
110135
* Execution
136+
111137
* Work-item operations map to PIM operation
112138
* Group functions map to PNM operation
139+
113140
* Example
141+
114142
* work-item execution maps to PIM
115143
* group function maps to PNM
144+
116145
* Conclusion
146+
117147
* Integrate support for PIM/PNM into SYCL
118148

119149
Q&A
120-
* Are the proposed functions specific to PIM or could also be used with other HW?
121-
* Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
150+
* Are the proposed functions specific to PIM, could also be used with other HW?
151+
152+
* Can also be used with other hardware.
153+
* Semantics not PIM-specific, but translation of C++ to SYCL
122154
* Can also map nicely to other types of hardware, for example vector processor
155+
123156
* Why have the user explicitly specify a block-size?
157+
124158
* Not a hardware detail
125-
* Rather a promise by the user that data-blocks will always be at least that big
126-
* Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
127-
* Could num_blocks runtime parameter be replaced by iterator, requiring to be divisable by block-size
159+
* Rather a promise by the user that data-blocks
160+
will always be at least that big
161+
* Promise allows device compiler to perform optimizations,
162+
efficient looping inside PIM unit
163+
164+
* Could num_blocks runtime parameter be replaced by iterator?
165+
166+
* requires to be divisable by block-size
128167
* Yes, that is possible, mainly a design question
129168
* Current version might have additional implications regarding alignment
130169

0 commit comments

Comments
 (0)