Skip to content
This repository was archived by the owner on Feb 5, 2024. It is now read-only.

Commit 8846ef4

Browse files
author
Ruyman Reyes Castro
committed
Fixing indentation on list
1 parent 7dcdbe9 commit 8846ef4

File tree

1 file changed

+30
-29
lines changed

1 file changed

+30
-29
lines changed

language/README.rst

Lines changed: 30 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -77,38 +77,39 @@ Hyesun Hong,
7777
* CXL-PNM is the CXL variant for PNM, can work with multiple PIM
7878

7979
SYCL Extension for PIM/PNM
80-
* Goals
81-
* Seamlessly integrate PIM/PNM operation into SYCL
82-
* Allow combination of xGPU and PIM/PNM in one device kernel
83-
* Not specific to one hardware
84-
* Design
85-
* Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
86-
* Model as special function unit
87-
* Aligns with trends to model special functional units inside accelerators
88-
* Compiler automatic mapping often not possible
89-
* joint_matrix
90-
* Group functions
91-
* Easy to use
92-
* Can easily be combined with device code
93-
* Give necessary convergence guarantees
94-
* Recap of SYCL work-item, work-group and group functions
95-
* Group functions must be encountered in converged control flow
80+
* Work in collaboration with Codeplay Software team
81+
* Goals
82+
* Seamlessly integrate PIM/PNM operation into SYCL
83+
* Allow combination of xGPU and PIM/PNM in one device kernel
84+
* Not specific to one hardware
85+
* Design
86+
* Vector operation seem like natural fit, but no convergence guarantee and vector size explicit
87+
* Model as special function unit
88+
* Aligns with trends to model special functional units inside accelerators
89+
* Compiler automatic mapping often not possible
90+
* joint_matrix
91+
* Group functions
92+
* Easy to use
93+
* Can easily be combined with device code
94+
* Give necessary convergence guarantees
95+
* Recap of SYCL work-item, work-group and group functions
96+
* Group functions must be encountered in converged control flow
9697
* Extension
97-
* Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
98-
* Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
98+
* Extended group functions with additional overload of joint_reduce and new joint_transform and joint_inner_product
99+
* Block size as template parameter, number of blocks as runtime parameter -> allows calculation of number of elements to process
99100
* Extension for PNM
100-
* Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
101+
* Added new overloads of joint_exclusive_scan, joint_inclusive_scan, reduce_over_group
101102
* PNM standalone has less opportunity for parallelism, also limited by memory controller
102-
* -> Combine PNM and PIM, PNM generates commands for PIM blocks
103+
* -> Combine PNM and PIM, PNM generates commands for PIM blocks
103104
* Two modes
104105
* PIM mode: PIM blocks can operate independently, can choose number of blocks
105106
* PNM mode: Synchronized execution on multiple PIM blocks
106107
* Mapping
107108
* Every PIM block is one work-item
108109
* PNM with attached PIM blocks forms one work-group
109110
* Execution
110-
* Work-item operations map to PIM operation
111-
* Group functions map to PNM operation
111+
* Work-item operations map to PIM operation
112+
* Group functions map to PNM operation
112113
* Example
113114
* work-item execution maps to PIM
114115
* group function maps to PNM
@@ -117,15 +118,15 @@ SYCL Extension for PIM/PNM
117118

118119
Q&A
119120
* Are the proposed functions specific to PIM or could also be used with other HW?
120-
* Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
121-
* Can also map nicely to other types of hardware, for example vector processor
121+
* Can also be used with other hardware. Semantics not PIM-specific, but translation of C++ to SYCL
122+
* Can also map nicely to other types of hardware, for example vector processor
122123
* Why have the user explicitly specify a block-size?
123-
* Not a hardware detail
124-
* Rather a promise by the user that data-blocks will always be at least that big
125-
* Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
124+
* Not a hardware detail
125+
* Rather a promise by the user that data-blocks will always be at least that big
126+
* Promise allows device compiler to perform optimizations, efficient looping inside PIM unit
126127
* Could num_blocks runtime parameter be replaced by iterator, requiring to be divisable by block-size
127-
* Yes, that is possible, mainly a design question
128-
* Current version might have additional implications regarding alignment
128+
* Yes, that is possible, mainly a design question
129+
* Current version might have additional implications regarding alignment
129130

130131

131132
2023-06-05

0 commit comments

Comments
 (0)