-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
tensor_parser.builder.build_tensor() should really be parallelized. Parallelism comes from either parsing multiple CSV files at once, or from beginning with a split of the data and then parallelizing over splits.
- Index maps can be constructed in parallel as long as the process-local sets are unioned and counts are summed.
- Tensor non-zeros can similarly be done in parallel as long as the various tensor files are concatenated.
- Sorting is already parallel.
Merging duplicates will take some more thought, as duplicates could cross partitions.