Skip to content

Parallelization #18

@ShadenSmith

Description

@ShadenSmith

tensor_parser.builder.build_tensor() should really be parallelized. Parallelism comes from either parsing multiple CSV files at once, or from beginning with a split of the data and then parallelizing over splits.

  • Index maps can be constructed in parallel as long as the process-local sets are unioned and counts are summed.
  • Tensor non-zeros can similarly be done in parallel as long as the various tensor files are concatenated.
  • Sorting is already parallel.

Merging duplicates will take some more thought, as duplicates could cross partitions.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions