Skip to content

JIT LTO Cagra Search#1807

Open
divyegala wants to merge 229 commits intorapidsai:mainfrom
divyegala:cagra-search-jit-lto
Open

JIT LTO Cagra Search#1807
divyegala wants to merge 229 commits intorapidsai:mainfrom
divyegala:cagra-search-jit-lto

Conversation

Comment thread cpp/src/neighbors/detail/cagra/jit_lto_kernels/compute_distance_kernel.cu.in Outdated
Comment thread cpp/src/neighbors/detail/cagra/jit_lto_kernels/setup_workspace_kernel.cu.in Outdated
Comment thread cpp/src/neighbors/detail/cagra/jit_lto_kernels/setup_workspace_kernel.cu.in Outdated
Comment thread cpp/src/neighbors/detail/cagra/jit_lto_kernels/compute_distance_kernel.cu.in Outdated
@@ -0,0 +1,98 @@
/*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restore interleaved_scan_fragments.hpp and remove this file. I'd like to keep the fragment/tag headers consistent.


#include "@fatbin_header_file@"
#include <cuvs/detail/jit_lto/FragmentEntry.hpp>
#include <cuvs/detail/jit_lto/registration_tags.hpp>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After restoring interleaved_scan_fragments.hpp, please remove this #include.

Comment on lines +52 to +53
# Registration fragment tags (FRAGMENT_TAG_FORMAT) can track GPU template params in .cu.in; map
# those scalars to tag_* names here so matrix JSON does not duplicate tag strings.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the existing conventions that we use in other matrix files. We use the *_abbrev substitutions for the kernel name and the tag name. Please don't pollute core CMake code with these implementation details.

- ${{ stdlib("c") }}
host:
- libnvjitlink-dev
- cuda-nvrtc-dev
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already added in #1804. Please remove it along with the rest.

Suggested change
- cuda-nvrtc-dev

Comment on lines +15 to +16
uint32_t PQ_BITS,
uint32_t PQ_LEN,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The absence of these template parameters from the arguments indicates that we may be able to reduce the number of dimensions we multiply this fragment out over. Let's do that and some general convention cleanup in a follow-up.

Comment thread cpp/CMakeLists.txt
)
target_compile_features(jit_lto_kernel_usage_requirements INTERFACE cuda_std_20)
target_link_libraries(jit_lto_kernel_usage_requirements INTERFACE rmm::rmm raft::raft CCCL::CCCL)
target_compile_definitions(jit_lto_kernel_usage_requirements INTERFACE CUVS_ENABLE_JIT_LTO)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? We've eliminated the non-JIT path at this point.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, due to the compute_distance.hpp header that's used both by JIT fatbins and library TUs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rework the file structure in a follow-up to remove this preprocessor definition (unless you want to do it in this PR).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure at the moment if it is possible to do that. Do you have any ideas?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at how IVF-flat and IVF-PQ are structured. Each fragment includes device_functions.cuh, and the implementations are split up into their own .cuh files.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the issue here. It's that the fatbins and library TUs both need full visibility of the types defined by compute_distance_standard/vpq.

Both these types have extern functions that are linked in at runtime for JIT, but keeping the extern definition for library TUs is not valid. So the macro helps with determining how to complete the type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing that's currently gated on CUVS_ENABLE_JIT_LTO is whether dist_op() and apply_normalization_standard() are extern or defined in that file. AFAICT, those functions are not used anywhere except in JIT+LTO fragments, and are orthogonal to standard_dataset_descriptor_init_kernel() and standard_descriptor_spec. Is it not possible to split these into their own separate files?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are used by compute_distance_standard-impl.cuh and compute_distance_vpq-impl.cuh files. Those 2 files are defining types that have member functions that call these functions, and need to be visible to the kernels but also to the normal TUs.

Comment thread dependencies.yaml
- libcusolver-dev
- libcusparse-dev
- libnvjitlink-dev
- cuda-nvrtc-dev
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this as it was already added in #1804.

struct tag_metric_inner_product {};
struct tag_metric_cosine {};
struct tag_metric_hamming {};
struct tag_team_8 {};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use integer values instead of dedicated types for the registration tags. That's what we did for IVF-flat and IVF-PQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Apply JIT-LTO to CAGRA search kernels

4 participants