Conversation
e870afc to
dca579a
Compare
| @@ -0,0 +1,98 @@ | |||
| /* | |||
There was a problem hiding this comment.
Please restore interleaved_scan_fragments.hpp and remove this file. I'd like to keep the fragment/tag headers consistent.
|
|
||
| #include "@fatbin_header_file@" | ||
| #include <cuvs/detail/jit_lto/FragmentEntry.hpp> | ||
| #include <cuvs/detail/jit_lto/registration_tags.hpp> |
There was a problem hiding this comment.
After restoring interleaved_scan_fragments.hpp, please remove this #include.
| # Registration fragment tags (FRAGMENT_TAG_FORMAT) can track GPU template params in .cu.in; map | ||
| # those scalars to tag_* names here so matrix JSON does not duplicate tag strings. |
There was a problem hiding this comment.
Please follow the existing conventions that we use in other matrix files. We use the *_abbrev substitutions for the kernel name and the tag name. Please don't pollute core CMake code with these implementation details.
| - ${{ stdlib("c") }} | ||
| host: | ||
| - libnvjitlink-dev | ||
| - cuda-nvrtc-dev |
There was a problem hiding this comment.
This was already added in #1804. Please remove it along with the rest.
| - cuda-nvrtc-dev |
| uint32_t PQ_BITS, | ||
| uint32_t PQ_LEN, |
There was a problem hiding this comment.
The absence of these template parameters from the arguments indicates that we may be able to reduce the number of dimensions we multiply this fragment out over. Let's do that and some general convention cleanup in a follow-up.
| ) | ||
| target_compile_features(jit_lto_kernel_usage_requirements INTERFACE cuda_std_20) | ||
| target_link_libraries(jit_lto_kernel_usage_requirements INTERFACE rmm::rmm raft::raft CCCL::CCCL) | ||
| target_compile_definitions(jit_lto_kernel_usage_requirements INTERFACE CUVS_ENABLE_JIT_LTO) |
There was a problem hiding this comment.
Is this still needed? We've eliminated the non-JIT path at this point.
There was a problem hiding this comment.
Yes, due to the compute_distance.hpp header that's used both by JIT fatbins and library TUs.
There was a problem hiding this comment.
We can rework the file structure in a follow-up to remove this preprocessor definition (unless you want to do it in this PR).
There was a problem hiding this comment.
I am not sure at the moment if it is possible to do that. Do you have any ideas?
There was a problem hiding this comment.
Take a look at how IVF-flat and IVF-PQ are structured. Each fragment includes device_functions.cuh, and the implementations are split up into their own .cuh files.
There was a problem hiding this comment.
That's not the issue here. It's that the fatbins and library TUs both need full visibility of the types defined by compute_distance_standard/vpq.
Both these types have extern functions that are linked in at runtime for JIT, but keeping the extern definition for library TUs is not valid. So the macro helps with determining how to complete the type.
There was a problem hiding this comment.
The only thing that's currently gated on CUVS_ENABLE_JIT_LTO is whether dist_op() and apply_normalization_standard() are extern or defined in that file. AFAICT, those functions are not used anywhere except in JIT+LTO fragments, and are orthogonal to standard_dataset_descriptor_init_kernel() and standard_descriptor_spec. Is it not possible to split these into their own separate files?
There was a problem hiding this comment.
These functions are used by compute_distance_standard-impl.cuh and compute_distance_vpq-impl.cuh files. Those 2 files are defining types that have member functions that call these functions, and need to be visible to the kernels but also to the normal TUs.
| - libcusolver-dev | ||
| - libcusparse-dev | ||
| - libnvjitlink-dev | ||
| - cuda-nvrtc-dev |
There was a problem hiding this comment.
Please remove this as it was already added in #1804.
| struct tag_metric_inner_product {}; | ||
| struct tag_metric_cosine {}; | ||
| struct tag_metric_hamming {}; | ||
| struct tag_team_8 {}; |
There was a problem hiding this comment.
Please use integer values instead of dedicated types for the registration tags. That's what we did for IVF-flat and IVF-PQ.
Apply updates from
CAGRA related PRs:
cudaFuncAttributeMaxDynamicSharedMemorySizewith thread-safety #1771JIT related PRs: