Skip to content

Conversation

@mgyoo86
Copy link
Collaborator

@mgyoo86 mgyoo86 commented Dec 17, 2025

Summary

Pool-aware convenience functions (zeros!, ones!, similar!) with backend-preserving DisabledPool{Backend} type.

Key Changes

Convenience Functions

Function Returns Description
zeros!(pool, [T,] dims...) View Zero-initialized, uses acquire! internally
ones!(pool, [T,] dims...) View One-initialized
similar!(pool, A, ...) View Match eltype/size of existing array
unsafe_zeros!, unsafe_ones!, unsafe_similar! Array Native array variants

Default element type: Float64 (CPU), Float32 (CUDA)

DisabledPool{Backend}

Zero-cost singleton that preserves backend context when pooling is disabled.

# USE_POOLING=false or MAYBE_POOLING_ENABLED[]=false
@with_pool :cuda pool begin
    # pool::DisabledPool{:cuda}
    v = zeros!(pool, 100)  # → CUDA.zeros(Float32, 100), not Array!
end

Why needed: Previously pool = nothing when disabled, losing :cuda context.

Utility Functions

  • pooling_enabled(pool) - Returns true for active pool, false for DisabledPool
  • default_eltype(pool) - Returns Float64 (CPU) or Float32 (CUDA)

Internal: Explicit _impl! Delegators

Macros transform zeros!(pool, ...)_zeros_impl!(pool, ...). For DisabledPool, these delegate back to public API with explicit type signatures for proper CUDA inlining:

# Instead of variadic args... (may not inline well)
@inline _zeros_impl!(p::DisabledPool, ::Type{T}, dims::Vararg{Int,N}) where {T,N} = zeros!(p, T, dims...)

Breaking Changes

  • Removed Nothing fallbacks: pool = nothing no longer works
  • Migration: Replace pool === nothing with !pooling_enabled(pool)

Files Changed

  • src/convenience.jl - Convenience functions + DisabledPool fallbacks + _impl! delegators
  • src/acquire.jl - DisabledPool acquire methods + _impl! delegators
  • src/types.jl - DisabledPool{Backend}, DISABLED_CPU, pooling_enabled()
  • src/macros.jl - Emit DisabledPool{backend}() instead of nothing
  • ext/.../convenience.jl - CUDA DisabledPool methods
  • docs/ - Updated documentation

- Add zeros!(pool, [T], dims...) for zero-initialized arrays
- Add ones!(pool, [T], dims...) for one-initialized arrays
- Add similar!(pool, array, [T], [dims...]) for template-based allocation
- Update macros.jl with _impl! transformation for macro optimization
- Support typed checkpoint extraction for convenience functions
User-facing functions always splat tuples into Vararg before calling
_impl!, making the NTuple versions dead code. Removed:
- _zeros_impl!(pool, T, dims::NTuple)
- _zeros_impl!(pool, dims::NTuple)
- _ones_impl!(pool, T, dims::NTuple)
- _ones_impl!(pool, dims::NTuple)
Add test_convenience.jl with tests for zeros!, ones!, similar!:
- All API signatures (explicit type, default Float64, NTuple)
- Nothing fallbacks for disabled pooling
- Integration with @with_pool macro
- Pool state management verification
Add _extract_acquire_types tests for zeros!, ones!, similar!:
- zeros!/ones! with default type (Float64) and explicit type
- similar! with same type as template (nargs == 3)
- similar! with explicit type (nargs >= 4, type arg)
- similar! with dims only (nargs >= 4, dims only)
- Mixed convenience functions integration test
- Add zeros!, ones! for CuAdaptiveArrayPool with Float32 as default type
  (matching CUDA.zeros() behavior)
- Clean up redundant imports in extension module
  (each sub-file handles its own imports)
- Add comprehensive CUDA convenience function tests
…nctions

Add raw-array (non-view) variants of initialized convenience functions:

- unsafe_zeros!(pool, [T,] dims...): zero-initialized raw arrays
- unsafe_ones!(pool, [T,] dims...): one-initialized raw arrays
- unsafe_similar!(pool, template, [T,] [dims...]): raw arrays from template

Implementation includes:
- Full macro transformation support (_impl! functions)
- CUDA Float32 defaults (matching zeros!/ones! behavior)
- Nothing fallbacks for disabled pooling
- Comprehensive test coverage for CPU and CUDA backends
- Add Convenience Functions subsection to README with zeros!/ones!/similar!
- Simplify thread-safety section to one line with link to multi-threading docs
- Document all convenience functions in api.md (view-returning and array-returning)
- Note CUDA Float32 default behavior matching CUDA.zeros()
- Fix benchmark numbers: 91 MiB → 2.75 GiB (90k allocations, 31% GC)
- Expand why manual buffer passing is impractical:
  - API pollution, nested calls, dynamic shapes, package boundaries
- Update solution example to use similar! convenience function
- Simplify comparison table (Naive vs AdaptiveArrayPools)
- Add emoji annotations for visual emphasis (⚠️/✅)
…type dispatch

Add default_eltype(pool) function that returns the default element type for
convenience functions when type is not specified:
- CPU (AbstractArrayPool): Float64
- CUDA (CuAdaptiveArrayPool): Float32

This fixes a bug where macro transformation bypassed CUDA's Float32 default
by transforming zeros!(pool, 10) → _zeros_impl!(pool, 10) which only existed
with hardcoded Float64 in the CPU module.

Changes:
- Add default_eltype to src/convenience.jl with AbstractArrayPool → Float64
- Override in CUDA extension with CuAdaptiveArrayPool → Float32
- Update _*_impl! functions to use default_eltype(pool) instead of Float64
- Simplify CUDA extension from 69 lines to 14 lines (remove redundant overrides)
- Update macro type extraction to generate default_eltype(pool) expressions
- Add _filter_static_types handling for default_eltype expressions
- Export default_eltype from main module
- Add macroexpand tests for convenience function expansion
- Update test_macro_internals to expect default_eltype(pool) expressions

Zero performance overhead verified via LLVM IR analysis - default_eltype(pool)
is fully constant-folded at compile time.
Add compile-time type deduplication in @generated checkpoint! and rewind!
functions. When duplicate types are passed (e.g., Float64, Float64), the
generated code now only calls _checkpoint_typed_pool!/_rewind_typed_pool!
once per unique type.

This optimization eliminates redundant push/pop operations that occurred
when zeros!(pool, 10) + zeros!(pool, Float64, 10) generated
checkpoint!(pool, default_eltype(pool), Float64) which resolved to
checkpoint!(pool, Float64, Float64) on CPU pools.

The deduplication happens entirely at compile time via the @generated
function machinery, with zero runtime overhead.
When USE_POOLING=false or MAYBE_POOLING_ENABLED[]=false, macros now
return DisabledPool{backend}() instead of nothing. This preserves
backend semantics:

- DisabledPool{:cpu} → zeros/ones return Array (Julia default)
- DisabledPool{:cuda} → zeros/ones return CuArray (CUDA default)

Changes:
- Add DisabledPool{B} parametric type and DISABLED_CPU singleton
- Add pooling_enabled(pool) predicate for backward compatibility
- Add BackendNotLoadedError for explicit failure on unknown backends
- Update @with_pool and @maybe_with_pool to emit DisabledPool
- Add @maybe_with_pool :backend variants for backend-specific macros
- Add DisabledPool{:cpu} fallbacks for all convenience/acquire functions
- Add DisabledPool{:cuda} fallbacks in CUDA extension
- Add state management no-ops (checkpoint!, rewind!, reset!, empty!)
- Update tests to use pooling_enabled() instead of pool === nothing

This fixes the issue where @maybe_with_pool :cuda with USE_POOLING=false
would silently return CPU Array instead of CuArray.
Remove all ::Nothing method fallbacks since macros now exclusively
use DisabledPool{backend}() when pooling is disabled.

Changes:
- Remove zeros!/ones!/similar! Nothing fallbacks from convenience.jl
- Remove unsafe_zeros!/unsafe_ones!/unsafe_similar! Nothing fallbacks
- Remove acquire!/unsafe_acquire! Nothing fallbacks from acquire.jl
- Remove checkpoint!/rewind!/reset!/empty! Nothing fallbacks from state.jl
- Remove pooling_enabled(::Nothing) from types.jl
- Replace _validate_pool_return(::Nothing) with ::DisabledPool in utils.jl
- Update all tests to use DISABLED_CPU instead of nothing

This enforces a consistent API pattern: DisabledPool{B} is the only
way to represent disabled pooling, eliminating confusion for extension
developers implementing new backends.
- Add test/test_coverage.jl for CPU coverage (120 tests)
  - DisabledPool convenience functions (zeros!, ones!, similar!, etc.)
  - DisabledPool acquire functions
  - BackendNotLoadedError handling
  - _impl! delegators for DisabledPool
  - Macro internals (_filter_static_types, _generate_*, etc.)
  - Qualified name transformation in _transform_acquire_calls

- Add test/cuda/test_disabled_pool.jl for CUDA DisabledPool (253 lines)
  - DISABLED_CUDA singleton and default_eltype
  - zeros!/ones! with type and default Float32
  - similar!/unsafe_similar! with CuArray and CPU->GPU conversion
  - acquire!/unsafe_acquire! variants
Replace variadic args... with explicit overloads for _impl! delegators
to ensure proper CUDA inlining and type specialization:
- _zeros_impl!, _ones_impl!, _similar_impl!
- _unsafe_zeros_impl!, _unsafe_ones_impl!, _unsafe_similar_impl!
- _acquire_impl!, _unsafe_acquire_impl!

Add comprehensive tests covering all explicit overloads including
tuple dimension variants.
- Update maybe_with_pool.md: pool becomes DisabledPool{backend}()
  instead of nothing when pooling is disabled
- Update api.md: add DisabledPool{Backend} type and pooling_enabled()
  utility function documentation
- Update configuration.md: document DisabledPool behavior when
  USE_POOLING=false
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 97.50000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.03%. Comparing base (52fc1d0) to head (4c240ba).
⚠️ Report is 21 commits behind head on master.

Files with missing lines Patch % Lines
src/macros.jl 91.57% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #10      +/-   ##
==========================================
+ Coverage   94.48%   97.03%   +2.55%     
==========================================
  Files           7        8       +1     
  Lines         671      945     +274     
==========================================
+ Hits          634      917     +283     
+ Misses         37       28       -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces convenience functions (zeros!, ones!, similar! and their unsafe_* variants) along with a backend-preserving DisabledPool{Backend} type that replaces the previous nothing pattern when pooling is disabled. This is a well-designed breaking change that improves type safety and maintains backend context (:cpu vs :cuda) throughout the codebase.

Key changes:

  • New convenience functions with sensible defaults (Float64 for CPU, Float32 for CUDA)
  • DisabledPool{Backend} type replaces nothing to preserve backend context when pooling is disabled
  • Breaking change: removes Nothing fallbacks (migration path: use pooling_enabled(pool) instead of pool === nothing)

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/types.jl Adds DisabledPool{Backend} type and pooling_enabled() utility
src/convenience.jl Implements new convenience functions with DisabledPool fallbacks and _impl! delegators
src/acquire.jl Updates DisabledPool fallbacks for acquire functions
src/state.jl Adds type deduplication in @generated checkpoint!/rewind!, DisabledPool no-ops
src/macros.jl Updates macro expansion to use DisabledPool, adds convenience function support
src/utils.jl Updates _validate_pool_return signature
src/AdaptiveArrayPools.jl Exports new functions
ext/AdaptiveArrayPoolsCUDAExt/convenience.jl Implements CUDA-specific DisabledPool methods with Float32 default
ext/AdaptiveArrayPoolsCUDAExt/AdaptiveArrayPoolsCUDAExt.jl Includes convenience.jl
test/test_convenience.jl Comprehensive tests for new convenience functions
test/test_coverage.jl Extensive coverage tests for edge cases and internal functions
test/cuda/* CUDA-specific tests for DisabledPool and convenience functions
test/*.jl Updates all tests from nothing to DISABLED_CPU
docs/*.md Updates documentation with DisabledPool pattern and convenience function API
README.md Updates examples and performance benchmarks

The implementation is sound with excellent test coverage. The breaking change is well-documented, and the migration path is clear. No critical issues identified.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mgyoo86 mgyoo86 merged commit 00787fd into master Dec 17, 2025
6 checks passed
@mgyoo86 mgyoo86 deleted the feat/convenience_dispatches branch December 17, 2025 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants