Skip to content

Conversation

@yoyolicoris
Copy link
Member

@yoyolicoris yoyolicoris commented May 29, 2025

import torch
import torchlpc
from torchlpc.core import lpc_cuda
from numba import cuda

from timeit import timeit
from time import sleep

batch_size = 64
samples = 2**14
order = 2

lpc_A = torch.zeros(batch_size, samples, order).cuda() + 0j
lpc_zi = torch.randn(batch_size, order).cuda() + 0j
lpc_x = torch.randn(batch_size, samples).cuda() + 0j

lpc_cuda(lpc_x, lpc_A, lpc_zi)
t_numba_lpc = timeit(
    "lpc_cuda(lpc_x, lpc_A, lpc_zi)",
    globals=globals(),
    number=100,
)
print(f"Numba LPC time: {t_numba_lpc:.4f} seconds")

sleep(1)  # Ensure the GPU is ready for the next operation

t_torch_lpc = timeit(
    "torch.ops.torchlpc.lpc(lpc_x, lpc_A, lpc_zi)",
    globals=globals(),
    number=100,
)
print(f"Torch LPC time: {t_torch_lpc:.4f} seconds")
print(f"Torch LPC is {t_numba_lpc / t_torch_lpc:.2f}x faster than Numba")

The results on a 5060 ti GPU with linux machine:

Numba LPC time: 0.8524 seconds
Torch LPC time: 0.0152 seconds
Torch LPC is 56.07x faster than Numba

Don't know why, but the .cu version is significantly faster.

@yoyolicoris yoyolicoris mentioned this pull request May 5, 2025
9 tasks
@yoyolicoris yoyolicoris requested a review from Copilot May 29, 2025 10:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new LPC CUDA kernel implementation and updates related recurrence functions to leverage both CUDA and CPU runners. Key changes include refactoring the recurrence functions to use lambdas for kernel dispatching, adding CUDA kernel implementations in C++ under torchlpc/csrc/cuda/lpc.cu, and extending test coverage to include additional sample sizes and device options.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
torchlpc/recurrence.py Updated recurrence functions to choose between CUDA and CPU runners via lambdas.
torchlpc/csrc/cuda/lpc.cu Added new CUDA kernels for LPC computation including support for complex types.
torchlpc/core.py Adjusted LPC forward logic to conditionally dispatch based on EXTENSION_LOADED.
tests/test_extension.py Expanded test parameters for sample sizes and devices (CPU/CUDA) for LPC equivalence.

@yoyolicoris yoyolicoris merged commit 37e8115 into main May 29, 2025
6 of 8 checks passed
@yoyolicoris yoyolicoris deleted the feat/native-lpc-cuda branch May 29, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants