Skip to content

Commit 07c1e73

Browse files
authored
Fix cuDNN convolution precision on Ampere+ GPUs (#3127)
On Ampere and later GPUs (SM 8.0+), cuDNN's default math mode permits TF32 Tensor Core operations which use reduced mantissa precision. This causes numerical differences when comparing CUDA vs CPU convolution results, particularly in cudnnConvolutionBackwardFilter(). Explicitly set CUDNN_FMA_MATH to force true FP32 computation for consistent numerical results across all GPU architectures.
1 parent 60adc65 commit 07c1e73

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

dlib/cuda/cudnn_dlibapi.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1044,6 +1044,15 @@ namespace dlib
10441044
CUDNN_CROSS_CORRELATION)); // could also be CUDNN_CONVOLUTION
10451045
#endif
10461046

1047+
#if CUDNN_MAJOR >= 8
1048+
// On Ampere and later GPUs, CUDNN_DEFAULT_MATH permits TF32 Tensor Core
1049+
// operations which have reduced precision. Use CUDNN_FMA_MATH to force
1050+
// true FP32 computation for consistent numerical results.
1051+
CHECK_CUDNN(cudnnSetConvolutionMathType(
1052+
(cudnnConvolutionDescriptor_t)conv_handle,
1053+
CUDNN_FMA_MATH));
1054+
#endif
1055+
10471056
CHECK_CUDNN(cudnnGetConvolution2dForwardOutputDim(
10481057
(const cudnnConvolutionDescriptor_t)conv_handle,
10491058
descriptor(data),

0 commit comments

Comments
 (0)