Problem
After #1515 fixes the BE byteswap crash in read_geotiff_gpu, predictor=2 BE multi-byte TIFFs no longer raise but instead return wrong values.
Reproducer
import numpy as np, tifffile
from xrspatial.geotiff import read_geotiff_gpu
from xrspatial.geotiff._reader import read_to_array
rng = np.random.RandomState(20260507)
arr = rng.randint(-1_000_000, 1_000_000, size=(32, 48), dtype=np.int64).astype(np.int32)
tifffile.imwrite('be_pred2.tif', arr, byteorder='>', predictor=2,
compression='deflate', tile=(16, 16))
cpu, _ = read_to_array('be_pred2.tif') # correct
gpu = read_geotiff_gpu('be_pred2.tif').data # ~93% of pixels mismatch
CPU path is correct (PR #1507 fixed BE predictor=2 on the CPU side).
Root cause sketch
_apply_predictor_and_assemble in _gpu_decode.py runs _gpu_predictor2_decode before the final BE byteswap. The kernel views the byte buffer as native uint16/uint32 and computes prefix-sum differences in that interpretation, but BE files store samples MSB-first. The differences need to be computed on the native-endian samples, so either the byteswap has to happen first (at the byte-buffer level, which is awkward across tile rows) or the kernel itself needs an endian flag like _fp_predictor_decode_kernel already does for predictor=3.
Severity
LOW (silent fallback in the wrapper still produces correct output, since the values disagree but the wrapper does not detect that). After #1515 lands, predictor=2 BE files will hit this code path for real and return wrong data.
Related
#1508, #1515, #1507.
Problem
After #1515 fixes the BE byteswap crash in
read_geotiff_gpu, predictor=2 BE multi-byte TIFFs no longer raise but instead return wrong values.Reproducer
CPU path is correct (PR #1507 fixed BE predictor=2 on the CPU side).
Root cause sketch
_apply_predictor_and_assemblein_gpu_decode.pyruns_gpu_predictor2_decodebefore the final BE byteswap. The kernel views the byte buffer as native uint16/uint32 and computes prefix-sum differences in that interpretation, but BE files store samples MSB-first. The differences need to be computed on the native-endian samples, so either the byteswap has to happen first (at the byte-buffer level, which is awkward across tile rows) or the kernel itself needs an endian flag like_fp_predictor_decode_kernelalready does for predictor=3.Severity
LOW (silent fallback in the wrapper still produces correct output, since the values disagree but the wrapper does not detect that). After #1515 lands, predictor=2 BE files will hit this code path for real and return wrong data.
Related
#1508, #1515, #1507.