-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Pull requests: huggingface/datasets
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: handle nested null types in feature alignment for multi-proc map
#8047
opened Mar 5, 2026 by
ain-soph
Loading…
Write IterableDataset to parquet incrementally instead of materializing entire shard in memory
#8045
opened Mar 5, 2026 by
HaukurPall
Loading…
Fix silent data loss in push_to_hub when num_proc > num_shards
#8044
opened Mar 5, 2026 by
HaukurPall
Loading…
Fix schema enforcement in streaming _convert_to_arrow
#8042
opened Mar 5, 2026 by
HaukurPall
Loading…
Use num_examples instead of len(self) for iterable_dataset's SplitInfo
#8041
opened Mar 5, 2026 by
HaukurPall
Loading…
Fix the logic for allowed extensions when creating build configurations (#8034)
#8040
opened Mar 5, 2026 by
Nexround
Loading…
Fix non-deterministic by sorting metadata extensions (#8034)
#8039
opened Mar 5, 2026 by
Nexround
Loading…
follow
cache_dir in download option when loading datasets
#8036
opened Mar 1, 2026 by
TsXor
Loading…
Fix torchcodec audio decoding to respect 'num_channels'
#8028
opened Feb 27, 2026 by
AsymptotaX
Loading…
feat: add return_file_name support to Parquet packaged builder
#8020
opened Feb 23, 2026 by
dhruvildarji
Loading…
feat: add return_file_name support to CSV packaged builder
#8019
opened Feb 23, 2026 by
dhruvildarji
Loading…
3 tasks
Improve error message for deprecated dataset scripts with migration guidance
#8017
opened Feb 22, 2026 by
suryanshbt211
Loading…
Speed up local 'get_data_patterns' by avoiding repeated recursive scans
#8014
opened Feb 21, 2026 by
AsymptotaX
Loading…
fix: prevent duplicate keywords in load_dataset_builder (#4910)
#8008
opened Feb 16, 2026 by
DhyeyTeraiya
Loading…
fix save_to_disk/load_from_disk with pathlib.Path input
#8004
opened Feb 13, 2026 by
Mr-Neutr0n
Loading…
Fix Dataset.map writer initialization when early examples return None
#7996
opened Feb 8, 2026 by
veeceey
Loading…
✨ Add 'SparseCsv' builder and 'sparse_collate_fn' for efficient high-dimensional sparse data loading
#7993
opened Feb 4, 2026 by
Ebraheem1
Loading…
Fix index out of bound error with original_shard_lengths.
#7987
opened Feb 4, 2026 by
jonathanasdf
Loading…
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.