Skip to content

Add Json() type#8027

Merged
lhoestq merged 12 commits intomainfrom
json-type
Mar 9, 2026
Merged

Add Json() type#8027
lhoestq merged 12 commits intomainfrom
json-type

Conversation

@lhoestq
Copy link
Copy Markdown
Member

@lhoestq lhoestq commented Feb 26, 2026

Json() type is needed when some fields don't have fixed types, e.g. "tools" and "tool_calls" types in conversation + tool calling datasets like dataclaw datasets. Cc @peteromallet and @WoctorDho @Nanbeige for viz

Examples of supported tool-calling / dataclaw datasets:

ds = load_dataset("peteromallet/dataclaw-peteromallet")  #  happens to not need Json() since tool types are fixed
ds = load_dataset("woctordho/dataclaw")
ds = load_dataset("Nanbeige/ToolMind")

The Json() type is auto-applied when loading JSON files when mixed types are found. The Json()type is set to the list containing objects with mixed types, to end up with a List(Json()) type for "tools" and "tool_calls" columns.

It is also possible to define a Dataset with a Json() type like this:

>>> from datasets import load_dataset, Dataset, Features, Json, List
>>> example = {"a": [{"key": 0}, {"another-key": "another-type"}]}
>>> features = Features({"a": List(Json())})
>>> ds = Dataset.from_list([example], features=features)
>>> ds[0]
{'a': [{'key': 0}, {'another-key': 'another-type'}]}

close #7869
close #4120
close #5827
close #7418
related to #7092
related to #6162
related to #2799
related to https://huggingface.co/datasets/PatoFlamejanteTV/LocalLLaMA/discussions/1

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@lhoestq lhoestq merged commit d560b58 into main Mar 9, 2026
15 checks passed
@lhoestq lhoestq deleted the json-type branch March 9, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants