LLM & AI Agent Dataset Doc|Agent Tool Use Web|AI Agent Marketplace | Huggingface
datasets>=4.4.0
torch>=2.2.0
transformers>=4.57.0
trl>=0.26.2
Register at DeepNLP Datasets and new users will have bonus credit at Billing enought to download the datasets.
For more production ready commercial dataset, Please contact deepnlp.contact@gmail.com
In this tutorial, we will run function call SFT datasets and open source LLM model.
We are using the transformers and trl package as the base lib for SFT.
Firstly, download data and put the json file in folder e.g. "download_dataset_path": ../dataset/function_call/example_deepnlp_agent_function_call_202510.json
Download Full Dataset Document and 50 examples function call Direct Download
mkdir ../train/dataset/function_call
cd ../train/dataset/function_call
wget https://static.aiagenta2z.com/scripts/doc/file/06766e91894147319ffd0116b04ff94d/example_deepnlp_agent_function_call_202510.json
cd ../../scripts
python train_function_call_sft_qwen3.py
Start SFT Output
In [54]: trainer.train()
...: trainer.save_model(output_model_dir)
...:
### Logs
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': 151643}.
0%| | 0/21 [00:00<?, ?it/s]/Library/Frameworks/
num_items_in_batch = self.accelerator.gather(num_items_in_batch.to(device)).sum()
Download Full Dataset Document and 50 examples Function Call RL Data DatasetsDirect Download
This RL scenario is suitable to improve on a list of tool call and messages session list, not the tool_call output.
Converted Dataset Example:
prompt: is a list of messages, in which users asked Agent to draw of pie-chart of 30 categories. rewards: 1.0, user clicks on the final responses, the MCP tool call and the returned the tool results can generated url https://mdn.alipayobjects.com/one_clip/afts/img/JEeRRbKd44wAAAAAYzAAAAgAoEACAQFr/original
Use the scalar_lookup_reward_fn() in the tutorial.
{
"prompt":
[{'role': 'user',
'content': "Draw a pie chart of these data, display percentage not raw number. {'finance': 79, 'AI AGENT': 304, 'knowledge-and-memory': 98, 'Audio Generator': 53, 'Heathcare': 22, 'other': 93, 'art-and-culture': 13, 'BUSINESS': 12, 'AI Agent Marketplace': 38, 'calendar-management': 140, 'Search Agent': 88, 'Travel': 85, 'RESEARCH': 68, 'AI Search Agent': 35, 'AI AGENT PLATFORM': 41, 'AI Tutor Agent': 31, 'Recommendation Agent': 14, 'BENCHMARK': 21, 'browser-automation': 87, 'Trip Planning': 19, 'Coding Agent': 69, 'Robotaxi': 23, 'WORKFLOW': 73, 'Tool Agent': 62, 'Data Analysis': 21, 'CHATBOT': 93, 'AI Docs': 26, 'Email Writing': 65, 'AI Agent Tool': 18, 'Assistant': 40, 'CODING AGENT': 217, 'Ads': 33, 'Music Generator': 115, 'Science': 16, 'Quadruped Robot': 34, 'ASSISTANT': 63, 'Marketing': 18, 'AI Law Agent': 17, 'AI AGENTS FRAMEWORKS': 108, 'Humanoid Robot': 34, 'file-systems': 60, 'Website Builder': 33, 'entertainment-and-media': 51, 'Developer': 8, 'AI Agent Framework': 7, 'Healthcare': 39, 'Orchestrating Agents': 36, 'Image Generator': 50, 'Law': 36, 'AI4Science Agent': 38, 'TRANSLATION': 13, 'FINANCE': 19, 'SCIENCE': 19, 'Tool Libraries': 29, 'Recruiting': 30, 'VIDEO GENERATOR': 3, 'developer-tools': 34, 'communication': 100, 'Search Engine Agent': 30, 'Entertainment': 16, 'Legal': 5, 'AI Agent Employees': 19, 'AI Agent Teacher': 16, 'Business': 18, 'Chatbot': 41, 'Deep Research Agent': 11, 'Education': 22, 'Benchmark': 13, 'Software Testing': 30, 'Sales': 23, 'AI Agent Healthcare': 33, 'Embodied AI': 41, 'research-and-data': 41, 'search': 18, 'Productivity': 15, 'AI Agent Finance': 11, 'DESIGN': 3, 'Desktop Use': 15, 'AI Shopping': 64, 'PRODUCTIVITY': 9, 'AI SECURITY': 12, 'Video Generator': 58, 'AI Agent': 50, 'Gaming Agent': 25, 'AI Agent Directory': 11, 'Search Recommendation': 9, 'Digital Workers': 23, 'AI Security': 19, 'Research': 19, 'Blog Writing': 11, 'AI Agent Platform': 20, 'AI Agent Legal': 22, 'AI AGENT MARKETPLACE': 2, 'location-services': 27, 'AI Agent Employee': 12, 'AI Agent Memory': 14, 'AI HR Agent': 1, 'AI Agent Orchestration': 9, 'AI Agents Frameworks': 40, 'AI Agent Search': 10, 'SOCIAL': 1, 'AI Employee': 31, 'SALES': 13, 'Customer Service': 42, 'BLOG WRITING': 3, 'EMBODIED AI': 4, 'AI Gadget': 7, 'Mobile Use': 3, 'HR Agent': 12, 'FILE': 2, 'File System': 1, 'Database': 4, 'Finance': 13, 'PAYMENT': 2, 'AI DOCS': 3, 'AI SHOPPING': 1, 'Workflow': 6, 'WEB': 2, 'AI AGENT MEMORY': 12, 'AI AGENT BUILDER': 9, 'EDUCATION': 3, 'AUTONOMOUS AGENT': 2, 'Workplace': 15, 'Operations': 8, 'ADVERTISING': 1, 'Gaming Agents': 8, 'MAP': 2, 'Translation': 14, 'Browser': 1, 'BLOCKCHAIN': 1, 'Autonomous Agent': 5, 'Workflow ': 1, 'DATA': 1, 'HEALTHCARE': 4, 'IMAGE GENERATOR': 3, 'AI Agent Frameworks': 4, 'MARKETING': 2, 'SEARCH': 1, 'AI Entertainment Agent': 2, 'AI Agent Law': 4, 'AI AGENT DIRECTORY': 1, 'Official': 1, 'COPILOT AGENT': 1, 'DATABASE': 1, 'ENTERTAINMENT': 1, 'AI Agent Index': 1, 'Map': 1, 'Memory': 2, 'Thinking': 1, 'AI Operations Agent': 2, 'Browser Use': 1}"},
{'role': 'user',
'content': '<div class="segment-container"><div class="file-container-grid">\n <div class="message_file_display_wrapper" data-file-name="employees.xlsx" data-file-url="/files-wd/download/demo/demo_session_id/employees.xlsx" data-file-type="xlsx">\n <div class="file_display_icon_wrapper">\n <img class="file_display_icon" src="/static/img/excel-icon-transparent.png"></img>\n </div>\n <div>\n <span class="file-name">employees.xlsx</span>\n </div>\n </div>\n </div></div>'},
{'role': 'assistant',
'content': '<img class="agent_display_icon" src="https://agent.deepnlp.org/static/img/icon_chatbot_agent.png"> Multi Agents Start Running<br>Agent (plan_agent) Start Planning Tasks<br>Agent (base_agent) Execution Start Tasks <br><p>Iteration 1 Running Tool <span class="span_planning_highlight">generate_pie_chart</span> of MCP Server <span class="span_planning_highlight">mcp-server-chart</span></p><br><p>Iteration 2 Running Tool <span class="span_planning_highlight">generate_pie_chart</span> of MCP Server <span class="span_planning_highlight">mcp-server-chart</span></p><br><p>Iteration 3 Running Tool <span class="span_planning_highlight">generate_pie_chart</span> of MCP Server <span class="span_planning_highlight">mcp-server-chart</span></p><br><br>Agent (base_agent) Execution End <br>Agent (generation_agent) Start Generating Answers<br>\n<url>https://mdn.alipayobjects.com/one_clip/afts/img/JEeRRbKd44wAAAAAYzAAAAgAoEACAQFr/original</url>'}],
"rewards": 1.0
}
Preparing Datasets for trl
mkdir ../train/dataset/rl
cd ../train/dataset/rl
wget https://static.aiagenta2z.com/scripts/doc/file/4619bcee15bf4a189044052791c44eac/example_deepnlp_rl_feedback_202510.json
Running RL Training of the Datasets
trl package
cd ../../scripts
python train_rl_grpo_qwen3_openai_trl.py
verl package
cd ../../scripts
python train_rl_grpo_qwen3_openai_verl.py
## verl_train.parquet, "../train/dataset/rl/verl_train.parquet", Download Cache Model From Huggingface/Dashscope
python -m verl.trainer.main_ppo \
data.train_files="../train/dataset/rl/verl_train.parquet" \
data.val_files="../train/dataset/rl/verl_train.parquet" \
data.train_batch_size=1 \
data.max_prompt_length=512 \
data.max_response_length=512 \
algorithm.adv_estimator=grpo \
actor_rollout_ref.model.path="../train/model/qwen3/Qwen3-0.6B/Qwen/Qwen3-0___6B" \
actor_rollout_ref.actor.ppo_mini_batch_size=1 \
actor_rollout_ref.rollout.n=4 \
actor_rollout_ref.rollout.name=hf \
trainer.n_gpus_per_node=0 \
trainer.nnodes=1 \
trainer.project_name="grpo_qwen3_rl_verl"
If you want to improve the tool_call accuracy Pass@K, such as 'tool choice' accuracy or parameters inference accuracy, you can organize the datasets together with the function call datasets and get the completion history of available tools, prompt, and tool_call output.
## Example Dataset Formating using tool_call_reward_fn reward function
prompt Input: [{"role": "user", "content": "What is the weather in SF?"}]
ground truth: get_weather(city='SF')
reward function:
{
"data_source": "tool_call_weather",
"prompt": [{"role": "user", "content": "What is the weather in SF?"}],
"ability": "tool_call",
"reward_model": {
"style": "rule",
"ground_truth": "get_weather(city='SF')"
}
}
## function call result in the function call dataset
## Completion Output
{"tool_calls":[{"id":"call_41d68645f2c94f1db65dd5","type":"tool_use","function":{"name":"maps_geocode","arguments":"{\"address\": \"John F. Kennedy International Airport, New York\"}"}}]}
Use the tool_call_reward_fn in the tutorial, which fetch the tools_calls, and agaist the ground_truth if available.
MCP Marketplace
AI Agent Marketplace
ChatGPT Apps Store Marketplace