Skip to content

Commit 701a698

Browse files
committed
Merge branch 'develop' into add_original_names
2 parents a2f80c6 + 5081fe8 commit 701a698

246 files changed

Lines changed: 245932 additions & 1482 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/Validate-GPU.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,16 @@ jobs:
8585
bash ${work_dir}/tools/ci/check_validate.sh
8686
'
8787
88+
- name: Run Unit Test
89+
env:
90+
work_dir: ${{ github.workspace }}
91+
if: steps.check-bypass.outputs.can-skip != 'true'
92+
run: |
93+
docker exec -t ${{ env.container_name }} /bin/bash -c '
94+
source ${{ github.workspace }}/../../../proxy
95+
bash ${work_dir}/tools/ci/run_unittest.sh
96+
'
97+
8898
- name: Terminate and delete the container
8999
if: always()
90100
run: |

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,10 @@
88
<a href="https://github.com/user-attachments/assets/125e3494-25c9-4494-9acd-8ad65ca85d03"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
99
</div>
1010

11-
**GraphNet** is a large-scale dataset of deep learning **computation graphs**, built as a standard benchmark for **tensor compiler** optimization. It provides over 2.7K computation graphs extracted from state-of-the-art deep learning models spanning diverse tasks and ML frameworks. With standardized formats and rich metadata, GraphNet enables fair comparison and reproducible evaluation of the general optimization capabilities of tensor compilers, thereby supporting advanced research such as AI for System on compilers.
11+
**GraphNet** is a large-scale dataset of deep learning **computation graphs**, built as a standard benchmark for **tensor compiler** optimization. It provides over 2.7K computation graphs extracted from state-of-the-art deep learning models spanning diverse tasks and ML frameworks. With standardized formats and rich metadata, GraphNet enables fair comparison and reproducible evaluation of the general optimization capabilities of tensor compilers, thereby supporting advanced research such as AI for System on Compilers.
1212

1313
## 📣 News
14+
- [2025-11-19] ✨ Keynote Speech at GTOC Forum 2025: [GraphNet Empowering the AI Software Stack]( https://b23.tv/PFzSKK1)
1415
- [2025-10-14] ✨ Our technical report is out: a detailed study of dataset construction and compiler benchmarking, introducing the novel performance metrics Speedup Score S(t) and Error-aware Speedup Score ES(t). [📘 GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research](https://arxiv.org/abs/2510.24035)
1516
- [2025-8-20] 🚀 The second round of [open contribution tasks](https://github.com/PaddlePaddle/Paddle/issues/74773) was released. (completed ✅)
1617
- [2025-7-30] 🚀 The first round of [open contribution tasks](https://github.com/PaddlePaddle/GraphNet/issues/44) was released. (completed ✅)
@@ -84,7 +85,7 @@ python -m graph_net.plot_violin \
8485
--output-dir $GRAPH_NET_BENCHMARK_PATH
8586
```
8687

87-
The scripts are designed to process a file structure as `/benchmark_path/category_name/`, and items on x-axis are identified by name of the sub-directories. After executing, several summary plots of result in categories (model tasks, libraries...) will be exported to `$GRAPH_NET_BENCHMARK_PATH`.
88+
The scripts are designed to process a file structure as `/benchmark_path/category_name/`, and items on the x-axis are identified by the name of the sub-directories. After executing, several summary plots of results in categories (model tasks, libraries...) will be exported to `$GRAPH_NET_BENCHMARK_PATH`.
8889

8990
### Hardware Regression Testing
9091
We also provide a two-step workflow that validates compiler correctness and performance against a "golden" reference, which is crucial for hardware-specific testing and regression tracking. Details can be found in this [guide](./docs/hardware_test.md).
@@ -105,7 +106,7 @@ Check out the [Construction Guide](./docs/README_contribute.md) for details on t
105106

106107
## GraphNet Community
107108

108-
You can join our community via following group chats. Welcome to ask any questions about using and building GraphNet.
109+
You can join our community by following the group chats. Welcome to ask any questions about using and building GraphNet.
109110

110111
<div align="center">
111112
<table>

README_cn.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
2+
<h1 align="center">GraphNet:面向张量编译器研究的大规模计算图数据集</h1>
3+
4+
<div align="center">
5+
6+
![](https://img.shields.io/github/issues/PaddlePaddle/GraphNet?label=open%20issues)
7+
[![arXiv](https://img.shields.io/badge/arXiv-2510.24035-b31b1b.svg)](https://arxiv.org/abs/2510.24035)
8+
<a href="https://github.com/user-attachments/assets/125e3494-25c9-4494-9acd-8ad65ca85d03"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a>
9+
</div>
10+
11+
**GraphNet** 是一个大规模深度学习**计算图数据集**,旨在为**张量编译器**优化提供一个标准的基准测试平台。它包含了从覆盖多种任务和机器学习框架的先进深度学习模型中提取的超过 2700个 计算图。凭借其标准化的格式和丰富的元数据,GraphNet 能够对张量编译器的通用优化能力进行公平比较和可复现的评估,从而支持诸如面向编译器的“AI for System”等前沿研究。
12+
13+
## 📣 最新动态
14+
- [2025-11-19] ✨ 在 GTOC Forum 2025 上的主题演讲:[GraphNet 助力 AI 软件栈催熟](https://b23.tv/PFzSKK1)
15+
- [2025-10-14] ✨ 我们的技术报告已发布:这是一份关于数据集构建和编译器基准测试的详细研究,并引入了新颖的性能指标——加速分数 S(t) 和感知错误的加速分数 ES(t)。[📘 GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research](https://arxiv.org/abs/2510.24035)
16+
- [2025-8-20] 🚀 第二轮 [开源贡献任务](https://github.com/PaddlePaddle/Paddle/issues/74773) 已发布。(已完成 ✅)
17+
- [2025-7-30] 🚀 第一轮 [开源贡献任务](https://github.com/PaddlePaddle/GraphNet/issues/44) 已发布。(已完成 ✅)
18+
## 📊 基准测试结果
19+
我们在 GraphNet 的 NLP 和 CV 子集上评估了两个代表性的张量编译器后端:CINN (PaddlePaddle) 和 TorchInductor (PyTorch)。评估采用了[技术报告](https://arxiv.org/abs/2510.24035)中提出的两个量化指标:
20+
- **加速分数** S(t) — 评估编译器在不同数值容忍度下的性能。
21+
<div align="center">
22+
<img src="/pics/St-result.jpg" alt="Speedup Score S_t Results" width="80%">
23+
</div>
24+
25+
- **感知错误的加速分数** ES(t) — 进一步考量运行时和编译错误。
26+
<div align="center">
27+
<img src="/pics/ESt-result.jpg" alt="Error-aware Speedup Score ES_t Results" width="80%">
28+
29+
</div>
30+
31+
## ⚡ 快速开始
32+
本节面向编译器用户/开发者展示如何评估张量编译器并复现基准测试结果,以及面向 GraphNet 贡献者展示如何贡献新的计算图。
33+
34+
### ⚖️ 编译器评估
35+
36+
**步骤 1:基准测试**
37+
38+
使用 `graph_net.torch.test_compiler` 对 GraphNet 样本进行基准测试,可指定批次和日志配置:
39+
40+
```bash
41+
# 设置你的基准测试目录
42+
export GRAPH_NET_BENCHMARK_PATH=/home/yourname/graphnet_benchmark/
43+
44+
# 运行基准测试
45+
python -m graph_net.torch.test_compiler \
46+
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name/ \
47+
--compiler /custom/or/builtin/compiler/ \
48+
--device /device/to/execute/ \
49+
--warmup /times/to/warmup/ \
50+
--trials /times/to/test/ \
51+
> $GRAPH_NET_BENCHMARK_PATH/log.log 2>&1
52+
53+
# 注意:如果省略 --compiler 参数,默认使用 PyTorch 的内置编译器。
54+
```
55+
56+
执行后,`graph_net.torch.test_compiler` 将:
57+
1. 以即时执行模式运行原始模型,记录基线性能。
58+
2. 使用指定的后端(例如 CINN, TVM, Inductor, TensorRT, XLA, BladeDISC)编译模型。
59+
3. 执行编译后的模型,收集其运行时间和输出。
60+
4. 若无执行失败,则将编译结果与基线对比,计算加速比。
61+
62+
**步骤 2:分析**
63+
64+
使用 `graph_net.plot_St``graph_net.plot_ESt``graph_net.plot_violin` 这三个脚本,根据基准测试日志中的加速比、正确性和运行时信息,生成 St 图、ESt 图和 [小提琴图](https://en.m.wikipedia.org/wiki/Violin_plot)
65+
66+
```bash
67+
python -m graph_net.plot_St \
68+
--benchmark-path $GRAPH_NET_BENCHMARK_PATH/log.log \
69+
--output-dir $GRAPH_NET_BENCHMARK_PATH \
70+
--negative-speedup-penalty penalty/power/for/negative/speedup \
71+
--fpdb base/penalty/for/severe/errors
72+
73+
python -m graph_net.plot_ESt \
74+
--benchmark-path $GRAPH_NET_BENCHMARK_PATH/log.log \
75+
--output-dir $GRAPH_NET_BENCHMARK_PATH \
76+
--negative-speedup-penalty penalty/power/for/negative/speedup \
77+
--fpdb base/penalty/for/severe/errors
78+
79+
# 注意:如果省略 --negative-speedup-penalty 参数,默认使用 p=0。
80+
# 如果省略 --fpdb 参数,默认使用 b=0.1。
81+
82+
python -m graph_net.plot_violin \
83+
--benchmark-path $GRAPH_NET_BENCHMARK_PATH/JSON_results/ \
84+
--output-dir $GRAPH_NET_BENCHMARK_PATH
85+
```
86+
87+
这些脚本设计用于处理 `/benchmark_path/category_name/` 这样的文件结构,x 轴上的项目由子目录名称标识。执行后,按类别(模型任务、库等)划分的结果汇总图表将被导出到 `$GRAPH_NET_BENCHMARK_PATH`
88+
89+
### 硬件回归测试
90+
我们还提供了一个两步工作流,用于根据“黄金标准”参考验证编译器的正确性和性能,这对于硬件专用测试和回归跟踪至关重要。详情可参阅 [指南](./docs/hardware_test_cn.md)
91+
92+
### 🧱 构建与贡献指南
93+
想了解 GraphNet 如何构建或贡献新样本?查看 [构建指南](./docs/README_contribute_cn.md) 以获取有关提取和验证工作流的详细信息。
94+
95+
## 🚀 未来路线图
96+
97+
1. 将 GraphNet 扩展至 10,000+ 计算图。
98+
2. 为 GraphNet 样本添加更精细的子类别注释。
99+
3. 从多 GPU 场景中提取样本,以支持大规模分布式计算的基准测试和优化。
100+
4. 支持将完整计算图拆分为可独立优化的子图和算子序列。
101+
102+
**愿景**: GraphNet 旨在通过对张量编译器优化进行**大规模、系统性**的评估,并**为模型学习和迁移优化策略提供数据集**,从而为“面向编译器的 AI (AI for Compiler)”奠定基础。
103+
104+
## GraphNet 社区
105+
106+
您可以通过扫描下方群聊二维码加入我们的社区。欢迎提出任何关于使用和构建 GraphNet 的问题。
107+
108+
<div align="center">
109+
<table>
110+
<tr>
111+
<td align="center">
112+
<img width="200" src="https://github.com/user-attachments/assets/125e3494-25c9-4494-9acd-8ad65ca85d03" />
113+
</td>
114+
<td align="center">
115+
<img width="150" src="https://cdn.prod.website-files.com/6257adef93867e50d84d30e2/67d00cf7266d2c75571aebde_Example.svg" />
116+
<p><a href="https://discord.gg/vyeAydwh">Channel</a> is also available.</p>
117+
</td>
118+
</tr>
119+
</table>
120+
</div>
121+
122+
## 许可证与致谢
123+
124+
GraphNet 基于 [MIT 许可证](./LICENSE) 开源发布。
125+
126+
如果您觉得本项目对您的研究或工作有帮助,请引用:
127+
128+
```bibtex
129+
@misc{li2025graphnetlargescalecomputationalgraph,
130+
title={GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research},
131+
author={Xinqi Li and Yiqun Liu and Shan Jiang and Enrong Zheng and Huaijin Zheng and Wenhao Dai and Haodong Deng and Dianhai Yu and Yanjun Ma},
132+
year={2025},
133+
eprint={2510.24035},
134+
archivePrefix={arXiv},
135+
primaryClass={cs.LG},
136+
url={https://arxiv.org/abs/2510.24035},
137+
}
138+
```

docs/CONTRIBUTE_TUTORIAL.md

Lines changed: 6 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -255,35 +255,22 @@ python -m graph_net.config \
255255
--email "john@example.com"
256256
```
257257
258-
2. **Package the graph**
258+
2. **Commit the changes**
259259
260+
Move the new sample to **samples** directory and commit.
260261
```bash
261-
python -m graph_net.pack --output /path/to/output.zip --clear-after-pack True
262-
```
263-
264-
This API:
265-
266-
a. Packages all files under `$GRAPH_NET_EXTRACT_WORKSPACE` into `/path/to/output.zip` (You can set it to `GraphNet/samples`)
267-
268-
b. Clears the workspace if `--clear-after-pack` is `True`
269-
270-
Note: If third-party ops are used, contributors must include them manually in the package. As long as `validate` passes, no specific folder structure is required.
271-
272-
3. **Commit the changes**
273-
274-
Move the packaged computational graph in the previous step to **samples** directory and commit.
275-
```bash
276-
git add <the packaged computational graph>
262+
git add <the new sample>
277263
git commit -m "Description"
278264
```
265+
Note: If third-party ops are used, contributors must include them manually in the package.
279266
280-
4. **Push the branch to your fork**
267+
3. **Push the branch to your fork**
281268
282269
```bash
283270
git push origin feature/your-branch-name
284271
```
285272
286-
5. **Submit a Pull Request**
273+
4. **Submit a Pull Request**
287274
288275
> **Note**: For clarity and maintainability, each PR should follow the Single Responsibility Principle (SRP). Submit only a single graph or a focused feature improvement per PR. For example, if you both update extraction logic and collect multiple models, each graph and each method update should be a separate PR.
289276

docs/CONTRIBUTE_TUTORIAL_cn.md

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -247,32 +247,22 @@ python -m graph_net.config \
247247
--username "john_doe" \
248248
--email "john@example.com"
249249
```
250-
2. **打包计算图**
251250

252-
```bash
253-
python -m graph_net.pack --output /path/to/output.zip --clear-after-pack True
254-
```
255-
该API的功能为:
256-
257-
a. 打包`$GRAPH_NET_EXTRACT_WORKSPACE`下的所有文件到`/path/to/output.zip` (可以设置到`GraphNet/samples`
258-
259-
b. 若`--clear-after-pack``True`,则打包后清空`$GRAPH_NET_EXTRACT_WORKSPACE`
251+
2. **提交修改**
260252

261-
请注意,如果有第三方算子,需要贡献者自行打包到计算图压缩包内。目前没有特别规定存放的目录结构,但只要通过了validate环节,就可以达到验收标准。
262-
263-
3. **提交修改**
264-
265-
移动上一步打包完成的计算图压缩包到**samples**目录,然后提交。
253+
移动新增的计算图样本到**samples**目录,然后提交。
266254
```bash
267-
git add <计算图压缩包>
255+
git add <新计算图样本>
268256
git commit -m "描述"
269257
```
270-
4. **推送分支到远程**(你的 Fork 仓库)
258+
请注意,如果有第三方算子,需要贡献者自行打包到计算图压缩包内。
259+
260+
3. **推送分支到远程**(你的 Fork 仓库)
271261

272262
```bash
273263
git push origin feature/your-branch-name
274264
```
275-
5. **提交 Pull Request**
265+
4. **提交 Pull Request**
276266

277267
> **注意**:为方便管理,每个PR应遵守Single Responsibility Principle (SRP)原则,**仅新增单一份计算图、或聚焦于单一功能改进**,避免将多个修改混合提交。例如,如果您修改了抓取方法,然后为支持某类模型收集了数据,那么其中每份单个模型的计算图、修改的新一份抓取方法,都应打开为独立的PR。
278268

docs/README_contribute.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ export GRAPH_NET_EXTRACT_WORKSPACE=/home/yourname/graphnet_workspace/
2525
# Extract the ResNet‑18 computation graph
2626
python graph_net/test/vision_model_test.py
2727

28-
# Validate the extracted graph (e.g. /home/yourname/graphnet_workspace/resnet18/)
28+
# Validate the extracted graph (e.g.,/home/yourname/graphnet_workspace/resnet18/)
2929
python -m graph_net.torch.validate \
3030
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/resnet18/
3131
```
@@ -45,7 +45,7 @@ Wrap the model with the extractor — that’s all you need:
4545
```bash
4646
import graph_net
4747

48-
# Instantiate the model (e.g. a torchvision model)
48+
# Instantiate the model (e.g., a torchvision model)
4949
model = ...
5050

5151
# Extract your own model
@@ -58,14 +58,14 @@ For more details, see docstring of `graph_net.torch.extract` defined in `graph_n
5858

5959
**Step 2: graph_net.torch.validate**
6060

61-
To verify that the extracted model meets requirements, we use `graph_net.torch.validate` in CI tool and also ask contributors to self-check in advance:
61+
To verify that the extracted model meets requirements, we use `graph_net.torch.validate` in the CI tool and also ask contributors to self-check in advance:
6262

6363
```bash
6464
python -m graph_net.torch.validate \
6565
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name
6666
```
6767

68-
All the **construction constraints** will be examined automatically. After passing validation, a unique `graph_hash.txt` will be generated and later checked in CI procedure to avoid redundant.
68+
All **construction constraints** will be examined automatically. After passing validation, a unique `graph_hash.txt` will be generated and later checked in the CI procedure to avoid redundancy.
6969

7070
## 📁 Repository Structure
7171
This repository is organized as follows:

docs/README_contribute_cn.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# 为 GraphNet 做贡献
2+
为确保数据集整体质量、可复现性及跨编译器兼容性,我们定义了以下构建**约束条件**
3+
4+
1. 可运行:计算图必须在命令式(即时)模式下可执行。
5+
2. 可序列化:计算图及其对应的 Python 代码必须支持序列化与反序列化。
6+
3. 可分解:完整计算图应能分解为两个不相交的子图。
7+
4. 可静态分析:每个计算图内的算子名称必须可静态解析。
8+
5. 自定义算子可访问:若使用了自定义算子,其实现代码必须完全可访问。
9+
10+
## 图提取与验证
11+
GraphNet 提供了用于图提取和验证的自动化工具。
12+
13+
<div align="center">
14+
<img src="/pics/graphnet_overview.jpg" alt="GraphNet Architecture Overview" width="65%">
15+
</div>
16+
17+
**示例:提取并验证 ResNet‑18**
18+
```bash
19+
git clone https://github.com/PaddlePaddle/GraphNet.git
20+
cd GraphNet
21+
22+
# 设置您的工作空间目录
23+
export GRAPH_NET_EXTRACT_WORKSPACE=/home/yourname/graphnet_workspace/
24+
25+
# 提取 ResNet-18 计算图
26+
python graph_net/test/vision_model_test.py
27+
28+
# 验证提取的图(例如 /home/yourname/graphnet_workspace/resnet18/)
29+
python -m graph_net.torch.validate \
30+
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/resnet18/
31+
```
32+
33+
**工作流程图示说明**
34+
35+
<div align="center">
36+
<img src="/pics/dataset_composition.png" alt="GraphNet Extract Sample" width="65%">
37+
</div>
38+
39+
> 注:**仅当**模块中使用了相应的自定义算子时,才需要提供 custom_op 的源代码,且**对其格式无特定要求**
40+
41+
**步骤 1:`graph_net.torch.extract`**
42+
43+
使用提取器包装您的模型——这就是您需要做的全部:
44+
45+
```bash
46+
import graph_net
47+
48+
# 实例化模型(例如一个 torchvision 模型)
49+
model = ...
50+
51+
# 提取您自己的模型
52+
model = graph_net.torch.extract(name="model_name", dynamic=True)(model)
53+
```
54+
55+
运行后,提取的计算图将保存至:`$GRAPH_NET_EXTRACT_WORKSPACE/model_name/`
56+
57+
更多详细信息,请参阅 `graph_net/torch/extractor.py` 中定义的 `graph_net.torch.extract` 的文档字符串。
58+
59+
**步骤 2:`graph_net.torch.validate`**
60+
61+
为验证提取的模型是否符合要求,我们在 CI 工具中使用 `graph_net.torch.validate`,同时也要求贡献者提前进行自检:
62+
63+
```bash
64+
python -m graph_net.torch.validate \
65+
--model-path $GRAPH_NET_EXTRACT_WORKSPACE/model_name
66+
```
67+
68+
所有**构建约束条件**都将被自动检查。通过验证后,将生成一个唯一的 `graph_hash.txt` 文件,后续在 CI 流程中会检查此文件以避免重复。
69+
70+
## 📁 代码仓库结构
71+
本仓库结构如下:
72+
73+
| 目录 | 描述 |
74+
|------------|--------------|
75+
| **graph_net/** | 图提取、验证与基准测试的核心模块 |
76+
| **paddle_samples/** | 从 PaddlePaddle 提取的计算图样本 |
77+
| **samples/** | 从 PyTorch 提取的计算图样本 |
78+
| **docs/** | 技术文档与贡献者指南|
79+
80+
以下是 **graph_net/** 目录的结构:
81+
```text
82+
graph_net/
83+
├─ config/ # 配置文件、参数
84+
├─ paddle/ # PaddlePaddle 图提取与验证
85+
├─ torch/ # PyTorch 图提取与验证
86+
├─ test/ # 单元测试与示例脚本
87+
└─ *.py # 基准测试与分析脚本

0 commit comments

Comments
 (0)