Skip to content

使用通用分析工具创建的任务执行后一直显示导出失败 #171

@wenbindeng

Description

@wenbindeng

下面是具体日志,帮忙看下这个是什么原因:

2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - ============================================================
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - [flagged_words_filter] Filter Summary Statistics
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - ============================================================
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - Total samples: 2
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - Kept samples: 2 (100.00%)
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - Filtered samples: 0 (0.00%)
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - 
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - No samples filtered. All samples passed the filter.
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - 
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - Filter parameters:
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 -   - Language: zh
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 -   - Tokenization: False
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 -   - Max ratio: 0.001 (0.10%)
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 -   - Use words augmentation: True
2026-04-13 15:19:10 | INFO     | data_engine.ops.base_op:635 - ============================================================
2026-04-13 15:19:10 | INFO     | data_engine.core.data:202 - OP [flagged_words_filter] Done in 0.808s. Left 2 samples.
2026-04-13 15:19:10 | DEBUG    | data_engine.utils.process_utils:30 - Setting multiprocess start method to 'fork'
2026-04-13 15:19:10 | DEBUG    | data_engine.ops.base_op:182 - Op [text_length_filter] running with number of procs:3
2026-04-13 15:19:10 | DEBUG    | data_engine.ops.base_op:182 - Op [text_length_filter] running with number of procs:3

text_length_filter_compute_stats (num_proc=2):   0%|          | 0/2 [00:00<?, ? examples/s]
text_length_filter_compute_stats (num_proc=2):  50%|#####     | 1/2 [00:00<00:00,  9.21 examples/s]
text_length_filter_compute_stats (num_proc=2): 100%|##########| 2/2 [00:00<00:00,  8.82 examples/s]
2026-04-13 15:19:11 | DEBUG    | data_engine.ops.base_op:182 - Op [text_length_filter] running with number of procs:3

text_length_filter_process (num_proc=2):   0%|          | 0/2 [00:00<?, ? examples/s]
text_length_filter_process (num_proc=2): 100%|##########| 2/2 [00:00<00:00, 10.29 examples/s]
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - ============================================================
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - [text_length_filter] Filter Summary Statistics
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - ============================================================
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - Total samples: 2
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - Kept samples: 2 (100.00%)
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - Filtered samples: 0 (0.00%)
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - 
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - No samples filtered. All samples passed the filter.
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - 
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - Filter parameters:
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 -   - Min length: 10 characters
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 -   - Max length: 999999 characters
2026-04-13 15:19:11 | INFO     | data_engine.ops.base_op:635 - ============================================================
2026-04-13 15:19:11 | INFO     | data_engine.core.data:202 - OP [text_length_filter] Done in 0.692s. Left 2 samples.
2026-04-13 15:19:11 | INFO     | data_engine.tools.legacies.analyzer:101 - Exporting dataset to disk...
2026-04-13 15:19:11 | INFO     | data_engine.exporter.base_exporter:130 - Exporting computed stats into a single file...

Creating json from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]
Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 144.71ba/s]
2026-04-13 15:19:11 | ERROR    | data_server.job.JobExecutor:147 - Job 54 execution failed with error: [Errno 2] No such file or directory: '/data/dataflow_data/dwb-test222_edba79c2-c1f0-4fe3-8925-1616158e803d/output/_df_dataset_stats.jsonl/_data/x_stats.jsonl'
2026-04-13 15:19:11 | INFO     | data_server.job.JobExecutor:157 - Job 54 marked as FAILED`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions