This directory provides scripts for benchmarking the time cost of knowledge graph extraction and concept generation processes. Unlike the parallel_generation directory which focuses on parallel processing, this directory is designed specifically for measuring and analyzing extraction performance.
-
1_slice_kg_extraction.py: Benchmarks the time cost of entity-event triple extraction from text documents with detailed timing metrics. -
2_concept_generation.py: Benchmarks the time cost of concept node generation and graph construction from extracted triples.
This benchmark suite helps you:
- Measure extraction speed for different LLM models
- Compare performance across different hardware configurations
- Optimize batch sizes for maximum throughput
- Estimate processing time for large-scale datasets
- Profile bottlenecks in the extraction pipeline
Run entity-event extraction timing:
python 1_slice_kg_extraction.py \
--shard 0 \
--total_shards 1 \
--port 8135Key Parameters:
--shard: Which data shard to process (default: 0)--total_shards: Total number of data shards (default: 1)--port: vLLM/SGLang server port (default: 8135)
Run concept generation timing:
python 2_concept_generation.py \
--shard 0 \
--total_shards 1 \
--port 8135The total extraction time is recorded in the last object of the output JSON file:
Location: output_dir/kg_extraction/xxx_1_in_1.json
Key: total_extraction_time_seconds
Example:
{
"id": "doc_12345",
"text": "...",
"triples": [...],
"total_extraction_time_seconds": 245.67
}The concept generation time is recorded in the last line of the logging file:
Location: output_dir/concepts/logging.txt
Format: Total concept generation time: xxx seconds
Example:
Processing concepts...
Creating CSV files...
Total time: 89.34 seconds
Both scripts use benchmark=True and record=True in ProcessingConfig:
kg_extraction_config = ProcessingConfig(
model_path=model_name,
data_directory="/data/AutoSchema/processed_data/cc_en_head",
filename_pattern=keyword,
batch_size_triple=16, # Extraction batch size
batch_size_concept=64, # Concept generation batch size
output_directory=f'/data/AutoSchema/processed_data/cc_en_head/{model_name}',
current_shard_triple=args.shard,
total_shards_triple=args.total_shards,
record=True, # Save detailed results
max_new_tokens=8192, # Max tokens (extraction: 8192, concept: 512)
benchmark=True # Enable timing metrics
)-
Start LLM Server
# Example: vLLM server vllm serve Qwen/Qwen2.5-7B-Instruct --port 8135 -
Run Triple Extraction Benchmark
python 1_slice_kg_extraction.py --port 8135
-
Check Extraction Time
# View last object in JSON output tail -n 20 output_dir/kg_extraction/xxx_1_in_1.json | grep total_extraction_time_seconds
-
Run Concept Generation Benchmark
python 2_concept_generation.py --port 8135
-
Check Concept Time
# View last line of logging file tail -n 1 output_dir/concepts/logging.txt
- Model Size: Larger models (70B) are slower but more accurate than smaller models (7B)
- Batch Size: Larger batches improve throughput but require more memory
- Max Tokens: Higher token limits allow more complex extractions but increase latency
- Hardware: GPU memory and compute capability directly impact speed
- Concurrency:
max_workersparameter controls parallel API calls
After benchmarking, you'll find:
output_dir/
├── kg_extraction/
│ └── xxx_1_in_1.json # Contains total_extraction_time_seconds
├── concepts/
│ ├── logging.txt # Contains total concept generation time
│ ├── concept_nodes.csv
│ └── concept_edges.csv
└── graphml/
└── knowledge_graph.graphml