GENE Log Analysis + Excel Charts
This Python script analyzes GENE (or RCCL) simulation log files (.out, .err, .cper) and generates an Excel report with detailed statistics, runtime/system info, and visualizations. It is designed to handle multiple logs in a folder and summarize PASS vs FAIL jobs, file sizes, elapsed times, and hardware/runtime settings.
Features
Log Metadata Extraction
Collects filename, file type, size (KB), timestamp, and application type (gene or rccl).
Elapsed Time & Job Status Parsing
Detects elapsed time from .out files.
Determines job status (PASS or FAIL) from log content.
Runtime/System Info Extraction
Extracts key parameters from .out files:
Parallelization choice
Alternative routines
Number of blocks
Fastest tagvals
Time for chosen settings
MPI processes per node
MPI tasks
GPU devices per node
Node Failure Detection
Parses .err files to detect failed nodes and failure reasons.
Summary Statistics
Total logs, failed jobs, maximum and average elapsed time for failed jobs.
Average and maximum file sizes.
Histograms
File size distribution (PASS vs FAIL jobs)
Elapsed time distribution for failed jobs.
Excel Report
All extracted and aggregated data saved into GENE_Log_Analysis_Report_PASS_FAIL.xlsx.
Includes interactive bar charts for:
File size histogram
Elapsed time histogram (failed jobs)
PASS vs FAIL per MPI tasks
Requirements
Python 3.8+
Packages:
pip install pandas numpy openpyxl
Usage
Update Configuration
log_dir = r"C:\path\to\logs"
Replace with the path containing your .out, .err, and .cper files.
Run the Script
python gene_log_analysis.py
Output
Excel report will be saved in the log folder:
GENE_Log_Analysis_Report_PASS_FAIL.xlsx
Contains multiple sheets:
log_metadata
elapsed_info
runtime_info
summary
summary_stats
hist_file_size
hist_elapsed
mpi_task_chart (generated for MPI task summary chart)
File Parsing Logic
.out files:
Elapsed time: lines with Elapsed: or JobID summary.
Job status: Time for GENE simulation = PASS; otherwise FAIL.
Runtime/hardware info extracted using regex patterns.
.err files:
Detect failed node names and failure reasons.
Excel Charts
File Size Distribution
PASS vs FAIL jobs
Histogram by file size bins
Elapsed Time Histogram
For failed jobs
Binned by hours
PASS vs FAIL per MPI tasks
Aggregated counts of jobs grouped by mpi_tasks
Notes
The script ignores files that cannot be read due to permissions or encoding errors.
File size histograms automatically adjust to available .out logs.
All numeric values are rounded for clarity in Excel.