Skip to content

This code contains script to decode and analyze cper and GNE application logs

Notifications You must be signed in to change notification settings

bkumar-amd/crash_data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

GENE Log Analysis + Excel Charts

This Python script analyzes GENE (or RCCL) simulation log files (.out, .err, .cper) and generates an Excel report with detailed statistics, runtime/system info, and visualizations. It is designed to handle multiple logs in a folder and summarize PASS vs FAIL jobs, file sizes, elapsed times, and hardware/runtime settings.

Features

Log Metadata Extraction

Collects filename, file type, size (KB), timestamp, and application type (gene or rccl).

Elapsed Time & Job Status Parsing

Detects elapsed time from .out files.

Determines job status (PASS or FAIL) from log content.

Runtime/System Info Extraction

Extracts key parameters from .out files:

Parallelization choice

Alternative routines

Number of blocks

Fastest tagvals

Time for chosen settings

MPI processes per node

MPI tasks

GPU devices per node

Node Failure Detection

Parses .err files to detect failed nodes and failure reasons.

Summary Statistics

Total logs, failed jobs, maximum and average elapsed time for failed jobs.

Average and maximum file sizes.

Histograms

File size distribution (PASS vs FAIL jobs)

Elapsed time distribution for failed jobs.

Excel Report

All extracted and aggregated data saved into GENE_Log_Analysis_Report_PASS_FAIL.xlsx.

Includes interactive bar charts for:

File size histogram

Elapsed time histogram (failed jobs)

PASS vs FAIL per MPI tasks

Requirements

Python 3.8+

Packages:

pip install pandas numpy openpyxl

Usage

Update Configuration

log_dir = r"C:\path\to\logs"

Replace with the path containing your .out, .err, and .cper files.

Run the Script

python gene_log_analysis.py

Output

Excel report will be saved in the log folder:

GENE_Log_Analysis_Report_PASS_FAIL.xlsx

Contains multiple sheets:

log_metadata

elapsed_info

runtime_info

summary

summary_stats

hist_file_size

hist_elapsed

mpi_task_chart (generated for MPI task summary chart)

File Parsing Logic

.out files:

Elapsed time: lines with Elapsed: or JobID summary.

Job status: Time for GENE simulation = PASS; otherwise FAIL.

Runtime/hardware info extracted using regex patterns.

.err files:

Detect failed node names and failure reasons.

Excel Charts

File Size Distribution

PASS vs FAIL jobs

Histogram by file size bins

Elapsed Time Histogram

For failed jobs

Binned by hours

PASS vs FAIL per MPI tasks

Aggregated counts of jobs grouped by mpi_tasks

Notes

The script ignores files that cannot be read due to permissions or encoding errors.

File size histograms automatically adjust to available .out logs.

All numeric values are rounded for clarity in Excel.

About

This code contains script to decode and analyze cper and GNE application logs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors