Skip to content

Commit 1e1684c

Browse files
committed
feat: add optional CSV export for findings and warnings
1 parent e3063f3 commit 1e1684c

File tree

1 file changed

+1
-213
lines changed

1 file changed

+1
-213
lines changed

README.md

Lines changed: 1 addition & 213 deletions
Original file line numberDiff line numberDiff line change
@@ -1,213 +1 @@
1-
# LogLens
2-
3-
[![CI](https://github.com/stacknil/LogLens/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/ci.yml)
4-
[![CodeQL](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml)
5-
6-
C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL.
7-
8-
It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports.
9-
10-
## Project Status
11-
12-
LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow.
13-
14-
## Why This Project Exists
15-
16-
Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible.
17-
18-
LogLens is built around three ideas:
19-
20-
- detection engineering over offensive functionality
21-
- parser observability over silent failure
22-
- repository discipline over throwaway scripts
23-
24-
The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene.
25-
26-
## Scope
27-
28-
LogLens is a defensive, public-safe repository.
29-
It is intended for log parsing, detection experiments, and engineering practice.
30-
It does not provide exploitation, persistence, credential attack automation, or live offensive capability.
31-
32-
## Repository Checks
33-
34-
LogLens includes two minimal GitHub Actions workflows:
35-
36-
- `CI` builds and tests the project on `ubuntu-latest` and `windows-latest`
37-
- `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule
38-
39-
Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md).
40-
41-
## Threat Model
42-
43-
LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use.
44-
45-
The current tool helps answer:
46-
47-
- Is one source IP generating repeated SSH failures in a short window?
48-
- Is one source IP trying several usernames in a short window?
49-
- Is one account running sudo unusually often in a short window?
50-
51-
It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own.
52-
53-
## Detections
54-
55-
LogLens currently detects:
56-
57-
- Repeated SSH failed password attempts from the same IP within 10 minutes
58-
- One IP trying multiple usernames within 15 minutes
59-
- Bursty sudo activity from the same user within 5 minutes
60-
61-
LogLens currently parses and reports these additional auth patterns beyond the core detector inputs:
62-
63-
- `Accepted publickey` SSH successes
64-
- `Failed publickey` SSH failures, which count toward SSH brute-force detection by default
65-
- `pam_unix(...:auth): authentication failure`
66-
- `pam_unix(...:session): session opened`
67-
- selected `pam_faillock(...:auth)` failure variants
68-
- selected `pam_sss(...:auth)` failure variants
69-
70-
LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including:
71-
72-
- `total_lines`
73-
- `parsed_lines`
74-
- `unparsed_lines`
75-
- `parse_success_rate`
76-
- `top_unknown_patterns`
77-
78-
LogLens does not currently detect:
79-
80-
- Lateral movement
81-
- MFA abuse
82-
- SSH key misuse
83-
- Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns
84-
- Cross-file or cross-host correlation
85-
86-
## Build
87-
88-
```bash
89-
cmake -S . -B build
90-
cmake --build build
91-
ctest --test-dir build --output-on-failure
92-
```
93-
94-
For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](./docs/dev-setup.md).
95-
96-
## Run
97-
98-
```bash
99-
./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out
100-
./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal
101-
./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config
102-
```
103-
104-
The CLI writes:
105-
106-
- `report.md`
107-
- `report.json`
108-
109-
into the output directory you provide. If you omit the output directory, the files are written into the current working directory.
110-
111-
When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic.
112-
113-
## Sample Output
114-
115-
For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log).
116-
117-
`report.md` summary excerpt:
118-
119-
```markdown
120-
## Summary
121-
- Input mode: syslog_legacy
122-
- Parsed events: 14
123-
- Findings: 3
124-
- Parser warnings: 2
125-
```
126-
127-
`report.json` summary excerpt:
128-
129-
```json
130-
{
131-
"input_mode": "syslog_legacy",
132-
"parsed_event_count": 14,
133-
"finding_count": 3,
134-
"warning_count": 2
135-
}
136-
```
137-
138-
The config file schema is intentionally small and strict:
139-
140-
```json
141-
{
142-
"input_mode": "syslog_legacy",
143-
"timestamp": {
144-
"assume_year": 2026
145-
},
146-
"brute_force": { "threshold": 5, "window_minutes": 10 },
147-
"multi_user_probing": { "threshold": 3, "window_minutes": 15 },
148-
"sudo_burst": { "threshold": 3, "window_minutes": 5 },
149-
"auth_signal_mappings": {
150-
"ssh_failed_password": {
151-
"counts_as_attempt_evidence": true,
152-
"counts_as_terminal_auth_failure": true
153-
},
154-
"ssh_invalid_user": {
155-
"counts_as_attempt_evidence": true,
156-
"counts_as_terminal_auth_failure": true
157-
},
158-
"ssh_failed_publickey": {
159-
"counts_as_attempt_evidence": true,
160-
"counts_as_terminal_auth_failure": true
161-
},
162-
"pam_auth_failure": {
163-
"counts_as_attempt_evidence": true,
164-
"counts_as_terminal_auth_failure": false
165-
}
166-
}
167-
}
168-
```
169-
170-
This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it.
171-
172-
Timestamp handling is now explicit:
173-
174-
- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year`
175-
- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year`
176-
177-
## Example Input
178-
179-
```text
180-
Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2
181-
Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2
182-
Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh
183-
Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2
184-
Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice
185-
Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0)
186-
Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth]
187-
Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291
188-
```
189-
190-
`journalctl --output short-full` style example:
191-
192-
```text
193-
Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2
194-
Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh
195-
Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2
196-
Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth]
197-
```
198-
199-
## Known Limitations
200-
201-
- `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly.
202-
- `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations.
203-
- Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support.
204-
- Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings.
205-
- `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them.
206-
- Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats.
207-
- Findings are rule-based triage aids, not incident verdicts or attribution.
208-
209-
## Future Roadmap
210-
211-
- Additional auth patterns and PAM coverage
212-
- Optional CSV export
213-
- Larger sanitized test corpus
1+
# LogLens [![CI](https://github.com/stacknil/LogLens/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) [![CodeQL](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports, with optional CSV exports for findings and warnings. ## Project Status LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. ## Why This Project Exists Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. LogLens is built around three ideas: - detection engineering over offensive functionality - parser observability over silent failure - repository discipline over throwaway scripts The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. ## Scope LogLens is a defensive, public-safe repository. It is intended for log parsing, detection experiments, and engineering practice. It does not provide exploitation, persistence, credential attack automation, or live offensive capability. ## Repository Checks LogLens includes two minimal GitHub Actions workflows: - `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` - `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). ## Threat Model LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. The current tool helps answer: - Is one source IP generating repeated SSH failures in a short window? - Is one source IP trying several usernames in a short window? - Is one account running sudo unusually often in a short window? It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. ## Detections LogLens currently detects: - Repeated SSH failed password attempts from the same IP within 10 minutes - One IP trying multiple usernames within 15 minutes - Bursty sudo activity from the same user within 5 minutes LogLens currently parses and reports these additional auth patterns beyond the core detector inputs: - `Accepted publickey` SSH successes - `Failed publickey` SSH failures, which count toward SSH brute-force detection by default - `pam_unix(...:auth): authentication failure` - `pam_unix(...:session): session opened` - selected `pam_faillock(...:auth)` failure variants - selected `pam_sss(...:auth)` failure variants LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: - `total_lines` - `parsed_lines` - `unparsed_lines` - `parse_success_rate` - `top_unknown_patterns` LogLens does not currently detect: - Lateral movement - MFA abuse - SSH key misuse - Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns - Cross-file or cross-host correlation ## Build ```bash cmake -S . -B build cmake --build build ctest --test-dir build --output-on-failure ``` For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](./docs/dev-setup.md). ## Run ```bash ./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out ./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal ./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config ./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv ``` The CLI writes: - `report.md` - `report.json` into the output directory you provide. If you omit the output directory, the files are written into the current working directory. When you add `--csv`, LogLens also writes: - `findings.csv` - `warnings.csv` The CSV schema is intentionally small and stable: - `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary` - `warnings.csv`: `kind`, `message` When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. ## Sample Output For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). `report.md` summary excerpt: ```markdown ## Summary - Input mode: syslog_legacy - Parsed events: 14 - Findings: 3 - Parser warnings: 2 ``` `report.json` summary excerpt: ```json { "input_mode": "syslog_legacy", "parsed_event_count": 14, "finding_count": 3, "warning_count": 2 } ``` The config file schema is intentionally small and strict: ```json { "input_mode": "syslog_legacy", "timestamp": { "assume_year": 2026 }, "brute_force": { "threshold": 5, "window_minutes": 10 }, "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, "sudo_burst": { "threshold": 3, "window_minutes": 5 }, "auth_signal_mappings": { "ssh_failed_password": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_invalid_user": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_failed_publickey": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "pam_auth_failure": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": false } } } ``` This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. Timestamp handling is now explicit: - `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` - `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` ## Example Input ```text Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 ``` `journalctl --output short-full` style example: ```text Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] ``` ## Known Limitations - `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. - `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. - Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support. - Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. - `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. - Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. - Findings are rule-based triage aids, not incident verdicts or attribution. ## Future Roadmap - Additional auth patterns and PAM coverage - Larger sanitized test corpus

0 commit comments

Comments
 (0)