|
1 | | -# LogLens |
2 | | - |
3 | | -[](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) |
4 | | -[](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) |
5 | | - |
6 | | -C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. |
7 | | - |
8 | | -It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports. |
9 | | - |
10 | | -## Project Status |
11 | | - |
12 | | -LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. |
13 | | - |
14 | | -## Why This Project Exists |
15 | | - |
16 | | -Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. |
17 | | - |
18 | | -LogLens is built around three ideas: |
19 | | - |
20 | | -- detection engineering over offensive functionality |
21 | | -- parser observability over silent failure |
22 | | -- repository discipline over throwaway scripts |
23 | | - |
24 | | -The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. |
25 | | - |
26 | | -## Scope |
27 | | - |
28 | | -LogLens is a defensive, public-safe repository. |
29 | | -It is intended for log parsing, detection experiments, and engineering practice. |
30 | | -It does not provide exploitation, persistence, credential attack automation, or live offensive capability. |
31 | | - |
32 | | -## Repository Checks |
33 | | - |
34 | | -LogLens includes two minimal GitHub Actions workflows: |
35 | | - |
36 | | -- `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` |
37 | | -- `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule |
38 | | - |
39 | | -Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). |
40 | | - |
41 | | -## Threat Model |
42 | | - |
43 | | -LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. |
44 | | - |
45 | | -The current tool helps answer: |
46 | | - |
47 | | -- Is one source IP generating repeated SSH failures in a short window? |
48 | | -- Is one source IP trying several usernames in a short window? |
49 | | -- Is one account running sudo unusually often in a short window? |
50 | | - |
51 | | -It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. |
52 | | - |
53 | | -## Detections |
54 | | - |
55 | | -LogLens currently detects: |
56 | | - |
57 | | -- Repeated SSH failed password attempts from the same IP within 10 minutes |
58 | | -- One IP trying multiple usernames within 15 minutes |
59 | | -- Bursty sudo activity from the same user within 5 minutes |
60 | | - |
61 | | -LogLens currently parses and reports these additional auth patterns beyond the core detector inputs: |
62 | | - |
63 | | -- `Accepted publickey` SSH successes |
64 | | -- `Failed publickey` SSH failures, which count toward SSH brute-force detection by default |
65 | | -- `pam_unix(...:auth): authentication failure` |
66 | | -- `pam_unix(...:session): session opened` |
67 | | -- selected `pam_faillock(...:auth)` failure variants |
68 | | -- selected `pam_sss(...:auth)` failure variants |
69 | | - |
70 | | -LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: |
71 | | - |
72 | | -- `total_lines` |
73 | | -- `parsed_lines` |
74 | | -- `unparsed_lines` |
75 | | -- `parse_success_rate` |
76 | | -- `top_unknown_patterns` |
77 | | - |
78 | | -LogLens does not currently detect: |
79 | | - |
80 | | -- Lateral movement |
81 | | -- MFA abuse |
82 | | -- SSH key misuse |
83 | | -- Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns |
84 | | -- Cross-file or cross-host correlation |
85 | | - |
86 | | -## Build |
87 | | - |
88 | | -```bash |
89 | | -cmake -S . -B build |
90 | | -cmake --build build |
91 | | -ctest --test-dir build --output-on-failure |
92 | | -``` |
93 | | - |
94 | | -For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](./docs/dev-setup.md). |
95 | | - |
96 | | -## Run |
97 | | - |
98 | | -```bash |
99 | | -./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out |
100 | | -./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal |
101 | | -./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config |
102 | | -``` |
103 | | - |
104 | | -The CLI writes: |
105 | | - |
106 | | -- `report.md` |
107 | | -- `report.json` |
108 | | - |
109 | | -into the output directory you provide. If you omit the output directory, the files are written into the current working directory. |
110 | | - |
111 | | -When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. |
112 | | - |
113 | | -## Sample Output |
114 | | - |
115 | | -For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). |
116 | | - |
117 | | -`report.md` summary excerpt: |
118 | | - |
119 | | -```markdown |
120 | | -## Summary |
121 | | -- Input mode: syslog_legacy |
122 | | -- Parsed events: 14 |
123 | | -- Findings: 3 |
124 | | -- Parser warnings: 2 |
125 | | -``` |
126 | | - |
127 | | -`report.json` summary excerpt: |
128 | | - |
129 | | -```json |
130 | | -{ |
131 | | - "input_mode": "syslog_legacy", |
132 | | - "parsed_event_count": 14, |
133 | | - "finding_count": 3, |
134 | | - "warning_count": 2 |
135 | | -} |
136 | | -``` |
137 | | - |
138 | | -The config file schema is intentionally small and strict: |
139 | | - |
140 | | -```json |
141 | | -{ |
142 | | - "input_mode": "syslog_legacy", |
143 | | - "timestamp": { |
144 | | - "assume_year": 2026 |
145 | | - }, |
146 | | - "brute_force": { "threshold": 5, "window_minutes": 10 }, |
147 | | - "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, |
148 | | - "sudo_burst": { "threshold": 3, "window_minutes": 5 }, |
149 | | - "auth_signal_mappings": { |
150 | | - "ssh_failed_password": { |
151 | | - "counts_as_attempt_evidence": true, |
152 | | - "counts_as_terminal_auth_failure": true |
153 | | - }, |
154 | | - "ssh_invalid_user": { |
155 | | - "counts_as_attempt_evidence": true, |
156 | | - "counts_as_terminal_auth_failure": true |
157 | | - }, |
158 | | - "ssh_failed_publickey": { |
159 | | - "counts_as_attempt_evidence": true, |
160 | | - "counts_as_terminal_auth_failure": true |
161 | | - }, |
162 | | - "pam_auth_failure": { |
163 | | - "counts_as_attempt_evidence": true, |
164 | | - "counts_as_terminal_auth_failure": false |
165 | | - } |
166 | | - } |
167 | | -} |
168 | | -``` |
169 | | - |
170 | | -This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. |
171 | | - |
172 | | -Timestamp handling is now explicit: |
173 | | - |
174 | | -- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` |
175 | | -- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` |
176 | | - |
177 | | -## Example Input |
178 | | - |
179 | | -```text |
180 | | -Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 |
181 | | -Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 |
182 | | -Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh |
183 | | -Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 |
184 | | -Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice |
185 | | -Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) |
186 | | -Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] |
187 | | -Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 |
188 | | -``` |
189 | | - |
190 | | -`journalctl --output short-full` style example: |
191 | | - |
192 | | -```text |
193 | | -Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 |
194 | | -Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh |
195 | | -Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 |
196 | | -Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] |
197 | | -``` |
198 | | - |
199 | | -## Known Limitations |
200 | | - |
201 | | -- `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. |
202 | | -- `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. |
203 | | -- Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support. |
204 | | -- Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. |
205 | | -- `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. |
206 | | -- Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. |
207 | | -- Findings are rule-based triage aids, not incident verdicts or attribution. |
208 | | - |
209 | | -## Future Roadmap |
210 | | - |
211 | | -- Additional auth patterns and PAM coverage |
212 | | -- Optional CSV export |
213 | | -- Larger sanitized test corpus |
| 1 | +# LogLens [](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) [](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports, with optional CSV exports for findings and warnings. ## Project Status LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. ## Why This Project Exists Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. LogLens is built around three ideas: - detection engineering over offensive functionality - parser observability over silent failure - repository discipline over throwaway scripts The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. ## Scope LogLens is a defensive, public-safe repository. It is intended for log parsing, detection experiments, and engineering practice. It does not provide exploitation, persistence, credential attack automation, or live offensive capability. ## Repository Checks LogLens includes two minimal GitHub Actions workflows: - `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` - `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). ## Threat Model LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. The current tool helps answer: - Is one source IP generating repeated SSH failures in a short window? - Is one source IP trying several usernames in a short window? - Is one account running sudo unusually often in a short window? It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. ## Detections LogLens currently detects: - Repeated SSH failed password attempts from the same IP within 10 minutes - One IP trying multiple usernames within 15 minutes - Bursty sudo activity from the same user within 5 minutes LogLens currently parses and reports these additional auth patterns beyond the core detector inputs: - `Accepted publickey` SSH successes - `Failed publickey` SSH failures, which count toward SSH brute-force detection by default - `pam_unix(...:auth): authentication failure` - `pam_unix(...:session): session opened` - selected `pam_faillock(...:auth)` failure variants - selected `pam_sss(...:auth)` failure variants LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: - `total_lines` - `parsed_lines` - `unparsed_lines` - `parse_success_rate` - `top_unknown_patterns` LogLens does not currently detect: - Lateral movement - MFA abuse - SSH key misuse - Many PAM-specific failures beyond the parsed `pam_unix`, `pam_faillock`, and `pam_sss` sample patterns - Cross-file or cross-host correlation ## Build ```bash cmake -S . -B build cmake --build build ctest --test-dir build --output-on-failure ``` For fresh-machine setup and repeatable local presets, see [`docs/dev-setup.md`](./docs/dev-setup.md). ## Run ```bash ./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out ./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal ./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config ./build/loglens --mode syslog --year 2026 --csv ./assets/sample_auth.log ./out-csv ``` The CLI writes: - `report.md` - `report.json` into the output directory you provide. If you omit the output directory, the files are written into the current working directory. When you add `--csv`, LogLens also writes: - `findings.csv` - `warnings.csv` The CSV schema is intentionally small and stable: - `findings.csv`: `rule`, `subject_kind`, `subject`, `event_count`, `window_start`, `window_end`, `usernames`, `summary` - `warnings.csv`: `kind`, `message` When an input spans multiple hostnames, both reports add compact host-level summaries without changing detector thresholds or introducing cross-host correlation logic. ## Sample Output For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). `report.md` summary excerpt: ```markdown ## Summary - Input mode: syslog_legacy - Parsed events: 14 - Findings: 3 - Parser warnings: 2 ``` `report.json` summary excerpt: ```json { "input_mode": "syslog_legacy", "parsed_event_count": 14, "finding_count": 3, "warning_count": 2 } ``` The config file schema is intentionally small and strict: ```json { "input_mode": "syslog_legacy", "timestamp": { "assume_year": 2026 }, "brute_force": { "threshold": 5, "window_minutes": 10 }, "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, "sudo_burst": { "threshold": 3, "window_minutes": 5 }, "auth_signal_mappings": { "ssh_failed_password": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_invalid_user": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "ssh_failed_publickey": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": true }, "pam_auth_failure": { "counts_as_attempt_evidence": true, "counts_as_terminal_auth_failure": false } } } ``` This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. Timestamp handling is now explicit: - `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` - `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` ## Example Input ```text Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 ``` `journalctl --output short-full` style example: ```text Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] ``` ## Known Limitations - `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. - `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. - Parser coverage is still selective: it covers common `sshd`, `sudo`, `pam_unix`, and selected `pam_faillock` / `pam_sss` variants rather than broad Linux auth-family support. - Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. - `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. - Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. - Findings are rule-based triage aids, not incident verdicts or attribution. ## Future Roadmap - Additional auth patterns and PAM coverage - Larger sanitized test corpus |
0 commit comments