diff --git a/CHANGELOG.md b/CHANGELOG.md index 2995156..bc5bdcb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,10 +1,42 @@ # Changelog -All notable changes to this project will be documented in this file. +All notable user-visible changes should be recorded here. + +## Unreleased + +### Added + +- None yet. + +### Changed + +- None yet. + +### Fixed + +- None yet. + +### Docs + +- None yet. ## v0.1.0 + +### Added + +- Parser support for `syslog_legacy` and `journalctl_short_full` authentication log input. +- Rule-based detections for SSH brute force, multi-user probing, and sudo burst activity. +- Parser coverage telemetry including parsed/unparsed counts and unknown-pattern buckets. +- Repository automation and hardening with CI, CodeQL, pinned GitHub Actions, security policy, and Dependabot for workflow updates. + +### Changed + +- Established deterministic Markdown and JSON reporting for the MVP release. + +### Fixed + +- None. + +### Docs -- Added parser support for `syslog_legacy` and `journalctl_short_full` authentication log input. -- Added rule-based detections for SSH brute force, multi-user probing, and bursty sudo activity. -- Added parser coverage telemetry, including parsed/unparsed counts and unknown-pattern buckets. -- Added repository automation and hardening with CI, CodeQL, pinned GitHub Actions, security policy, and Dependabot for workflow updates. +- Added CI, CodeQL, repository hardening guidance, and release-facing project documentation for the first public release. diff --git a/README.md b/README.md index d90bafb..07a59a2 100644 --- a/README.md +++ b/README.md @@ -1,220 +1,207 @@ -# LogLens - -[![CI](https://github.com/stacknil/LogLens/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) -[![CodeQL](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) - -C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. - -It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports. - +# LogLens + +[![CI](https://github.com/stacknil/LogLens/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/ci.yml) +[![CodeQL](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml/badge.svg)](https://github.com/stacknil/LogLens/actions/workflows/codeql.yml) + +C++20 defensive log analysis CLI for Linux authentication logs, with parser coverage telemetry, configurable detection rules, CI, and CodeQL. + +It parses `auth.log` / `secure`-style syslog input and `journalctl --output=short-full`-style input, normalizes authentication evidence, applies configurable rule-based detections, and emits deterministic Markdown and JSON reports. + ## Project Status LogLens is an MVP / early release. The repository is stable enough for public review, local experimentation, and extension, but the parser and detection coverage are intentionally narrow. ## Why This Project Exists - -Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. - + +Many small security tools can detect a handful of known log patterns. Fewer tools make their parsing limits visible. + LogLens is built around three ideas: - detection engineering over offensive functionality - parser observability over silent failure - repository discipline over throwaway scripts -## Scope - -LogLens is a defensive, public-safe repository. -It is intended for log parsing, detection experiments, and engineering practice. -It does not provide exploitation, persistence, credential attack automation, or live offensive capability. - - -## Repository Checks - -LogLens includes two minimal GitHub Actions workflows: - -- `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` -- `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule - -Both workflows are intended to stay stable enough to require on pull requests to `main`. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). - -## Threat Model - -LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. - -The current tool helps answer: - -- Is one source IP generating repeated SSH failures in a short window? -- Is one source IP trying several usernames in a short window? -- Is one account running sudo unusually often in a short window? - -It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. - -## Detections - -LogLens currently detects: - -- Repeated SSH failed password attempts from the same IP within 10 minutes -- One IP trying multiple usernames within 15 minutes -- Bursty sudo activity from the same user within 5 minutes - -LogLens currently parses and reports these additional auth patterns: - -- `Failed publickey` SSH failures, which count toward SSH brute-force detection by default -- `pam_unix(...:auth): authentication failure` -- `pam_unix(...:session): session opened` - -LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: - -- `total_lines` -- `parsed_lines` -- `unparsed_lines` -- `parse_success_rate` -- `top_unknown_patterns` - -LogLens does not currently detect: - -- Lateral movement -- MFA abuse -- SSH key misuse -- PAM-specific failures beyond the parsed sample patterns -- Cross-file or cross-host correlation - -## Build - -```bash -cmake -S . -B build -cmake --build build -ctest --test-dir build --output-on-failure -``` - -## Run - -```bash -./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out -./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal -./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config -``` - -The CLI writes: - -- `report.md` -- `report.json` - -into the output directory you provide. If you omit the output directory, the files are written into the current working directory. - -The config file schema is intentionally small and strict: - -```json -{ - "input_mode": "syslog_legacy", - "timestamp": { - "assume_year": 2026 - }, - "brute_force": { "threshold": 5, "window_minutes": 10 }, - "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, - "sudo_burst": { "threshold": 3, "window_minutes": 5 }, - "auth_signal_mappings": { - "ssh_failed_password": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": true - }, - "ssh_invalid_user": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": true - }, - "ssh_failed_publickey": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": true - }, - "pam_auth_failure": { - "counts_as_attempt_evidence": true, - "counts_as_terminal_auth_failure": false - } - } -} -``` - -This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. - -Timestamp handling is now explicit: - -- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` -- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` - -## Example Input - -```text -Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 -Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 -Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh -Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 -Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice -Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) -Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] -Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 -``` - -`journalctl --output short-full` style example: - -```text -Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 -Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh -Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 -Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] -``` - -## Example Output - -`report.md` excerpt: - -```markdown -# LogLens Report - -## Summary -- Input mode: syslog_legacy -- Assume year: 2026 -- Timezone present: false -- Total lines: 16 -- Parsed lines: 14 -- Unparsed lines: 2 -- Parse success rate: 87.50% -- Parsed events: 14 -- Findings: 3 -- Parser warnings: 2 -``` - -`report.json` excerpt: - -```json -{ - "tool": "LogLens", - "input_mode": "syslog_legacy", - "assume_year": 2026, - "timezone_present": false, - "parser_quality": { - "total_lines": 16, - "parsed_lines": 14, - "unparsed_lines": 2, - "parse_success_rate": 0.8750 - }, - "parsed_event_count": 14, - "finding_count": 3 -} -``` - -## Known Limitations - -- `syslog_legacy` mode requires an explicit year; LogLens no longer guesses one implicitly. -- `journalctl_short_full` parsing currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets such as `+0000` or `+00:00`, not arbitrary timezone abbreviations. -- The parser supports a small set of common `sshd`, `sudo`, and `pam_unix` patterns from `auth.log` or `secure`, not every distro-specific variant. -- Unsupported lines are surfaced as parser telemetry and warnings only; they do not generate detector findings on their own. -- `pam_unix` auth failures remain lower-confidence by default; they influence detectors only if `auth_signal_mappings` explicitly upgrades them. -- Detector thresholds and auth signal mappings are configurable only through the fixed `config.json` schema shown above; partial overrides and alternative config formats are not supported. -- Findings are intentionally rule-based and conservative; they are not attribution or incident verdicts. - -## Future Roadmap - -- Additional auth patterns and PAM coverage -- Better host-level summaries -- Optional CSV export -- Larger sanitized test corpus +The project reports suspicious login activity while also surfacing parser coverage, unknown-line buckets, CI status, and code scanning hygiene. + +## Scope + +LogLens is a defensive, public-safe repository. +It is intended for log parsing, detection experiments, and engineering practice. +It does not provide exploitation, persistence, credential attack automation, or live offensive capability. + +## Repository Checks + +LogLens includes two minimal GitHub Actions workflows: + +- `CI` builds and tests the project on `ubuntu-latest` and `windows-latest` +- `CodeQL` runs GitHub code scanning for C/C++ on pushes, pull requests, and a weekly schedule + +Both workflows are intended to stay stable enough to require on pull requests to `main`. Release-facing documentation is split across [`CHANGELOG.md`](./CHANGELOG.md), [`docs/release-process.md`](./docs/release-process.md), [`docs/release-v0.1.0.md`](./docs/release-v0.1.0.md), and the repository's GitHub release notes. The repository hardening note is in [`docs/repo-hardening.md`](./docs/repo-hardening.md), and vulnerability reporting guidance is in [`SECURITY.md`](./SECURITY.md). + +## Threat Model + +LogLens is designed for offline review of `auth.log` and `secure` style text logs collected from systems you own or administer. The MVP focuses on common, high-signal patterns that often appear during credential guessing, username enumeration, or bursty privileged command use. + +The current tool helps answer: + +- Is one source IP generating repeated SSH failures in a short window? +- Is one source IP trying several usernames in a short window? +- Is one account running sudo unusually often in a short window? + +It does not attempt to replace a SIEM, correlate across hosts, enrich IPs, or decide whether a finding is malicious on its own. + +## Detections + +LogLens currently detects: + +- Repeated SSH failed password attempts from the same IP within 10 minutes +- One IP trying multiple usernames within 15 minutes +- Bursty sudo activity from the same user within 5 minutes + +LogLens currently parses and reports these additional auth patterns: + +- `Failed publickey` SSH failures, which count toward SSH brute-force detection by default +- `pam_unix(...:auth): authentication failure` +- `pam_unix(...:session): session opened` + +LogLens also tracks parser coverage telemetry for unsupported or malformed lines, including: + +- `total_lines` +- `parsed_lines` +- `unparsed_lines` +- `parse_success_rate` +- `top_unknown_patterns` + +LogLens does not currently detect: + +- Lateral movement +- MFA abuse +- SSH key misuse +- PAM-specific failures beyond the parsed sample patterns +- Cross-file or cross-host correlation + +## Build + +```bash +cmake -S . -B build +cmake --build build +ctest --test-dir build --output-on-failure +``` + +## Run + +```bash +./build/loglens --mode syslog --year 2026 ./assets/sample_auth.log ./out +./build/loglens --mode journalctl-short-full ./assets/sample_journalctl_short_full.log ./out-journal +./build/loglens --config ./assets/sample_config.json ./assets/sample_auth.log ./out-config +``` + +The CLI writes: + +- `report.md` +- `report.json` + +into the output directory you provide. If you omit the output directory, the files are written into the current working directory. + +## Sample Output + +For sanitized sample input, see [`assets/sample_auth.log`](./assets/sample_auth.log) and [`assets/sample_journalctl_short_full.log`](./assets/sample_journalctl_short_full.log). + +`report.md` summary excerpt: + +```markdown +## Summary +- Input mode: syslog_legacy +- Parsed events: 14 +- Findings: 3 +- Parser warnings: 2 +``` + +`report.json` summary excerpt: + +```json +{ + "input_mode": "syslog_legacy", + "parsed_event_count": 14, + "finding_count": 3, + "warning_count": 2 +} +``` + +The config file schema is intentionally small and strict: + +```json +{ + "input_mode": "syslog_legacy", + "timestamp": { + "assume_year": 2026 + }, + "brute_force": { "threshold": 5, "window_minutes": 10 }, + "multi_user_probing": { "threshold": 3, "window_minutes": 15 }, + "sudo_burst": { "threshold": 3, "window_minutes": 5 }, + "auth_signal_mappings": { + "ssh_failed_password": { + "counts_as_attempt_evidence": true, + "counts_as_terminal_auth_failure": true + }, + "ssh_invalid_user": { + "counts_as_attempt_evidence": true, + "counts_as_terminal_auth_failure": true + }, + "ssh_failed_publickey": { + "counts_as_attempt_evidence": true, + "counts_as_terminal_auth_failure": true + }, + "pam_auth_failure": { + "counts_as_attempt_evidence": true, + "counts_as_terminal_auth_failure": false + } + } +} +``` + +This mapping lets LogLens normalize parsed events into detection signals before applying brute-force or multi-user rules. By default, `pam_auth_failure` is treated as lower-confidence attempt evidence and does not count as a terminal authentication failure unless the config explicitly upgrades it. + +Timestamp handling is now explicit: + +- `--mode syslog` or `input_mode: syslog_legacy` requires `--year` or `timestamp.assume_year` +- `--mode journalctl-short-full` or `input_mode: journalctl_short_full` parses the embedded year and timezone and ignores `assume_year` + +## Example Input + +```text +Mar 10 08:11:22 example-host sshd[1234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 +Mar 10 08:12:10 example-host sshd[1235]: Accepted password for alice from 203.0.113.20 port 51111 ssh2 +Mar 10 08:15:00 example-host sudo: alice : TTY=pts/0 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/systemctl restart ssh +Mar 10 08:27:10 example-host sshd[1243]: Failed publickey for invalid user svc-backup from 203.0.113.40 port 51240 ssh2 +Mar 10 08:28:33 example-host pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.41 user=alice +Mar 10 08:29:50 example-host pam_unix(sudo:session): session opened for user root by alice(uid=0) +Mar 10 08:30:12 example-host sshd[1244]: Connection closed by authenticating user alice 203.0.113.50 port 51290 [preauth] +Mar 10 08:31:18 example-host sshd[1245]: Timeout, client not responding from 203.0.113.51 port 51291 +``` + +`journalctl --output short-full` style example: + +```text +Tue 2026-03-10 08:11:22 UTC example-host sshd[2234]: Failed password for invalid user admin from 203.0.113.10 port 51022 ssh2 +Tue 2026-03-10 08:13:10 UTC example-host sshd[2236]: Failed password for test from 203.0.113.10 port 51040 ssh +Tue 2026-03-10 08:18:05 UTC example-host sshd[2238]: Failed publickey for invalid user deploy from 203.0.113.10 port 51060 ssh2 +Tue 2026-03-10 08:31:18 UTC example-host sshd[2245]: Connection closed by authenticating user alice 203.0.113.51 port 51291 [preauth] +``` + +## Known Limitations + +- `syslog_legacy` requires an explicit year; LogLens does not guess one implicitly. +- `journalctl_short_full` currently supports `UTC`, `GMT`, `Z`, and numeric timezone offsets, not arbitrary timezone abbreviations. +- Parser coverage is intentionally narrow and focused on common `sshd`, `sudo`, and `pam_unix` variants. +- Unsupported lines are surfaced as parser telemetry and warnings, not as detector findings. +- `pam_unix` auth failures remain lower-confidence by default unless signal mappings explicitly upgrade them. +- Detector configuration uses a fixed `config.json` schema rather than partial overrides or alternate config formats. +- Findings are rule-based triage aids, not incident verdicts or attribution. + +## Future Roadmap + +- Additional auth patterns and PAM coverage +- Better host-level summaries +- Optional CSV export +- Larger sanitized test corpus diff --git a/docs/release-process.md b/docs/release-process.md new file mode 100644 index 0000000..e3b6348 --- /dev/null +++ b/docs/release-process.md @@ -0,0 +1,18 @@ +# Release Process + +## Changelog Discipline + +- Add user-visible changes to `CHANGELOG.md` as they land. +- Keep entries under `Unreleased` until a version is cut. +- Use the stable categories `Added`, `Changed`, `Fixed`, and `Docs`. +- Move `Unreleased` entries into a versioned section during release prep. + +## Where Information Belongs + +- `README.md`: what the tool is, how to build and run it, sample output, and current limitations. +- `CHANGELOG.md`: concise version-by-version history of user-visible changes. +- GitHub release notes: a short release announcement built from the changelog, with highlights and upgrade context. + +## Practical Rule + +If a change affects external readers or users, it should usually touch either `README.md`, `CHANGELOG.md`, or both.