Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,13 @@ EXPECTED_VERSION=X.Y.Z bash tools/ci/verify_nuget_release.sh
## 5. Compatibility / TFMs
- Library-Zielplattformen: `netstandard2.0`, `net8.0` und `net10.0`
- Release-Versioning: Git-Tag `vX.Y.Z` (optional `-prerelease`) ist SSOT
- Aktueller Pre-Release-Kanal der `5.2.0`-Linie: `v5.2.0-rc.6`

## 6. Architekturüberblick
### 6.1 Kernklassen (Datenfluss)
| Kernklasse | Primäre Inputs | Primäre Outputs | Kernlogik |
|---|---|---|---|
| `FileTypeDetector` | `path`, `byte[]`, `verifyExtension` | `FileType`, `DetectionDetail`, `bool`, `IReadOnlyList<ZipExtractedEntry>` | Header/Magic (`FileTypeRegistry`) plus Archiv-Gate (`ArchiveTypeResolver` + `ArchiveSafetyGate`) und optionales OOXML-Refinement (`OpenXmlRefiner`). |
| `FileTypeDetector` | `path`, `byte[]`, `verifyExtension` | `FileType`, `DetectionDetail`, `bool`, `IReadOnlyList<ZipExtractedEntry>` | Header/Magic (`FileTypeRegistry`) plus Archiv-Gate (`ArchiveTypeResolver` + `ArchiveSafetyGate`) und optionales Container-Refinement (`OpenXmlRefiner` fuer OOXML/OpenDocument, `LegacyOfficeBinaryRefiner` fuer OLE2-Office). |
| `ArchiveProcessing` | `path`, `byte[]` | `bool`, `IReadOnlyList<ZipExtractedEntry>` | Fassade: path-basierte Validierung/Extraktion delegiert an `FileTypeDetector` (`TryValidateArchive` / `ExtractArchiveSafeToMemory`); byte-basierte Pfade nutzen `ArchivePayloadGuard` und `ArchiveEntryCollector`. |
| `FileMaterializer` | `byte[]`, `destinationPath`, `overwrite`, `secureExtract` | `bool` | Nur Byte-basierte Persistenz: raw write oder (bei `secureExtract=true` und archivfähigem Payload) sichere Extraktion via `ArchiveExtractor`. |
| `EvidenceHashing` | `path`, `byte[]`, `IReadOnlyList<ZipExtractedEntry>`, optionale Hash-Optionen | `HashEvidence`, `HashRoundTripReport` | Erkennung + Archivsammlung (`ArchiveEntryCollector`) und deterministische Manifest-/Payload-Hashes, inkl. RoundTrip über `FileMaterializer`. |
Expand Down
97 changes: 53 additions & 44 deletions docs/audit/compat/003_NETSTANDARD2_COMPAT_EVIDENCE.MD
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Evidence Report: netstandard2 Compat

## 1. Zweck
Dieser Report dokumentiert die technische Umsetzung und Verifikation fuer die net48-Kompatibilitaet ueber `netstandard2.0`.
Dieser Report dokumentiert die technische Umsetzung und Verifikation fuer die net48-Kompatibilitaet ueber `netstandard2.0` inklusive fail-closed Office-/Archiv-Refinement.

## 2. Geltungsbereich
- Library: `src/FileTypeDetection/FileTypeDetectionLib.vbproj`
- Hashing-Core-Fassade: `src/FileTypeDetection/EvidenceHashing.vb`
- Provider-Abstraktionen und TFM-Provider unter `src/FileTypeDetection/Abstractions/Providers`, `src/FileTypeDetection/Composition`, `src/FileTypeDetection/Providers`
- Detektion/Refinement: `src/FileTypeDetection/FileTypeDetector.vb`, `src/FileTypeDetection/Detection/FileTypeRegistry.vb`, `src/FileTypeDetection/Infrastructure/CoreInternals.vb`
- Tests: `tests/FileTypeDetectionLib.Tests/Unit/*`

## 3. Regeln/Architektur
### 3.1 Before/After TargetFrameworks
Expand All @@ -33,84 +33,93 @@ Dieser Report dokumentiert die technische Umsetzung und Verifikation fuer die ne
- SHA256: `SHA256.HashData`
- Hex: `Convert.ToHexString(...).ToLowerInvariant()`
- FastHash: `XxHash3.HashToUInt64(...).ToString("x16")`
- FastHash ist auf `netstandard2.0` **nicht** deaktiviert.
- FastHash ist auf `netstandard2.0` nicht deaktiviert.

### 3.4 Provider-Selektion (compile-time)
MSBuild-Conditionen in `src/FileTypeDetection/FileTypeDetectionLib.vbproj`:
- global: `<Compile Remove="Providers/**/*.vb"/>`
- `netstandard2.0`: `<Compile Include="Providers/NetStandard2_0/**/*.vb"/>`
- `net8.0|net10.0`: `<Compile Include="Providers/Net8_0Plus/**/*.vb"/>`

### 3.5 Office-/Archiv-Semantik (fail-closed)
- Office/OpenDocument-Endungen werden alias-basiert auf gruppierte Typen aufgeloest (`Docx`, `Xlsx`, `Pptx`).
- Legacy-OLE2 (`.doc/.xls/.ppt`) wird ueber `LegacyOfficeBinaryRefiner` markerbasiert fail-closed verfeinert.
- `TryValidateArchive` und Extraktion akzeptieren nur echte extrahierbare Archiv-Container (`Zip`); Office-Container werden nicht als extrahierbares Archiv behandelt.
- Endungspruefung bleibt nachgelagerte Policy und wird nur bei explizitem Verify-Flag als Fehlerpfad erzwungen.

## 4. Verifikation/Nachweise
### 4.1 Befehle und Exit-Codes
1. `dotnet --info` -> `0`
1. `dotnet --info` -> `0` (`artifacts/ci/netstandard2-compat/dotnet-info.txt`)
2. `dotnet restore FileClassifier.sln -v minimal` -> `0`
3. `dotnet build FileClassifier.sln -c Release --no-restore -warnaserror -v minimal` -> `0`
4. `dotnet test tests/FileTypeDetectionLib.Tests/FileTypeDetectionLib.Tests.csproj -c Release --no-build -v minimal` -> `0` (`414` Tests gruen)
5. `dotnet pack src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release --no-build -o artifacts/ci/netstandard2-compat/nuget -v minimal` -> `0`
6. `dotnet build src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release -f netstandard2.0 -v diag > artifacts/ci/netstandard2-compat/build-netstandard2.0.log` -> `0`
7. `dotnet build src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release -f net8.0 -v diag > artifacts/ci/netstandard2-compat/build-net8.0.log` -> `0`
8. `dotnet build src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release -f net10.0 -v diag > artifacts/ci/netstandard2-compat/build-net10.0.log` -> `0`
9. `python3 tools/check-doc-consistency.py` -> `0`
10. `python3 tools/check-docs.py` -> `0`
11. `bash tools/versioning/verify-version-convergence.sh` -> `0`
12. `bash tools/ci/bin/run.sh security-nuget` -> `0`
13. `EXPECTED_RELEASE_TAG=v5.2.0-rc.3 REQUIRE_RELEASE_TAG=1 bash tools/ci/check-versioning-svt.sh --repo-root . --out artifacts/ci/versioning-svt/versioning-svt-summary.json` -> `0`
14. `bash tools/ci/release/gate2_version_policy.sh release v5.2.0-rc.3 artifacts/nuget/Tomtastisch.FileClassifier.5.2.0-rc.3.nupkg` -> `0`
15. `VERIFY_ONLINE=0 bash tools/ci/release/gate4_verify_postpublish.sh 5.2.0-rc.3 artifacts/nuget/Tomtastisch.FileClassifier.5.2.0-rc.3.nupkg` -> `0`
16. `VERIFY_ONLINE=0 bash tools/ci/release/gate4_verify_postpublish.sh 5.2.0 artifacts/ci/netstandard2-compat/nuget/Tomtastisch.FileClassifier.5.2.0.nupkg` -> `0`
3. `dotnet restore --locked-mode FileClassifier.sln -v minimal` -> `0`
4. `dotnet build FileClassifier.sln -c Release --no-restore -warnaserror -v minimal` -> `0`
5. `dotnet test tests/FileTypeDetectionLib.Tests/FileTypeDetectionLib.Tests.csproj -c Release --no-build -v minimal` -> `0` (`544` Tests gruen)
6. `dotnet pack src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release --no-build -o artifacts/ci/netstandard2-compat/nuget -v minimal` -> `0`
7. `dotnet build src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release -f netstandard2.0 -v diag > artifacts/ci/netstandard2-compat/build-netstandard2.0.log` -> `0`
8. `dotnet build src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release -f net8.0 -v diag > artifacts/ci/netstandard2-compat/build-net8.0.log` -> `0`
9. `dotnet build src/FileTypeDetection/FileTypeDetectionLib.vbproj -c Release -f net10.0 -v diag > artifacts/ci/netstandard2-compat/build-net10.0.log` -> `0`
10. `python3 tools/check-doc-consistency.py` -> `0`
11. `python3 tools/check-docs.py` -> `0`
12. `EXPECTED_RELEASE_TAG=v5.2.0-rc.6 REQUIRE_RELEASE_TAG=1 bash tools/ci/bin/run.sh versioning-svt` -> `0`
13. `bash tools/ci/bin/run.sh version-convergence` -> `0`
14. `bash tools/ci/bin/run.sh security-nuget` -> `0`

### 4.2 Build-/Pack-Proof
- Build-Matrix erfolgreich:
- `src/FileTypeDetection/bin/Release/netstandard2.0/Tomtastisch.FileClassifier.dll`
- `src/FileTypeDetection/bin/Release/net8.0/Tomtastisch.FileClassifier.dll`
- `src/FileTypeDetection/bin/Release/net10.0/Tomtastisch.FileClassifier.dll`
- NUPKG-Inhalt (`unzip -l ... | rg "lib/"`):
- NUPKG-Inhalt (`artifacts/ci/netstandard2-compat/nuget/Tomtastisch.FileClassifier.5.2.0.nupkg`):
- `lib/netstandard2.0/Tomtastisch.FileClassifier.dll`
- `lib/net8.0/Tomtastisch.FileClassifier.dll`
- `lib/net10.0/Tomtastisch.FileClassifier.dll`

### 4.3 Provider-Compile-Proof
- Build-Logs enthalten die erwarteten Providerpfade je TFM:
- `artifacts/ci/netstandard2-compat/build-netstandard2.0.log` mit `Providers/NetStandard2_0/HashPrimitivesProvider.vb`
- `artifacts/ci/netstandard2-compat/build-net8.0.log` mit `Providers/Net8_0Plus/HashPrimitivesProvider.vb`
- `artifacts/ci/netstandard2-compat/build-net10.0.log` mit `Providers/Net8_0Plus/HashPrimitivesProvider.vb`
- Runtime-nahe Marker-Probe aus den drei Build-Artefakten:
- Build-Task-Proof je TFM: `artifacts/ci/netstandard2-compat/provider-compile-proof-short.txt`
- `netstandard2.0` -> `Providers/NetStandard2_0/HashPrimitivesProvider.vb`
- `net8.0` -> `Providers/Net8_0Plus/HashPrimitivesProvider.vb`
- `net10.0` -> `Providers/Net8_0Plus/HashPrimitivesProvider.vb`
- Runtime-Marker-Proof: `artifacts/ci/netstandard2-compat/provider-marker-proof.txt`
- `netstandard2.0:NetStandard2_0`
- `net8.0:Net8_0Plus`
- `net10.0:Net8_0Plus`
- Probe-Kommando:
```bash
tmpdir=$(mktemp -d)
cd "$tmpdir"
dotnet new console -n Probe -f net10.0
# Program.cs laedt jede TFM-DLL in eigenem AssemblyLoadContext und liest ProviderMarker via Reflection.
dotnet run -c Release --no-restore
```

### 4.4 Forbidden-API Grep-Proof (Core)
Befehl:
```bash
rg -n "Convert\.ToHexString|SHA256\.HashData|System\.IO\.Hashing|Microsoft\.AspNetCore\.App" src/FileTypeDetection/Core
```
Ergebnis:
- keine Treffer (`forbidden_core_refs=none`)
- keine Treffer (`artifacts/ci/netstandard2-compat/core-forbidden-apis.txt` hat `0` Zeilen)

### 4.5 Test-/Semantik-Proof (Office/OpenOffice/Archive)
- Neue/erweiterte Tests decken u. a. ab:
- falsche Endung vs. Inhaltsdetektion (`verifyExtension=false/true`)
- Legacy-OLE Office (`doc/xls/ppt`)
- OpenDocument (`odt/ods/odp`)
- echte Archive vs. Office-Container
- korrupte Payloads und Konfliktmarker (fail-closed)
- Relevante Testdateien:
- `tests/FileTypeDetectionLib.Tests/Unit/EndToEndFailClosedMatrixUnitTests.cs`
- `tests/FileTypeDetectionLib.Tests/Unit/LegacyOfficeBinaryRefinerUnitTests.cs`
- `tests/FileTypeDetectionLib.Tests/Unit/OpenXmlRefinerUnitTests.cs`
- `tests/FileTypeDetectionLib.Tests/Unit/ExtensionCheckUnitTests.cs`
- `tests/FileTypeDetectionLib.Tests/Unit/ArchiveExtractionUnitTests.cs`

### 4.5 CI-Teilchecks
- `artifacts/ci/versioning-svt/versioning-svt-summary.json` -> `status: pass` (pre-release `v5.2.0-rc.3`, core-match `5.2.0`)
### 4.6 Version-/Release-Konvergenz
- `artifacts/versioning_report.json` -> `status: pass`, `expected_version: 5.2.0-rc.6`
- `artifacts/ci/versioning-svt/versioning-svt-summary.json` -> `status: pass`
- `artifacts/ci/version-convergence/summary.json` -> `status: pass`, `repo_version=5.2.0`, `vbproj_version=5.2.0`, `docs_latest_version=5.2.0`
- `artifacts/ci/security-nuget/result.json` -> `status: pass`
- Gate-4-PreRelease-Probe (`VERIFY_ONLINE=0`) zeigt `require_registration=0`.
- Gate-4-Stable-Probe (`VERIFY_ONLINE=0`) zeigt `require_registration=1`.
- RC-PreRelease-NUPKG fuer SVT-Probe: `artifacts/nuget/Tomtastisch.FileClassifier.5.2.0-rc.6.nupkg`

### 4.6 Policy/Konvergenz-Notiz
### 4.7 Policy/Konvergenz-Notiz
Ambiguitaet zwischen:
- `docs/versioning/001_POLICY_VERSIONING.MD:43` (in PR/CI keine statischen Versionfelder), und
- existierendem SVT/Convergence-Setup (`verify-version-convergence.sh`, `check-versioning-svt.sh`), das `RepoVersion` und `Version`/`PackageVersion` in `FileTypeDetectionLib.vbproj` erwartet.
- `docs/versioning/001_POLICY_VERSIONING.MD` (Tag `vX.Y.Z[-prerelease]` als SSOT fuer Publish), und
- Repo-Konvergenzregeln (`RepoVersion`/`Version`/`PackageVersion` bleiben Kernversion `X.Y.Z`).

Entscheidung fuer diesen Scope:
- fail-closed nach bestehendem CI/Repo-Vertrag: Versionen auf `5.2.0` synchron gehalten und durch `versioning-svt` + `version-convergence` verifiziert.
- Pre-Releases werden ueber Tag `v5.2.0-rc.N` abgebildet; die Projektfelder bleiben semantisch auf Kernversion `5.2.0`.
- fail-closed nach bestehendem CI/Repo-Vertrag: Kernversionen bleiben auf `5.2.0` konvergent.
- Pre-Release wird ueber Tag/NUPKG-Version `v5.2.0-rc.6` abgebildet und via SVT geprueft.

## 5. Grenzen/Nicht-Ziele
- Keine oeffentliche API-Signatur geaendert.
Expand Down
1 change: 1 addition & 0 deletions docs/references/001_REFERENCES_CORE.MD
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Quelle: `FileTypeDetector.vb`.
| `ArchiveStructuredRefined` | Archiv wurde strukturiert (z. B. OOXML) verfeinert |
| `ArchiveRefined` | Archivtyp wurde inhaltlich verfeinert |
| `ArchiveGeneric` | Archiv blieb generisch |
| `OfficeBinaryRefined` | Legacy-OLE-Dokument wurde über Marker-Refinement als Office-Typ erkannt |

## 4. Interne Kernpfade (Leseführung)
| Interner Pfad | Datei | Bedeutung | Detail-README |
Expand Down
1 change: 1 addition & 0 deletions docs/references/101_REFERENCES_CORE.MD
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Source: `FileTypeDetector.vb`.
| `ArchiveStructuredRefined` | archive was refined structurally (e.g. OOXML) |
| `ArchiveRefined` | archive kind was refined by content |
| `ArchiveGeneric` | archive stayed generic |
| `OfficeBinaryRefined` | legacy OLE document was mapped to an Office kind by marker refinement |

## 4. Internal core paths (guided reading)
| Internal path | File | Meaning | Detail README |
Expand Down
2 changes: 1 addition & 1 deletion docs/versioning/002_HISTORY_VERSIONS.MD
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Heuristik fuer die Rueckwirkungs-Zuordnung:
- `docs|test|ci|chore|tooling|refactor|fix` => Patch

Aktueller Entwicklungsstand:
- Aktuelle Entwicklungslinie enthaelt `5.x` (aktueller Pre-Release-Stand: `v5.2.0-rc.2`, naechster stabiler Zielstand: `5.2.0`; Details in `docs/versioning/003_CHANGELOG_RELEASES.MD`).
- Aktuelle Entwicklungslinie enthaelt `5.x` (aktueller Pre-Release-Stand: `v5.2.0-rc.6`, naechster stabiler Zielstand: `5.2.0`; Details in `docs/versioning/003_CHANGELOG_RELEASES.MD`).

Hinweis:
- Die Spalte `Keyword` verwendet den technischen Klassifizierungswert aus der Historie.
Expand Down
5 changes: 5 additions & 0 deletions docs/versioning/003_CHANGELOG_RELEASES.MD
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,14 @@ der Git-Tag `vX.Y.Z` (optional `-prerelease`) als SSOT.
## [Unreleased]
- Added:
- Incode-Dokumentation fuer die TFM-Providermethoden komplettiert (`HashPrimitivesProvider` fuer `netstandard2.0` und `net8.0+`).
- Legacy-Office-Refinement (`LegacyOfficeBinaryRefiner`) fuer OLE2-Dokumente mit fail-closed Marker-Logik eingefuehrt.
- Erweiterte E2E-Matrix-Tests fuer falsche Endungen, korrupte Payloads und Office/OpenDocument-Varianten ergänzt.
- Changed:
- Public XML-Dokumentation auf Policy-045 ausgerichtet: unzulaessige `<exception>`-Tags in fail-closed APIs entfernt.
- Deutsche Log-/Dokumentationstexte mit korrekten Umlauten harmonisiert.
- Office-/OpenOffice-Aliasauflösung im `FileTypeRegistry` konsolidiert (`doc/docx/odt`, `xls/xlsx/ods`, `ppt/pptx/odp`).
- Archivextraktion nimmt nur noch echte, extrahierbare Archive (`Zip`) an; Office-Container werden nicht mehr als extrahierbares Archiv behandelt.
- `TryValidateArchive` prueft vor Safety-Gate explizit den erkannten Container-Typ.
- Gate 4 (`tools/ci/release/gate4_verify_postpublish.sh`) fuer Pre-Release-Tags robust gemacht:
- laengeres Retry-Fenster,
- `registration` standardmaessig entkoppelt bei `vX.Y.Z-<label>`,
Expand Down
2 changes: 1 addition & 1 deletion docs/versioning/102_HISTORY_VERSIONS.MD
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Heuristics for retroactive classification:
- `docs|test|ci|chore|tooling|refactor|fix` => patch

Current state:
- Current release line contains `5.x` (current pre-release state: `v5.2.0-rc.2`, next stable target: `5.2.0`; details in `docs/versioning/103_CHANGELOG_RELEASES.MD`).
- Current release line contains `5.x` (current pre-release state: `v5.2.0-rc.6`, next stable target: `5.2.0`; details in `docs/versioning/103_CHANGELOG_RELEASES.MD`).

Note:
- The \"short description\" column follows the original commit/PR intent text for deterministic traceability and is not normalized to a single language.
Expand Down
5 changes: 5 additions & 0 deletions docs/versioning/103_CHANGELOG_RELEASES.MD
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,14 @@ All changes are documented here in technical terms. The release version itself i
## [Unreleased]
- Added:
- Completed in-code documentation for TFM provider methods (`HashPrimitivesProvider` for `netstandard2.0` and `net8.0+`).
- Added legacy Office refinement (`LegacyOfficeBinaryRefiner`) for OLE2 documents with fail-closed marker logic.
- Added extended E2E matrix tests for wrong extensions, corrupted payloads, and Office/OpenDocument variants.
- Changed:
- Aligned public XML docs with Policy 045 by removing invalid `<exception>` tags from fail-closed APIs.
- Harmonized German log/doc strings to use proper umlauts.
- Consolidated Office/OpenOffice alias resolution in `FileTypeRegistry` (`doc/docx/odt`, `xls/xlsx/ods`, `ppt/pptx/odp`).
- Archive extraction now accepts only real, extractable archives (`Zip`); Office containers are no longer treated as extractable archives.
- `TryValidateArchive` now verifies detected container kind before archive safety checks.
- Hardened Gate 4 (`tools/ci/release/gate4_verify_postpublish.sh`) for pre-release tags:
- longer retry window,
- `registration` decoupled by default for `vX.Y.Z-<label>`,
Expand Down
Loading
Loading