Skip to content

feat: self-service data upload for PDP, AR, course files (#86)#94

Merged
William-Hill merged 16 commits intomainfrom
feature/86-self-service-data-upload
Apr 1, 2026
Merged

feat: self-service data upload for PDP, AR, course files (#86)#94
William-Hill merged 16 commits intomainfrom
feature/86-self-service-data-upload

Conversation

@William-Hill
Copy link
Copy Markdown
Collaborator

Summary

  • 3-step upload wizard at /admin/upload — drop a file, preview with auto-detected schema and column mapping, confirm and batch-upsert into Postgres
  • Auto-detection of 5 file types (PDP Cohort AR, PDP Cohort Submission, Course AR, Course Submission, ML Predictions) from CSV/XLSX headers using scoring algorithm with required-column gating
  • Hybrid column mapping — known schemas auto-match; unknown columns get dropdown remapper
  • Batch upsert — multi-row INSERT (500 rows/batch) with ON CONFLICT for idempotent re-uploads; per-row fallback on batch failure for error isolation
  • Upload history at /admin/upload/history — stat cards + paginated table with status badges
  • Role-gated to admin/ir via existing RBAC middleware
  • 23 unit tests (Vitest) covering schema detection, header normalization, column mapping, and file parsing
  • Test data generator (operations/generate_test_data.py) producing 7 synthetic test files (PDP AR, Submission, Course, ML predictions, edge cases, 62MB oversized)

New dependencies

  • papaparse + @types/papaparse — CSV parsing
  • xlsx — Excel parsing
  • vitest (dev) — unit testing

Pre-merge checklist

  • Run upload_history SQL migration against Supabase (see spec)
  • Smoke test with dev server + test CSV files
  • Verify role gating (non-admin/ir users get 403)

Test plan

  • Unit tests pass: cd codebenders-dashboard && npx vitest run (23 tests)
  • Build passes: npm run build
  • Drop data/test_uploads/test_pdp_cohort_ar.csv on /admin/upload — verify auto-detection, preview, column mapping
  • Drop data/test_uploads/test_oversized.csv — verify 50MB rejection
  • Drop data/test_uploads/test_bad_headers.csv — verify low-confidence amber banner
  • Verify /admin/upload/history shows upload log after commit
  • Verify non-admin roles cannot access /admin/upload

🤖 Generated with Claude Code

12 tasks covering schema registry, file parsing, API routes, UI wizard,
upload history, nav update, and test data generation. TDD approach with
Vitest for pure-logic modules.
Build a module-level schemaMap for safe O(1) lookups (replaces non-null
assertions). In detectSchema, gate the >=0.6 confidence band behind an
all-required-columns check — a small-subset file can no longer be
auto-accepted against a large schema even when recall is 1.0. Update the
course_submission test fixture to include its missing required columns,
and add a regression test that a 3-column subset stays below 0.6.
Adds operations/generate_test_data.py which creates synthetic Bishop State
CSV fixtures in data/test_uploads/ for upload pipeline testing: PDP cohort AR
(500 rows, 90 cols), cohort submission (500 rows, 35 cols), course AR (5000
rows, 39 cols), ML predictions (2500 rows), bad headers, mixed casing, and a
~62MB oversized file. Also adds data/test_uploads/ to .gitignore.
…e, derive state

- Export CONFIDENT_THRESHOLD/TENTATIVE_THRESHOLD constants from upload-schemas.ts and replace all hardcoded 0.6/0.3/0.59 values with them
- Remove selectedSchemaLabel state; derive it from SCHEMAS.find() alongside selectedSchemaObj so ColumnMapper receives the real schema object instead of null
- Wire the "Wrong? Change type" button via showSchemaOverride boolean state; renders schema picker inline below green banner when toggled; resets on resetWizard
- Pre-compute matchedCount/unmappedCount once and replace all inline columns.filter() calls in JSX
- Refactor upsertRows to build SQL template once outside the loop and execute one multi-row INSERT per 500-row batch (N+1 → N/BATCH_SIZE); per-row fallback fires only on batch failure to identify bad rows
- Remove stale "// Log to upload_history" comment
…ts, XLSX key trim

- Guard parseInt against NaN for page/pageSize query params
- Add statusCounts aggregate query to history API and return in response
- Replace page-level statusCounts reduce with API-sourced state
- Add res.ok check in fetchHistory before parsing JSON
- Remove .xls from drop-zone accept attribute
- Switch column-mapper to index-based identification in handleRemap
- Catch JSON.parse error in commit route and return 400
- Trim XLSX row keys to match trimmed headers
- Use rowCount ?? 0 for accurate insert count (avoid inflating on conflict)
@William-Hill William-Hill merged commit 996f3d7 into main Apr 1, 2026
3 checks passed
@William-Hill William-Hill deleted the feature/86-self-service-data-upload branch April 1, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant