feat: self-service data upload for PDP, AR, course files (#86) by William-Hill · Pull Request #94 · devcolor/codebenders-datathon

William-Hill · 2026-04-01T13:36:03Z

Summary

3-step upload wizard at /admin/upload — drop a file, preview with auto-detected schema and column mapping, confirm and batch-upsert into Postgres
Auto-detection of 5 file types (PDP Cohort AR, PDP Cohort Submission, Course AR, Course Submission, ML Predictions) from CSV/XLSX headers using scoring algorithm with required-column gating
Hybrid column mapping — known schemas auto-match; unknown columns get dropdown remapper
Batch upsert — multi-row INSERT (500 rows/batch) with ON CONFLICT for idempotent re-uploads; per-row fallback on batch failure for error isolation
Upload history at /admin/upload/history — stat cards + paginated table with status badges
Role-gated to admin/ir via existing RBAC middleware
23 unit tests (Vitest) covering schema detection, header normalization, column mapping, and file parsing
Test data generator (operations/generate_test_data.py) producing 7 synthetic test files (PDP AR, Submission, Course, ML predictions, edge cases, 62MB oversized)

New dependencies

papaparse + @types/papaparse — CSV parsing
xlsx — Excel parsing
vitest (dev) — unit testing

Pre-merge checklist

Run upload_history SQL migration against Supabase (see spec)
Smoke test with dev server + test CSV files
Verify role gating (non-admin/ir users get 403)

Test plan

Unit tests pass: cd codebenders-dashboard && npx vitest run (23 tests)
Build passes: npm run build
Drop data/test_uploads/test_pdp_cohort_ar.csv on /admin/upload — verify auto-detection, preview, column mapping
Drop data/test_uploads/test_oversized.csv — verify 50MB rejection
Drop data/test_uploads/test_bad_headers.csv — verify low-confidence amber banner
Verify /admin/upload/history shows upload log after commit
Verify non-admin roles cannot access /admin/upload

🤖 Generated with Claude Code

12 tasks covering schema registry, file parsing, API routes, UI wizard, upload history, nav update, and test data generation. TDD approach with Vitest for pure-logic modules.

Build a module-level schemaMap for safe O(1) lookups (replaces non-null assertions). In detectSchema, gate the >=0.6 confidence band behind an all-required-columns check — a small-subset file can no longer be auto-accepted against a large schema even when recall is 1.0. Update the course_submission test fixture to include its missing required columns, and add a regression test that a 3-column subset stays below 0.6.

…components

Adds operations/generate_test_data.py which creates synthetic Bishop State CSV fixtures in data/test_uploads/ for upload pipeline testing: PDP cohort AR (500 rows, 90 cols), cohort submission (500 rows, 35 cols), course AR (5000 rows, 39 cols), ML predictions (2500 rows), bad headers, mixed casing, and a ~62MB oversized file. Also adds data/test_uploads/ to .gitignore.

…e, derive state - Export CONFIDENT_THRESHOLD/TENTATIVE_THRESHOLD constants from upload-schemas.ts and replace all hardcoded 0.6/0.3/0.59 values with them - Remove selectedSchemaLabel state; derive it from SCHEMAS.find() alongside selectedSchemaObj so ColumnMapper receives the real schema object instead of null - Wire the "Wrong? Change type" button via showSchemaOverride boolean state; renders schema picker inline below green banner when toggled; resets on resetWizard - Pre-compute matchedCount/unmappedCount once and replace all inline columns.filter() calls in JSX - Refactor upsertRows to build SQL template once outside the loop and execute one multi-row INSERT per 500-row batch (N+1 → N/BATCH_SIZE); per-row fallback fires only on batch failure to identify bad rows - Remove stale "// Log to upload_history" comment

…ts, XLSX key trim - Guard parseInt against NaN for page/pageSize query params - Add statusCounts aggregate query to history API and return in response - Replace page-level statusCounts reduce with API-sourced state - Add res.ok check in fetchHistory before parsing JSON - Remove .xls from drop-zone accept attribute - Switch column-mapper to index-based identification in handleRemap - Catch JSON.parse error in commit route and return 400 - Trim XLSX row keys to match trimmed headers - Use rowCount ?? 0 for accurate insert count (avoid inflating on conflict)

William-Hill added 16 commits March 31, 2026 18:48

docs: add implementation plan for self-service data upload (#86)

7ee7a14

12 tasks covering schema registry, file parsing, API routes, UI wizard, upload history, nav update, and test data generation. TDD approach with Vitest for pure-logic modules.

chore: add papaparse, xlsx, vitest dependencies for upload feature

0e1d8d1

feat(upload): schema registry with auto-detection and column mapping

056189f

feat(upload): CSV and XLSX file parser with size validation

f850677

feat(upload): add admin/ir role gate and admin layout

69ca8a2

feat(upload): preview API route with schema detection

ce149bd

feat(upload): commit and history API routes with batch upsert

281a07b

feat(upload): drop-zone, column-mapper, data-preview, upload-summary …

f032b03

…components

feat(upload): 3-step upload wizard page with auto-detection

eab917f

feat(upload): upload history page with stats and pagination

97d91e5

feat(upload): add Admin nav link for admin/ir roles

7cf5872

fix: remove unused normalizeHeader import in commit route

21f4831

William-Hill merged commit 996f3d7 into main Apr 1, 2026
3 checks passed

William-Hill deleted the feature/86-self-service-data-upload branch April 1, 2026 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: self-service data upload for PDP, AR, course files (#86)#94

feat: self-service data upload for PDP, AR, course files (#86)#94
William-Hill merged 16 commits intomainfrom
feature/86-self-service-data-upload

William-Hill commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

William-Hill commented Apr 1, 2026

Summary

New dependencies

Pre-merge checklist

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant