feat: self-service data upload for PDP, AR, course files (#86)#94
Merged
William-Hill merged 16 commits intomainfrom Apr 1, 2026
Merged
feat: self-service data upload for PDP, AR, course files (#86)#94William-Hill merged 16 commits intomainfrom
William-Hill merged 16 commits intomainfrom
Conversation
12 tasks covering schema registry, file parsing, API routes, UI wizard, upload history, nav update, and test data generation. TDD approach with Vitest for pure-logic modules.
Build a module-level schemaMap for safe O(1) lookups (replaces non-null assertions). In detectSchema, gate the >=0.6 confidence band behind an all-required-columns check — a small-subset file can no longer be auto-accepted against a large schema even when recall is 1.0. Update the course_submission test fixture to include its missing required columns, and add a regression test that a 3-column subset stays below 0.6.
Adds operations/generate_test_data.py which creates synthetic Bishop State CSV fixtures in data/test_uploads/ for upload pipeline testing: PDP cohort AR (500 rows, 90 cols), cohort submission (500 rows, 35 cols), course AR (5000 rows, 39 cols), ML predictions (2500 rows), bad headers, mixed casing, and a ~62MB oversized file. Also adds data/test_uploads/ to .gitignore.
…e, derive state - Export CONFIDENT_THRESHOLD/TENTATIVE_THRESHOLD constants from upload-schemas.ts and replace all hardcoded 0.6/0.3/0.59 values with them - Remove selectedSchemaLabel state; derive it from SCHEMAS.find() alongside selectedSchemaObj so ColumnMapper receives the real schema object instead of null - Wire the "Wrong? Change type" button via showSchemaOverride boolean state; renders schema picker inline below green banner when toggled; resets on resetWizard - Pre-compute matchedCount/unmappedCount once and replace all inline columns.filter() calls in JSX - Refactor upsertRows to build SQL template once outside the loop and execute one multi-row INSERT per 500-row batch (N+1 → N/BATCH_SIZE); per-row fallback fires only on batch failure to identify bad rows - Remove stale "// Log to upload_history" comment
…ts, XLSX key trim - Guard parseInt against NaN for page/pageSize query params - Add statusCounts aggregate query to history API and return in response - Replace page-level statusCounts reduce with API-sourced state - Add res.ok check in fetchHistory before parsing JSON - Remove .xls from drop-zone accept attribute - Switch column-mapper to index-based identification in handleRemap - Catch JSON.parse error in commit route and return 400 - Trim XLSX row keys to match trimmed headers - Use rowCount ?? 0 for accurate insert count (avoid inflating on conflict)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/admin/upload— drop a file, preview with auto-detected schema and column mapping, confirm and batch-upsert into Postgres/admin/upload/history— stat cards + paginated table with status badgesoperations/generate_test_data.py) producing 7 synthetic test files (PDP AR, Submission, Course, ML predictions, edge cases, 62MB oversized)New dependencies
papaparse+@types/papaparse— CSV parsingxlsx— Excel parsingvitest(dev) — unit testingPre-merge checklist
upload_historySQL migration against Supabase (see spec)Test plan
cd codebenders-dashboard && npx vitest run(23 tests)npm run builddata/test_uploads/test_pdp_cohort_ar.csvon/admin/upload— verify auto-detection, preview, column mappingdata/test_uploads/test_oversized.csv— verify 50MB rejectiondata/test_uploads/test_bad_headers.csv— verify low-confidence amber banner/admin/upload/historyshows upload log after commit/admin/upload🤖 Generated with Claude Code