-
Notifications
You must be signed in to change notification settings - Fork 3
Add Storage Finder CSV importer and weekly sync #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
@kapuic Looks amazing! Thank you! |
|
Can the sync be weekly instead of hourly? We expect changes to be made once a month at best, so hourly sync is unnecessary. Thanks! |
|
Yes, sure! Updated to run weekly on Saturdays. By the way, it seems that the original JSON version (from the Drupal instance) had more data. I’m not familiar with which services and properties are available at NYU, or which ones should be displayed. I would recommend previewing the Google Sheet + converter version with the CLI script to see if more things should be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a TypeScript-based CSV importer for the Storage Finder feature, designed to convert Google Sheets data into JSON format for the application. The implementation includes configurable facets with regex-based matchers, HTML sanitization utilities, and an automated GitHub Actions workflow to keep data synchronized.
Key Changes:
- New script system under
scripts/storage-finder-data-generator/with CLI support for CSV import, configurable output, and data transformation - GitHub Actions workflow for automated data synchronization (scheduled weekly)
- Addition of
csv-parsepackage for CSV processing
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/storage-finder-data-generator/types.ts |
TypeScript type definitions for CSV rows, service records, facets, and configuration structures |
scripts/storage-finder-data-generator/html.ts |
HTML sanitization utilities for escaping user content and generating safe HTML output |
scripts/storage-finder-data-generator/generate.ts |
Main CLI script with argument parsing, CSV loading, data transformation, and JSON generation |
scripts/storage-finder-data-generator/constants.ts |
Configuration constants including default Google Sheets URL, output paths, and environment variable names |
scripts/storage-finder-data-generator/config.ts |
Field definitions and facet configurations with regex matchers for categorizing storage services |
scripts/storage-finder-data-generator/MAINTENANCE.md |
Comprehensive maintainer documentation for using and extending the importer |
.github/workflows/storage-finder-sync.yml |
GitHub Actions workflow for automated data regeneration and commits |
package.json |
Added csv-parse dependency for CSV parsing functionality |
pnpm-lock.yaml |
Lockfile updates for the new csv-parse dependency |
eslint.config.mjs |
Added React version detection settings to ESLint configuration |
docusaurus.config.ts |
Formatting improvement (consolidated multi-line string) |
src/components/HomepageFeatures/index.tsx |
Formatting improvement (added parentheses around JSX in map callback) |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| on: | ||
| schedule: | ||
| - cron: "0 0 * * 6" |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title and description mention "hourly sync", but the cron schedule "0 0 * * 6" runs weekly on Saturdays at midnight UTC, not hourly.
If hourly sync is intended, the cron should be "0 * * * *". If weekly is correct, the PR description should be updated to reflect this.
| - cron: "0 0 * * 6" | |
| - cron: "0 * * * *" |
|
|
||
| - name: Lint and format | ||
| run: | | ||
| bun lint --fix |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bun lint --fix command may not work as intended. The lint script runs both lint:js and lint:css, but the --fix flag will be appended to the entire command, not passed to the individual linters. This means it would try to run pnpm lint:js && pnpm lint:css --fix, which only fixes CSS issues.
Consider running the commands separately:
- name: Lint and format
run: |
bun lint:js --fix
bun lint:css --fix
bun formatOr define a lint:fix script in package.json that properly handles both linters.
| bun lint --fix | |
| bun lint:js --fix | |
| bun lint:css --fix |
| case "--csv": { | ||
| options.csvPath = argv[index + 1]; | ||
| index += 1; | ||
| break; | ||
| } | ||
| case "--output": { |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing validation for arguments that require values. When --csv or --output is used without = syntax, the code doesn't check if the next argument exists. This could result in undefined being assigned or an error when accessing argv[index + 1].
Add validation before accessing the next argument:
case "--csv": {
if (index + 1 >= argv.length) {
throw new Error("--csv requires a value");
}
options.csvPath = argv[index + 1];
index += 1;
break;
}
case "--output": {
if (index + 1 >= argv.length) {
throw new Error("--output requires a value");
}
options.outputDir = argv[index + 1];
index += 1;
break;
}| case "--csv": { | |
| options.csvPath = argv[index + 1]; | |
| index += 1; | |
| break; | |
| } | |
| case "--output": { | |
| case "--csv": { | |
| if (index + 1 >= argv.length) { | |
| throw new Error("--csv requires a value"); | |
| } | |
| options.csvPath = argv[index + 1]; | |
| index += 1; | |
| break; | |
| } | |
| case "--output": { | |
| if (index + 1 >= argv.length) { | |
| throw new Error("--output requires a value"); | |
| } |
| } | ||
| const safeUrl = escapeAttribute(normalizedUrl); | ||
| const safeLabel = | ||
| normalizedLabel.length === 0 ? normalizedUrl : escapeHtml(normalizedLabel); |
Copilot
AI
Dec 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback label uses the unescaped URL in HTML content. When normalizedLabel is empty, normalizedUrl is used directly inside the anchor tag without HTML escaping, which could allow XSS if the URL contains HTML special characters. The URL should be escaped with escapeHtml() when used as label text.
Suggested fix:
const safeLabel =
normalizedLabel.length === 0 ? escapeHtml(normalizedUrl) : escapeHtml(normalizedLabel);| normalizedLabel.length === 0 ? normalizedUrl : escapeHtml(normalizedLabel); | |
| normalizedLabel.length === 0 ? escapeHtml(normalizedUrl) : escapeHtml(normalizedLabel); |
scripts/storage-finder-data-generator) with configurable facets/choices/regex matchers to convert CSV data into JSON data.STORAGE_FINDER_SHEET_URLor read from a local csv file.pnpm dlx tsxor Bun) and updating facets/columns.Testing
pnpm dlx tsx scripts/storage-finder-data-generator/generate.ts --output src/data/storage-finderbun scripts/storage-finder-data-generator/generate.ts --output src/data/storage-finderCloses #200.