Skip to content

Conversation

@kapuic
Copy link
Member

@kapuic kapuic commented Dec 3, 2025

  • Add a TypeScript-based CSV importer (scripts/storage-finder-data-generator) with configurable facets/choices/regex matchers to convert CSV data into JSON data.
  • Fetch from https://docs.google.com/spreadsheets/d/12vxBpVUpWTrPmZ-3e30IbyyoAG9qQ6ULpMq5O8okPtA/export?format=csv&gid=1073279644 by default. Can be overridden by STORAGE_FINDER_SHEET_URL or read from a local csv file.
  • Provide a maintainer document for running the importer (with pnpm dlx tsx or Bun) and updating facets/columns.
  • Introduce a GitHub Action that regenerates JSON from the Google Sheet, lint/format, and auto-commit on the current branch.

Testing

  • pnpm dlx tsx scripts/storage-finder-data-generator/generate.ts --output src/data/storage-finder
  • bun scripts/storage-finder-data-generator/generate.ts --output src/data/storage-finder

Closes #200.

@kapuic kapuic self-assigned this Dec 3, 2025
@kapuic kapuic added the enhancement New feature or request label Dec 3, 2025
@kapuic kapuic linked an issue Dec 3, 2025 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

PR Preview Action v1.6.3
Preview removed because the pull request was closed.
2025-12-19 20:37 UTC

@genericdata
Copy link
Contributor

genericdata commented Dec 5, 2025

@kapuic Looks amazing! Thank you!

@s-sajid-ali
Copy link
Member

Can the sync be weekly instead of hourly? We expect changes to be made once a month at best, so hourly sync is unnecessary. Thanks!

Copilot AI review requested due to automatic review settings December 6, 2025 10:00
@kapuic
Copy link
Member Author

kapuic commented Dec 6, 2025

Yes, sure! Updated to run weekly on Saturdays.

By the way, it seems that the original JSON version (from the Drupal instance) had more data. I’m not familiar with which services and properties are available at NYU, or which ones should be displayed. I would recommend previewing the Google Sheet + converter version with the CLI script to see if more things should be added.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a TypeScript-based CSV importer for the Storage Finder feature, designed to convert Google Sheets data into JSON format for the application. The implementation includes configurable facets with regex-based matchers, HTML sanitization utilities, and an automated GitHub Actions workflow to keep data synchronized.

Key Changes:

  • New script system under scripts/storage-finder-data-generator/ with CLI support for CSV import, configurable output, and data transformation
  • GitHub Actions workflow for automated data synchronization (scheduled weekly)
  • Addition of csv-parse package for CSV processing

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/storage-finder-data-generator/types.ts TypeScript type definitions for CSV rows, service records, facets, and configuration structures
scripts/storage-finder-data-generator/html.ts HTML sanitization utilities for escaping user content and generating safe HTML output
scripts/storage-finder-data-generator/generate.ts Main CLI script with argument parsing, CSV loading, data transformation, and JSON generation
scripts/storage-finder-data-generator/constants.ts Configuration constants including default Google Sheets URL, output paths, and environment variable names
scripts/storage-finder-data-generator/config.ts Field definitions and facet configurations with regex matchers for categorizing storage services
scripts/storage-finder-data-generator/MAINTENANCE.md Comprehensive maintainer documentation for using and extending the importer
.github/workflows/storage-finder-sync.yml GitHub Actions workflow for automated data regeneration and commits
package.json Added csv-parse dependency for CSV parsing functionality
pnpm-lock.yaml Lockfile updates for the new csv-parse dependency
eslint.config.mjs Added React version detection settings to ESLint configuration
docusaurus.config.ts Formatting improvement (consolidated multi-line string)
src/components/HomepageFeatures/index.tsx Formatting improvement (added parentheses around JSX in map callback)
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


on:
schedule:
- cron: "0 0 * * 6"
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title and description mention "hourly sync", but the cron schedule "0 0 * * 6" runs weekly on Saturdays at midnight UTC, not hourly.

If hourly sync is intended, the cron should be "0 * * * *". If weekly is correct, the PR description should be updated to reflect this.

Suggested change
- cron: "0 0 * * 6"
- cron: "0 * * * *"

Copilot uses AI. Check for mistakes.

- name: Lint and format
run: |
bun lint --fix
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bun lint --fix command may not work as intended. The lint script runs both lint:js and lint:css, but the --fix flag will be appended to the entire command, not passed to the individual linters. This means it would try to run pnpm lint:js && pnpm lint:css --fix, which only fixes CSS issues.

Consider running the commands separately:

- name: Lint and format
  run: |
    bun lint:js --fix
    bun lint:css --fix
    bun format

Or define a lint:fix script in package.json that properly handles both linters.

Suggested change
bun lint --fix
bun lint:js --fix
bun lint:css --fix

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +69
case "--csv": {
options.csvPath = argv[index + 1];
index += 1;
break;
}
case "--output": {
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for arguments that require values. When --csv or --output is used without = syntax, the code doesn't check if the next argument exists. This could result in undefined being assigned or an error when accessing argv[index + 1].

Add validation before accessing the next argument:

case "--csv": {
  if (index + 1 >= argv.length) {
    throw new Error("--csv requires a value");
  }
  options.csvPath = argv[index + 1];
  index += 1;
  break;
}
case "--output": {
  if (index + 1 >= argv.length) {
    throw new Error("--output requires a value");
  }
  options.outputDir = argv[index + 1];
  index += 1;
  break;
}
Suggested change
case "--csv": {
options.csvPath = argv[index + 1];
index += 1;
break;
}
case "--output": {
case "--csv": {
if (index + 1 >= argv.length) {
throw new Error("--csv requires a value");
}
options.csvPath = argv[index + 1];
index += 1;
break;
}
case "--output": {
if (index + 1 >= argv.length) {
throw new Error("--output requires a value");
}

Copilot uses AI. Check for mistakes.
}
const safeUrl = escapeAttribute(normalizedUrl);
const safeLabel =
normalizedLabel.length === 0 ? normalizedUrl : escapeHtml(normalizedLabel);
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback label uses the unescaped URL in HTML content. When normalizedLabel is empty, normalizedUrl is used directly inside the anchor tag without HTML escaping, which could allow XSS if the URL contains HTML special characters. The URL should be escaped with escapeHtml() when used as label text.

Suggested fix:

const safeLabel =
  normalizedLabel.length === 0 ? escapeHtml(normalizedUrl) : escapeHtml(normalizedLabel);
Suggested change
normalizedLabel.length === 0 ? normalizedUrl : escapeHtml(normalizedLabel);
normalizedLabel.length === 0 ? escapeHtml(normalizedUrl) : escapeHtml(normalizedLabel);

Copilot uses AI. Check for mistakes.
@kapuic kapuic changed the title Add Storage Finder CSV importer and hourly sync Add Storage Finder CSV importer and weekly sync Dec 10, 2025
@s-sajid-ali s-sajid-ali merged commit 0337f9b into main Dec 19, 2025
5 checks passed
@s-sajid-ali s-sajid-ali deleted the feature/storage-finder-import branch December 19, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Datafinder Easy Updates

4 participants