Skip to content

Conversation

@dklawren
Copy link
Contributor

@dklawren dklawren commented Dec 6, 2025

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a comprehensive GitHub ETL pipeline that extracts pull request data from GitHub repositories and loads it into Google BigQuery. The implementation uses a streaming/chunked architecture to process data in batches of 100 PRs, includes rate limit handling, and provides local testing capabilities through mock services.

  • Implements chunked extraction, transformation, and loading of GitHub PR data to BigQuery
  • Adds mock GitHub API server and BigQuery emulator for local testing without rate limits
  • Includes comprehensive documentation with setup instructions and schema definitions

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
requirements.txt Adds BigQuery client library and testing dependencies (pytest, pytest-mock, pytest-cov)
mock_github_api.py Creates Flask-based mock GitHub API server that generates 250 sample PRs with commits, comments, and reviewers for testing
main.py Implements the core ETL pipeline with chunked processing: extraction from GitHub API with pagination/rate limiting, data transformation to BigQuery format, and insertion using BigQuery client
docker-compose.yml Orchestrates three services: mock GitHub API, BigQuery emulator, and the ETL service for local development
data.yml Defines BigQuery table schemas for pull_requests, commits, reviewers, and comments tables used by the emulator
README.md Provides comprehensive documentation including setup, architecture, authentication methods, and usage examples
Dockerfile.mock Creates container image for the mock GitHub API service using Python 3.11 and Flask
Comments suppressed due to low confidence (1)

main.py:18

  • Import of 'pprint' is not used.
import pprint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@dklawren dklawren requested review from cgsheeh and zzzeid December 8, 2025 23:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 15 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI commented Jan 5, 2026

@dklawren I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Contributor

Copilot AI commented Jan 5, 2026

@dklawren I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 6 commits January 5, 2026 21:59
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Refactor pagination to preserve request parameters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants