-
Notifications
You must be signed in to change notification settings - Fork 0
feat(etl): Script to export github data into bigquery #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
86accbc
548b90f
bad16ed
065d716
1c41701
60f8ad4
da0d19e
ac58e37
4410fb0
140e48b
193f3f9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| # Dockerfile for mock GitHub API service | ||
| FROM python:3.11-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Install Flask | ||
| RUN pip install --no-cache-dir flask | ||
|
|
||
| # Copy mock API script | ||
| COPY mock_github_api.py . | ||
|
|
||
| # Expose port | ||
| EXPOSE 5000 | ||
|
|
||
| # Run the mock API | ||
| CMD ["python", "mock_github_api.py"] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| projects: | ||
| - id: test | ||
| datasets: | ||
| - id: github_etl | ||
| tables: | ||
| - id: pull_requests | ||
| columns: | ||
| - name: pull_request_id | ||
| type: INTEGER | ||
| - name: current_status | ||
| type: STRING | ||
| - name: date_created | ||
| type: TIMESTAMP | ||
| - name: date_modified | ||
| type: TIMESTAMP | ||
| - name: target_repository | ||
| type: STRING | ||
| - name: bug_id | ||
| type: INTEGER | ||
| - name: date_landed | ||
| type: TIMESTAMP | ||
| - name: date_approved | ||
| type: TIMESTAMP | ||
| - name: labels | ||
| type: STRING | ||
| mode: REPEATED | ||
| - name: snapshot_date | ||
| type: DATE | ||
| - id: commits | ||
| columns: | ||
| - name: pull_request_id | ||
| type: INTEGER | ||
| - name: target_repository | ||
| type: STRING | ||
| - name: commit_sha | ||
| type: STRING | ||
| - name: date_created | ||
| type: TIMESTAMP | ||
| - name: author_username | ||
| type: STRING | ||
| - name: author_email | ||
| type: STRING | ||
| - name: filename | ||
| type: STRING | ||
| - name: lines_removed | ||
| type: INTEGER | ||
| - name: lines_added | ||
| type: INTEGER | ||
| - name: snapshot_date | ||
| type: DATE | ||
| - id: reviewers | ||
| columns: | ||
| - name: pull_request_id | ||
| type: INTEGER | ||
| - name: target_repository | ||
| type: STRING | ||
| - name: date_review_requested | ||
| type: TIMESTAMP | ||
| - name: reviewer_email | ||
| type: STRING | ||
| - name: reviewer_username | ||
| type: STRING | ||
| - name: status | ||
| type: STRING | ||
| - name: snapshot_date | ||
| type: DATE | ||
| - id: comments | ||
| columns: | ||
| - name: pull_request_id | ||
| type: INTEGER | ||
| - name: target_repository | ||
| type: STRING | ||
| - name: comment_id | ||
| type: INTEGER | ||
| - name: date_created | ||
| type: TIMESTAMP | ||
| - name: author_email | ||
| type: STRING | ||
| - name: author_username | ||
| type: STRING | ||
| - name: character_count | ||
| type: INTEGER | ||
| - name: status | ||
| type: STRING | ||
| - name: snapshot_date | ||
| type: DATE |
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,53 @@ | ||||||||
| services: | ||||||||
| # Mock GitHub API service for testing without rate limits | ||||||||
| mock-github-api: | ||||||||
| build: | ||||||||
| context: . | ||||||||
| dockerfile: Dockerfile.mock | ||||||||
| ports: | ||||||||
| - "5000:5000" | ||||||||
| networks: | ||||||||
| - github_etl | ||||||||
|
|
||||||||
| # BigQuery emulator for local testing | ||||||||
| bigquery-emulator: | ||||||||
| image: ghcr.io/goccy/bigquery-emulator:latest | ||||||||
| platform: linux/amd64 | ||||||||
| ports: | ||||||||
| - "9050:9050" | ||||||||
| - "9060:9060" | ||||||||
| volumes: | ||||||||
| - ./data.yml:/data.yml | ||||||||
| command: | | ||||||||
| --project=test --data-from-yaml=/data.yml --log-level=debug | ||||||||
|
Comment on lines
+21
to
+22
|
||||||||
| command: | | |
| --project=test --data-from-yaml=/data.yml --log-level=debug | |
| command: ["--project=test", "--data-from-yaml=/data.yml", "--log-level=debug"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate step numbering in authentication methods. All three authentication methods are numbered as "1." This should be "1. Service Account Key File", "2. Workload Identity", "3. Application Default Credentials" for clarity.