Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 5, 2026

The pagination logic was setting params = None after the first page, relying on GitHub's Link header URL to contain all query parameters. This is fragile and makes the code harder to maintain.

Changes

  • Parse next URL to extract page parameter - Use urllib.parse to extract only the page value from the Link header URL
  • Preserve original params dict - Update only the page parameter instead of replacing the entire params object
  • Validate page parameter - Ensure page is a positive integer; log warning and stop pagination on invalid values (missing, non-numeric, zero, negative)

Before

# Pagination
next_url = resp.links.get("next", {}).get("url")
if not next_url or len(batch) == 0:
    break
base_url, params = next_url, None  # Fragile: relies on next URL having all params

After

# Pagination
next_url = resp.links.get("next", {}).get("url")
if not next_url or len(batch) == 0:
    break
parsed_url = urlparse(next_url)
query_params = parse_qs(parsed_url.query)
if "page" in query_params and query_params["page"]:
    try:
        page_num = int(query_params["page"][0])
        if page_num > 0:
            params["page"] = page_num
        else:
            logger.warning(f"Invalid page number {page_num}, stopping pagination")
            break
    except (ValueError, IndexError) as e:
        logger.warning(f"Invalid page parameter: {e}, stopping pagination")
        break

This maintains the original request parameters (state, per_page, sort, direction) across all pages and prevents infinite loops from malformed pagination URLs.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits January 5, 2026 21:59
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Co-authored-by: dklawren <826315+dklawren@users.noreply.github.com>
Copilot AI changed the title [WIP] Update GitHub data export script for BigQuery Refactor pagination to preserve request parameters Jan 5, 2026
Copilot AI requested a review from dklawren January 5, 2026 22:05
@dklawren dklawren marked this pull request as ready for review January 5, 2026 23:17
@dklawren dklawren merged commit 140e48b into etl-script Jan 5, 2026
@dklawren dklawren deleted the copilot/sub-pr-2 branch January 5, 2026 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants