Conversation
- Replace 50-line inline bash Snuba bootstrap with bootstrap-snuba.py (adopted from getsentry), which parallelizes ClickHouse DB setup with retries and cleaner error handling - Extract failure aggregation from inline bash to report_shuffle_failures.py: deduplicates across shards, writes a consolidated GitHub Actions job summary, and upserts GitHub issues (creates or comments on existing) - Add test_report_shuffle_failures.py with 33 tests covering all pure functions and the gh CLI boundary - Add a report job that runs after all matrix shards, downloads failure artifacts, and calls the reporter — replacing per-shard issue creation with one centralized deduplicated pass
023e119 to
038a6e4
Compare
…mestamp The test used a hardcoded date of 2025-01-01 which is now outside ClickHouse's 90-day retention window, causing Snuba to return fewer rows than expected. Use a recent timestamp (1 hour ago) instead, matching the pattern already used in the neighbouring test_export_clickhouse_rows test.
The test was frozen to 2025-01-15, a date now outside Snuba's 90-day retention window. Snuba returns no data for timestamps that old, causing the result list to be empty and the index-0 access to raise IndexError. Use a module-level constant frozen to yesterday so the test remains deterministic while always staying within the retention window.
The test used base_time = datetime(2025, 10, 29, ...) which is now 163+ days ago, outside the EAP item retention_days=90 window. Snuba returns no data for that detector, causing the KeyError on data[str(detector.id)]. Use MOCK_DATETIME (already defined as yesterday) so base_time is always within retention. The class-level @freeze_time(MOCK_DATETIME) already establishes this as the frozen 'now', so using it for base_time is consistent with the rest of the test suite.
Add 7 new patterns drawn from 27038a4: - #9 Snuba cross-test contamination: dedicated project + tight query window - #10 Expired retention window: replace hardcoded 2025 dates with before_now() - #11 Global state teardown gap: try/finally cleanup + autouse fixture - #12 Hardcoded IDs causing uniqueness collisions: use uuid4().hex - #13 Bucket-count boundary flakiness: freeze time mid-period - #14 Batch timeout causing premature flush: set max_batch_time=300 - #15 Polling loops dependent on time.sleep: add poll_interval=0 parameter Update the lookup checklist to cover the new patterns.
Cut SKILL.md by ~50% - collapse phases into numbered sections, inline the artifact-reading one-liner, remove prose repetition. Cut common-patterns.md by ~60% - lookup table up front, each pattern trimmed to symptom + fix code only, no diagnostic preamble.
Increase matrix from 11 to 16 shards for faster wall-clock time. Disable detect-test-pollution temporarily: it re-runs the full test suite for each failing shard (~10min overhead) which isn't worth the cost until the baseline flake rate is lower. Also removes the now-dead testids truncation and upload steps that only fed pollution detection. The failure artifact always writes type=flaky in the interim.
A prior test that opens a sentry_sdk span leaves the trace active in the SDK scope. get_trace_id() then returns a non-None value, failing the assertion that expects None when no span is active. Wrap the emission in sentry_sdk.isolation_scope() to ensure the handler runs with a clean scope regardless of what ran before.
reserve_model_ids(Monitor, 14) was called after setup_cross_db_deletion_data(), which creates monitors with auto-generated IDs. setUp() deletes all monitors but does not reset the DB sequence, so if a previous test consumed IDs 4 or lower, the next auto-ID could be 5, 7, 9, or 11 — exactly the IDs the test then tries to INSERT explicitly, causing an IntegrityError. Move reserve_model_ids to before any setup calls so all auto-generated monitors land above 14 and can never collide with the explicit IDs.
Adopt the setup-devservices action from master which starts devservices in the background before Python/venv setup begins. This lets docker image pulls and container starts overlap with pip install and other sentry env setup, saving ~1-2 min per shard. Also start bootstrap-snuba.py in background before wait.sh completes so Snuba Phase 1 (ClickHouse DB creation + bootstrap) overlaps with the final devservices health checks — saving another ~30s. Move bootstrap-snuba.py to .github/actions/setup-devservices/ where master keeps it, removing the duplicate in workflows/scripts/.
test_emit shares the same trace context leak as test_gke_emit: when test_emit_with_trace_id runs first it leaves an active SDK span, making get_trace_id() return non-None in the next test. Wrap the emit call in sentry_sdk.isolation_scope() across all parametrizations. Drop GitHub issue creation from report_shuffle_failures.py entirely — the workflow now only generates the job summary. Remove GH_TOKEN, GH_REPO, and issues:write permission from the report job. Update tests to match.
assert_ttls (test_buffer.py): Skip TTL=-2 (key vanished between keys() and ttl() TOCTOU race with concurrent xdist workers sharing Redis) TestGetActiveOrgs.setUp (test_common.py): Wrap 100 create_organization/ create_project calls in time_machine.travel(tick=True) so each gets a unique millisecond timestamp — class-level @freeze_time causes MaxSnowflakeRetryError because the generator can't advance the clock to resolve sequence exhaustion. test_drain_shard_flush_all__upper_bound (test_outbox.py): Raise barrier timeout from 10s to 30s for both barrier tests — 10s is too tight under 16-shard CI runner load. test_get_span_counts_with_many_projects (sampling spans): Same MaxSnowflakeRetryError as above — 200 create_project calls under class- level @freeze_time. Wrap the creation loop in time_machine.travel(tick=True). test_delete_error_events (test_project.py): Snuba eventstream deletions are processed asynchronously. Add a retry loop (max 5s) after the scheduled deletion runs before asserting events == 0.
test_updates_grouping_config_if_current_config_is_not_default: The
assertion 'expected_expiry - actual_expiry < 1' fails when both
timestamps are int()-truncated and land exactly 1 second apart. The
comment already says 'one-second tolerance necessary' — fix to <= 1.
test_workflow_engine_serializer: set_value('success') was called at the
top of the test, long before the HTTP request. A concurrent xdist
worker's flushdb() (from the spans buffer fixture) can clear the Redis
key in that window, making the endpoint return 'pending'. Move
set_value() to immediately before the request.
Add a TODO section to fix-flaky-tests/SKILL.md for the two remaining
unfixed flakes (test_simple CrossTransactionAssertionError, test_unmerge
times_seen drift) so future sessions can pick them up.
…n_id=2 Another test's integration can have id=2. Use self.integration.id + 1_000_000 to guarantee the ID doesn't exist regardless of test ordering.
…ring
test_drain_shard_not_flush_all__upper_bound: Use Organization(id=2) for
outbox2 so it's in a different shard. Same-org outboxes coalesce —
processing outbox1 deletes outbox2 as a side effect, breaking the
assertion. A different-shard outbox is unaffected by the shard-1 drain.
Switch back to _PairwiseSync (no timeout) since CI load can break the
1s timeout even on valid test execution.
test_basic[new_group]: Move optimize_snuba_table('events') to BEFORE
get_event_by_processing_counter('x1'). The reprocessed event x1 may
still have the old group_id in Snuba until ClickHouse processes the
replace message — optimize must run first.
…p count @patch('sentry.conduit.tasks.time.sleep') patches time.sleep on the global time module, causing retry sleeps from sentry.utils.retries to accumulate in mock_sleep.call_count (giving ~2333 instead of 100). Patching 'sentry.conduit.tasks.time' instead replaces only the local module reference in tasks.py, so retries.py keeps using real time.sleep and the assert mock_time.sleep.call_count == NUM_DELTAS is reliable.
…ation in push_changes_timeout TestRecalibrateOrgsTasks.setUp: Wrap 6 create_old_project calls in time_machine.travel(tick=True) to prevent MaxSnowflakeRetryError. test_push_changes_timeout: time.time() side_effect=[0,0,200] exhausts when retry machinery makes extra calls. Use chain([0,0], repeat(200)) so additional calls keep returning 200 (still past timeout) without raising StopIteration.
maxage=1 caused key 1 to expire before the assertion under slow CI (16 parallel shards slow down execution beyond 1 second).
…orting Re-enables pollution detection in a lightweight form (~30s per shard instead of the previous ~10min full-suite re-run): - After a shuffle-test failure, run the failing test in isolation (no shuffle). If it passes alone → mark failure.json as type="pollution". If it fails alone → keep type="flaky" (genuine intermittent failure). - The existing 'report' job writes all failures to GITHUB_STEP_SUMMARY with a "passes in isolation (likely pollution)" label for pollution cases. No GitHub issues are created. report_shuffle_failures.py: Handle pollution failures without polluting_testid (simple isolation detection) by showing the traceback with a "passes in isolation" label rather than requiring polluter info.
Restores the original detect_test_pollution.py-based bisection that automatically finds the polluting test. Replaces the lightweight isolation-only detection added in the previous commit. The script does binary search through testids (tests that ran before the failing one) to find the exact polluting test. Writes result to GITHUB_STEP_SUMMARY (no issue created). Failure artifacts get type='pollution' with polluting_testid when bisection succeeds, or type='flaky' when the test also fails in isolation.
Documents all systemic infrastructure changes, individual test fixes grouped by root cause, Snuba changes, workflow changes, and current state.
test_recalibrate_orgs_with_sliding_window_org: A concurrent xdist worker's flushdb() can clear both the sliding-window cache keys and the recalibration factor keys in the gap between the first and second recalibrate_orgs() calls. Re-seed both sets of Redis keys before the second invocation so the double-down assertion is robust. test_large_digest: Using self.notification (a full pickled+compressed Event object, potentially hundreds of KB per record) × 8192 records pushes the DIGEST_OPEN Lua response past rb.Cluster buffer limits, causing random truncation. Switch to a tiny string value since the test only checks record count.
Hardcoded group_id=2 for the control events collides with group.id when test ordering puts this test early in the session. Both events land in the same group, the suspect-tags scorer sees no cross-group signal, and the score collapses to 0.0 instead of 2.76. Use group.id + 1_000_000 to guarantee no collision regardless of ordering.
sentry.utils.snuba._snuba_pool is a module-level global created at import time from settings.SENTRY_SNUBA (default: port 1218). When the xdist worker later overrides SENTRY_SNUBA to the per-worker snuba-gw port (1230+), the pool still points at 1218, causing connection refused on every eventstream.insert call on that worker. Recreate _snuba_pool immediately after the settings override so all subsequent Snuba requests on the worker reach the correct container.
… import The previous attempt imported sentry.utils.snuba at configure time, which triggered an import chain hitting sentry.api.helpers.error_upsampling before Django settings were fully ready in the worker subprocess, causing KeyError: 'analytics.backend' and crashing all workers. Fixed approach: check sys.modules instead of importing the module. - If sentry.utils.snuba is already in sys.modules (imported before configure_for_worker ran via conftest loading), recreate _snuba_pool in-place pointing at the per-worker URL. - If it is not yet imported, the next import will read the already-updated settings.SENTRY_SNUBA and create the pool with the correct URL automatically.
test_cache_invalidation_error_handling: self.project is a lazy cached property that initializes self.organization, self.team, and self.project on first access. When this test runs first in a shuffled order, that lazy init happens inside the patch(cache.delete, side_effect=Exception) context. Organization.save() post_save fires cache.delete, which raises the patched Exception and escapes the test. Fix: eagerly access self.project before entering the patch context. test_get_active_orgs_no_max_projects: Add super().setUp() so base class initialization (including SnubaTestCase.init_snuba() and cell context setup) runs correctly. Missing super() call left the Snuba/cell state uninitialised in shuffle-first ordering, contributing to MaxSnowflakeRetryError.
…t_setup too The configure-time sys.modules check sometimes misses: if sentry.utils.snuba is first imported after configure_for_worker ran but before the settings override took effect (an edge case in plugin ordering), the pool gets created with port 1218 and nothing updates it afterwards. Add _ensure_snuba_pool() called from pytest_runtest_setup (before every test) as a safety net. It checks whether _snuba_pool already points at the correct per-worker port and skips the recreation if so — overhead is one attribute lookup per test. Use sentry.net.http.connection_from_url (matching what snuba.py uses) instead of urllib3 directly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
reviving #108286