Skip to content

Conversation

@wesm
Copy link
Contributor

@wesm wesm commented Dec 12, 2025

Addresses #2795, #4279.

Two main improvements to reduce unnecessary RPC calls and computation:

  1. Defer updates when tab is not visible

    • Track visibility state in grid instances
    • Defer schema/data updates until tab becomes active
    • Trigger deferred refresh when tab regains focus
  2. Fix duplicate data fetching on backend events

    • Backend fires onDidUpdateBackendState BEFORE onDidSchemaUpdate/onDidDataUpdate
    • Previously both handlers would fetch data, causing double work
    • Now: backend events handled only by schema/data handlers
    • UI operations (setRowFilters, sortData) explicitly update both caches

This eliminates redundant column profile computations and data fetches
that were happening on every schema/data update from the backend.

Release Notes

New Features

  • N/A

Bug Fixes

  • Data explorer does not do background computation in some cases when the tab is inactive.

QA Notes

Since this PR was LLM-assisted, we'll just want to be careful to check for regressions in e2e tests.

@:data-explorer

@wesm wesm requested a review from dhruvisompura December 12, 2025 23:34
@github-actions
Copy link

github-actions bot commented Dec 12, 2025

E2E Tests 🚀
This PR will run tests tagged with: @:critical @:data-explorer

readme  valid tags

@wesm wesm force-pushed the feature/de-background-rpcs branch from 0ed346b to d1350c5 Compare December 16, 2025 20:41
Two main improvements to reduce unnecessary RPC calls and computation:

1. Defer updates when tab is not visible
   - Track visibility state in grid instances
   - Defer schema/data updates until tab becomes active
   - Trigger deferred refresh when tab regains focus

2. Fix duplicate data fetching on backend events
   - Backend fires onDidUpdateBackendState BEFORE onDidSchemaUpdate/onDidDataUpdate
   - Previously both handlers would fetch data, causing double work
   - Now: backend events handled only by schema/data handlers
   - UI operations (setRowFilters, sortData) explicitly update both caches

This eliminates redundant column profile computations and data fetches
that were happening on every schema/data update from the backend.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wesm wesm force-pushed the feature/de-background-rpcs branch from d1350c5 to 2c4a9fa Compare December 16, 2025 23:13
@wesm
Copy link
Contributor Author

wesm commented Dec 17, 2025

Did a bunch of work on this today, this should be ready for review!

Copy link
Contributor

@dhruvisompura dhruvisompura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some basic testing with a small file and a 10 million row x 1000 column file and things seemed to be working. I did have some questions about the changes, but it is mostly looking good!

One UX question I have is how deferring computations for larger datasets effects folks who may want to multi-task while computations are happening? Does this mean folks can't do something on a different editor tab while waiting for the data explorer to finish updating now?

Comment on lines +888 to +892
// Update the summary cache to refresh column profiles after filtering.
await this._tableSummaryCache.update({
invalidateCache: true,
columnIndices
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit odd to me that we have a reference to the tableSummaryCache here when the cache is something that should be managed by the tableSummaryDataGridInstance. Was there a specific reason for doing it this way?

An alternative pattern that seems better to me is the way we handle notifying the summary panel about pinned columns by using the PositronDataExplorerInstance:

.

What do you think about that? This would maintain the separation of concern here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try that approach — this is one of the fixes that was needed to avoid the double-compute bug in the summary pane for large data frames (or schema update events).

I think part of the problem is that setRowFilters should probably not be in this class in the first place, since row filters affect both the data grid and the summary grid.

Comment on lines +239 to +240
// DISABLED: Fetching data here causes double-fetching in some scenarios.
// await this.fetchData(InvalidateCacheFlags.Data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall why this was added here but I'm curious if commenting this out will cause any regressions? I'm also curious what specific scenarios caused double-fetching to happen. Would be really helpful to add a more detailed comment in case we need to revisit this for any reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the different places where backend state syncs were being requested and I don't think this will cause any regressions — if so they aren't represented by e2e tests. What was resulting in the double-summary-compute bug was that

  • schema update event comes in
  • backend state update requested -> cache invalidate, profiles recompute
  • onDidSchemaUpdate fired -> cache invalidated again, profiles compute a second time

I would guess that the fetch here was added as a workaround for something else and the double-compute bug was just never caught by e2e testing

Comment on lines +91 to +94
/**
* Reference to the updateLayoutEntries function for use in setVisible().
*/
private _updateLayoutEntries!: (state?: BackendState) => Promise<void>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than defining _updateLayoutEntries in the constructor and having a ref to this function, can we just make this a standalone private function? Or is there a specific reason its defined like this?

Comment on lines -193 to +231
// Update the layout entries.
await updateLayoutEntries();

// Update the cache.
await this.fetchData(InvalidateCacheFlags.Data);
// If not visible, defer the update until we become visible.
if (!this._visible) {
this._pendingDataUpdate = true;
return;
}
// Clear pending flag since we're doing the work now.
this._pendingDataUpdate = false;
await this.handleDataUpdate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this has been updated so that we will rebuild sort keys every time which we weren't doing previously. I am a bit fuzzy on the sorting side of things and wondering if you could provide a bit more detail on the expectation with this change now that we rebuild the sort keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants