-
Notifications
You must be signed in to change notification settings - Fork 713
Experiments Q1 2026 goals #14231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiments Q1 2026 goals #14231
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| This quarter we’re focused on making experiments faster, clearer, and easier to work with by improving performance and results loading, and fixing rough edges in how experiments work. We’re also expanding data and replay support, and starting to use AI to help teams come up with better experiments. | ||
|
|
||
| #### Query performance <TeamMember name="Juraj Majerik" photo /> | ||
| For large-scale users, experiment queries are timing out or taking many minutes to load. We will build a way to precompute the heavy parts of these queries on a schedule, so that the final computation runs in a reasonable amount of time. This will be enabled for select customers and be easily toggleable. The goal is to use high-scale customers as a testing ground, so we have this solution in place before we onboard more users at this scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to take this on since I have experience scheduling computations in Dagster, and I expect this to be roughly similar. The goal is to keep it as simple as possible, focus on the largest part of the query (exposures?), and still be able to fall back to a real-time query when needed.
| #### AI features | ||
| Motivation: We’re pushing towards more automation, using AI to make experimentation easier to set up, interpret, and act on. | ||
| #### Anonymous -> Logged in experiments: full support <TeamMember name="Anders Asheim Hennum" photo /> | ||
| Last quarter, we implemented support for running reliable experiments across the anonymous-to-authenticated flow, currently available in posthog-js. This quarter, we will extend this support to all libraries and gather user feedback on usability and whether it works well for them in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questionable whether we need to have this in all SDKs right away. But I think LLMs can help us get this done fast, so might be good to get this out of the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah hopefully the SDK changes are mostly trivial
|
|
||
| * **Integrate Experiments into Tasks (AI)** – Deploy any code change behind an experiment, track its progress, and make a decision based on results. <TeamMember name="Juraj Majerik" photo /> <TeamMember name="Rodrigo Iloro" photo /> | ||
| #### Experiment phases <TeamMember name="Anders Asheim Hennum" photo /> | ||
| Currently, changing rollout conditions mid-experiment creates a poor experience, with confusing warnings and potentially invalid statistical analysis. We will introduce experiment phases: distinct periods within an experiment where rollout conditions stay the same. This makes it clear when and how an experiment changed over time, and makes the results easier to reason about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the most important quality improvement on the flags side. @andehen happy to reword this or include other flags-related points (UI/UX?) if that makes sense, as there’s quite some overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should do this. I think the wording is good as it is 👍
|
|
||
| * **“Analyze results”** AI agent – An AI agent that reviews experiment results, highlights important findings, and explains them in plain language. It can suggest whether to continue the test, stop it early, or roll out a variant. The agent should also call out risks, like low sample size or unusual data patterns, to help teams make better decisions. <TeamMember name="Anders Asheim Hennum" photo /> | ||
| #### Integrate session replay recordings <TeamMember name="Rodrigo Iloro" photo /> | ||
| PostHog now supports session replay summaries via chat. We will integrate replay summaries into experiments so they help explain and add context to the experiment results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having replay summaries integrated in experiments can be a massive differentiator, I think we should do this asap.
| PostHog now supports session replay summaries via chat. We will integrate replay summaries into experiments so they help explain and add context to the experiment results. | ||
|
|
||
| #### Extend data warehouse support <TeamMember name="Rodrigo Iloro" photo /> | ||
| We will extend data warehouse experiment support to funnels. We will also improve the overall experience by providing clear guidance on how to run data warehouse experiments, and we will two data warehouse experiments ourselves to build up hands-on knowledge within the team. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data stack product is still evolving, but the ability to query the data warehouse via Clickhouse is unlikely to change. We need to support funnels, and we also need to test and dogfood the integration much more. Data warehouse tooling is strategically important, so it makes sense to spend time on it this quarter.
| #### Feature flags foundation | ||
| Motivation: Experiments rely on flags, so we need to make sure the basics are solid and ready to support advanced use cases. | ||
| #### AI generated experiments <TeamMember name="Juraj Majerik" photo /> | ||
| We will help users come up with experiment ideas interactively via chat. We will use existing insights, recordings and "signals" as input, combine them with user-provided context. Then we will generate an experiment idea, set up the experiment with metrics, and recommend how to implement it in code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very experimental, but it could help drive experiment creation. Some of it depends on upcoming work from team-signals and what we’ll be able to consume or build on. It would also force us to test and polish the experiment creation flow via MCP, which is still quite untested.
|
|
||
| * **Anonymous -> Logged in users experiments** – Create a seamless experience for experiments that start with anonymous users but continue when the same user logs in. <TeamMember name="Anders Asheim Hennum" photo /> | ||
| #### Great results loading experience <TeamMember name="Marcel Poelker" photo /> | ||
| We already precompute timeseries results. We will use these timeseries as the main experiment results, so users see freshly calculated results when they open the app, instead of having to refresh manually and wait. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this is too small of a goal, but it's good onboarding project for Marcel. There are edge cases to think through here and having this done well will really improve UX.
| - Before launch: let users enter their expected traffic and baseline conversion to estimate how long the experiment will take to reach significance. | ||
| - During the experiment: show how much longer the experiment likely needs based on current progress and live data. | ||
| #### Retire legacy experiments to a read-only mode <TeamMember name="Marcel Poelker" photo /> | ||
| Legacy experiments take up a large part of our codebase. We will cleanly separate them and move them to a read-only mode: still visible for audit and reference, but no longer editable or recalculable. This will reduce support burden and simplify the codebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good opportunity for Marcel to get familiar with our code architecture and get this out of the way.
andehen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good!
In addition to great anonymous experience, I'd like to work on the possibility to run multiple experiments on a flag. That includes sending a tracking key on the events such that we know which rule caused a given flag. It will be a stretch goal, but worth taking some steps towards in this quarter
| #### AI features | ||
| Motivation: We’re pushing towards more automation, using AI to make experimentation easier to set up, interpret, and act on. | ||
| #### Anonymous -> Logged in experiments: full support <TeamMember name="Anders Asheim Hennum" photo /> | ||
| Last quarter, we implemented support for running reliable experiments across the anonymous-to-authenticated flow, currently available in posthog-js. This quarter, we will extend this support to all libraries and gather user feedback on usability and whether it works well for them in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah hopefully the SDK changes are mostly trivial
|
|
||
| * **Integrate Experiments into Tasks (AI)** – Deploy any code change behind an experiment, track its progress, and make a decision based on results. <TeamMember name="Juraj Majerik" photo /> <TeamMember name="Rodrigo Iloro" photo /> | ||
| #### Experiment phases <TeamMember name="Anders Asheim Hennum" photo /> | ||
| Currently, changing rollout conditions mid-experiment creates a poor experience, with confusing warnings and potentially invalid statistical analysis. We will introduce experiment phases: distinct periods within an experiment where rollout conditions stay the same. This makes it clear when and how an experiment changed over time, and makes the results easier to reason about. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should do this. I think the wording is good as it is 👍
Changes
Add goals for Q1 2026.