Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document new experiments methodology #10217

Merged
merged 49 commits into from
Jan 7, 2025

Conversation

danielbachhuber
Copy link
Contributor

@danielbachhuber danielbachhuber commented Dec 24, 2024

Changes

Introduces a new set of docs to describe the statistical methodology introduced in PostHog/posthog#26713

CleanShot 2025-01-07 at 07 46 21@2x

I kept the old document around and linked to it as "Legacy methodology".

Copy link

vercel bot commented Dec 24, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
posthog ✅ Ready (Inspect) Visit Preview Jan 7, 2025 3:59pm


Funnel experiments use Bayesian statistics with a beta model to evaluate the **win probabilities** and **credible intervals** for an experiment. [Read the statistics primer for an overview](/docs/experiments/statistics-primer) if you haven't already.

## What the heck is a Beta model?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably tone down the casual tone here :D Though I generally appreciate it, Experiment have a high standard for rigor, so I feel a more formal language better communicates this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lior539 @ivanagas What do y'all think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets drop it in the title but have fun with the example (like you do)

@jurajmajerik
Copy link
Contributor

I only skimmed this due to a lack of time, but seems like a good start! Maybe we want to avoid repeating the explanation of credible intervals and sampling three times, and instead covering it just once in the statistics primer? But I don’t have a strong opinion on this.

Copy link
Contributor

@ivanagas ivanagas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is well written, it's just a really tricky subject to write about well.

The biggest thing is having a clear "why someone should read this" for each doc. How does this make them a better engineer as well as someone who is better able to use our experiments product. Right now, it feels like it is missing a bit of that depth.

contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved
Say you just started an experiment a few hours ago and see these results:
* 1 in 10 people in the control group complete the funnel = 10% success rate.
* 1 in 9 people in the test variant group complete the funnel = 11% success rate.
* The control variant has a 46.7% probability of being better and the test variant has a 53.3% probability of being better.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to tell me it is too complicated, but how did they probabilities get calculated? It seems just like magic, but it would be help to know.

At the very least, something like this:

Suggested change
* The control variant has a 46.7% probability of being better and the test variant has a 53.3% probability of being better.
* Using Bayesian analysis, we'll find the control variant has a 46.7% probability of being better and the test variant has a 53.3% probability of being better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted to explain better with 5eb1f8f

contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved

## What the heck is a Beta model?

Imagine you run a pizza shop and want to know if customers say "yes" to adding pineapple. Some customers will say yes, others will say no.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you care about this?

You want to know how much to promote pineapple? You want to know how much pineapple to order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added explanation in 5132470

Imagine you run a pizza shop and want to know if customers say "yes" to adding pineapple. Some customers will say yes, others will say no.

The **beta distribution** is a statistical model that's great for analyzing proportions or probabilities. It helps us understand:
1. The true probability of customers saying yes to adding pineapple.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between probability and true probability?

Suggested change
1. The true probability of customers saying yes to adding pineapple.
1. The true probability of customers saying yes to adding pineapple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Added an explanation c7aba65


The **win probability** tells you how likely it is that a given variant has the highest conversion rate compared to all other variants in the experiment. It helps you determine whether the experiment shows a **statistically significant** real effect vs. simply random chance.

Let's say you're testing a new signup flow and have these results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the same pineapple on pizza example? If it doesn't work here, maybe we should use one that works for both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pizza website? Experiment to upsell pineapple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. Updated the examples with bb2d23c and b20beae


## Credible intervals

A **credible interval** tells you the range where the true conversion rate lies with 95% probability. Unlike traditional confidence intervals, credible intervals give you a direct probability statement about the conversion rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give you a direct probability statement about the conversion rate

What does "direct" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained in 5c0e225

danielbachhuber and others added 4 commits January 6, 2025 04:35
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
@danielbachhuber
Copy link
Contributor Author

Maybe we want to avoid repeating the explanation of credible intervals and sampling three times, and instead covering it just once in the statistics primer? But I don’t have a strong opinion on this.

@jurajmajerik Good call out. I'd like to keep the repetition because they're important concepts and the definitions are relevant to the corresponding context.

Copy link
Contributor

@ivanagas ivanagas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, much improved :)

contents/docs/experiments/funnels-statistics.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/trends-continuous-statistics.mdx Outdated Show resolved Hide resolved
Comment on lines 14 to 15
- When we have very little data, the Gamma distribution is wide, saying "hey, the true rate could be anywhere in this broad range".
- As we collect more data, the Gamma distribution gets narrower, saying "we're getting more confident about what the true rate is".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

contents/docs/experiments/trends-count-statistics.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/statistics-primer.mdx Outdated Show resolved Hide resolved
danielbachhuber and others added 7 commits January 6, 2025 10:04
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Co-authored-by: Ian Vanagas <[email protected]>
Copy link
Contributor

@andehen andehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good starting point! 🙌

As we talked about in the huddle, I would like to suggest a slightly different structure. But as this documentation is a topic for next quarter assigned to me, I'll save it for that.


Et voilà! The test variant's win probability increased significantly, and the credible intervals became narrower and more distinct. You can decide on the winner now or continue to wait, depending on your business requirements.

## Supported methodologies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unclear I think what we mean with "methodologies" here. I would suggest to call this "Supported metric types" or something, as that is what our users care about

I suggest something like this:

Supported metric types:

As different type of data have different shape, we need to use different models to better match the true distribution of the data. For example, funnel conversions are always between 0 to 1 (0 - 100 %), pageview counts can be any positive integer (0, 50, 280), and continuous values such as revenue can vary widely and tend to be right-skewed. The metric types we currently support are:

  • [funnels (conversion rates)] (/docs/experiments/funnels-statistics)
  • [count data (page views, etc.)] (/docs/experiments/trends-count-statistics)
  • [continuous values (revenue, etc.)] (/docs/experiments/trends-property-value-statistics)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only capital letters if is a title. But in the middle of a sentence it should be "a gamma-poisson model". The excpetion is if one refers to a specific distribution like this "we use a Beta(1, 1) distribution as prior ..."

@@ -0,0 +1,71 @@
---
title: Statistical methodology for funnel experiments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should write "Statistical methodology for funnel metrics".
Now, with the multiple metrics feature, we can have both funnel metrics and trend metrics in the same experiment, so it makes more sense to refer to these as different metrics rather than different experiments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with 1aafe97

contents/docs/experiments/trends-count-statistics.mdx Outdated Show resolved Hide resolved
contents/docs/experiments/trends-count-statistics.mdx Outdated Show resolved Hide resolved
@danielbachhuber
Copy link
Contributor Author

@andehen Thanks for the review!

Its unclear I think what we mean with "methodologies" here. I would suggest to call this "Supported metric types" or something, as that is what our users care about

Adapted with 1ef9b83

I think only capital letters if is a title. But in the middle of a sentence it should be "a gamma-poisson model". The excpetion is if one refers to a specific distribution like this "we use a Beta(1, 1) distribution as prior ..."

Fixed up with d7da622

@danielbachhuber danielbachhuber enabled auto-merge (squash) January 7, 2025 15:47
@danielbachhuber danielbachhuber merged commit b87b539 into master Jan 7, 2025
4 checks passed
@danielbachhuber danielbachhuber deleted the experiments/new-stats-methodology branch January 7, 2025 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants