You experience type I and type II errors
Incorrectly find or fail to recognize winners in your experiments. With both
Failure, in the end you go with what seems to work or not. And not with that
Misinterpretation of test results is not just a result
in misguided optimization efforts, but can also derail your optimization program
The best time to spot these mistakes is before you even make them! Let us see how you can avoid having Type I and Type II errors in your optimization experiments. But before that, let’s look at the null hypothesis … because It is the faulty rejection or non-rejection of the null hypothesis that causes Type I and Type II errors.
The null hypothesis: H0
If you accept an experiment, you don’t
Jump directly to suggest that the proposed change move a certain metric.
You start by making the proposed change
doesn’t affect the metric in question – that they have nothing to do with each other.
This is your null hypothesis (H0). H0 is
always that there is no change. This is what you believe, by default … until
(and if) your experiment refutes it.
And your alternative hypothesis (Ha or H1) is
that there is a positive change. H0 and Ha are always mathematical opposites.
Ha is the one where you expect the proposed change to make a difference
Your alternative hypothesis – and test it with yours
For example, if you want to run one
Experiment on your pricing page and add another payment method
First form a null hypothesis with the words: The
An additional payment method has no impact on sales. Your alternative
The hypothesis would be: The additional
Payment method increases sales.
Conducting an experiment indeed questions the null hypothesis or the status quo.
Type I and Type II errors occur if you incorrectly reject or do not reject the null hypothesis.
Understanding Type I Errors
Type I errors are called false positive or
In a type I error instance of the hypothesis
Testing, your optimization test or experiment *Seems to be successful * and you conclude (erroneously) that the
The variation you test works differently (better or worse) than that
For Type I errors, you will see lifts or break-ins – this is only temporary and will not be likely
long-term care – and in the end reject your null hypothesis (and
Accept your alternative hypothesis).
The null hypothesis can be wrongly rejected
happen for various reasons, but the leading one is the practice of peek (i.e. look at your results
in the meantime or if the experiment is still running). And call up the tests
earlier than the specified stop criteria are reached.
Many test methods discourage them
The practice of peeking as a consideration of the intermediate results can lead to errors
Conclusions that lead to Type I errors.
Here’s how to make a Type I mistake:
Suppose you optimize your B2B website
Landing page and hypothesis that adding badges or awards is reduced
the fear of your potential customers, which increases the fill rate of your form (which leads to
So your null hypothesis for this experiment
becomes: Adding badges has no effect
Fill out Form.
The stop criterion for such an experiment is usually a certain period of time and / or after X conversions have taken place at the defined statistical significance level. Traditionally, optimizers try to reach the 95% statistical confidence mark because they have a 5% chance of making the Type I error, which is considered low enough for most optimization experiments. In general, the higher this metric, the lower the likelihood of making Type I errors.
The desired level of trust determines your likelihood of getting a Type I (α) error.
So if you are aiming for a confidence level of 95%, your value for α is 5%. Here you accept that the probability that your conclusion is wrong is 5%.
Conversely, if you do an experiment with a 99% confidence level, your likelihood of getting a Type I error drops to 1%.
For this experiment, say that you are becoming impatient and instead of waiting for your experiment to end, you
Just look at the dashboard of your test tool (Peek!) A day later. And you
Notice an “obvious” increase – that your form fill rate has increased many times over
29.2% with a confidence level of 95%.
And BAM …
… you end your experiment.
… reject the null hypothesis (which had badges
no impact on sales).
… accept the alternative hypothesis (the
Badges increased sales).
… And run with the version with the awards
But if you measure your leads over the month,
You will find the number almost comparable to what you reported with the
Original version. The badges weren’t that important. And that is zero
The hypothesis was probably rejected in vain.
What happened here was that you ended your experiment too early and rejected the null hypothesis and received a wrong winner – a Type I error.
Avoid Type I errors in your experiments
A sure way to lower your chances
If you encounter a Type I error, the confidence level is higher. A 5%
statistical level of significance (which corresponds to a statistical confidence level of 95%
Level) is acceptable. It is a bet that most optimizers would make safe because
Here you fail in the unlikely range of 5%.
Don’t just introduce a high level of confidence, but also run your tests for long enough is important. Calculators for the test duration can tell you how long you have to run your test (after taking into account a certain effect size, among other things). If you let an experiment take its intended course, you significantly decrease the likelihood that the Type 1 error will occur (provided you use a high level of confidence). If you wait until you get statistically significant results, there is little chance (typically 5%) that you have incorrectly rejected the null hypothesis and made a Type I error. In other words, use a good sample size because this is crucial in order to achieve statistically significant results.
Now I was dealing with Type I errors related to the level of confidence (or significance) in your experiments. But there is also another type of error that can sneak into your tests – Type II errors.
Understand Type II errors
Type II errors are considered false negatives or
In contrast to the type I error in the
Example of a Type II error, the experiment * Seems to be unsuccessful (or inconclusive) * And you
(erroneously) conclude that the variation you are testing does not
different from the original.
You don’t see the real thing with Type II errors
rises or falls and cannot reject and reject the null hypothesis
the alternative hypothesis.
Here’s how to make the Type II mistake:
Back to the same B2B website from above …
So let’s suppose this time you accept that
Add one GDPR Compliance disclaimer is at the top of your form
Encourage more prospects to fill it out (which leads to more leads).
Hence your null hypothesis for it
Experiment will: The GDPR Attention
The disclaimer does not affect the completion of forms.
And the alternative hypothesis for it
read: The GDPR Disclaimer of liability
leads to more form fillings.
The statistical significance of a test determines how well it can detect differences in the performance of your original and challenger versions if deviations occur. Traditionally, optimizers try to reach the statistical performance limit of 80% because the higher this metric, the less likely it is to make Type II errors.
Statistical performance takes a value between 0 and 1 (and is often expressed in%) and controls the likelihood of your Type II error (β). it is calculated as: 1 – β
The higher the statistical significance of your test, the less likely it is that Type II errors will occur.
If an experiment has a statistical performance of 10%, it can be very susceptible to a Type II error. If an experiment has a statistical significance of 80%, the likelihood of a Type II error is much lower.
Run your test again, but this time you
Do not notice any significant increase in your form fill-ups. Both versions report
near similar conversions. Because of this, stop your experiment and
Continue with the original version without the GDPR disclaimer.
But if you delve deeper into your leads
You can find data from the trial period while looking at the number of leads
Both versions (the original and the challenger) appeared to be identical, the GDPR
The version has brought you a good, significant increase in the number of leads from
Europe. (Of course, you could have used audience targeting to show that
just experiment with the leads from Europe – but that’s another story.)
What happened here was that you ended your test too early without checking that you had achieved enough performance – and made a Type II mistake.
Avoid Type II errors in your experiments
Run high tests to avoid Type II errors
Statistical power. Try to configure your experiments so that you can at least hit
the statistical performance mark of 80%. This is an acceptable statistical level
Performance for most optimization experiments. With this you can ensure that in 80% of
At least in these cases, you correctly reject a false null hypothesis.
To do this, you need to consider the factors
The biggest one is that Sample size (with an observed effect size). The Sample size binds directly to the power of a test. A huge one Sample size means a high performance test. Underperforming tests are very susceptible to Type II errors because your chances of seeing differences in the results of your challenger and the original versions are significantly reduced, especially at low MEIs (more on this below). To avoid Type II errors, wait for the test to collect enough power to minimize Type II errors. Ideally, in most cases you want to achieve a performance of at least 80%.
Another factor is that Minimum interest effect (MEI) that you aim for yours
Experiment. MEI (also called MDE) is the minimum size of the difference
that you want to recognize in your KPI in question. If you set a low MEI
(for example, with a 1.5% increase) your chances of encountering Type II
Increase in errors because the detection of small differences has to be much larger
Sample sizes (to achieve adequate performance).
Finally, it should be noted that there tends to be an inverse relationship between the probability of making a Type I (α) error and the probability of making a Type II (β) error. For example, if you decrease the value of α to decrease the likelihood of a Type I error (say you set α to 1%, which means a confidence level of 99%), the statistical value of your experiment (or its ability β ) to see a difference if it exists) also decreases, increasing the likelihood of getting a Type II error.
Accept one of the mistakes: Type I and II (& find a balance)
Reduce the likelihood of an error type
increases that of the other type (provided everything else remains the same).
And so you have to answer the call with which error
Guy you could be more tolerant of.
On the one hand make a Type I mistake and
Introducing a change for all users can cost you conversions and earnings
– Even worse, could be a conversion killer.
On the other hand make a Type II mistake and
If you do not release a successful version for all users, this in turn can result in costs
the conversions you could have won otherwise.
Both errors are always associated with costs.
Depending on your experiment, however, one
may be more acceptable to you than the others.
Generally, testers find that Art
I error about four times more serious than the Type II error.
If you want to take a more balanced posture
Approach, statistician Jacob Cohen suggests that you should Aim for 80% statistical performance that comes with “an appropriate balance between alpha and beta risk.”(80% performance
is also the standard for most test tools.)
And as far as statistical significance is concerned, the standard is 95%.
Basically, It’s about compromise and the risk you want to tolerate. If you really want to minimize the likelihood of both errors, you can aim for a 99% confidence level and a 99% potency. However, this would mean that you would work with incredibly large samples for ages. You would also leave some room for error.
You will complete one every now and then
experiment wrong. However, this is part of the testing process. Investigation and
Retesting or tracking your successful or failed experiments is one way
to confirm your results or to determine that you made a mistake.
Over to you …
Have you ever encountered Type I or Type II errors in your experiments? If so, how did you find out that you chose the wrong winner or missed the overall winner? Tell us in the comments!
Note: We are not the author of this content. For the Authentic and complete version,
Check its Original Source