Table of Contents
When you put up something on your website, how will you know if it works or not? No need for guess work, because this is where A/B Testing comes in.
The A/B Test, sometimes known as split testing, is an experiment where you show two variants of the same product to different customers at the same time, and then determine which variant drives more conversions. Usually, the variant with the bigger conversion is the winning variant, and is thus used to help you sell your product better.
Developed by William Sealy Gosset around 1908, this test was used to identify the best barley to brew beer at Guinness. His methodology allowed for a statistically rigorous comparison of two groups to know whether there was a significant difference between them. Not only was he able to keep the high quality at Guiness, his test has become a powerful tool to determine what works and what doesn’t. This test is still being used a hundred years later by the likes of audio engineers, and Internet marketers.
It has especially taken off over the last few decades where companies recognized that the Internet is made for A/B testing. Because app and website changes are quick and cheap, the impact is easily quantifiable in website clicks and sign-ups. A/B testing has been used to test everything from your favorite website’s design, clothing models’ looks, and even sale offers. The way you act on their website is a piece of data gathered by engineers. In fact, some business find that minor changes can impact customer behavior dramatically, and can mean big money for a high-traffic website.
Website traffic, or the number of visitors on your website play an important role in expanding your business. The amount of traffic equals the opportunities you have to acquire new customers, create relationships with existing ones and grow your business. Businesses want visitors to take action (also known as conversion) on their website. The rate at which a website can drive this is called the “conversion rate”. Optimizing your conversion funnel means a higher chance for your visitors to convert.
The metrics for conversion are unique to each website. For e-commerce, it may be the sale of the products, while for B2B, it may be the generation of qualified leads. A/B testing is one of the components of the overarching process of Conversion Rate Optimization (CRO) using which you can gather both qualitative and quantitative user insights and use them to understand your potential customers and to optimize your conversion funnel based on that data.
Creating a marketing campaign around your product is the first step. If you made a website or an email newsletter, you’d want to know if it helps or hinders your sales. A/B testing lets you know which words, phrases, images, videos, and other elements work best. Even the simplest changes can impact conversion rates!
Doing A/B testing can solve a myriad of problems. First, it can solve visitor pain points. When they visit your website, they have a specific goal they want to achieve. It can be to learn more about your product, or even buy your product. Whatever their goals may be, they may likely face a pain point. Not being able to achieve their goal will lead to bad user experience, which will then affect conversion rates.
You can also get better ROI from your existing traffic. The cost of acquisition can be huge. Hence, A/B testing lets you make the most out of your existing traffic and helps you increase conversion without having to spend so much on new acquisition.
It can also help reduce bounce rate. A high bounce rate on your website may be due to too many options available, mismatched expectations, and so on. Through A/B testing, you can test multiple variations of an element in your website as much as possible until you find the best version. This improves user experience, which then leads visitors to spend more time on your website and reduce bounce rate.
It can also help achieve improvements that can affect your statistics. A/B testing is completely data driven, and has no room for guesswork or instincts. Through this, you can determine the best and the worst based on statistically significant improvement metrics like click-through rates, time spent on the page, and so on.
Your website’s conversion funnel is crucial to your business. It’s important to optimise each piece of content that reachers your users. This holds true for elements that can affect visitor behaviour and conversion rate. Here are some of the elements that you can consider doing an A/B Test on:
It’s crucial to create an excellent user experience for your visitor. Ensure that your structure is clear so that visitors can easily find what they are looking for and don’t get lost. Creating an easy-to-navigate website and matching visitors’ expectations can increase the chances of conversion as well as create a great customer experience for them which may have the visitors return to your website.
The CTA is determines whether or not the visitors will finish their purchases and convert. Through A/B testing, you can test different copies, placement, even colours and sizes for your CTA until you find the best one. You can then test this best version to optimise it further.
Your headline is the first thing your visitors see on your website. This is why it’s crucial to craft your website’s headline to be eye-catching and straight to the point. Be extra cautious about the writing style and formatting too. Test out the size, fonts, and messaging.
The body’s copy must be able to communicate what it has in store for the visitor. It should also connect to the website’s headline. A well-crafted body may help increase the chance of conversion.
If your website is selling a product, your product page is extremely important for conversion. Its design and layout must be optimized. Along with the copy, the design must include images and videos of the product – offers, demos, advertisements, etc. The product page must be able to provide simple yet clear information on what it is and what it can provide. Consider highlighting customer reviews to add credibility to your product.
While conducting A/B testing, keep an eye out for these mistakes to ensure that your testing goes smoothly and to avoid wasting time and resources:
One common mistake with A/B testing is running it too soon to avoid wasting resources and time. Run your campaign for at least a full week and see how it performs. Then, you can consider doing tweaks and tests.
It’s best to run test for a minimum of two weeks to keep the results as accurate as possible. If you’re not confident with the results within the first seven days, run it for another seven days. The only time to break this rule is when your historical data indicates that the conversion rate is the same every single day. But even then, it’s always best to test at least one full week at a time. Keep an eye out for external factors like events or even advertising as these may skew your results. When in doubt, run a follow-up test.
In A/B testing, statistical significance is the best version that Version A is better than Version B. But this is if the sample size is large enough. To test your campaigns, you need enough people to get meaningful results. If you have a high-traffic site, you can complete A/B testing quickly because of the constant flow of visitors. However, if you have a low-traffic site, testing will become longer.
A hypothesis is a proposed explanation made on the basis of limited evidence that serves as a jump-off point for further research.You need a proper conversion research to discover where the problems are, and then come up with a hypothesis to solve the problem. If you do A/B testing without a clear hypothesis, and found that B is better by 15%, what have you really learned? We want to learn about our audience, and this helps us improve our customer theory in order to come up with better tests.
If A beats B by 105%, that’s not the whole picture. Averages lie. Many testing tools may have built-in results segmentations, but these have no match for what you can do in Google Analytics. Sending your data to Google Analytics may help segment your data and create custom reports for it, so you can have a better view of your results.
Use your traffic on high-impact factors on your website instead of things like color. Test data-driven hypotheses.
So you set up a test and it failed. Most first tests fail. Run the test, learn from it, and improve your customer theory and hypothesis. Run a follow-up test, learn from it, improve, and so on.
Statistical significance is not the only thing to pay attention to. Understand what false positives are. The more variations of something you test, the higher the chance of a false positive. It’s better to do a simple A/B test. This will get you faster results and you’ll learn faster so you can improve your hypothesis sooner.
While running multiple tests at the same time may save you time, this may skew results if you’re not careful. If you want to test a new version of several layouts in the same flow at once, consider running multi-page experiments or multivariate testing to attribute results properly.
The reality is, if your website is already good, you’re not going to get a huge spike in lifts all the time. These massive lifts are in fact, very rare. Most winning tests are going to get small gains. It all depends on the absolute numbers you’re dealing with! Look into these gains from a 12-month perspective. When you keep on getting those small wins, eventually it will add up in the end.
A day without a test is a wasted day. Testing is learning about your audience, why it works, and what works. You won’t know what works until you test it. So keep optimizing your website.
So you have a decent sample size, confidence level, and test duration. But this doesn’t mean that your test results are valid. There are several threats to the validity of your test such as the instrumentation effect, where the testing tools cause flawed data in the test. Sometimes, it may be the history effect where something outside in the world happens and causes flawed data in the test. It may also be the broken code effect where the element is displayed poorly on some devices. Keep an eye out for these threats to ensure smooth testing.
One of the biggest problems with A/B testing is testing the wrong pages. Wrong testing may result in wasting time, resources, and money. So how do you know which page to test and optimize? Look into your home, contact, about, and product pages. When you have an eCommerce store, it’s especially important to test out the product pages. Optimizing these pages may help boost sales for you.
One way to mess up your A/B testing results is changing your setup in the middle of the test. Sudden changes will invalidate your test and skew results. If you need to change anything from the test, you must start over so you can get a reliable set of results.
While case studies are helpful in knowing what can work for you, do note that certain techniques work differently for these companies. Avoid copying what worked for others, because your business is unique and what worked for others may not work for you. Instead, use these case studies as a starting point in crafting your A/B testing strategy.
When done right, A/B tests can help you improve your business and optimize conversion. Let’s look into three examples of successful A/B testing case studies below.
Electronic Arts, or EA, is a media company popular for their Sims franchise. When they released the newest version of SimCity 5, they wanted to capitalize on its popularity and improve sales.
EA wanted to A/B test different versions of its sales page to know how it could increase sales. The control version of the pre-order page offered 20 percent off a future purchase for anyone who bought SimCity 5, while the other version did not promoted a discount.
The variation performed more than 40% better than the control. It was found that the fans of SimCity 5 were simply interested in buying the game as avid fans, and were not looking into an incentive. The result of the test was that half of the game’s sales were digital.
High performing e-commerce stores have one thing in common: a great visual look. Appealing to shoppers visually creates a better chance at converting them. This is what SmartWool Socks looked into. Their goal was to increase the basket size per customer. They hypothesized that changing the layout of the product images in a repetitive fashion and same size will result to more sales per visitor.
The store experimented with different layouts for their store, and created a look so that it looked like the grid style and appeared more uniform. They set up two pages for A/B testing: one page with a mix of large and small images, and another page with images in uniform sizing. After testing 25,000 visitors, they found that using the grid layout led to a 17.1% jump in returning visitors.
A health insurance site called Humana worked with a marketing firm to do A/B testing for them. They created a small change to their banner, which resulted to a clickthrough rate increase of 433%.
Humana’s initial banner was very cluttered with too much copy and no easily visible CTA button. When they made the change in the banner, the design was cleaned up and added an eye-catching CTA button that captured attention. The button’s copy and background photo were also changed.
As a rule of thumb, A/B testing must be done for a minimum of seven days. This is to account for the fact that some days will have more traffic, sales, and other factors than others. This is also to ensure that you reach statistical significance. When it comes to data, it’s always better to have more data than less data. Factor testing time into your A/B testing plan at the start, so you won’t feel rushed or tempted to cut it short too early.
So now that you’ve run your tests and received returns on your conversions, you want to look into several metrics which will be valuable for you in the long run. Here’s what you need to look out for:
There will be visitors who land on your landing page and leave without doing anything. This is called the bounce rate. And that’s just how it is – you can’t retain every visitor. But improved conversions combined with a high bounce rate means there’s more to do. The improved conversions prove you’re doing something valuable.
This is similar to the bounce rate because it also measures departing visitors. An exit rate looks into people who get off your landing page to explore your site further. This may mean that you’ve piqued their interest enough to want to learn about your product more. But if you notice an inordinate number of visitors leaving on a certain page, this page may be turning them off.
These are simply averages. You’re looking into the average time people spend time on a website or the average number of people who visit a page. If you’re not seeing the average you want, you’ll need to rework the pages and relaunch the A/B test.
Multivariate testing uses the same core mechanism as A/B testing, but analyzed more variables, and reveals more information about how these variables interact with one another. As in an A/B test, page traffic is split between different versions of the design. The purpose of a multivariate test is to measure the effectiveness each design combination has on the ultimate goal.
Once a site has received enough traffic to run the test, the data from each variation is compared to find the most successful design, as well as to likely reveal which elements have the greatest positive or negative impact on a visitor’s interaction.
The most commonly cited example of multivariate testing is a page on which several elements are up for debate — for example, a page that includes a sign-up form, some kind of catchy header text, and a footer.
To run a multivariate test on this page, rather than creating a radically different design as in A/B testing, you might create two different lengths of sign-up form, three different headlines, and two footers. Next, funnel visitors to all possible combinations of these elements. This is also known as full factorial testing, and is one of the reasons why multivariate testing is often recommended only for sites that have a substantial amount of daily traffic. The more variations that need to be tested, the longer it takes to obtain meaningful data from the test.
After the test has been run, the variables on each page variation are compared to each other, and to their performance in the context of other versions of the test. What emerges is a clear picture of which page is performing best, and which elements are most responsible for this performance. For example, varying a page footer may be shown to have very little effect on the performance of the page, while varying the length of the sign-up form has a huge impact.
When using multivariate tests, consider how they will fit into your cycle of testing and redesign as a whole. Even when you are armed with information about an element’s impact, consider doing additional A/B testing cycles to explore other ideas.
P-value is created to show you the exact probability that the outcome of your A/B test is a result of chance. This is then used to calculate statistical significance. The statistical significance shows you the exact probability that you can repeat the result of your A/B test after releasing results.
Lift is the percentage difference in conversion rate between your control version and a successful test treatment. When it comes to A/B testing, it’s not about the open or conversion metrics, but it’s about the lift. This is calculated by splitting your test groups, and then tracking both groups to see the difference in conversion.
A Type I error occurs when you incorrectly reject a true null hypothesis, or a false positive. For example, you conducted a test where you compared the happiness levels between people who bought a product on sale versus people who bought a product on regular price. Your null hypothesis would be that there is no statistically significant difference in happiness between the two parties.
Suppose that there was no real difference in happiness between the two groups. If your statistical test was significant, you would have then committed a Type I error as the null hypothesis is actually true. In other words, you found a significant result merely due to chance.
On the other hand is the Type II error, which fails to reject a false null hypothesis. This would be a “false negative.” To return to our product example, suppose that you found there was no statistically significant difference between your groups, but in reality, people who bought a product on sale are much happier. In this case, you incorrectly failed to reject the null hypothesis, because you said there was not a difference when one actually exists.
Let’s talk about basic building blocks of A/B testing so you can conduct your test smoothly:
Often called the average, this is a measure of the center of the data. This is a useful predictor of data.
This is thought to be as the average variability of data around the mean (center) of the data. The variance is a way to quantify just how much variability we have in our data. The main takeaway is that the higher the variability, the less precise the mean will be as a predictor of any individual data point.
This is a rule that assigns a probability to a result or outcome. Remember that the probability of the entire distribution sums to 1 (or 100%).
The test statistic is the value used in the statistical tests to compare the results of two (or more) options, our ‘A’ and ‘B’. It might make it easier to just think of the test statistic as just another KPI. If the test KPI is close to zero, then we don’t have much evidence to show that the two options are really that different. However, the further from zero the KPI is, the more evidence we have that the two options are not really performing the same.
The new KPI combines both the differences in the averages of the test options, and incorporates the variability in our test results.
As you optimize your web pages, you may find that there are a number of variables you want to test. But to evaluate how effective a change is, you want to focus on one variable and measure its performance to ensure that this is responsible for changes in performance.
Keep in mind that even simple changes like changes in email image can drive big improvements.
Although you’ll measure a number of metrics for every one test, choose a primary metric to focus on before you run the test. Do this before you even set up the second variation. Think about where you want this variable to be at the end of the A/B test. Then you may start to create an official hypothesis and examine your results based off of this prediction.
Now that you have variables and a desired outcome, use this information to set up the unaltered version of what you’re test as the control setup. From there, build a variation setup that you’ll test against your control.
For tests where you need control over your audience, it’s best to test with two or more audiences that are equal in order to have conclusive results.
How you determine the sample size will depend on your A/B testing tool, as well as the type of test you’re running. If you’re testing something that doesn’t have a finite audience like a website, then how long you keep your test running will directly affect your sample size. The test will have to run long enough to obtain enough views to get conclusive results.
Once you picked your metrics, think about how significant your results have to be to justify choosing one variation over the other. Statistical significance comes into play here. The higher the confidence level, the more certain you will be about your results.
Testing more than one thing can complicate your results. This is measure that this causes the changes in your conversion.
We’ve established how helpful A/B testing is to optimize conversion rates, especially when done well. However, there are at least three common A/B testing problems that may lead to inaccurate results, and even poor decisions.
Also called statistical fluctuation, this phenomenon describes random changes in a set of data that are not related to the stimulus measured. An example would be a coin toss. Many large studies show that in a fair coin toss, or a series of fair coin tosses, heads will appear at a ratio of 50%. This holds true for tails too. So how do you avoid making this mistake? It may be as simple as considering increasing the sample size until a significant number of visitors have participated in the test.
When A/B testing is used for conversion rate optimization, often the only element looked into is the conversion rate. And this may lead to inaccurate results, and even a flawed test. To avoid this, make sure to examine all available data, and not just the conversion rate.
One of the most common A/B tests is a website redesign. When done well, these tests can result in good insights. However, if the pages tested are radically different, the test may be trying to measure too many variables. Be mindful to not do too many tests to avoid wasting resources and time testing insignificant elements.
Unless you’re an A/B testing expert, you’ll soon realise that most of your A/B test results won’t always get the best results. You may even experience disappointment if you’ve tried A/B testing. So what can you do with failed A/B tests?
The good news is you can turn them into better tests with a better chance of succeeding. To help ensure your success and maximise your learnings, here are some things to keep in mind:
A simple way to look for learnings and even uncover a possibly winning test result is to segment your A/B test result for key visitor groups. Ideally, you want to set up segments for each of your key visitor groups and analyze those. To improve this, you can analyze your test variations in tools like Google Analytics to understand differences in user behaviour for each test variation and look for more learnings.
A common reason for poor A/B test results is because the hypothesis was not very good. This may be because businesses often just guess at what to test with no insights being used to create each idea. Without a good hypothesis, it will be hard to learn if the tests fails. A strong hypothesis is created from insights taken from conversion research like web analytics, surveys, and user testing.
One thing to consider: if your variation did not win, it doesn’t always mean your hypothesis was wrong. If your research shows great support for your hypothesis, then look into your variation. Test it again, and consider making it a bolder variation. If it still fails, it may be time to change your hypothesis.
What you track is as important as who you test it on. Optimizing a single metric gives you a skewed picture of how users respond to your changes. Dive into your analytics and segment everything about the pages you have tested. From here, dig into behaviour related metrics and try to identify wings from your failed variation pages. If your primary goals is for sign-ups, pay attention to micro-conversions like CTA clicks, bounce rates, or other actions.
Check if the variations created for the test are truly different for the visitors to notice. If the variations are subtle, such as small changes in images or wording, visitors often won’t notice these differences hence we won’t see a winning test result. Ensure that the differences in variation are bold enough to be noticed and create at least one bold variation.
Use calculators inside reports to see if the experiment received a sufficient sample size to pull out a significant result. If not, just wait until it does.
If the results are inconclusive, it’s best to stick to the control setup. After adding a lot of changes in your website from inconclusive results, you’ll end up with so many elements that may reduce the ability to reach significance.
An exception to this is if you’re testing a legal requirement or shift in branding. In those cases, you may likely implement the treatment to all the traffic.
A/B testing has become a valuable tool for marketers and researchers seeking to improve the digital space. However, this has contributed a lot in the decision making during instances that we forget that it is a tool. While you can never go wrong with following what the data shows, it’s easy to be misled by what we think is the right choice. Gathering enough data is very important, but considering the context of the users and creating an actionable way from it may be even more important.
With connection and devices constantly improving and changing the way we engage, user behavior is not as predictable as it once was. Data scientists have found that several conditions like internet connection speed, time of day, device type, and geo location may heavily affect how a user responds to different website elements.
If an A/B test shows that 64% of visitors prefer the control over the variation, then the control is the best choice. But is it really? What happens to the variation picked by the remaining 36% visitors? The downside of the test is that you’ll never know how the other set of visitors will respond to your website change. So this is where contextual testing comes into play. There should always be a control group when testing, but it’s important to segment visitors to see how each responds to different variable changes.