A/B Testing Mobile Apps
Why A/B Test My App?
Last year over 70 million mobile app downloads generated $26 billion in revenue. Mobile devices are proliferating, and the native apps that run on mobile devices are becoming an indispensable medium though which to engage, entertain, transact and build loyalty with your audience.
The problem is, users are picky and impatient. And with just one tap or swipe, mobile makes it easy for them to go elsewhere. So how do you get it right? By listening to your users. And they’re talking to you everyday – through data.
A/B testing is the practice of showing different versions of your product to your users, and then identifying which version is better by observing how users respond. It is a proven method of product improvement, and has been successfully applied in the past to everything from telesales pitches and direct marketing campaigns to email subject lines and website landing pages.
But for release-software like a native mobile app, the A/B testing technology used in the past does not work. Moreover, the user interfaces and user experiences on mobile are distinct from other media – meaning that the things you test in an app, and the metrics you use to measure how those changes are affecting performance require a mobile-first approach.
How It Works
A/B testing a native app is technically more challenging than testing a website or mobile website because an app is downloaded and run locally on your mobile device rather than accessed via a live connection through a browser. This makes it more difficult to test out changes and rollout the best performing changes on-the-fly.
It's a hot topic in mobile development and engineers from Facebook, to LinkedIn, WalmartLabs and Socialcam have all weighed in on how to best structure a mobile A/B testing solution. The consensus is that the best way to do so is to replace hard-coded static objects in your app with dynamic objects that can be controlled by a remote server.
This server-driven method raises a potential performance issue: What if the end user's device is not connected to pull dynamic objects from the server when needed? We’ve solved this issue by randomly bucketing users on first app launch and then saving dynamic objects directly to the users’ device memory so they can be accessed without a connection for every session thereafter.
Cohorting is the process of randomly bucketing users on first app launch. Users not selected for a cohort (either because they are not part of any targeted segment or because they do not have a live connection on first app launch) are shown a default variation. Data is collected for all users regardless of whether they are in a cohort or not. When you launch a new test, cohorting will happen randomly once again.
Variation Persistence is the function of saving dynamic objects to the users’ device. It has the added benefit of preserving a consistent user experience across app sessions, and ensuring an experimentally rigorous process of attributing user data to a single test variation.
Default Variations are static (non-dynamic) objects that you can set for users who are not included in a cohort, either because they are not part of any targeted segment or because they do not have a live connection on first app launch. For these users, you can set a hard-coded variation to preserve an uninterrupted user experience. Here's an example: >
Debug Mode - To make sure everything is looking good before you go live in the app store, and to avoid polluting your test results with data generated during debugging and development, you can turn on a "Debug Mode" to turn off variation persistence (all platforms) and shake to select a specific variation (iOS only).
Debug Mode Platforms References:
Equal Distribution - It’s important to know that the values in your experiment are spread uniformly and with a high degree of distributed entropy. The Splitforce library ensures your variations converge quickly on the parameters you choose, by using a high-quality random number generator and ‘split’ percentages set in the experiment builder. Approaching the test this way can help you reach a more accurate conclusion in less time.
But really, the challenges of A/B testing mobile apps go beyond the technical. Users' experience of mobile apps don't really follow the same linear conversion funnel that is so often applied to the Web. Metrics related to engagement - like average session length, retention and average number of times users completed a goal are more significant.
The most important question behind any A/B test is: Why? Why do you dedicate time and resources to run a test? In other words, what do you want to achieve?
Answering this question will guide you in deciding which metrics are the best indicator of your app’s performance, and the corresponding Goal Types you can use with Splitforce.
You can choose from three different Goal Types when setting up a goal:
Conversion Goal can be any metric that is measured as a proportion of users that completed a certain action at least once. It’s value is always between 0 and 1 (or 0% and 100%). Examples: In-App Purchase Rate, User Review Rate, User Registration Rate.
Event Goal can be any metric that is measured as an absolute amount of events. It’s value is always greater than or equal to 0, and is measured in number of event occurrences. Examples: Average Purchases per User, Average Plays per User, Average Sessions per User.
Time Goal can be any metric that is measured as an amount of time. It’s value is always greater than or equal to 0, and is measured in seconds, minutes, hours, etc. Examples: Average Session Length, Average Registration Completion Length, Average Screenview Length.
Quantity Goal can be any metric that is measured as a number of something in the app. It’s value is always greater than or equal to 0, and is measured in number of occurrences or value. Examples: Average Purchase Amount per User, Average Items Purchased per User, Average Friends Invited per User.
Once you’ve identified the key metrics that indicate whether changes to your app are better or worse, it’s time to answer the next question: What changes should you test?
Elements are things in your app that you decide to change and test. Virtually anything in your app can be an Element, including:
- User interface elements such as text font or button color
- User experience elements like level of gravity or a recommendation algorithm
- User workflows can be elements, such as tutorials or purchase checkout funnels
Answering the question ‘What to test?’ will give you the information you need to create your first experiment, and select the Element Types which correspond to the things you want to test.
HINT: Has your team ever had a disagreement or hesitated over which design or functionality to include in your app? It’s normal, and healthy to have competing ideas and opinions about what works and what doesn’t. This is fertile ground for testing, and arriving at decisions for your app that are built on data rather than intuition.
You can choose from five different Element Types when building an experiment:
Texts are elements in your app that are just plain text. They can be used to do things like instruct users through a tutorial, or prompt users to complete a specific task. Some example of text elements include: button copy, product descriptions, in-app messages and menu items.
Colors are elements in your app that are a color. They are an important part of creating a visually appealing app that people love to use, and can also play a role in drawing attention to certain parts of your app. Here are some examples of color elements you may want to test: button color, menu color, background color and text color.
Numbers are elements in your app that can be represented by a numerical value. They can be used to test both visible changes like number of products displayed, or invisible changes like number of products viewed before offering a promotion. Here are some other examples of number elements you can test: the dimensions and coordinates of a button, or the speed and gravity of an object in a game.
Switches are elements in your app that can be represented by a Boolean (On or Off). They can be used to completely switch features on and off to see how the presence of those features affect your performance metrics.
Custom elements allow you to test anything you can code. This is where you can test really deep changes to virtually anything in your app. Some creative ways we’ve seen custom elements used to make improvements include: testing level design in a game, sequence of screen views in a purchase funnel, or inclusion of a feature for sharing content to social networks.