Extra_Trial Consciousness Change Strategy to Promote Business Growth #

Hello, I’m Kaiyue. It’s an honor to write an extra article for Teacher Bowei’s column. This article is related to my experience learning A/B testing. At the same time, as a product manager at Geek Time, our team’s experimentation mindset has also undergone a process from 0 to 1.

A year and a half ago, I started self-studying A/B testing and searched a lot of articles and courses online to learn. However, there were few useful resources, and their quality was uneven, and the explanations were not thorough enough. As a result, I spent a long time judging the correctness of the information and made many mistakes.

Therefore, after Teacher Bowei’s column went online, I followed it every week. The more I learned, the stronger my interest became. I thought to myself, it would have been wonderful if I had come across this column in the early stages of my learning.

In this extra article, I will share the whole process of our team’s introduction, application, and establishment of an experimentation mindset based on A/B testing with you who are currently learning.

Experimentation Mindset Changing Decision-Making Patterns, Driving Business Growth #

Geek Time did not start using A/B testing from the early stages of its product. Instead, it went through four stages: correction, introduction, application, and summary, eventually forming a strong experimentation mindset.

Correction: Changing the incorrect understanding of A/B testing and establishing the correct understanding.
Introduction: Introducing the methods and tools of A/B testing into the decision-making process instead of making decisions on the fly.
Application: Using A/B testing to solve practical problems one by one.
Summary: Retrospecting the experience and forming an experimentation mindset.

After going through the four stages of development, we have established a complete experimentation process and formed an experimentation mindset. There are two key points:

Whenever encountering a product decision-making problem, think of A/B testing first.
Persistently use A/B testing in the long term.

Here, I would like to explain in detail the correction of our experimentation mindset. It is precisely the correction in our mindset that has led us to change the decision-making pattern, switching from a single decision-making pattern based on experience to a systematic decision-making pattern based on both experience and experimentation mindset, and continuously driving business growth.

Mindset Correction #

I used to think that A/B testing is just setting up two versions, letting two groups of users use them, and the version with a higher conversion rate wins, then it can be released online.

But is it really like that? Is this decision-making process scientific? If A/B testing is so simple, why are there still many internet companies not using it?

Let’s take an example. The conversion rate of version A of a details page is 1.76%, and the conversion rate of version B is 2.07%. As a product manager, which version would you choose? If there is another version C with a conversion rate of 2.76%, and another version D with a conversion rate of 11.76%, according to the decision model of “whichever version has a higher conversion rate should be launched”, we should immediately launch version D. In the past, I thought the same way, but I actually misunderstood A/B testing.

In the example I just mentioned, there are three issues:

First, the experiment only extracted a part of the users to draw conclusions, not all users. So when all users are using version D, will the conversion rate still be 11.76%?
Second, the conversion rates of versions B, C, and D are 2.07%, 2.76%, and 11.76% respectively, and the improvement relative to version A is 0.31%, 1%, and 10% respectively. Does the higher the difference, the higher our confidence to launch that version? Obviously not, we still need to consider whether the improvements of 0.31%, 1%, and 10% are actual or caused by experimental errors.
Third, how large does the difference need to be for us to make a judgement? In other words, if we launch version B, can it really bring an improvement in conversion rates? How much actual improvement will there be?

Because these three issues lack data support, they cannot be answered, so we cannot make a decision on whether to launch version A or version B. We still need to collect more information to answer these three questions.

Answering these three questions involves understanding scientific A/B testing. What is scientific and standardized A/B testing? Bo Wei’s column has already provided the answer. A/B testing is not as simple as imagined, it is a scientific experiment that involves sampling, significance testing, software engineering, psychology, and many other aspects. The key is to focus on whether the experimental process is scientifically rigorous and whether the results are reliable, so that decisions based on such A/B testing results can truly promote business development.

Introduction of A/B Testing #

Why introduce A/B testing? Geek Time’s user base has already reached over a million, and we need to go from rapid growth to refined operation. User growth and data-driven decision-making cannot do without the tool of A/B testing. It can use a small portion of the traffic to conduct experiments without making significant changes, verify hypotheses, and optimize products to promote user retention and engagement.

During the introduction, we took action in three aspects.

First, we systematically learned about A/B testing. When we started learning, we searched for a lot of information, although the quantity was large, most of them were similar. But after continuous learning, we summarized our own experimental process and tried to apply it. Of course, we also encountered many pitfalls and fell into many misconceptions.

So when the editor planned the column “A/B Testing from 0 to 1”, we found it very practical. Bo Wei provides detailed explanations for beginners or advanced learners regarding the problems they encounter in the learning process, unclear details, and the “pitfalls” they need to avoid.

Second, we built our own traffic splitting system. After learning the theoretical knowledge, we needed to request the development team to build tools. We built our own traffic splitting system and made the entire A/B testing process work, so as to truly help decision-makers make judgments.

Third, we incorporated A/B testing into the product iteration process. Now, before iterating on important products, we always conduct A/B testing with multiple versions. This has become the consensus of the team.

After introducing and establishing the concept and awareness of A/B testing, the next step is to put it into practice. Bo Wei has also emphasized multiple times in his column that A/B testing is highly practical and requires continuous iteration and improvement in actual business scenarios. Now I will use two practical cases to show how Geek Time utilized A/B testing to verify hypotheses and iterate on products from 0 to 1.

Application of A/B Testing in Practice #

Geek Time has several important business indicators, among which conversion rate and repurchase rate are particularly important. So I have selected two representative cases to explain. In case one, we used A/B testing to validate a hypothesis to improve repurchase rate. In case two, we used A/B testing to select a details page with a high conversion rate. Both cases demonstrate the necessity of experiments and experimental awareness.

Case Study 1: Can Eye-Catching Coupon Design Increase Repeat Purchase Rate? #

Background #

The operations team wants to increase the repeat purchase rate for users who have completed their first order. They came up with an idea to make the coupon display more eye-catching to encourage users to use it. However, this idea was not approved by the product manager for several reasons:

Firstly, the current version already has a coupon display module.
Secondly, the overall coupon usage rate is not high, and analysis of historical data shows that coupons are not effective in promoting repeat purchases.
Lastly, and most importantly, the current version has a “reward for sharing” feature. Users can share courses in the form of posters on their Moments, and they will receive cash back if their friends purchase through the poster. This method can also promote repeat purchases and attract new users.

Both the operations team and the product manager have their own reasons, so in this situation where they couldn’t convince each other, they decided to use A/B testing to solve this problem, and the results of the experiment were quite unexpected.

Experiment Design #

The existing solution is that after a user completes their first order, a pop-up window appears where the user can choose to use the coupon to purchase the course or share it with other users to receive cash rewards. The operations team proposed a hypothesis that displaying high-value coupons with a more eye-catching style can increase the repeat purchase rate. The hypothesis can be stated as “Eye-catching coupons can encourage users to immediately use the coupon, thereby increasing the probability of repeat purchases.”

Here it should be noted that the system automatically sends the coupon to the user after they complete their first order, so there is no need for the user to manually claim it.

Therefore, the UI style for the experimental group was created:

Next, the experiment was designed according to the standardized process of A/B testing:

Define the goal and hypothesis. The goal is to increase repeat purchases, and the null hypothesis is that there is no difference in the repeat purchase rate between the experimental group and the control group.
Determine the metrics. Use the repeat purchase rate as the measurement metric, while also considering new user numbers and revenue. (Repeat purchase rate = number of users with 2 or more paid orders / number of users with only 1 paid order)
Determine the experimental unit. Use the UID as the experimental unit.
Determine the sample size. We set the difference between the experimental group and control group as 0.6%. This difference also has other names, such as minimum detectable effect or practical significance. The minimum sample size required is calculated to be at least 8074.

Implementation of the Test

After analyzing historical data, it was determined that there was no significant periodic variation in the user sharing rate and coupon redemption rate. Therefore, the test duration was determined based on the sample size and traffic.

After preparing, the development team started using their self-built traffic splitting system and launched the test.

Result Analysis #

There were 17,652 users who entered the experiment. With 80% power and 95% confidence level, the confidence interval did not converge, and the p-value was greater than 0.05, which means we fail to reject the null hypothesis. We continued the test for some time and found the same result. Therefore, it was concluded that the experimental group did not perform better than the original version.

The results calculated using the R language’s prop.test function are shown in the following figure:

The summary of the experiment results is shown in the table below:

During the experiment, we also collected two additional indicators:

Through these auxiliary indicators, we found that the original version brought in more users and motivated users to share and promote purchases. After analysis, we ruled out the possibility that “most new users were brought in by a few old users’ sharing.”

Making a Decision #

Based on the experimental data, the confidence interval for the experimental group contains the value “0,” which means that the conversion rate of the experimental group may increase by 0.098%, or it may decrease by 0.733%.

Furthermore, in terms of acquiring new users, the original version is 5 times better than the experimental group; and in terms of transaction amount, the former is 3.6 times higher than the latter. The difference is significant, which surprised us. Fortunately, we had the awareness of conducting experiments and tested the idea through A/B testing. If we had made a decision without proper testing, it would have caused losses to the company.

Based on these reasons, we decided to continue using the original version.

Case Study Reflection #

In this case study, the decision-making approach of “bold hypothesis, careful verification” was adopted. When the idea of “stimulating repeat purchases through eye-catching coupon design” was proposed, the product manager immediately thought of using A/B testing to verify the feasibility of the idea. Instead of rejecting it recklessly or blindly accepting it, they were driven by the awareness of experimentation and used the A/B testing method to collect and analyze data, leading to a scientifically-based decision. This is also the first key point of experiment awareness that I mentioned at the beginning of this article: when it comes to decision-making regarding product changes, the first thing to consider is A/B testing.

Case Study 2: Selecting a High Conversion Rate Details Page #

Having learned from past experiences, we have developed the habit of continuously using A/B testing during product iterations.

Background #

The course details page of our app needs to be iterated. The product manager wonders if strengthening the promotional price can improve the conversion rate of the details page.

Experiment Design #

We designed two UI styles, as shown in the following image:

Determine the Metric. Use conversion rate as the evaluation metric.
Determine the Experimental Unit. Use UID as the experimental unit.
Determine the Sample Size. We set the difference between the experimental group and the control group to be 1.5%. After calculation, we need a sample size of about 17,000. Since we have a large amount of traffic, with the original traffic distribution plan, we can reach the minimum sample size within 1-2 days. As user activity drastically decreases on weekends, in order to cover a user’s active cycle and avoid novelty effects as much as possible, we appropriately reduce the proportion of experimental traffic to total traffic and set the duration of the experiment to one week.
Implement the Test. After making the necessary preparations, the development team launched the test.

Results Analysis #

To avoid a “learning effect,” we continuously monitored the metrics daily after launching the experiment. The changes in various metrics were stable and met expectations, ruling out the “learning effect.”

The experiment results are as follows:

Experiment Results - There were 23,686 users involved in the experiment. Under an 80% effect size and a 95% confidence level, the confidence interval did not converge, and the p-value was greater than 0.05, which means we cannot reject the null hypothesis. There was no significant difference between the two versions.

At this point, we were at a deadlock. The experiment results were not significant, and increasing the sample size to reduce variance did not change the outcome. So, how do we make a decision?

Making a Decision #

Since the confidence interval did not converge, we cannot determine which version to use based on the experiment results. Therefore, we need to consider other factors to make a decision. The overall style of the app is simple and lively, without large color blocks. In addition, the eye-catching “big color block” did not increase the conversion rate but divided the page into two sections.

Based on the considerations of UI style, we decided to use Version A.

Case Reflection #

Experiment results may sometimes contradict intuition. Data obtained from rigorous experiments can effectively reflect the real situation of users. The premise of data-driven decisions is having data, and the premise of having data is consciously conducting experiments and collecting data.

The results of many experiments may not provide clear decision-making criteria, and product managers need to make subjective decisions. This does not mean that experiments are not effective. The role of experiments is to exclude ideas that can be validated through experimentation and provide sufficient evidence. For problems that cannot be solved through experimentation, the decision-making authority is handed over to the “expert system,” relying on the experience of the person in charge or the team to make decisions.

Conclusion #

That concludes today’s main content, where I summarized some experiences our team has accumulated in optimizing decision-making models and driving business growth.

The A/B testing method is a validated best practice, and we should incorporate the awareness of experimentation into our mindset. When encountering growth or decision-making problems, our first thought should be, “A/B testing may be a good solution.” This is the first key point of experiment awareness.

The second key point of experiment awareness is that A/B testing should be persisted in the long term and form a cycle, rather than just being a closed loop. If we consider the process from problem discovery to experiment result deployment to effect regression as a closed loop, then we need to add the verb “continuous” before problem discovery, such as “continuously discovering problems.” This allows experiment awareness to form a cycle and trend upward. The importance of this awareness lies not in whether individual experiments are effective or not, but in enabling us to verify through experimentation and maintain this habit before making decisions.

By establishing experiment awareness, our decision-making model is no longer limited to relying on experience and intuition. The combination of experiment awareness and experience becomes our decision-making system. Due to the probabilistic advantages of this system, although individual decisions may be effective or ineffective, the cumulative effect of each small improvement in the long run can drive the overall growth of the business, and the experience embedded within will produce stunning results.