14 Reflect on Three Points for Ab Testing Interview Essentials Part 2

14 Reflect on Three Points for AB Testing_Interview Essentials Part 2 #

Hello, I am Bo Wei.

In today’s interview lesson, you will find that you have already grasped most of the knowledge points being tested. However, what I want to emphasize is that knowledge is the foundation of your business improvement and is also an important aspect that is assessed during interviews. What is even more crucial is your ability to apply the knowledge in different scenarios and know how to use it. This is the ability to transform knowledge into problem-solving skills, which is your competitive edge in interviews.

Alright, let’s seize the opportunity and continue discussing the interview theme, helping you to strengthen your foundation and remain calm during interviews!

Interview Question 1 #

Assume you are currently responsible for running an A/B test that is expected to run for 2 weeks based on the sample size calculation. However, your colleagues in the business department are monitoring the test results every day and after one week they observed significant results. At this point, they believe that since the results are already significant, they want you to stop the test and implement the changes being tested. Because your colleagues in the business department are not familiar with statistics, how would you explain to them in plain language why you cannot stop the experiment at this time?

Key points:

  1. Multiple testing issue
  2. Layman’s explanation of statistical principles

Solution:

In fact, I provided a similar practical background on analyzing test results in the 7th section. Continuously checking the results before reaching the required sample size will cause a multiple testing issue. Once multiple testing occurs, our previous efforts will be in vain. Therefore, in this question, you need to first point out the multiple testing issue and then explain the reasons behind it, as well as the specific consequences it may cause (increased false positive rate, inaccurate experimental results).

You can also see that if the question only focuses on the multiple testing issue, it would be too easy. Consider the question carefully, and you will realize that the interviewer wants to test your ability to express and understand concepts in plain language, in other words, how you can use plain language to explain complex and difficult statistical concepts and principles to your business colleagues. In actual work, there is often a need to communicate with colleagues without a statistical background about A/B testing-related topics, so interviewers also like to assess the candidates’ ability in this area.

Moreover, this is actually an indirect way of testing whether the interviewee has truly internalized the relevant statistical knowledge. After all, if you only memorize the concepts without being able to flexibly apply these principles in practice, let alone explain these principles in plain language to someone without a statistical background.

You may ask, what exactly does it mean to use plain language? It’s actually quite simple – speak in everyday language. In my experience, “avoiding one thing and using two more” is the key.

“Avoiding one thing” means trying to avoid using statistical terms (such as p-value, Type I error, hypothesis testing, etc.) as much as possible.

On the one hand, using professional statistical terms will increase the time cost and communication barriers in your communication. Business colleagues do not understand these terms. When you explain them using professional terms, you will need to spend more time explaining the terms. Not only is it difficult for the other party to understand, but also the purpose of your communication is not achieved.

On the other hand, think about why you need to explain to your business colleagues? The main reason is to tell them that the experiment cannot be stopped at this time. Therefore, just explain clearly why the experiment cannot be stopped at this time, and use as few terms as possible.

“Using two more” means using more analogies and examples. Especially using analogies and examples from everyday life will be a very good way.

For example, A/B testing is actually comparing the performance of two groups, and since there is a comparison, there is a concept of winning or losing. In this case, you can choose any event in daily life where there are different results of winning or losing each time it occurs as an analogy.

I prefer to use sports games as an analogy, such as basketball, which is very similar to A/B testing. Each NBA basketball game has a predetermined time: 48 minutes. Moreover, the result of a basketball game is based on the final result after the game. If you check the score at any time before the end of the game, either side could be leading, but we do not consider the intermediate results as the final result.

Similarly, returning to the A/B test, if we have not reached the required time and we announce that the experiment is completed and stop it as soon as we see significant results, it is the same as declaring a winner based on which side is leading during a game before it ends.

Returning to our interview scenario, the multiple testing issue is actually quite common in work. Especially when business colleagues do not have a strong statistical background, they may only rely on p-values to make decisions without considering the prerequisite of sufficient sample size. Therefore, it is particularly important to explain these statistical principles in plain language.

Interview Application 2 #

A certain product now wants to change its trademark and wants to measure the impact of the new trademark on its business. How should this be done?

Key Points: - The scope of A/B testing and alternative methods

Solution:

If you are familiar enough with the knowledge discussed in Lesson 12, “When A/B Testing is Not Appropriate,” you will know that A/B testing cannot be used to measure the impact of a trademark change. As I have mentioned, “when a significant event is released,” A/B testing is not suitable, and changing a trademark is one of those events. After all, a trademark represents the image of a product and company. If a product has multiple trademarks circulating in the market simultaneously, it will confuse users and have a detrimental impact on the product’s image.

Do you remember the two alternative methods for A/B testing? They are non-experimental causal inference methods and user research. However, in this context, non-experimental causal inference methods will not work because this trademark is brand new and there is no historical data related to it. Therefore, user research is the method we ultimately choose.

In this case, we only need to collect users’ opinions on the new trademark. Therefore, we need a relatively large sample size to ensure representativeness of the feedback. However, it will not involve in-depth issues such as user experience. In this case, we can use surveys to collect user feedback, providing us with directional guidance. It will let us know whether users have a more positive or negative feedback compared to the existing trademark.

If we obtain overall positive feedback from the survey, the team decides to discontinue the existing trademark and introduce the new one. At this point, we can measure the impact after replacing the trademark by comparing the changes in the North Star metric before and after the introduction of the new trademark. A more accurate method is to build a model.

We can establish a time series model for the North Star metric using historical data. We can use the data before the introduction of the new trademark to train this model, which can also predict the trend of the North Star metric without the new trademark. Then we can compare the model’s predicted data with the actual data after the introduction of the new trademark to infer the impact of the new trademark based on the difference between the two.

In summary, the answer strategy for this question is to explain why A/B testing is not applicable in the given scenario and then provide user research and modeling as alternative solutions.

Interview question 3 #

A social networking website is preparing to recommend friends to its users. They want to introduce a new feature called “People You May Know” in the upper right corner of the homepage. How should they design an A/B test to truly measure the effectiveness of the underlying recommendation algorithm for this feature? Assume there is no network effect here.

Key points: - A/B test group design

Solution:

When you see a social networking website in the question, you may immediately think of network effects. However, after reading the question, you find that there are no network effects assumed.

You might think that if you want to introduce a new feature without considering network effects, it must be a conventional A/B test design. So you randomly split the users into two groups: the control group without the “People You May Know” feature and the experimental group with this new feature. Finally, compare the metrics of the two groups to determine the effectiveness of the recommendation algorithm for the new feature.

You see, there’s nothing difficult about it! If you really think so, you have unknowingly fallen into the trap set by the interviewer.

If we carefully read the scenario described in the question, we will find that this new feature is in the upper right corner of the page, which means that adding this new feature also involves changes to the user interface.

If we design the experiment according to the experimental grouping we just mentioned, comparing the experimental group with the control group actually changes two factors: adding the recommendation algorithm and changing the user interface. In this case, even if the metrics of the experimental group improve relative to the control group, we cannot determine which factor is at work.

So the key point of this question is how to separate these two potential influencing factors. In practice, the usual solution is to design multiple experimental groups, each of which changes only one factor while sharing a control group, i.e., the state before the change.

Does this method sound familiar? Yes, that’s right. This is the A/B/n testing that I mentioned in Lesson 9. However, the situation in this case is special because if we want to add a recommendation algorithm, it will definitely change the user interface, which means that one factor needs to depend on another and cannot exist independently.

However, if we think about it the other way around, changing the user interface does not necessarily mean adding a recommendation algorithm, so we can design each group in a progressive relationship:

  • Control group: The original version before the change.
  • Experimental group A: Adding the “People You May Know” feature, with the recommended content randomly generated.
  • Experimental group B: Adding the “People You May Know” feature, with the recommended content generated by the recommendation algorithm.

We can see that experimental group A only changes the user interface compared to the control group because its recommended content is randomly generated. On the other hand, experimental group B only adds the recommendation algorithm compared to experimental group A, and the user interface is the same for both. This way, we can measure whether changing the interface has an impact by comparing the control group with experimental group A, and determine if the underlying recommendation algorithm for the new feature is effective by comparing experimental group A with experimental group B.

Job Application Four #

A social platform has developed a new interactive interface in the hope of increasing user likes. The team conducted an A/B test by randomly assigning a portion of users to the experimental group with the new interface. They found that the experimental group had an average like count 5% higher than the control group, and the result was significant. So, if the new interface is promoted to all users, do you think the average like count will increase by more than 5% or less than 5%? Why? In this case, we assume no learning effect.

Key Points: - Network effect

Solution:

When you see “social platform,” you should think of the “network effect.” After the previous learning, you should have developed a firm understanding of this.

This question is not difficult, as it tests the network effect and its causes. However, through this question, I want to make you aware of the specific scenarios of the network effect and the relationship between the actual improvement and the experimental results after the development of a new interactive interface on a social platform under the influence of the network effect.

In the absence of a learning effect, due to the nature of a social platform, there is a network effect, which means that random assignment cannot guarantee the independence of the experimental and control groups, indicating that the independence of the two groups has been compromised.

Specifically, if a user A in the experimental group likes a content because of the new interface, this liked content will be seen by A’s friends in the control group. It is also possible that user B in the control group will like this content. Therefore, this modification to the new interface not only affects the experimental group but also affects the control group through the network effect. This means that the average like count of the experimental group will increase, and so will the average like count of the control group.

From this perspective, the 5% improvement is actually the result after the influence of the network effect. The actual improvement effect should be greater (i.e., only the metrics of the experimental group improve while the metrics of the control group remain unchanged), meaning it is greater than 5%.

Therefore, when we promote this new interactive interface to all users, which means there is no control group, the actual improvement effect compared to the old version should be greater than 5%.

Conclusion #

Our journey through the A/B testing interview questions wraps up here. You may have noticed that these common topics have been covered in the previous lessons. As long as you study the content of this column carefully, you shouldn’t have major problems.

Lastly, I would like to emphasize one point. Most of the interview questions we discussed in these two lessons directly mention A/B testing. However, the forms of examination for A/B testing vary in interviews. Sometimes, the question doesn’t explicitly mention A/B testing, but it is an integral part of the answers to these questions. For example, when asked to evaluate the effectiveness of a new product feature or whether to proceed with a product change, your answer will definitely involve defining goals and metrics to measure the impact of the new feature, as well as designing A/B tests to validate its effectiveness. In conclusion, whenever you are asked to make causal inferences and quantify the impact of changes, A/B testing is your good companion!

Thought Question #

Here, let’s put our minds to work. If you were to explain the related terms of A/B testing in plain and simple language (without using statistical definitions or referencing other terms), how would you explain them? Choose one or two and try to explain them.

Feel free to share your explanations in the comments section. Let’s discuss and exchange ideas together. And if you found this interview lesson helpful, please feel free to share it with friends who might find it useful.