13 Integration of Winning Through Ab Testing Interview Essentials Part 1

13 Integration of Winning Through AB Testing_Interview Essentials Part 1 #

Hello, I’m Bowei.

In the next two lessons, let’s switch gears and talk about a relatively easy topic: interview applications related to A/B testing.

In recent years, with the widespread use of A/B testing in various industries such as the Internet, e-commerce, and advertising, it has become an important part of interviews for positions in data, product, and growth. So, based on my many years as an interviewer, I have summarized common interview points related to A/B testing. On one hand, I will explain the interview thought process through typical questions, and on the other hand, I will share some of my thoughts and reflections from interviews.

I also want to emphasize that although these two lessons are about interview questions, they are also another way of assessing your flexible application of the knowledge you have learned. Interviews not only test your knowledge mastery but also focus on how you can apply it in a work setting. Therefore, I hope that through these two interview lessons, you can learn to deconstruct questions, improve your interview skills, and also integrate and apply the knowledge we have learned.

There are endless interview questions, but the points of consideration are limited. I have summarized the relevant points into a diagram for your focused review. Now, let’s begin with the formal explanation of the interview. -

Interview Application 1 #

A shared travel company has improved the user interface of its driver app, hoping to provide drivers with a better user experience and increase their income. The question is: please design an A/B test to verify whether the new driver app provides a better user experience than the old one.

Key Points:

The process of A/B testing.
Independence of the experimental group and control group.

Solution:

When faced with this interview question, many students initially think about going through the process of A/B testing, so they start answering accordingly.

Determine Objectives and Hypotheses -> Determine Metrics (mention the evaluation metrics and potential guardrails) -> Determine Experimental Units -> Random Assignment (usually evenly divided) -> Determine Sample Size (note that it is important to know which statistical estimates are needed to determine the sample size) -> Implement the Test -> Validity Check (mention specific methods of validity check) -> Analyze the Results (note the criteria for judging using the p-value method and the confidence interval method)

If you only answer the process of designing A/B testing, you may barely pass, as the question sets at least one hidden pitfall, which helps distinguish you from other interviewees.

First of all, you need to pay attention to the specific content of the question when explaining the process. Otherwise, it will seem like you are reading from a script, giving the impression that you cannot apply your knowledge flexibly. If you are unsure of how to answer well, you can refer to the approach and method I used in the eighth lesson to analyze case studies.

However, the biggest pitfall of this question is not here. You need to be more attentive. Take a closer look at the specific context of “shared travel”. If you have a solid understanding of the concept of independence between two groups, as discussed in Lesson 11, you will discover that the interviewer doesn’t simply want you to go through the process of A/B testing.

The interview setting is also a real working scenario, so you need to analyze the specific context thoroughly and realize that it is necessary to maintain the independence of the experimental group and the control group when designing the experiment.

Let’s delve deeper by using an example to explain.

Assume we select drivers who use this shared travel service in Shanghai and randomly divide them into an experimental group and a control group, with each group accounting for 50%. In the experimental group, drivers use the new driver app, while in the control group, drivers use the old driver app.

Now, let’s focus on the experimental group: If the new app does improve the user experience for drivers, their usage frequency will increase, which means the number of orders placed by the experimental group will increase. Since the total number of orders (demand) remains constant, this will lead to a decrease in the number of orders in the control group.

On the other hand, if the new app decreases the user experience for drivers and reduces their usage frequency, the number of orders placed by the experimental group will decrease, while the number of orders in the control group will increase.

At this point, you will notice that the experimental group and the control group are not independent but rather affect each other. This violates the assumption in A/B testing that the experimental group and the control group must be independent, leading to inaccurate experimental results.

In the context of the question, a better solution is to conduct the test in different cities: we find two similar cities, A and B (similar in terms of the development of the business, economic development, travel habits, etc.), where drivers in City A use the new app as the experimental group, and drivers in City B use the old app as the control group. In this way, the two groups will not affect each other.

Therefore, for this question, a complete and correct answer should first point out the issue of the violated independence between the two groups, analyze it with examples, and then propose your solution. Finally, elaborate on the process based on the actual context.

In fact, if you are an expert, you should notice another hidden point in the question: learning effect.

Since the question is about testing a new user interface, there may be a learning effect for existing users: novelty effect or change aversion. Regarding this point, briefly mention its identification and resolution methods, but there is no need for a lengthy discussion. The core focus of this question is still the independence of the two groups and the process of designing A/B testing. However, if you can alert the interviewer to the potential pitfall of the learning effect, it will be like giving them a pleasant surprise, proving your ability to think outside the box in addition to providing an excellent answer.

Interview Application 2 #

In your past practice, have you ever experienced a situation where, despite obtaining significant results from an A/B test (e.g., p-value less than 5%), the decision was made not to implement the changes in the business/product? What were the reasons? Please provide an example.

Test point: Factors to consider when implementing changes in A/B testing

Approach:

This question is very short, and at first glance, it may seem easy, but you need to be careful. Think carefully about what knowledge points the interviewer wants to test and what capabilities they want to assess.

In terms of knowledge points, the interviewer mainly tests what factors need to be considered when implementing changes in A/B testing.

This question is actually very direct, and you can easily identify what knowledge points the interviewer is testing. However, I want to emphasize that there are many variations of this type of question in interviews, and you need to recognize the essential questions among different variations.

Core question: Starting from the conclusion (the changes were not implemented), the interviewer asks what possible reasons there might be.
Variation 1: The interviewer will provide you with the data of the test results, and although the p-value is less than 5%, it is very close to 5%, for example, 4%. This indicates that the difference between the two groups is actually very small and has limited impact on the actual business.
Variation 2: The interviewer will directly ask you about the cost of implementing changes in A/B testing.

Regardless of the variations, the bottom line is: the result is statistically significant, but not significant in the business, so the changes were not implemented in practice.

In practice, statistical significance is just one of the reasons why changes are not implemented. On the other hand, the costs and benefits of implementing changes need to be taken into account. We can estimate the benefits based on the difference indicated by the significant results, but when it comes to costs, we need to consider multiple factors as I mentioned in Lesson 7, and assess the business significance.

So when answering this type of question, explain the costs based on examples, focusing on the above factors, and state that the result is statistically significant but not significant in the business, which is why the changes were not implemented in the end.

Specifically, the implementation of changes in practice has several costs:

Labor costs

Refers to the time costs of the personnel involved in implementing the changes. For example, engineers need to spend time implementing the specific changes and writing relevant code. Product managers need to spend time collecting and organizing new requirements, organizing meetings, and writing documentation. If the changes will confuse the users, customer service personnel will also need to spend time answering questions and resolving doubts for the users.

Opportunity costs

In practice, time and resources are always limited in the continuous iteration of business/products. Let’s imagine a scenario: before a new version is launched, if both change A and change B have statistically significant results (p-values less than 5%), but we have limited time and resources, and can only implement one change before the launch, then we will definitely choose the change that has a greater impact on the business.

Now you may wonder how to compare which change has a greater impact on the business when both have p-values less than 5%.

There are two methods we can use.

The first method is to estimate the business impact brought by the changes. This method is suitable when different changes have different evaluation metrics or different target audiences.

For example, change A improves conversion rate by 2% and can bring in an additional 100,000 new users per year. Change B increases retention rate by 0.5% and can retain an additional 50,000 existing users per year. At this point, we need to measure the value of gaining 100,000 new users versus retaining 50,000 existing users (for example, we can determine the average value of new users and existing users through data analysis or modeling).

Of course, this also depends on the business goals at that stage. You need to see whether the focus is on user acquisition or retention at that time. Once we quantify the estimated business impact of the changes, we can decide which change to prioritize.

The second method is to calculate the effect size. This method is suitable when the changes are similar and the evaluation metrics are the same.

For example, in an experiment to improve the recommendation algorithm, click-through rate is often used as the evaluation metric. Now we have new algorithm A and new algorithm B, both showing improvement compared to the old algorithm. In this case, we need to calculate the effect size for each experiment:

- The effect size is used in statistics to indicate the magnitude of the difference in indicators. The larger the effect size, the more the two groups of indicators differ.

If we calculate that the effect size of new algorithm A is larger than that of new algorithm B, it means that the improvement in A has a greater magnitude and impact, so it can be decided to prioritize implementing the change in A.

Calculating effect size is actually estimating the impact brought by the changes. However, because these changes have the same evaluation metric, we only need to calculate the effect size for comparison.

Code costs

Implementing changes generally requires code modifications, which potentially increase the chance of code errors. As the codebase becomes more complex, the costs of future code changes will also increase.

Interview Application 3 #

We have made changes to our company website to improve user engagement. Through A/B testing, we found that the new version indeed significantly improved user engagement, so we subsequently displayed the new website version to all users. However, after a period of time, user engagement returned to the previous level. Assuming that the A/B testing itself had no technical or statistical issues, what do you think could be the cause of this situation? And how would you solve it?

Key Concept: Learning Effect

Solution Approach:

This question is not difficult in terms of the knowledge required. It mainly tests the issue of the learning effect. However, if you only answer this one reason, it is actually something most interviewees can easily think of and would only earn a passing score.

I will first outline the recommended way to answer this question, and then analyze it carefully with you.

The more recommended answer is to first list the possible reasons for this situation and then systematically eliminate them based on the specific scenario described in the question. Finally, provide your own conclusion and propose a solution.

Why answer in this way? Mainly because compared to simply giving one reason or directly suggesting a solution, this way of answering can better demonstrate your comprehensive understanding of the question. I have emphasized in previous lessons that knowing why a problem occurs and identifying the problem is sometimes even more important than solving the problem. Therefore, what the interviewer is particularly interested in examining here is your exploration of the reasons for the problem.

There are many reasons why the actual effect of the changes implemented is inconsistent with the A/B testing results. The most common reasons are mainly two:

Technical bugs in implementing the A/B test.
Errors in calculating the test results (e.g. calculating results before an adequate sample size has been reached).

Next, let’s eliminate them one by one.

First, the question clearly states that there are no technical or statistical issues in the A/B testing, so we can exclude the common pitfalls of A/B testing.

Second, since the scenario described in the question is not a social network or a two-sided market like a sharing economy, the experimental and control groups will not interfere with each other, so there is no violation of the independence of the experimental/control groups.

Next, based on the design of the test itself and the description of the results, there is no need for further analysis or multiple experimental groups, and there are no multiple hypothesis problems or Simpson’s paradox.

Finally, for issues like different versions of a website, the most common problem is the learning effect, just like I mentioned in the analysis above. After ruling out other common reasons, the point being tested is the learning effect. There are many types of interview questions testing the learning effect. Some may directly ask you about the learning effect, while others, like this question, will give you a specific scenario to judge.

Based on the situation described in the question, the most likely reason is the novelty effect in the learning effect: users are initially curious about the new version, so their engagement increases. But over time, it gradually returns to the normal average level.

As for how to identify and solve the learning effect, if you cannot answer smoothly, you may need to review the content of Lesson 10.

So you see, in an interview, the interviewer is not only testing your knowledge points but more importantly, your divergent understanding of the question and your approach to problem-solving.

Summary #

In this lesson, I mainly discussed three interview questions. Through my detailed analysis, you can also see that the ability to break down the questions is essential.

Many people will practice coding questions before interviews, and while this is important, there may be moments of blanking out under the high pressure of an interview. In fact, interview questions have patterns, similar to the moves in a fight. You need to guess what moves the other party might make. If you can anticipate their next move before they make it, even if it’s just for a second, you have a chance to defeat them. Therefore, compared to practicing an extensive number of coding questions, it is more important to learn how to break down the questions.

I believe that through today’s lesson, you have gained a preliminary understanding of the format and focus of A/B testing related interview questions. You must still be eager for more, but don’t worry, in our next lesson, we will continue to analyze typical interview questions and their focus.

Thought-provoking questions #

Have you encountered any interesting A/B testing interview questions? Or do you have any good interview experiences to share? Feel free to share them and let’s discuss together.

Feel free to share and let’s communicate and discuss together. You are also welcome to recommend this lesson to your friends so that we can progress and grow together.