15 Use Rshiny to Teach You How to Make a Sample Size Calculator

15 Use RShiny to Teach You How to Make a Sample Size Calculator #

Hello, I’m Bowen.

Sample size calculation before A/B testing is an indispensable step in experimental design. In Section 6, when discussing sample size calculation, I mentioned the problem of inconsistent online sample size calculators. Most calculators online can only calculate probability indicators and cannot calculate mean indicators, which greatly limits their practical application in business.

In view of these problems, and because I also wanted to calculate sample sizes more quickly and accurately to improve work efficiency, I delved into the principles of A/B testing sample size calculation based on statistical theory. This allowed me to identify and master the correct calculation methods.

Later, I found that my colleagues and friends also had this need when conducting A/B tests. So I turned the sample size calculation method into a tool and created an app that could be used online.

Therefore, what I am going to teach you today is the detailed process of transforming the sample size calculation into a tool. I will guide you in creating a real-time A/B testing sample size calculator that can be published online.

Practical Guide #

Since we are making an app, we still need to do some simple programming, including front-end and back-end, using the R language and its front-end library Shiny. However, don’t worry, you don’t need to master front-end technologies like JavaScript, HTML, CSS, or know how to set up a database on the back-end. You only need to master the following three points.

  1. Principles of A/B test sample size calculation. For the principles, focus on studying the two sections of the “Statistics” part and Section 6 of the “Basics” part in our course.

  2. Basic programming knowledge. I’m referring to general programming, not specific languages. This includes variable assignment, basic data types (strings, numbers, etc.). These are the most fundamental programming knowledge that most internet professionals know and master, even if they are not professional programmers. The difficulty is not high.

  3. Basic syntax of R and Shiny. If you are familiar with R and Shiny, you can skip this point. If you haven’t used R and Shiny before, don’t worry, you can quickly learn and master these syntax. Here are some supplementary materials for your reference.

If you don’t have the time and energy to learn R and Shiny, don’t worry, I will share my code on GitHub, and you can learn and understand it by combining the code with the content of this course.

I believe that if you have been studying this course seriously from the beginning, you must have mastered the first point: the principles of A/B test sample size calculation. As for the second point: basic programming knowledge, I believe that as an internet professional, you have already mastered or have conceptual understanding of it. So today, we will focus on how to combine these two points to create a simple and convenient sample size calculator. While teaching you how to create it, I will also explain relevant knowledge of R, Shiny, and give some practical examples.

Before the explanation, let me first provide the links to my code repository and the sample size calculator for your reference:

Firstly, if you open the code repository on GitHub, you will find that there are two main files: server.R and ui.R. This is the standard file structure of a Shiny app, and you can roughly see their functions from the file names:

  • server.R is responsible for the back-end logic, such as the logic of sample size calculation in our case.
  • ui.R is responsible for the front-end user interface, how good your app looks depends on it.

Next, if you open the link to the sample size calculator that I provided, you will see that it has been categorized into probability and mean types:

Today, I will explain each of these two types separately.

Production Process #

Probability indicators #

From the logic of calculating the sample size of probability indicators (refer to Lesson 6), we need the function power.prop.test. The following code (L31-35) is the specific implementation in the server.R file:

number_prop_test <- reactive({
  ceiling(power.prop.test(
    p1 = input$avgRR_prop_test/100,
    p2 = input$avgRR_prop_test/100 * (1 + input$lift_prop_test/100),
    sig.level = 1 - numsif_prop_test(),
    power = 0.8)[[1]])
})

For the input parameters of the function, we need to input the following four pieces of information:

  • Indicators p1 and p2 of the two groups.
  • Significance level sig.level.
  • Power.
  • One-sided or two-sided test.

Let’s compare with the actual frontend interactive interface to see how to input them:

Indicators p1 and p2 of the two groups

Here, I will let the user input the original indicators, i.e., p1, and the minimum detectable lift. Note that the “lift” here refers to the relative lift = (p2-p1)/p1, not the absolute lift = p2-p1 (note the difference from the absolute lift in the mean indicator). With these two parameters, we can calculate p2.

This is set based on my actual usage in practice because it is usually the case that there are original indicators and desired lift. However, you can also let the user directly input p1 and p2 according to your needs.

Significance level sig.level

Here, I will let the user input the confidence level (1-α) instead of the significance level α, which is adjusted based on the usual practice.

Power and one-sided or two-sided test

I set Power and the one-sided or two-sided test as default values that do not need to be changed by the user. Because many users of the calculator I made do not have a strong statistical background, I hide Power and set it as the default 80% to reduce their confusion.

As for the one-sided or two-sided test, I mentioned in Lesson 2 that a two-sided test is recommended for A/B testing, so I also set it as the default two-sided test (from the code, it can be seen that this parameter is not involved because the default value of the function itself for this parameter is “two-sided”).

If you still remember the formula for calculating the sample size in Lesson 6, you would know that the factors that affect the sample size are the significance level α, Power, the difference δ between the two groups, and the pooled variance \(\\sigma\_{\\text {pooled}}^{2}\).

You may wonder: why not let the user directly input the above 4 influencing factors instead of the parameters on the current interactive interface?

Actually, this is to save time for the user by using the most commonly occurring parameters in practice to help the user calculate the pooled variance.

By comparing the input parameters of the function with these influencing factors, you will find that the sample size can be determined completely through the input parameters of the function, thereby obtaining the sample size.

After entering these four pieces of information, you can start running the calculation.

If you carefully compare the server.R and ui.R files, you will find that the two files in the entire app run in the following way:

  • The entire app receives the user’s input values through the ui.R file, and stores these input values in the input function.
  • Then, the server.R file reads the input, performs calculations, and stores the results in the output function, which is returned to ui.R.
  • Finally, ui.R displays these results in the user’s interactive interface.

Here is another example to illustrate the entire process I just mentioned.

First, in ui.R, we need the user to input the original indicator avgRR_prop_test (L11-12):

numericInput("avgRR_prop_test", label = "Original Indicator", value = 5, min = 0, step = 1)

Minimum detectable relative lift lift_prop_test (L18-19):

numericInput("lift_prop_test", label = "Minimum Detectable Relative Lift", value = 5, min = 0, step = 1)

Confidence level sif_prop_test (L42-44): radioButtons(“sif_prop_test”, label = “Confidence Level”, choices = list(“80%”,“85%”,“90%”,“95%”), selected = “95%",inline=T)

So, these user-entered parameters are then passed to the input function and transferred to the server.R file for sample size calculation (L31-35):

number_prop_test <-reactive({ceiling(power.prop.test(p1=input$avgRR_prop_test/100,	
p2=input$avgRR_prop_test/100*(1+input$lift_prop_test/100),
sig.level=1-numsif_prop_test(), 
power=0.8)[[1]])
        })

After the calculation is completed, the result is stored in the output function (L44-51):

output$resulttext1_prop_test <- renderText({ 
      "Sample size per group: "
    })
    
output$resultvalue1_prop_test<-renderText({   
      tryIE(number_prop_test())
    })

Finally, the output function passes the result to the ui.R file for display on the front end (L57-63):

tabPanel("Result",
                     br(),
                     textOutput("resulttext1_prop_test"),
                     verbatimTextOutput("resultvalue1_prop_test"),
                     textOutput("resulttext2_prop_test"),
                     verbatimTextOutput("resultvalue2_prop_test")
            )

One thing to note here is that the sample size calculated using the power.prop.test function is for a single group. If the total sample size is required, it needs to be multiplied by the number of groups. The number of groups is also manually input by the user.

You may have also noticed that I have added a lot of explanatory language in this app, providing explanations and instructions for each parameter that requires user input. This approach is based on user feedback from practice. As I mentioned earlier, many users do not have a strong background in statistics and are not familiar with these statistical measures. Therefore, it is important to explain these input parameters as clearly as possible to reduce confusion for users.

Mean-based Metrics #

From the logic of sample size calculation for mean-based metrics (as discussed in Lesson 6), we need the function power.t.test. The following code (L105-109) is the specific implementation in the server.R file:

number_t_test <- 
reactive({ceiling(power.t.test(delta=input$lift_t_test, 
sd=input$sd_t_test,
sig.level=1-numsif_t_test(), 
power=0.8)[[1]])
        })

From this code, we can see that, compared to probability-based metrics, the required input parameters of the function have changed to:

  • Standard deviation (sd).
  • Minimum detectable difference (delta) (this is the absolute difference between the two groups’ metrics).
  • Significance level (sig.level).
  • Power.
  • One- or two-tailed test.

This is because the standard deviation (or variance) of mean-based metrics cannot be calculated solely based on the values of the two groups’ metrics; it requires knowledge of each data point to calculate.

Let’s start with the calculation formula for standard deviation: - - Therefore, the standard deviation needs to be calculated by the user based on the above formula using the data.

The minimum detectable difference (delta) is determined by the user based on the specific business context, and the significance level is usually set to 95%.

Power and one- or two-tailed test use the same default values as probability-based metrics: Power is 80% for a two-tailed test.

The rest of the code for mean-based metrics is similar to probability-based metrics, so I won’t go into detail here. The code itself is well-documented.

Application Scenarios and Use Cases #

In practice, the application scenarios of sample size calculation can be divided into two categories:

  • Given the flow rate per unit time, calculate the testing time.
  • Given the testing time, calculate the flow rate per unit time.

Therefore, you will find options for testing time and flow rate in the results section of the App’s interface, located in the lower right corner.

Now let me give an example to illustrate these two cases separately.

Let’s assume we are conducting an A/B test with a metric of download rate. The baseline rate is 5%, the minimum detectable relative improvement is 10%, and the confidence level is 95%. There is one experimental group and one control group. By using our sample size calculator, we determine that the sample size for each group is 31234, and the total sample size is 62468.

Given a fixed flow rate per unit time, calculate the testing time

This scenario is quite common. Let’s assume that we have an available testing flow rate of approximately 10000 per week. After inputting the parameters, we calculate that it will take 6 to 7 weeks to reach a sufficient sample size:

Given a fixed testing time, calculate the flow rate per unit time #

This scenario is useful in situations where time is limited, such as needing to obtain results within a week. Since there are 7 days in a week, after inputting the parameters, we calculate that we need to have at least 8924 testing flow rate per day.

Once we know the required daily flow rate, we can adjust the proportion of the testing flow rate to the total flow rate accordingly.

Finally, I want to emphasize that although I used an example related to probability metrics, these two application scenarios are also applicable to mean metrics.

Use Cases #

In this section, I will provide an example for both probability metrics and mean metrics, illustrating how to input different parameters in specific scenarios.

Let’s start with an example of a probability metric.

Now let’s look at an example of a mean metric.

How to Publish Shiny App Online #

Now that we have completed the server.R and ui.R files, you can open our app locally by clicking on “Run App” in the top right corner of the image below.

app run

However, if you want to publish the app online, you will need to use ShinyApps.io. ShinyApps.io is a platform specifically designed for publishing Shiny Apps. You will need to register for a free account on ShinyApps.io. The process is not difficult, and you can refer to this tutorial for detailed instructions.

Summary #

So here, the explanation of creating an A/B test sample size calculator comes to an end. I believe that through the learning of this section, combined with the code and app I provided, you can successfully create your own sample size calculator.

However, one thing needs to be further explained. Although the logic of sample size calculation is fixed, for the user interface, while ensuring the basic functionality, you can design it according to your preferences. Here I provide a collection of Shiny front-end examples and common statements for your reference.

Thought-provoking Question #

In fact, sample size calculations can be implemented in various programming languages. The reason I chose R and Shiny is that they are relatively easy to use. If you were asked to implement sample size calculations using Python, which functions would you choose for probability and mean-related indicators?

If you encountered any problems or had any experiences while creating a sample size calculator, please feel free to share them in the comments section. You are also welcome to share this course with your friends and colleagues.