27 Data Source Never Trust Anything From Any Client

27 Data Source Never Trust Anything from Any Client #

Starting from today, I want to discuss several security topics with you. First of all, let me make it clear that I am not a security expert. But I have noticed that many developers who are involved in business development often lack any kind of security awareness. If a company does not have a dedicated security department or experts, the security issues can become very serious.

If we only rely on some so-called penetration testing services to conduct shallow scanning and penetration testing, without further analysis at the code and logic level, the security problems we can discover will be very limited. To ensure good security, we still rely on the awareness of frontline programmers and product managers, bit by bit.

Therefore, in the following articles, I will talk to you from the perspective of business development about the security awareness that we should have the most.

For HTTP requests, we need to have a deep-rooted concept in our minds, which is that we cannot directly trust any data sent by the client. The data sent from the client to the server is only information collection. The data needs to be validated for its validity, permissions, etc., before it can be used. Furthermore, this data should only be considered as the user’s intention and cannot directly represent the current state of the data.

Let me give you a simple example. When we play a game, what the client sends to the server are just the user’s actions, such as how much the user has moved. The server sets the new position based on the user’s current state and returns it to the client. In order to prevent cheating, it is not possible for the client to directly tell the server the user’s current position.

Therefore, the instructions sent by the client to the server only represent the operations, and cannot directly determine the user’s state. The calculation for state changes is done on the server side. When the network is poor, we often encounter situations where the client is pulled back by the server after taking 10 steps, due to lost instructions. The client corrects the player’s position based on the actual position calculated by the server.

Today, I will use four case studies to explain why “nothing in any client can be trusted”.

Client-Side Computations Cannot Be Trusted #

Let’s take a look at a case of placing an order in an e-commerce application.

In this scenario, the backend exposes a POST interface, “/order,” to the client, allowing the client to directly send the assembled order information (Order) to the server:

@PostMapping("/order")
public void wrong(@RequestBody Order order) {
    this.createOrder(order);
}

The order information (Order) may include the item ID, item price, quantity, and total price:

@Data
public class Order {
    private long itemId;             // Item ID
    private BigDecimal itemPrice;    // Item price
    private int quantity;            // Item quantity
    private BigDecimal itemTotalPrice;  // Item total price
}

Although the client surely has access to information such as the item price when placing an order, and can calculate the total price to present and confirm to the user, these pieces of information are only used for display and verification purposes. Even if the POJO sent by the client includes this information, the server must still initialize the item’s price from the database and recalculate the final order price. If this is not done, it is highly possible for hackers to exploit the system and maliciously modify the item’s total price to a much lower value.

Therefore, the only trustworthy information we directly use is the item ID and quantity provided by the client. Based on this information, the server recalculates the final total price. If the calculated item price on the server side does not match the price received from the client, a friendly prompt can be given to the client to ask them to place the order again. The modified code is as follows:

@PostMapping("/orderRight")
public void right(@RequestBody Order order) {
    // Re-query the item based on its ID
    Item item = Db.getItem(order.getItemId());
    
    // If the price passed by the client does not match the item price queried by the server, give a friendly prompt
    if (!order.getItemPrice().equals(item.getItemPrice())) {
        throw new RuntimeException("The price of the item you have chosen has changed, please place the order again");
    }
    
    // Update the item price
    order.setItemPrice(item.getItemPrice());
    
    // Recalculate the total price of the item
    BigDecimal totalPrice = item.getItemPrice().multiply(BigDecimal.valueOf(order.getQuantity()));
    
    // If the total price passed by the client does not match the calculated total price, give a friendly prompt
    if (order.getItemTotalPrice().compareTo(totalPrice)!=0) {
        throw new RuntimeException("The total price of the item you have chosen has changed, please place the order again");
    }
    
    // Update the item total price
    order.setItemTotalPrice(totalPrice);
    
    createOrder(order);
}

Another approach is to only ask the client to provide the necessary data to the server. In this case, it is more reasonable to define a new POJO, CreateOrderRequest, as the interface parameter instead of directly using the domain model, Order. When designing the interface, we should consider which data needs to be provided by the client, instead of using a comprehensive object as the parameter on the server side, to avoid security issues caused by forgetting to reset the client data in the server.

After a successful order placement, the server returns information such as the item price and total price to the client. At this point, the client can perform a comparison. If the data does not match the previous client data, a prompt can be given to the user. After confirming there is no issue, the user can proceed to the payment stage:

@Data
public class CreateOrderRequest {
    private long itemId;              // Item ID
    private int quantity;             // Item quantity
}

@PostMapping("orderRight2")
public Order right2(@RequestBody CreateOrderRequest createOrderRequest) {
    // The item ID and quantity are trustworthy; the rest of the data needs to be calculated on the server side
    Item item = Db.getItem(createOrderRequest.getItemId());
    
    Order order = new Order();
    order.setItemPrice(item.getItemPrice());
    order.setItemTotalPrice(item.getItemPrice().multiply(BigDecimal.valueOf(order.getQuantity())));
    
    createOrder(order);
    
    return order;
}

From this case, we can see that when handling data submitted by the client, the server needs to clearly distinguish which data needs to be provided by the client and which data needs to be calculated by the client based on the data received from the server. The former can be trusted, while the latter cannot. The server needs to recalculate the latter, and if the calculations on the client and server sides do not match, a friendly prompt can be given.

Validation of parameters submitted by the client #

When it comes to client data, one common mistake we make is assuming that the data is coming from the server, and therefore, the client cannot submit abnormal data. Let’s look at an example.

There is a user registration page where users are required to select their country. We would provide a list of supported countries from the server for the user to choose from. The following code snippet shows that our registration process only supports three countries: China, the United States, and the United Kingdom, and is not open to other countries. Therefore, we filter countries with IDs from the database and return them to the page for display:

@Slf4j
@RequestMapping("trustclientdata")
@Controller
public class TrustClientDataController {
    private HashMap<Integer, Country> allCountries = new HashMap<>();
    
    public TrustClientDataController() {
        allCountries.put(1, new Country(1, "China"));
        allCountries.put(2, new Country(2, "US"));
        allCountries.put(3, new Country(3, "UK"));
        allCountries.put(4, new Country(4, "Japan"));
    }
    
    @GetMapping("/")
    public String index(ModelMap modelMap) {
        List<Country> countries = new ArrayList<>();
        countries.addAll(allCountries.values().stream().filter(country -> country.getId() < 4).collect(Collectors.toList()));
        modelMap.addAttribute("countries", countries);
        return "index";
    }
}

We use the data returned from the server to populate the template:

...
<form id="myForm" method="post" th:action="@{/trustclientdata/wrong}">
    <select id="countryId" name="countryId">
        <option value="0">Select country</option>
        <option th:each="country : ${countries}" th:text="${country.name}" th:value="${country.id}"></option>
    </select>
    <button th:text="Register" type="submit"></button>
</form>
...

On the webpage, only these three countries are available for selection:

However, we must keep in mind that webpages are meant for regular users, and hackers do not care about what is displayed on the page. They may attempt to submit other country IDs that are not displayed on the page. If we blindly trust the country ID provided by the client, it is highly likely that we would open the user registration functionality to people from other countries as well:

@PostMapping("/wrong")
@ResponseBody
public String wrong(@RequestParam("countryId") int countryId) {
    return allCountries.get(countryId).getName();
}

Even though we know the parameter range comes from the dropdown list and the dropdown list content comes from the server, we still need to validate the parameter. This is because the interface does not necessarily have to be requested through a browser; as long as the interface definition is known, it can be submitted using other tools:

curl http://localhost:45678/trustclientdata/wrong\?countryId=4 -X POST

The solution is to validate the parameter’s validity before using the parameter passed by the client:

@PostMapping("/right")
@ResponseBody
public String right(@RequestParam("countryId") int countryId) {
    if (countryId < 1 || countryId > 3)
        throw new RuntimeException("Invalid parameter");
    return allCountries.get(countryId).getName();
}

Alternatively, using Spring Validation and annotations to validate parameters is more elegant:

@Validated
public class TrustClientParameterController {
    @PostMapping("/better")
    @ResponseBody
    public String better(
            @RequestParam("countryId")
            @Min(value = 1, message = "Invalid parameter")
            @Max(value = 3, message = "Invalid parameter") int countryId) {
        return allCountries.get(countryId).getName();
    }
}

The issue of validating parameters submitted by the client leads to another easily overlooked point. We may store some server-side data temporarily in hidden fields on the webpage so that the data can be passed back to the server when the page is submitted again. Although users cannot modify this data through webpage operations, this data is just regular data for the HTTP request and can be modified to any value at any time. Therefore, when using such data on the server-side, we need to be equally cautious.

Do not trust any content in the request header #

Earlier, we mentioned that we should not directly trust the parameters passed by the client, which are the data passed through the GET or POST methods. In addition, the content in the request header should also not be trusted.

A common requirement is to determine the uniqueness of a user in order to prevent abuse. For example, when it comes to sending small prizes to new users who have not registered, we do not want the same user to receive the prizes multiple times. Considering that unregistered users do not have a user identifier because they have not logged in, we may think of using the IP address of the request to determine whether the user has already received the prize.

For example, consider the following test code. We use a HashSet to simulate a list of IP addresses to which prizes have been issued. After each prize is claimed, the IP address is added to this list. The IP address is obtained as follows: first, we try to get it from the X-Forwarded-For request header, and if that is not available, we use the getRemoteAddr method from HttpServletRequest.

@Slf4j
@RequestMapping("trustclientip")
@RestController
public class TrustClientIpController {

    HashSet<String> activityLimit = new HashSet<>();

    @GetMapping("test")
    public String test(HttpServletRequest request) {
        String ip = getClientIp(request);
        if (activityLimit.contains(ip)) {
            return "您已经领取过奖品";
        } else {
            activityLimit.add(ip);
            return "奖品领取成功";
        }
    }

    private String getClientIp(HttpServletRequest request) {
        String xff = request.getHeader("X-Forwarded-For");
        if (xff == null) {
            return request.getRemoteAddr();
        } else {
            return xff.contains(",") ? xff.split(",")[0] : xff;
        }
    }

}

The reason for doing this is that our applications are usually deployed behind reverse proxies or load balancers, and getRemoteAddr can only obtain the proxy’s IP address instead of the actual IP address of the user. This does not meet our requirements because reverse proxies often put the user’s real IP address into the X-Forwarded-For request header when forwarding requests.

This way of relying too much on the X-Forwarded-For request header to determine user uniqueness has its issues:

Tools like cURL can be used to simulate requests and tamper with the content of the header:

curl http://localhost:45678/trustclientip/test -H "X-Forwarded-For:183.84.18.71, 10.253.15.1"

In scenarios such as internet cafes or schools, the outgoing IP address is often the same. In this case, only the user who opened the page first may be able to claim the prize, while other users will be blocked.

Therefore, IP addresses or any information in the request header, including information in cookies or the Referer, can only be used as references and should not be used as the basis for important logical judgments. For unique identification requirements like this example, a better approach is to require users to log in or use third-party authentication (such as WeChat) to obtain a user identifier for unique identification.

User identification cannot be obtained from the client #

When it comes to user login, one common mistake in the business code is using the user ID passed from the client to the server, like this:

@GetMapping("wrong")
public String wrong(@RequestParam("userId") Long userId) {
    return "Current user ID: " + userId;
}

You might think that nobody would do such a thing, but I have actually encountered a real case where a large project had a security issue because the server directly used the user identification passed from the client.

There are three reasons why people make such low-level mistakes:

Developers fail to correctly identify the users targeted by the interface or service. If the interface is for internal services and the user ID is passed in by the calling party, it may not be unreasonable. However, such interfaces should not be directly exposed to clients or H5 applications.
During the testing phase, for the convenience of testing and debugging, we often implement interfaces that can be used without logging in. We directly use the user identification passed from the client, but forget to remove similar super interfaces before going live.
A large website’s frontend may consist of different modules that are not necessarily part of one system, and the user login status may not be unified. Sometimes, in order to simplify matters, we may directly pass the user ID in the URL to establish the user login status through frontend parameter passing.

If your interface directly faces users (such as being called by a client or H5 page), then users must be logged in before using it. After logging in, the user identification is stored on the server and the interface needs to retrieve it from the server (e.g., from the session). The following code demonstrates the simplest login operation, which sets the current user’s identification in the session after logging in:

@GetMapping("login")
public long login(@RequestParam("username") String username, @RequestParam("password") String password, HttpSession session) {
    if (username.equals("admin") && password.equals("admin")) {
        session.setAttribute("currentUser", 1L);
        return 1L;
    }
    return 0L;
}

Here, I will share a little tip for Spring Web. If you want every method that requires login to obtain the current user identification from the session and perform some subsequent processing, you don’t need to copy and paste the same logic for getting the user identity into each method. Instead, you can define a custom annotation @LoginRequired on the userId parameter, and then use the HandlerMethodArgumentResolver to automatically assemble the parameter:

@GetMapping("right")
public String right(@LoginRequired Long userId) {
    return "Current user ID: " + userId;
}

@LoginRequired itself is nothing special, just a custom annotation:

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.PARAMETER)
@Documented
public @interface LoginRequired {
    String sessionKey() default "currentUser";
}

The magic comes from the HandlerMethodArgumentResolver. We have defined a implementing class LoginRequiredArgumentResolver, which implements the two methods in the HandlerMethodArgumentResolver interface:

The supportsParameter method checks if the parameter has the @LoginRequired annotation, and then performs custom parameter resolution.
The resolveArgument method implements the parsing logic itself. Here, we try to retrieve the current user’s identification from the session. If we cannot retrieve it, we prompt an error for an illegal call. If we can retrieve it, we return the userId. This way, the userId parameter in the controller will be automatically assigned.

@Slf4j
public class LoginRequiredArgumentResolver implements HandlerMethodArgumentResolver {
    @Override
    public boolean supportsParameter(MethodParameter methodParameter) {
        return methodParameter.hasParameterAnnotation(LoginRequired.class);
    }

    @Override
    public Object resolveArgument(MethodParameter methodParameter, ModelAndViewContainer modelAndViewContainer, NativeWebRequest nativeWebRequest, WebDataBinderFactory webDataBinderFactory) throws Exception {
        LoginRequired loginRequired = methodParameter.getParameterAnnotation(LoginRequired.class);
        Object object = nativeWebRequest.getAttribute(loginRequired.sessionKey(), NativeWebRequest.SCOPE_SESSION);
        if (object == null) {
            log.error("Invalid call to API {}!", methodParameter.getMethod().toString());
            throw new RuntimeException("Please log in first!");
        }
        return object;
    }
}

Of course, we need to implement the addArgumentResolvers method of the WebMvcConfigurer interface to add this custom resolver LoginRequiredArgumentResolver:

@SpringBootApplication
public class CommonMistakesApplication implements WebMvcConfigurer {
...
    @Override
    public void addArgumentResolvers(List<HandlerMethodArgumentResolver> resolvers) {
        resolvers.add(new LoginRequiredArgumentResolver());
    }
}

After testing, with this implementation, all methods that require login can obtain the user identification by simply adding the @LoginRequired annotation, which is convenient and secure:

Key Takeaways #

Today, I shared with you the conclusion that “nothing from the client can be trusted” and explained some representative errors.

Firstly, client-side computations are untrustworthy. Although many projects nowadays have rich front-ends that can perform a lot of logical calculations without accessing server-side APIs, the results of computations from the client cannot be directly trusted. When conducting business operations, the client can only play the role of collecting information. Although it can pass information such as prices to the server, it can only be used for comparison and verification. Ultimately, the server’s computation results should be relied upon.

Secondly, all parameters from the client need to be validated for their legality. Even if we know that the user is selecting data from a dropdown list, even if we know that the user cannot submit illegal values through normal web page operations, the server should still validate the parameters to prevent malicious users from bypassing the browser UI and directly submitting parameters to the server.

Thirdly, any information in the request headers, apart from information in the request body, should also not be trusted. We must be aware that the IP, Referer, and Cookie from the request headers could all be tampered with, and such data can only be used for reference and record keeping, but not for important business logic.

Fourthly, if an interface is aimed at external users, parameters such as user identification should never be present. The user’s identification should come from the server, and only authenticated users will have an identifier left on the server. If your interface is currently aimed at internal services, you must also be extremely cautious. Such interfaces should only be used internally, and further consideration should be given to authorization issues for the server-side caller.

Security issues follow the concept of a barrel effect, where the overall security level of the system depends on the weakest security module. When writing business code, we need to start with ourselves and establish the most fundamental security awareness to eliminate basic security issues from the source.

I have put all the code used today on GitHub, and you can click on this link to view it.

Reflection and Discussion #

When discussing the point that user identification cannot be obtained from the client, I mentioned that developers may pass the user ID through the front end because the user information is not integrated. So, what good solutions do we have to integrate user identification across different systems or even different websites?

There is another type of vulnerability related to client data that is very important, and that is the data in the URL address. When redirecting anonymous users to the login page, we usually include the redirectUrl so that users can quickly return to the previous page after logging in. Hackers may forge an activity link, consisting of a real website and a phishing redirectUrl, and send an email to induce users to log in. When users log in, they actually visit the real website, so it is not easy to detect that the redirectUrl is a phishing website. After logging in, users may unknowingly disclose important information on the phishing website. We call this type of security issue an open redirect problem. What do you think should be done at the code level to prevent open redirect problems?

Have you ever encountered security issues caused by trusting the information passed from the client to the server in HTTP requests? I am Zhu Ye, and I welcome you to leave a comment in the comment section to share your thoughts. You are also welcome to share today’s content with your friends or colleagues for further discussion.