37 E Commerce System Table Design Optimization Case Analysis

37 E-commerce system table design optimization case analysis #

Hello, I’m Liu Chao. Today, I will take you to understand the optimization of table design in e-commerce systems.

If the table structure is not well designed during the initial phase of business architecture design, the system is likely to encounter bottlenecks as the business and data volume increase. If the table structure lacks scalability, the business coupling will increase and the complexity of the system will also increase. In this lecture, I will use the table structure design in e-commerce systems as an example to explain in detail the factors we need to consider when designing tables and how to optimize system performance through table design.

Core Business #

To understand the table structure design of an e-commerce system, we must first become familiar with the core businesses of an e-commerce system. This part is easy to understand if you have experience with online shopping.

Generally, e-commerce systems can be divided into platform-based and self-operated e-commerce systems. Platform-based e-commerce systems refer to e-commerce platforms where third-party merchants can set up their own stores to manage product information, inventory information, promotions, customer service, etc. Representative examples include Taobao and Tmall. On the other hand, self-operated e-commerce systems are operated solely by the company and do not have third-party merchants. Common examples include JD.com and the Apple Store.

The main difference between these two types of e-commerce systems lies in whether the sellers are customers (C-end) or businesses (B-end). Clearly, the complexity of platform-based e-commerce systems is much higher than that of self-operated e-commerce systems. In order to better understand the business of an online store, we will discuss table structure design optimization based on a self-operated e-commerce system using the Apple Store as an example.

The core business of an e-commerce system is undoubtedly the sale of products. Around the sale of products, we can divide the core business into the following main modules:

1. Product Module #

The product module mainly includes product categorization and product information management. Product categorization refers to the commonly seen categories, and some people prefer to categorize them into multiple levels. For example, the first category could be mobile phones, televisions, accessories, etc., and the second category of accessories could be further divided into headphones, power banks, etc. In order to reduce the cost of users learning the system operations, we should try to minimize the number of levels.

After we have conducted a search based on the categories, we arrive at the product page. A product item consists of several product SKUs. A product item refers to a specific product, such as the iPhone 9, while a product SKU refers to a specific attribute of the product, such as a gold-colored iPhone 9 with 128GB of memory.

2. Shopping Cart Module #

The shopping cart is mainly used for users to temporarily store products they wish to purchase and allows them to check out all items in the shopping cart at once. Shopping carts are generally divided into offline shopping carts and online shopping carts. An offline shopping cart stores the selected products only in local cache, while an online shopping cart synchronizes the product information to the server.

Currently, most online stores support both types of shopping carts. When users are not logged into the store, the main focus is on the offline shopping cart, which records the user’s product information. Once the user logs into the store, the products placed in the shopping cart will be synchronized to the server, allowing users to view their shopping cart items on different platforms (such as mobile phones and computers) at different times.

3. Order Module #

The order module is the core functional module that drives the entire online store. Without orders, a platform will have difficulty maintaining its operations. The order module manages the transaction records of users on the platform and serves as the channel for users and merchants to communicate the status of their purchases. Users can change the status of an order at any time, and merchants must promptly update the order status according to the business process to inform users about the specific status of their purchased products.

Typically, an order can go through the following statuses: pending payment, pending shipment, pending receipt, pending review, completed, user cancellation, refund only, return and refund. The flow of an order is shown in the diagram below:

4. Inventory Module #

This module mainly records the specific inventory information of product SKUs and includes inventory transactions and inventory management. Inventory transactions refer to the real-time consumption of inventory when a user purchases a product, while inventory management mainly includes inventory replenishment or purchase, allocation by operating personnel.

Generally, inventory information includes product SKUs, warehouse zones, real-time inventory, locked inventory, pending return inventory, and promotional inventory.

Nowadays, most e-commerce companies have implemented inventory partitioning for different regions, such as South China and North China. This means that the same product SKU may have no inventory in one area but have inventory in another. Therefore, we need to have a field for warehouse zones to distinguish the same product SKU in different warehouses.

Real-time inventory refers to the real-time inventory of a product, while locked inventory indicates the inventory that has been locked during the time between the submission of an order and the actual deduction of inventory or the cancellation of the order. Pending return inventory and promotional inventory represent the quantity of inventory for refund processing and the quantity of inventory for each promotion, respectively.

In addition to this inventory information, we can also set inventory status for products, such as virtual inventory status and physical inventory status. If a product does not require inventory management and can be purchased by users as needed, there is no need to query or deduct inventory each time a user purchases the product. Instead, we can set the inventory status of the product to virtual inventory.

5. Promotion Module #

The promotion module refers to promotional functions such as coupons, vouchers, and discounts. It mainly includes activity management and transaction management. The former is responsible for managing the information related to the release of coupons and vouchers, including their validity period, amount, conditions, quantities, etc. The latter is responsible for managing the information related to users’ redemption of coupons and vouchers.

Original content

Business Challenges #

After understanding the specific business of the main modules mentioned above, we can further evaluate the potential challenges and performance bottlenecks in implementing the business into the system.

1. Different product categories have differences, how to design the structure of the product table? #

We know that the detailed information of a mobile phone product is very different from that of a piece of clothing. The SKU of a mobile phone includes features such as color, RAM, and storage, while a piece of clothing includes size and color.

If we need to store these products in the same table, we can either use the same fields to store different information or add fields to maintain their respective information. The former will result in complex program design, a wide table width, which reduces the number of rows that can be stored on a disk page, affects query performance, and has high maintenance costs. The latter will result in too many fields in one table, and if a new product type appears, it will require dynamically adding fields.

A better approach is to store some common fields in a common table field and create separate product type tables, such as a table for mobile phone products and a table for clothing products. However, this approach also has drawbacks, as it may result in a large number of tables, inflexibility when querying product information, and difficulties implementing full-text search.

In this case, we can store the common information of the products in a common table and store the detailed information of the products in a key-value database, such as ElasticSearch or Solr, combined with a search engine.

2. During the Double 11 shopping festival, the number of items in the shopping cart increases significantly, what if the shopping cart system encounters performance bottlenecks? #

In the scenario where users are not logged into the system, we store the shopping cart information in cookies. After users log in, the shopping cart information will be saved in the database.

During the Double 11 period, most users will add items to their shopping carts in advance. The process of adding items to the shopping cart is relatively long and the operations are distributed, so it does not exert much pressure on the database for writing. However, during the purchase process, due to the concentrated access to shopping carts by users, if all the accesses go to the database for reading, the database will be under immense pressure.

In this case, we should consider a hot-cold data storage solution for the shopping cart information. Generally, users usually prefer the most recently added items in their shopping carts, which are considered hot data, while items added to the shopping cart a long time ago are considered cold data. We need to store the hot data in a Redis cache in advance to improve the system’s concurrent performance during the event. For example, we can store the information of items added to the shopping carts in the past month in Redis, which should at least be a paginated information.

When the shopping cart information is not found in the cache, we can then query the database, which greatly reduces the burden on the database.

3. Orders table with massive data, how to design the structure of the orders table? #

Typically, the orders table is the one that accumulates system data the fastest. As long as an order is submitted, it will be created in the orders table regardless of whether the order is actually paid for. If a company’s business develops very rapidly, it is only a matter of time before sharding and sharding the orders table.

Before sharding, the primary keys of the orders are auto-incremented and associated with other business tables. Once sharding and sharding are to be done, there will be a situation where the primary keys are coupled with the business, and conflicts may occur between the newly generated auto-incremented IDs and the previous ones. This will result in a huge amount of work when doing table upgrades later. If we plan to do table upgrades later, it is recommended to use snowflake to generate primary keys in advance.

If we want to implement horizontal sharding for the orders table, which field should we use?

Usually, we implement sharding of orders by calculating the hash value of the user ID field. This method can optimize the performance of order operations on the user side. If we need to shard the orders table horizontally, we still base it on the user ID field.

After sharding and sharding, querying orders becomes a challenge for our backend order management system. Usually, orders are queried based on order status and creation time. The queries need to support pagination and JOIN operations on certain fields. If we need to perform these operations in a sharded and sharded environment, it is undoubtedly a huge challenge.

For JOIN queries, we can generally achieve them by redundantly storing some configuration tables that are rarely modified. For example, after entering basic information about a product, which is rarely modified, we can store this information redundantly in each sharded database. If there is not much field information, we can directly include these fields in the orders table.

As for pagination queries, we usually recommend redundant order information to be stored in big data. The backend management system queries order information from big data. After users submit and pay for an order, the backend will synchronize the order to big data. When users modify the order on the C-end or operators modify the order in the backend, big data will be notified asynchronously to update the order data. This approach can solve the problem of paginated queries caused by sharding and sharding.

4. Performance bottleneck in inventory table in flash sales business, how to solve it? #

During normal product purchases, we usually directly check and lock the inventory in the database. However, if it is a flash sale during a promotion, directly querying and updating the inventory in the database will cause performance bottlenecks in the face of high concurrency.

Usually, we update the inventory of promotional activities in the cache, and query the real-time inventory of products through the cache. We use distributed locks to deduct and lock inventory. The specific implementation of distributed locks will be explained in detail in Lesson 41.

5. There are also flash sale scenarios in promotional activities, how to design the tables? #

The trading of coupons and red packets in promotional activities often has some similarities with flash sale activities.

Before some major promotional activities, we usually distribute various coupons and red packets at regular intervals, and users need to click to claim them before they can use them. Therefore, when a certain quantity of coupons and red packets are released at the same time, there will also be a situation where users rush to claim these coupons and red packets at the same time, especially for popular products.

We can refer to the optimization design of inventory and use cache and distributed locks to query and update the quantity of coupons and red packets. After successfully obtaining the quantity from the cache, we can update the quantity of coupons and red packets in the database asynchronously.

Summary #

In this lecture, we practiced table design in the context of an e-commerce system. We can summarize the key points as follows:

In situations where fields are complex, prone to change, or difficult to standardize, it is recommended to use key-value pairs instead of relational database tables for storage.
For queries in high-concurrency scenarios, caching can be used instead of database operations to improve concurrency performance.
Tables with rapidly accumulating data should consider horizontal sharding or partitioning to avoid performance bottlenecks from single-table operations.
In addition, we should try to avoid complex JOIN queries through optimizations, such as duplicating some fields to reduce JOIN queries or creating intermediate tables to reduce JOIN queries.

Thought Question #

Have you ever used foreign keys to associate tables when designing a database? Nowadays, internet companies generally recommend implementing the logical association between tables instead of using foreign keys to achieve actual table association. Do you know why this is?