06 Domain Decomposition How to Reasonably Decompose Systems

06 Domain Decomposition How to Reasonably Decompose Systems #

Hello, I’m Xu Changlong.

Starting from this chapter, let’s take a look at how to perform high-concurrency transformation on systems that require high data consistency. In this chapter, I will use a representative e-commerce system as an example to explain the key technical points of the transformation.

Generally speaking, systems that require strong consistency often involve technical points such as “lock contention” and have significant performance bottlenecks. E-commerce systems often carry out flash sale activities, which place even higher demands on the system. When transforming e-commerce systems, the industry usually starts from three aspects: system decomposition, inventory contention optimization, and system isolation optimization.

In today’s lesson, let’s warm up by learning some techniques for system decomposition. We know that e-commerce systems have many functionalities that require maintaining strong data consistency. We usually use locks to ensure that only one thread can modify the data at the same time.

However, this approach significantly reduces the parallel processing efficiency of the business and easily affects the system’s performance. Furthermore, this type of system often has various personalized activity requirements, and the supporting functionalities need to be continuously updated and iterated. These changes often lead to the system deviating from its original design. Therefore, while developing new requirements, we need to regularly decompose and organize the system to avoid it going off track. At this point, it becomes crucial to split the system reasonably based on the business requirements.

Case Background #

To help you master the skills of system splitting, let’s look at a case. One time, I was invited by a friend to optimize their system.

They are a well-known supplier for a certain industry’s e-commerce platform, with a long supply chain and complex product categories and specifications. In order to ensure smooth production planning, the system needs to coordinate the production schedules of multiple subcontractors and material suppliers.

Originally, the coordination of orders required phone communication, but this method was too random. In order to ensure stable supply in the production chain and improve coordination efficiency, my friend added a scheduling negotiation function based on the order booking system. Specifically, “scheduling” was added as a step in the main order process, and the negotiated schedules were displayed in a calendar style, making it convenient for upstream suppliers and various factories to coordinate production cycles.

The entire negotiation process for the supply chain is shown in the following diagram:

As shown in the diagram, the upstream project will first release a production plan (or procurement plan), and the supplier will split the procurement list (order splitting) according to the plan and contact different factories to coordinate pre-scheduling (scheduling appointments). Subsequently, the upstream purchaser will conduct quality audits on the factory’s products, and then place orders, make payments, and confirm the schedules.

The factory will develop a material procurement plan based on the confirmed schedule, and notify the material supplier to supply goods in batches, and start batch production. After each batch of products is manufactured, the factory will notify the logistics department to deliver them to the purchaser (i.e., the supplier) in batches, and update the batch order information in the supplier’s system. Then, the upstream purchaser will inspect the products and initiate the return or exchange process for any defective products.

However, after running the system for a while, my friend discovered that since the previous system was based on orders, even after adding the scheduling function, the main orders still served as the aggregate root (i.e., the main entity), which meant that the upstream party needed to create main orders when releasing their plans.

As the main orders remained in an open state and the schedules were continuously adjusted and new schedules were added, the order data continued to increase. Within a year, the volume of order data exceeded one hundred million records. Due to the large amount of data, long cooperation cycles, and the inclusion of after-sales processes, these data could not be archived based on time, resulting in the system becoming slower and slower.

Considering that this is a core business and the ongoing problems have a significant impact, my friend sought my advice on how to split and separate the data. However, based on my understanding, this is not a problem of splitting and maintaining tables and databases, but rather a problem of unreasonable system functional design leading to bloatedness. Therefore, after communication, we decided to perform domain splitting on the system’s order system.

Flow Analysis and Organization #

First, I organized the API and flow of the main order, and drew a simple diagram of the flow and the relationship between the order system from top to bottom, as shown in the following figure:

As you can see, there are multiple roles using this “order scheduling system”. I communicated with the product and development teams through this diagram to confirm that there are no issues with the data flow and system data dependencies of the main process that I understood.

Next, we focused on the order table. The order table carries too many functions, which makes it impossible to do data maintenance for multiple processes that depend on the order table. Also, there are multiple states in the order that are unrelated to the order business, such as a long scheduling cycle, which prevents the order from being closed. As we discussed in Lesson 1, a data entity should not undertake too many functions, otherwise it is difficult to manage. Therefore, we need to separate the main functions of the order and scheduling entities.

Through analysis, we also discovered another problem. The core of the current system is not the order, but the planned scheduling. The original order system, before the transformation, implemented the upstream and downstream order distribution through the automatic matching function, so the main modules of the system revolved around order circulation. However, after adding the scheduling function, the core of the system has shifted from revolving around the order to generating orders based on the schedule, which is more in line with business needs.

Scheduling and order are related but have different functional usages. Scheduling is only a plan, while orders are only used for factory production, transportation, and upstream result verification. This means that the design core of the system modules and tables has shifted. We need to split the modules to achieve better flexibility.

In summary, our overall splitting approach is to completely separate the scheduling process and the order delivery process. In a startup company, the initial design intention of our project often deviates from the original design due to changes in market demand. This requires us to constantly re-examine our system and continuously improve it to ensure its completeness.

Because I am worried that the development team will not be able to get rid of the thinking inertia of the original system and the separation is not thorough enough, which leads to the failure of the revision, I have sorted out the roles and processes and clarified the responsibilities of each role and the relationships between processes. I drew multiple boxes based on roles and their required actions, and interspersed their actions and data flows, as shown in the following figure:

Based on this diagram, I communicated again with the development and product teams and identified the points of separation between orders and scheduling in terms of functionality and data. Specifically, the upstream functions are divided into: publishing procurement plans, receiving scheduling, placing orders, receiving/returning goods; the supplier mainly coordinates scheduling and order related services; and the factory is mainly responsible for production scheduling, production, and after-sales. In this way, the system process can be classified into several stages:

Planning and scheduling coordination stage - 2. Production and supply based on the scheduling + periodic logistics delivery stage - 3. After-sales service and exchange stage

As you can see, the first stage does not involve orders, it mainly focuses on upstream and coordination of multiple factories’ scheduling; the second and third stages are factory production, supply and after-sales, which require interaction with orders, and the perspectives of the upstream, factory, and logistics are completely different.

Based on this conclusion, we can completely split the system into two subsystems: scheduling and dispatch system, and order delivery system, based on the main entities and main business processes of the data (the order ID as the aggregation root, splitting the process into two domains: order and scheduling).

In the planning and scheduling coordination stage, the upstream submits procurement plans and receiving schedules in the scheduling and dispatch system, and then the supplier coordinates distribution and negotiation with multiple cooperating factories based on the upstream’s scheduling situation and procurement needs. Once an agreement is reached, the upstream reserves the scheduling and production scheduling in advance.

After the upstream signs the agreement and pays the deposit for production batches, the scheduling system will generate corresponding orders in the order system based on the scheduling and factory placing orders. Meanwhile, once the upstream, supplier, and factory reach a cooperation agreement, additional scheduling can be continuously added, rather than limiting the cooperation period within the order.

In the production and supply based on the scheduling stage, the scheduling system, while calling the order system, will pass the specific main order number and order details. The order details contain the categories and quantities of planned production as well as the delivery volume for each period. The factory can adjust the production schedule according to its own situation. After the products are produced, the factory sends them in batches through logistics for delivery, and records the delivery time, quantity, and logistics information in the order system. At the same time, the order system generates financial information and reconciles in batches with the upstream finance and warehouse.

After such splitting, the two systems associate procurement scheduling and delivery batches as the aggregate root and have data connections. This way, the overall order process becomes much simpler.

In general, the previous business analysis was conducted from the perspectives of flow, roles, and key actions. Different flows were classified into different stages for analysis, and two business domains—scheduling and orders—were split based on the main entities. Through this bold splitting, we can then verify the feasibility with the product and development teams.

Starting with table splitting in system decomposition #

After going through the previous process, I believe you already have a sense of the method of splitting entity responsibilities according to the process and stages. Here, let’s review the process from the perspectives of code and database tables.

Generally, system functionalities start with table splitting, which is the easiest path to implement. This is because our business processes often revolve around a main entity table and involve multiple entities for interaction. In this case, we have separated the scheduling data and status from the order table. The code layering before the split is shown in the following diagram:

After the split, the code layering becomes like this:

As you can see, the biggest change is that the responsibilities of the order entity table have been split. Our system code has become simpler and the situation where the same order entity is called by multiple roles has completely disappeared. In the process of splitting, we have three criteria:

Data entities should only do the most essential thing, for example, orders should only handle the life and death of orders (including creation, status change, returns, order completion).
Classify business processes based on the involved entities, see if they can be divided into multiple stages, such as “coordinating scheduling process in progress,” “production process,” “after-sales service stage.”
Decide how many modules to divide orders into based on the frequency of data dependencies. If two modules have tight interaction in business processes and have data relationships, such as Join or calling A always leads to calling B, then these two modules should be merged while ensuring that no further splitting will be done in the short term.

If a core system is split and organized according to the responsibilities of entity tables, both the process and the difficulty of modification will be greatly reduced.

Module splitting can also be seen from the bottom up according to Figure 6. If the data interaction between them is not particularly frequent, such as no frequent Joins, we divide the system into four modules. As shown in Figure 7, these four modules are relatively independent and each is responsible for a core task. At the same time, there is no significant data correlation between two entities, and each module maintains all the data required for a specific stage. This division is relatively clear and easy to manage.

At this point, we only need to sort out the data and process relationships, ensuring that there is no frequent data Join in subsequent statistical analysis, in order to complete the table splitting.

However, if you want to divide modules according to business, I recommend looking at the business process from top to bottom to determine the domain scope of data entity splitting (Domain-Driven Design, DDD), as well as the responsibilities of each module.

The lower-level services need to be more abstract #

In addition to system decomposition, we also need to pay attention to the abstraction of services. Many services often need to be modified frequently due to changes in business details, and the lower-level services need to minimize changes. If the level of abstraction of the service is not enough, it will be difficult for us to determine the scope of the impact of the changes on the upstream systems.

Therefore, we need to clarify which services can be abstracted as lower-level services and how to better abstract these services.

Because e-commerce systems often split and abstract services, I will use this type of system as an example to explain to you. You may be wondering: why do e-commerce systems often split and abstract systems and services?

This is because the core and most complex part of e-commerce systems is the order system. E-commerce products have multiple categories (SKU+SPU), and different categories have different filtering dimensions, services, and units of measurement. This leads to the need for the system to record a large number of redundant category fields in order to properly save the transaction snapshot when the user places an order. Therefore, we need to frequently split and organize the system to avoid these unique characteristics from affecting other products.

In addition, different business processes in e-commerce systems have different service flows. For example, ordering food is completely different from ordering a customized cabinet. When a user purchases food, the e-commerce platform only needs to notify the warehouse to pack, generate the shipping label, ship the goods, and sign for the delivery. On the other hand, when a user customizes a cabinet, it involves the manufacturer coming to measure the size, remeasuring, customizing, transporting, and subsequent adjustments. Therefore, we need to abstract the services, making the business processes more standardized and generic, to avoid too frequent changes.

It is precisely because there are differences in the forms of business services that the order system needs to control its functions within a “certain range”. In this regard, we should consider how to minimize the functional requirements of the order table while meeting the business needs.

In fact, there is no absolute answer to this because different industries and different companies have different business forms. Here, I will give you some common abstracting approaches for your reference.

Passive abstraction #

If two or more services use the same business logic, abstract this business logic into a common service. For example, if Business A updates logic A and Business B also needs to use the updated logic A, then abstract this logic A into a common service at the lower level for both services to invoke. This kind of abstraction is relatively passive and common, suitable for systems with small codebase and few maintainers.

Using passive abstraction is relatively easy for systems in the early stage of startups with unclear main architecture. However, its disadvantage is that the level of abstraction is not high, and when there are a large number of business changes, it requires a certain scale of refactoring.

Generally speaking, although the code structure of this approach is closely related to the business, it is cumbersome and the code layers are not regular. Therefore, the passive abstraction method is suitable for the exploration stage of a new project.

Here, let me digress a little. It is forbidden for modules of the same level to call each other. If they are called, the two services need to be abstracted into common services, and the upper level should aggregate these two services, as shown in the red “X” in the above diagram. After the split, it looks like the following diagram:

The purpose of doing so is to make the system structure inverted from top to bottom, ensuring that there are no cross-referencing loops. Otherwise, it would make it difficult to debug and maintain the project, and it would require a large amount of resources to solve this problem when making system improvements.

Dynamic Auxiliary Table Method #

This method is suitable for slightly larger teams or systems. Its specific implementation is as follows: when the order system is used by several development teams together, different types of main orders created by different businesses will store business-specific data in different auxiliary tables. For example, ordinary products are stored in the tables order and order_product_extra, and the customization process status of custom products is stored in order_customize_extra.

The advantage of this approach is that it is closer to the business and facilitates queries. However, due to the presence of other business data in the auxiliary tables, the isolation of the business is relatively poor, and all businesses that depend on the order service are often affected. Also, the order needs to be constantly updated along with business changes. Therefore, the abstracted order service implemented in this way is not really functional, and generally, only the core businesses of companies would make similar customizations.

Mandatory Standard Interface Method #

This method is more common in large enterprises, with the core point being that the underlying services only provide standard services, while the personalized aspects of the business are handled by the business itself. For example, the order system only provides functions for placing orders, waiting for payment, successful payment, delivery, and receipt. When displaying, the frontend aggregates personalized data and standard orders.

With this approach, the coupling of the common service order to the business is minimal, and the order service is easier to maintain when the business is revised. However, the interaction between the upper-level businesses can be cumbersome because they will need to store a lot of additional information locally and implement some flow themselves. However, overall, for systems with multiple business use cases, modifications due to business changes will be minimal.

From the above three methods, it can be seen that the stability of the business depends on the level of abstraction of the service. If the underlying changes frequently, the entire business will need to be constantly modified, ultimately leading to confusion. Therefore, personally, I recommend using the Mandatory Standard Interface Method, which is also a common practice in many companies. Although it is difficult to use, it is still better than constantly refactoring the entire system.

You may wonder, why not design the first method all at once? This is because most startup businesses are not stable. Although designing in advance can keep the code structure consistent, when you look back after two years, you will find that the initial design has changed completely. The design we were so confident about in the beginning will eventually become a stumbling block for the business.

Therefore, this kind of decomposition and architectural design requires us to periodically review, introspect, and make adjustments. After all, technology serves the business, and the business is more important. No one can guarantee that the egocentric design in the initial stages of the project will not be changed into a personal portal for socializing.

In summary, each method is not absolutely correct. We need to decide which approach to use based on business requirements.

Summary #

There are many methods for business decomposition, and the simplest and most convenient way is: first, arrange the business processes from top to bottom, classify and aggregate the processes; then, from different domain aggregations, identify the main entities required for interaction, and determine whether to decompose based on the data dependency between the main entities in the process (look from bottom to top); after decomposing different entities and actions into multiple modules, classify them according to the business process and divide them into the final modules (final summary).

In summary, the decomposition process can be summarized as follows: look at the process from top to bottom, look at the modules from bottom to top, and finally consider the results of both processes and modules. Using this method, module boundaries can be quickly identified, and the decomposed business will be very clear.

In addition to decomposing the business, we also need to pay attention to how to abstract services. If the underlying business changes frequently, it will lead to frequent modifications in the upper-level business, and even cases of change omissions. Therefore, we need to ensure that the underlying services are sufficiently abstract, and there are many ways to achieve it, such as passive decomposition, dynamic auxiliary table method, and standard abstraction method. Each of these methods has its own advantages, and we need to make decisions based on the business.

Usually, our business systems are initially designed based on a specific goal, but as market demand changes, the business systems undergo continuous revisions, often deviating from the original design.

Although we meet the established requirements every time we revise, it is also easy to bring many unreasonable problems. Therefore, after the requirements stabilize, we usually make more reasonable improvements to ensure system integrity and improve maintainability. Many times, the first version does not need to be too refined. After the market validation clarifies the next direction, we can use the reserved space for improvement, so that the designed system will have better scalability.

Reflection Questions #

Some of the concepts in this lesson overlap with DDD, but there are still some subtle differences. Please compare the differences between the three layers of MVC and the implementation of DDD.

Feel free to discuss and communicate with me in the comments section. See you in the next class!