36 Why Do People Always Think That 50,000 Rmb Can Start a Taobao Business

36 Why do some people think 50,000 yuan is enough to build a Taobao-like website? #

Hello, I am Zheng Ye.

Today, let’s start with a joke from the software industry.

The client wants to build an e-commerce website, and as the programmer for the contractor, I ask, “What kind of website do you want?” The client says, “Something like Taobao would be good.” I ask, “And how much are you willing to spend?” The client thinks for a moment and says, “Around 50,000 yuan should be enough!”

This is obviously a joke mocking clients who don’t understand the complexity of their requirements, but have you ever thought about why a system that seems simple to the client can be perceived as extremely difficult for you?

It’s because you are thinking about completely different things.

In the client’s eyes, all they want is a website where they can sell products. As long as they can list and sell items, and users can view and purchase them, it’s good enough. 50,000 yuan should be sufficient.

But in your mind, you are thinking, “Taobao, oh boy, that’s a huge technical challenge. Every year during ‘Double Eleven’ (Singles’ Day), they have to handle massive concurrent purchasing requests. Taobao must have a large team of programmers. You want to build one for just 50,000 yuan, that’s unrealistic.”

If this were in the earlier section about “communication and feedback,” I might have talked about how both parties need to coordinate and align their thinking. But in the section about “automation,” I want to discuss this problem from a different perspective: How did the system become more complex?

Development History of Taobao #

Since we’re talking about Taobao, let’s take a look at the evolution of Taobao’s technology based on some public information. In 2013, Ziliu published a book called Ten Years of Taobao Technology, which tells the story of how Taobao has changed step by step.

According to the book, the first version of Taobao was “bought”. It was a system called PHPAuction, which cost only about $2,000 even with the highest configuration. This system used the LAMP architecture, which stands for Linux + Apache + MySQL + PHP, which was a typical open-source architecture at that time.

The main work of the team was customization, and the major adjustment was to split the read and write operations of the single database into a master database and two slave databases. This structure is still the preferred choice for many teams even today.

As the traffic and data volume continued to increase, MySQL database couldn’t handle it. The default MySQL engine used at that time was MyISAM, which would lock the table when writing data and also block reading. Of course, this was just one of many problems.

By the end of 2003, the team switched from MySQL to Oracle. Due to Oracle’s better performance, they changed back to a single database structure. However, since the default solution for PHP to access the database did not have a connection pool, they had to find an open-source SQL Relay, which laid the foundation for further improvements.

As the data volume continued to grow, local storage was no longer sufficient, so they had to introduce network storage to solve the problem. However, even after splitting the storage nodes, they still couldn’t solve the problem, so Taobao started to buy small-scale servers.

IBM’s small servers, Oracle’s databases, and EMC’s storage marked the beginning of the IOE (Information of Everything) stage.

In early 2004, SQL Relay became an unsolvable pain point, so they had to think about a more fundamental solution: changing the programming language. Java became the best choice as it was the mainstream at that time.

The replacement plan was to divide the business into modules and replace them piece by piece. The old modules were only maintained without adding new features, while new features were developed in new modules, and both old and new modules shared the same database. Once a new feature went live, the corresponding feature in the old module would be disabled until all the features had been replaced, and then the old module would be taken offline.

As the data volume of Taobao continued to grow, a single Oracle server quickly reached its limit, so the team adopted the common “sharding” pattern. However, sharding brought new problems—how to consolidate data across databases. Therefore, they created a DBRoute to handle the data from sharding.

However, this approach also brought a new problem. Connecting to multiple databases at the same time meant that any problem with a database would cause the entire website to fail.

When the data volume of Taobao continued to grow, it became difficult for the database to handle every request. One solution was to introduce caching and Content Delivery Network (CDN) to free the database from the pressure of handling read data.

At that time, the caching system was not as mature as it is today, so the team modified an open-source project to develop their own caching system. The CDN they initially used was a commercial system, but the increase in traffic made it unable to cope, so they had to build their own CDN.

Later, because the CDN consumed a large amount of server resources, Taobao started developing its own low-power servers in order to reduce costs.

As the business continued to develop and more developers joined, the system became bloated and the coupling increased, leading to an increased probability of errors. At this point, they had to decompose the system, separating out modules with high reusability, such as user information.

As the business continued to grow, the decomposition started to scale up from local areas, gradually separating the underlying business from the upper-level processes, and gradually modularizing all the business operations.

With a relatively clear division of business, more underlying business operations could be applied to different scenarios, forming a basic infrastructure. New business operations could be built using this infrastructure, and the upper-level operations flourished like spring shoots after rain.

In this process, there were many technical problems that didn’t have good solutions at that time or were not suitable for their own scenarios. Therefore, Taobao’s engineers had to develop their own solutions, such as the distributed file system (TFS), caching system (Tair), distributed service framework (HSF), and so on. There were also some technical explorations aimed at cost-saving, such as moving away from the IOE approach and developing low-power servers.

I have provided a quick overview of the development of the Taobao website as an example, just to give you an understanding of how a system evolves. If you are interested in learning more details, I would suggest reading the book mentioned above. Of course, the current Taobao website is certainly more complete and complex than described here.

Different Systems for the Same Business #

Why do we need to understand the evolution process of a system? Because as programmers, we need to know exactly what kind of system we are dealing with.

Returning to our topic today, can 50,000 yuan create a Taobao (Chinese online shopping platform)? The answer is that it depends on what kind of system you want. The initial “Taobao” that was purchased didn’t even cost 50,000 yuan, and the Taobao site today is clearly not the same system as it was back then.

In terms of business, today’s Taobao is indeed much richer, but the core business itself has not changed significantly. It is still mainly about sellers providing products and buyers purchasing them. So what is the essential difference between them?

If you review the process mentioned above, you will see that as the business volume grows, the existing technology is no longer sufficient. Therefore, new technologies need to be used to solve these problems. The key point here is: different business volumes.

A system that only serves a few people is sufficient with a single machine, and a newly-joined programmer can do a good job of implementing such a system. However, when the business volume reaches a level that a single machine cannot handle, multiple machines need to be used for processing, which requires consideration of distributed system issues, possibly by introducing middleware.

When a system becomes a service provider for massive businesses, there is no ready-made middleware available to help, and problems need to be solved from a lower level.

Although these systems may seem the same from a business perspective, from a technological standpoint, the problems they face at different stages are different because they face different business scales. In more accurate terms, systems of different scales are fundamentally not the same system.

As long as the business keeps growing, problems will continue to arise, and the system will need to be continuously renovated. I once heard a very vivid analogy: transform an Alto (small car) into an Audi.

Are you using the right technology? #

As programmers, we all know the importance of technology, so we all strive to learn various new technologies. Especially when a technology comes with the reputation of a big company, many people can’t wait to learn it.

I have attended many technical conferences, and when someone from a big company shares their knowledge, it is usually packed with people eager to learn about those “advanced” technologies.

Okay, then what?

Many people are so eager to apply these technologies to their own projects. I have interviewed many programmers who would passionately talk about the technology, mentioning how they consider various distributed scenarios in their design and how they would handle system pressures.

Out of curiosity, I asked a question, “How many users does your system have?” It turned out that they were just working on an internal system with low usage frequency.

There are many programmers who pursue technology for the sake of technology, and the consequence of excessive use of technology is unnecessary complexity. Even if they use a sledgehammer to crack a nut, without real scenario testing, they cannot obtain genuine feedback, and their understanding of the technology remains superficial.

In the previous example, the reason why engineers at Taobao aimed to improve the system was not driven by technology, but rather by the increasing complexity caused by the growing business volume.

Therefore, evaluating the current stage of a system and using appropriate technology solutions is the most important issue we should consider.

You may say, “The system I’m working on doesn’t have such a large business volume, but I still want to improve my technical skills. What should I do?” The answer is to find a place with good problems to solve. The IT industry nowadays provides many opportunities for programmers, so it is not difficult to find a place with good problems. Of course, the prerequisite is that you have the basic ability to solve problems.

Conclusion #

Today, I used the example of the Taobao system to introduce to you the development process of a system becoming increasingly complex. I hope you can recognize that systems of different business scales are fundamentally different.

On the one hand, some people may underestimate the complexity of other people’s systems due to a lack of understanding of business scale. On the other hand, some people may blindly apply technology, introducing unnecessary complexity into the system and getting themselves stuck.

As programmers with technical abilities, we all care about improving our personal technical skills, but we may not consider enough in what situations and what kind of technology is more suitable. Using appropriate technology to solve current problems is a question that every programmer should carefully consider.

If you can only remember one thing from today’s content, please remember: Use simple technology to solve problems until the problems become complex.

Finally, I want you to think back and see if you have encountered problems caused by making technology unnecessarily complex. Feel free to write down your thoughts in the comments area.

Thank you for reading. If you find this article helpful, feel free to share it with your friends.