02 Theoretical Analysis Discussing Common Entry Points in Performance Optimization

02 Theoretical Analysis- Discussing Common Entry Points in Performance Optimization #

This lesson primarily explains the guidelines for Java performance optimization.

In the previous lesson, we discussed the definition of performance in detail. With a clear understanding of performance, we can have specific optimization goals and measurement methods, rather than relying solely on intuition to evaluate optimization results.

Once we have defined our optimization goals, where should we start? This lesson mainly focuses on theoretical analysis, providing an overview of the guidelines for Java performance optimization. While this lesson is theoretical, subsequent lessons will use more examples to delve into the knowledge points covered in this lesson, allowing for deeper reflection and summarization.

The 7 Categories of Performance Optimization Techniques #

Performance optimization can be classified into business optimization and technical optimization, depending on the focus of optimization. Business optimization can have a significant impact, but it falls under the purview of products and management. As programmers, our optimization methods mainly rely on a series of technical approaches to achieve our defined optimization goals. I have summarized these technical approaches into the following 7 categories:

image

As you can see, these optimization approaches mainly focus on planning for computing resources and storage resources. While there are various ways to trade space for time in optimization methods, it is not desirable to only consider computational speed without considering complexity and space requirements. What we aim to achieve is the optimal utilization of resources while also taking performance into account.

Next, I will briefly introduce these 7 optimization approaches. If you find it a bit tedious, don’t worry. The purpose of this lesson is to give you a general concept and overall understanding of the theoretical basis.

1. Reuse Optimization #

When writing code, you will find that there are many repetitive parts that can be extracted and turned into common methods. This way, you don’t have to struggle to rewrite them the next time you use them.

This idea is called reuse. The description above is about optimizing coding logic, and the same situation applies to data storage and retrieval. Whether in life or in coding, repetitive tasks happen all the time, and without reusing, work and life can be more tiring.

In software systems, when it comes to data reuse, the first thing that comes to mind is buffering and caching. Please note that these two terms have completely different meanings, which many students easily get confused with. Here is a brief introduction (detailed explanations will be given in sessions 06 and 07).

  • Buffering is commonly used for temporarily storing data and then transferring or writing them in batches. It is mostly used for sequential operations to alleviate frequent and slow random writes between different devices. Buffering mainly targets write operations.
  • Caching is commonly used for reusing previously read data by caching them in a relatively high-speed area. Caching mainly targets read operations.

Similarly, object pooling operations, such as database connection pools and thread pools, are also common. They are frequently used in Java. Since the creation and destruction of these objects have relatively high costs, we temporarily store them after use so that we don’t have to go through the time-consuming initialization process again the next time we use them.

image

2. Compute Optimization #

(1) Parallel Execution

CPUs are developing rapidly, and the majority of hardware nowadays is multi-core. To accelerate the execution of a task, the fastest and most optimal solution is to parallelize it. There are three modes of parallel execution:

The first mode is multi-machine, which uses load balancing to divide traffic or large computations into multiple parts for simultaneous processing. For example, Hadoop uses the MapReduce approach to distribute tasks and perform computations on multiple machines.

The second mode is multi-process. For instance, Nginx uses the NIO programming model, where the Master process manages Worker processes, and the actual request proxying is done by the Worker processes. This effectively utilizes multiple CPUs of the hardware.

The third mode is multi-threading, which Java programmers are most familiar with. For example, Netty adopts the Reactor programming model and also utilizes NIO, but it is thread-based. The Boss thread is responsible for receiving requests and then dispatching them to the corresponding Worker thread for actual business computation.

Languages like Golang have more lightweight coroutines, which are lighter than threads. However, coroutines are not mature in Java at present, so they will not be discussed in detail here. Nonetheless, coroutines essentially enable parallel execution of tasks on multiple cores.

(2) Transforming Synchronous into Asynchronous

Another optimization technique for computations is to transform synchronous operations into asynchronous ones. This typically involves changing the programming model. With synchronous operations, the request remains blocked until a successful or failed result is returned. Although this programming model is simple, it poses significant problems when dealing with bursty or time-skewed traffic, and requests are prone to failure.

Asynchronous operations enable easy horizontal scalability and can alleviate instant pressure, resulting in smoother request handling. Synchronous requests are like punching a steel plate, while asynchronous requests are like punching a sponge. You can imagine the difference – the latter provides more elasticity and a friendlier experience.

(3) Lazy Loading

The last technique involves optimizing business logic and improving user experience using common design patterns such as singleton and proxy patterns. For example, when drawing a Swing window that requires displaying multiple images, you can first load a placeholder and then gradually load the required resources in the background using a separate thread. This can prevent the window from becoming unresponsive.

Lark20200717-142148.jpeg

3. Result Set Optimization #

Next, let’s talk about optimizing the result set. To give you a more intuitive example, we all know that XML is a very good way to represent data. So why do we still have JSON? Besides being simpler to write, one important reason is that JSON has a smaller size, leading to higher transmission and parsing efficiency. For example, Google’s Protobuf has an even smaller size. Although the readability is reduced, it can significantly improve efficiency in high-concurrency scenarios like RPC, which is a typical optimization for result sets.

This is because our current web services are based on a client/server (C/S) model. When data is transmitted from the server to the client, it needs to be distributed multiple times. The data size grows exponentially with each distribution. Every reduction in storage size can greatly improve transmission performance and reduce costs.

For example, Nginx usually enables GZIP compression to keep the transmitted content compact. The client only needs a small amount of computing power to decompress it easily. Since this operation is distributed, the performance loss is fixed.

With this understanding, we can see the general idea of result set optimization: you should try to keep the returned data concise. If some fields are not needed by the client, you can remove them in your code or directly in the SQL query.

For businesses that do not require real-time processing but have high processing requirements, we can learn from buffer experience and minimize network interactions by adopting batch processing, thus increasing processing speed.

The result set may be used multiple times. You may store it in cache but still find its speed lacking. In this case, you need to optimize the data set by using indexes or bitmap techniques to speed up data access.

image

4. Resource Conflict Optimization #

In our everyday development, we often encounter shared resources. These shared resources can be single-machine resources, such as a HashMap; external storage, such as a database row; individual resources, such as a Setnx operation on a Redis key; or the coordination of multiple resources, such as transactions and distributed transactions.

Performance issues in reality are often related to locks. We usually think of database row locks, table locks, various locks in Java, and so on. At a lower level, such as CPU instruction-level locks, JVM instruction-level locks, and operating system internal locks, locks are everywhere.

Resource conflicts only occur during concurrency. This means that at any given moment, only one processing request can acquire a shared resource. Locking is the way to solve resource conflicts. For example, transactions are fundamentally a form of lock.

According to lock levels, locks can be categorized as optimistic locks and pessimistic locks, with optimistic locks being more efficient. According to lock types, locks can be further divided into fair locks and unfair locks, with some subtle differences in task scheduling.

The contention for resources can cause serious performance issues, so there have been studies on lock-free queues and other techniques, which greatly improve performance.

image

5. Algorithm Optimization #

Algorithms can significantly improve the performance of complex businesses, but in practical situations, they often come in different variations. With the decreasing cost of storage, in some CPU-intensive businesses, the trade-off of space for time is often employed to speed up processing.

Algorithm optimization falls under code optimization, which involves many coding techniques and requires a thorough understanding of the language’s APIs by the user. Sometimes, flexible usage of algorithms and data structures is also an important aspect of code optimization. For example, commonly used methods to reduce time complexity include recursion, binary search, sorting, and dynamic programming.

The impact on a system by an excellent implementation is much greater compared to a poor one. For instance, as implementations of a List, LinkedList and ArrayList differ by several orders of magnitude in terms of random access performance. Another example is that CopyOnWriteList employs a copy-on-write approach, which significantly reduces lock conflicts in scenarios with frequent reads and infrequent writes. Knowing when to use synchronization and when something is thread-safe also requires a high level of coding proficiency.

Accumulating knowledge in this area requires attention and practice in our daily work. In the following lessons, important knowledge points will be discussed intermittently.

6. Efficient Implementation #

In everyday programming, it is advisable to use components that have good design principles and superior performance. For example, with Netty, there is no need to choose the older Mina component. And when designing a system, it is not advisable to choose protocols such as SOAP, which are time-consuming from a performance perspective. Another example is that a good parser (such as using JavaCC) will be much more efficient than regular expressions.

In conclusion, if bottleneck points in the system are identified through testing and analysis, it is important to replace key components with more efficient ones. In this case, the adapter pattern is very important. This is why many companies like to abstract a layer of their own on top of existing components; and when switching underlying components, the upper-level applications remain unaffected.

7. JVM Optimization #

Because Java runs on top of the JVM (Java Virtual Machine), its many features are subject to the constraints of the JVM. Optimizing the JVM can also improve the performance of Java programs to some extent. Improper configuration of JVM parameters may even lead to serious consequences such as OutOfMemoryError.

Currently, the widely used garbage collector is G1, which can efficiently reclaim memory with very few parameter configurations. The CMS garbage collector has been removed in Java 14 because its GC time is uncontrollable, and its use should be avoided as much as possible under conditional circumstances.

JVM performance tuning involves various trade-offs and often has far-reaching implications. It requires considering the impact of various aspects comprehensively. Therefore, it is particularly important to have an understanding of the internal operation principles of the JVM. It helps us gain a deeper understanding of the code and enables us to write more efficient code.

Summary #

The above are the seven main areas of code optimization. Through a brief introduction, we have provided an overall understanding of performance optimization. These seven areas are the most important aspects of code optimization. Of course, performance optimization also includes database optimization, operating system optimization, architecture optimization, and other content, which are not our focus. In the following lessons, we will only provide a brief introduction.

Next, we will learn about some performance evaluation tools, understand some resource limitations of the operating system, and then discuss these seven optimization points in detail. It is recommended to review this lesson after analyzing case studies, which will further deepen your understanding of Java performance optimization.