36 Discuss My SQL Supported Transaction Isolation Levels and the Principles and Application Scenarios of Pessimistic and Optimistic Locking

36 Discuss MySQL supported transaction isolation levels and the principles and application scenarios of pessimistic and optimistic locking #

In daily development, especially in business development, it is essential for Java engineers to use Java for basic data operations such as CRUD (create, read, update, delete) on databases. To perform data operations well, not only mastery of Java language-related frameworks is required, but also an understanding of various database system architectures. In today’s lecture, as a supplement to the completeness of knowledge points for Java interviews, further learning is needed in practice regarding the application and details of databases.

The question I want to ask today is, talk about the transaction isolation levels supported by MySQL, as well as the principles and application scenarios of pessimistic locking and optimistic locking?

Typical Answer #

The so-called isolation level is a definition proposed in database transactions to ensure the correctness of concurrent data reading and writing. It is not a concept exclusive to MySQL, but originates from the SQL-92 standard formulated by ANSI/ISO.

Each relational database provides its own implementation of isolation levels with their own characteristics. Although they are generally defined in terms of locks, the actual implementations vary greatly. Taking the most common MySQL InnoDB engine as an example, it is based on a composite implementation of MVCC (Multi-Versioning Concurrency Control) and locks. From low to high isolation levels, MySQL transaction isolation levels can be divided into four different levels:

Read uncommitted: A transaction can see modifications made by other transactions that have not been committed. This is the lowest isolation level that allows dirty reads.
Read committed: A transaction can only see data that other transactions have already committed, ensuring that no intermediate states are visible. Dirty reads are also avoided. Read committed is still a relatively low isolation level, as it does not guarantee the same data can be obtained when read again. It allows concurrent modification of data by other transactions and allows non-repeatable reads and phantom reads to occur.
Repeatable reads: Ensures that data read multiple times in the same transaction is consistent. This is the default isolation level for the MySQL InnoDB engine. However, unlike some other database implementations, it can be simply understood that MySQL does not produce phantom reads at repeatable read level.
Serializable: Concurrency between transactions is serialized. This usually means that shared read locks are acquired for reading, and exclusive write locks are acquired for updating. If the SQL statement uses a WHERE clause, a range lock will also be acquired (MySQL uses GAP locks for implementation, which are also used by default at repeatable read level). This is the highest isolation level.

As for pessimistic locking and optimistic locking, they are not exclusive concepts of MySQL or databases, but basic concepts in concurrent programming. The main difference is that when operating on shared data, pessimistic locking assumes a higher possibility of data conflicts, while optimistic locking assumes that conflicts are unlikely to occur in most cases and decides whether to take exclusive measures.

In the context of MySQL database application development, pessimistic locking is generally achieved by using statements like SELECT…FOR UPDATE to lock data, preventing other transactions from accidentally modifying it. Optimistic locking is similar to the AtomicFieldUpdater class in the Java concurrency package, but it does not lock the data. Instead, it uses the CAS mechanism and compares timestamps or version numbers of the data to determine the required version for optimistic locking.

I believe that MVCC, mentioned earlier, can essentially be seen as a form of optimistic locking mechanism, while exclusive read/write locks and two-phase locks are implementations of pessimistic locking.

As for their application scenarios, you can consider building a simplified train ticket query and booking system. Many people may query availability at the same time, and although a specific seat can only be sold to one person, there may be many available tickets. It is also unpredictable which query requester will proceed with the booking. In this case, optimistic locking is more suitable.

Topic Analysis #

Today’s question comes from an actual interview, and these two parts reflect the interviewer’s attempt to assess whether the interviewee has studied or thought about the internal mechanisms of databases and whether they understand basic concepts and practices related to concurrency in daily application development.

From the perspective of an ordinary database application developer, I provide a relatively simplified answer. The interviewer is likely to further expand on examples, such as designing a typical scenario to reproduce dirty reads or phantom reads, or from the perspective of database design, what means can be used to avoid similar situations. I suggest that you can experiment with typical databases to validate your own views when preparing for the interview.

There are many other points that can be assessed, and when preparing for this question, you can also compare the concurrency mechanisms in the Java language for a deeper understanding. For example, as the isolation level increases from low to high, contention gradually increases, and the resulting cost is a decrease in performance and scalability.

Databases have given rise to many different career paths:

Database Administrator (DBA), which is a separate professional field.
Database Application Engineer, many business developers fall into this category, using a combination of database and other programming language skills to develop business applications.
Database Engineer, which focuses more on developing basic software such as databases and database middleware.

The latter two are more relevant to Java development, but the knowledge and skills required are different, so the focus of the interview assessment is also different. Today, I will analyze my views on learning related knowledge and preparing for interviews.

In addition, in the field of database-related areas, Java engineers most commonly encounter O/R Mapping frameworks or similar database interaction libraries, and I will select the most widely used framework for comparison and analysis.

Knowledge Expansion #

First of all, let me talk about my views on learning in the field of database. From the perspective of application developers, at least the following aspects need to be mastered:

Database design basics, including several basic normal forms in database design, basic concepts of various databases such as tables, views, indexes, foreign keys, sequence generators, etc. Understand how to map real-world business entities and their dependencies to database structures, and know what types of database data types should be used for typical entity data.
Each database has its own design and implementation differences, so at least be proficient in the design points of the databases you have used. For example, the MySQL transaction isolation level mentioned in the beginning today is different from other databases. Further understanding of mechanisms such as MVCC and Locking is very helpful for dealing with advanced problems. You also need to understand the use of different index types, even underlying data structures and algorithms.
Common SQL statements, master basic SQL tuning skills, at least understand the basic ideas, such as how to write SQL to better utilize indexes, know how to analyze SQL execution plans, etc.
Furthermore, you need to understand solutions for specific scenarios such as high-concurrency, such as read-write separation, sharding, or how to use caching mechanisms, as current data storage is not limited to traditional relational databases.

The above diagram briefly summarizes my understanding of the database field, hoping to provide you with some reference when preparing. Of course, when preparing for interviews, it is not about burying your head in a bunch of books. I still recommend starting from the databases used in practical work, focusing on practice, and improving and deepening your knowledge system.

Next, let’s go back to Java itself. Currently, the most common Java technology for interacting with databases is JDBC. The most common open source frameworks are built on top of JDBC, including JPA/Hibernate, MyBatis, Spring JDBC Template, etc., each with unique design characteristics.

Hibernate is one of the most famous O/R Mapping frameworks and it is also a JPA Provider. As the name suggests, it is object-centered. Its strengths are reflected in the mapping from the database to Java objects. It can conveniently reflect complex relationships such as foreign key constraints at the Java object level and provides powerful persistence capabilities. It internally uses techniques such as Lazy-load to improve efficiency. In order to shield the differences of databases and reduce maintenance overhead, Hibernate provides HQL, which is similar to SQL and can automatically generate certain database-specific SQL statements.

Hibernate is widely used, but excessive emphasis on persistence and isolating the underlying details of the database also leads to many drawbacks. For example, learning HQL requires additional effort and may not be more efficient than deeply learning SQL. Weakening direct control over SQL by programmers may also lead to other costs. What could originally be done with a single SQL statement may be replaced by several statements generated by Hibernate, and the hidden internal details hinder further optimization.

On the other hand, although MyBatis still provides some mapping capabilities, it focuses more on SQL. Developers can focus on SQL and stored procedures, which is very simple and direct. If our application requires a large number of high-performance or complex SELECT statements, the “semi-automatic” MyBatis will be more practical than Hibernate.

Spring JDBC Template is also closer to the SQL level, and Spring itself can integrate with O/R Mapping frameworks such as Hibernate.

Regarding the learning of these specific open source frameworks, my suggestions are:

Grasp the architecture and design principles of mainstream frameworks as a whole, and understand the main processes, such as what happens during SQL parsing and generation, SQL execution, and result mapping.
Master the details of mapping definitions and principles. According to the interview questions I have compiled while preparing for this column, I found that many questions tend to focus on the details of mapping definitions.
In addition, comparing the designs and implementations of different frameworks is not only beneficial for deepening your understanding, but also a hot topic direction in interviews.

Today, from the perspective of database application developers, I have analyzed some internal mechanisms of the MySQL database, supplemented with my suggestions for preparing for database-related interviews and knowledge learning, and finally made a brief comparison of mainstream O/R Mapping frameworks.

Practice Session #

Have you grasped the topic we discussed today? Today’s question for contemplation is: From the perspective of architectural design, which layers can MyBatis be divided into? And what are the main modules in each layer?

Please write your thoughts on this question in the comment section. I will select the most well-thought-out comments and reward you with a learning voucher. Join me in the discussion.

Are your friends also preparing for interviews? You can “invite friends to read” by sharing today’s question with them. Perhaps you can be of help to them.