01 Numeric Types Avoid Auto Increment Pitfalls

01 Numeric Types - Avoid Auto_increment Pitfalls #

When designing table structures, numeric types are one of the most common types. However, using numeric types correctly is not as simple as it seems. For example:

  • How do you design an auto-increment primary key for a high-concurrency internet business? Is INT enough?
  • How do you design the balance for an account? Is using DECIMAL type foolproof?

All of the above are wrong!

Numeric types may seem simple, but in the design of table structures, it is easy to overlook the aforementioned “lack of comprehensive design thinking” issue (especially in high-concurrency internet scenarios). Therefore, I will take you on a deep dive into the use of numeric types from the perspective of business architecture design. I hope that after learning, you can truly make good use of MySQL’s numeric types (integer types, floating-point types, and high-precision types).

Numeric Types #

Integer Types #

MySQL supports the integer types supported by the SQL standard: INT and SMALLINT. In addition, MySQL also supports integer types such as TINYINT, MEDIUMINT, and BIGINT (Table 1 shows the storage space and value range occupied by various integer types):

image.png

Value range of INT types

In integer types, there are signed and unsigned attributes, which represent the value range of the integer. By default, it is signed. When designing, I do not recommend deliberately using the unsigned attribute, because when doing some data analysis, the SQL may not return the desired result.

Let’s take an example of a “sale” table, with its table structure and data as follows. It is worth noting that the column sale_count uses the unsigned attribute (i.e., when designing, it is expected that the column stores a value greater than or equal to 0):

mysql> SHOW CREATE TABLE sale\G

*************************** 1. row ***************************
       Table: sale
Create Table: CREATE TABLE `sale` (
  `sale_date` date NOT NULL,
  `sale_count` int unsigned DEFAULT NULL,
  PRIMARY KEY (`sale_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec)

mysql> SELECT * FROM sale;
+------------+------------+
| sale_date  | sale_count |
+------------+------------+
| 2020-01-01 |      10000 |
| 2020-02-01 |       8000 |
| 2020-03-01 |      12000 |
| 2020-04-01 |       9000 |
| 2020-05-01 |      10000 |
| 2020-06-01 |      18000 |
+------------+------------+
6 rows in set (0.00 sec)

Here, sale_date represents the date of sale, and sale_count represents the monthly sales quantity. Now there is a requirement that the boss wants to track the changes in sales quantity each month in order to make business decisions. This SQL statement needs to apply to non-equality joins, but it is not too difficult to write:

SELECT
    s1.sale_date, s2.sale_count - s1.sale_count AS diff
FROM
    sale s1
        LEFT JOIN
    sale s2 ON DATE_ADD(s2.sale_date, INTERVAL 1 MONTH) = s1.sale_date
ORDER BY sale_date;

However, during execution, because the column sale_count uses the unsigned attribute, it will throw this result:

ERROR 1690 (22003): BIGINT UNSIGNED value is out of range in '(`test`.`s2`.`sale_count` - `test`.`s1`.`sale_count`)'

As you can see, MySQL indicates that the calculated result is out of range. In fact, here MySQL requires that the unsigned value after subtraction remains unsigned, otherwise an error will occur.

To avoid this error, you need to set the database parameter sql_mode to NO_UNSIGNED_SUBTRACTION, allowing the result of subtraction to be signed so that you can get the desired result:

mysql> SET sql_mode='NO_UNSIGNED_SUBTRACTION';
Query OK, 0 rows affected (0.00 sec)
SELECT
    s1.sale_date,
    IFNULL(s2.sale_count - s1.sale_count,'') AS diff
FROM
    sale s1
    LEFT JOIN sale s2 
    ON DATE_ADD(s2.sale_date, INTERVAL 1 MONTH) = s1.sale_date
ORDER BY sale_date;
+------------+-------+
| sale_date  | diff  |
+------------+-------+
| 2020-01-01 |       |
| 2020-02-01 | 2000  |
| 2020-03-01 | -4000 |
| 2020-04-01 | 3000  |
| 2020-05-01 | -1000 |
| 2020-06-01 | -8000 |
+------------+-------+
6 rows in set (0.00 sec)

Floating-Point Types and High-Precision Types #

In addition to integer types, commonly used numeric types also include floating-point and high-precision types.

In previous versions of MySQL, there were floating-point types Float and Double. However, these types are not high precision and not SQL standard types, so they are not recommended for use in real production environments. Otherwise, due to precision issues, the final calculation results may be incorrect.

Even more importantly, starting from MySQL 8.0.17, when using the Float or Double types in table creation, the following warning will be thrown: MySQL reminds users not to use the above floating-point types and even warns that floating-point types will be deprecated in future versions.

Specifying number of digits for floating point data types is deprecated and will be removed in a future release

The high-precision DECIMAL type in numeric types can be used. When declaring a column of this type, you can (and usually must) specify the precision and scale, for example:

salary DECIMAL(8,2)

Here, 8 is the precision (precision represents the main number of digits to store the value), and 2 is the scale (scale represents the number of digits to store after the decimal point). Usually, in table structure design, the DECIMAL type can be used to represent business entities such as user salary and account balance, accurate to two decimal places.

However, when used in high-concurrency internet businesses, the design of the field for monetary values is not recommended to use the DECIMAL type. It is more recommended to use the INT integer type (the reason will be analyzed later in the text).

Practical Design of Business Table Structure #

Integer Type and Auto Increment Design #

In real-world business scenarios, the most common use of integer types is to represent the quantity of an item in the business. For example, the sales quantity in the table above, or the inventory quantity and purchase count in e-commerce. Another common and important usage of integer types in enterprise is as the primary key of a table, which is used to uniquely identify a row of data.

By combining the integer type with the attribute auto_increment, the auto-increment feature can be achieved. However, when using auto-increment as the primary key in table structure design, it is important to pay special attention to the following two points. Ignoring them may lead to catastrophic consequences for the business:

  • Use BIGINT as the primary key instead of INT.
  • Auto-increment values are not persistent and may have backtracking effects (prior to MySQL 8.0).

From Table 1, it can be observed that the maximum range of INT is around 4.2 billion. In real-world internet business applications, it is easy to reach this maximum value. For example, some transaction tables or log tables may have a daily data volume of 10 million, which can hit the limit of the INT type after 420 days.

Therefore, (pay attention to point 1) always use BIGINT as the primary key for auto-increment integers, not INT. Do not use INT just to save 4 bytes. When the limit is reached, changing the table structure later will be a huge burden and pain.

This brings us to an interesting question: what will happen to the database when it reaches the limit of the INT type? Will it reset to 1? Let’s verify this with the following SQL statements:

mysql> CREATE TABLE t (
    -> a INT AUTO_INCREMENT PRIMARY KEY
    -> );

mysql> INSERT INTO t VALUES (2147483647);

Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO t VALUES (NULL);

ERROR 1062 (23000): Duplicate entry '2147483647' for key 't.PRIMARY'

As we can see, when the INT limit is reached and another auto-increment insertion is attempted, a duplicate error is reported. MySQL does not automatically reset it to 1.

The second point to pay special attention to is that (pay attention to point 2) prior to MySQL 8.0, auto-increment is not persistent, and auto-increment values may have backtracking issues!

mysql> SELECT * FROM t;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
+---+
3 rows in set (0.01 sec)

mysql> DELETE FROM t WHERE a = 3;
Query OK, 1 row affected (0.02 sec)

mysql> SHOW CREATE TABLE t\G
*************************** 1. row ***************************
Table: t
Create Table: CREATE TABLE `t` (
  `a` int NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.00 sec)

As we can see, after deleting the record with an auto-increment value of 3, the next auto-increment value remains as 4 (AUTO_INCREMENT=4). This is not an error, as auto-increment does not backtrack. However, if the database restarts at this point, the auto-increment starting value of table t will again become 3, resulting in auto-increment value backtracking. Here is an example:

mysql> SELECT * FROM t;
+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
+---+
3 rows in set (0.00 sec)

Please note that this behavior is only applicable to versions prior to MySQL 8.0.

mysql> SHOW CREATE TABLE t\G

*************************** 1. row ***************************

       Table: t

Create Table: CREATE TABLE `t` (

  `a` int NOT NULL AUTO_INCREMENT,

  PRIMARY KEY (`a`)

) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

1 row in set (0.00 s

To completely solve this problem, there are two methods:

  • Upgrade MySQL to version 8.0. The auto-increment value for each table will be persisted.
  • If upgrading the database is not possible, it is strongly not recommended to use the auto-increment data type as the primary key in core business tables.

In fact, in the process of designing mass internet architecture, for better scalability of distributed architecture in the future, it is not recommended to use integer types as primary keys, but it is more recommended to use string types (this part will be described in detail in section 05).

Designing Money Fields #

In the design of business entities such as user balance, fund account balance, digital wallet, and change, because these fields are all money fields, programmers usually use the DECIMAL type as the field type, because this allows for precision up to two decimal places, for example: DECIMAL(8,2).

CREATE TABLE User (

  userId BIGINT AUTO_INCREMENT,

  money DECIMAL(8,2) NOT NULL,

  ......

)

(Important Note 3) In the design standard of mass internet business, it is not recommended to use the DECIMAL type, but it is more recommended to convert DECIMAL to an integer type. In other words, it is more recommended to store money types in cents instead of dollars. For example, representing 1 dollar in the database as the integer value 100.

If a DECIMAL type is used to represent the range of values for an amount field, how should the length be defined? Because the DECIMAL type is a variable-length field, if an amount field needs to be defined, defining it as DECIMAL(8,2) is far from enough. This can only represent storing a maximum value of 999999.99, which is limited to storing millions of dollars.

User balances should at least store fields in the billions, while GDP values by the National Bureau of Statistics may reach trillions. Defining them with the DECIMAL type does not provide a good standard.

Another important point is that the DECIMAL type is implemented using a binary encoding method, and its calculation efficiency is far less efficient than using integer types. Therefore, it is recommended to use BIG INT to store amount-related fields.

Even if the fields are stored in cents, BIG INT can still store a trillion-level amount. Here, 1 trillion = 10 trillion.

The advantage of this is that all amount-related fields are fixed-length fields, occupying 8 bytes of storage space. Another benefit is that direct calculations using integer types are more efficient.

Note that in database design, we strongly emphasize storing fixed-length data, because fixed-length storage has better performance.

Let’s take a look at the storage format in the database, roughly as follows:

图片5.png

If an update occurs, the original space occupied by record 1 cannot accommodate the updated record 1, so the database will mark record 1 as deleted and find new space for record 1 to use, as shown below:

图片6.png

In the above figure, *Record 1 refers to the space originally occupied by record 1, and this space will become fragmented space in the future and cannot be reused unless the table space is manually defragmented.

So, when using BIG INT to store amount fields, how should decimal point data be represented? In fact, this part can be completely handled and displayed by the frontend. As the database itself, it only needs to store the amount in cents.

Summary #

Today, I took you deep into the practical aspects of MySQL numerical types in table structure design. Let me summarize today’s key points:

  • It is not recommended to use the Unsigned attribute for integer types. If it must be used, the parameter sql_mode must include the NO_UNSIGNED_SUBTRACTION option.
  • When using auto-increment integer types as primary keys, it is necessary to use the BIGINT type instead of INT. The cost of adjusting the table structure later can be huge.
  • Before MySQL version 8.0, there is a rollback problem with auto-increment integers. As a business developer, you must be aware of this problem.
  • When the upper limit value of the auto-increment integer is reached and an insert operation is performed again, MySQL will report a duplicate error.
  • Do not use floating-point types Float and Double. MySQL will no longer support these two types in future versions.
  • For account balance fields, it is better to design them using integer types instead of DECIMAL types. This provides better performance and more compact storage.