04 Non Structured Storage Make Good Use of JSON

04 Non-Structured Storage - Make Good Use of JSON #

In the previous few lectures, I have introduced you to the three common types in the MySQL database: numeric types, string types, and date types. However, they all belong to the category of traditional relational design.

Relational structured storage has certain drawbacks because it requires predefined columns and their corresponding types. However, during the development of a business, it may be necessary to expand the description function of a single column. In this case, if the JSON data type can be used well, it can bridge the gap between relational and non-relational data storage, providing better architectural choices for the business.

Of course, many students encounter various problems when using the JSON data type, and the most common misconception is to simply think of the JSON type as a string type. However, after learning today’s content, you will truly understand the power of JSON data types, and thereby store unstructured data better in practical work.

JSON Data Types #

JSON (JavaScript Object Notation) is mainly used for data exchange between internet application services. MySQL supports the JSON specification defined by RFC 7159, mainly includes two types: JSON object and JSON array. The following is a JSON object mainly used to store related information of an image:

{
    "Image": {
        "Width": 800,
        "Height": 600,
        "Title": "View from 15th Floor",
        "Thumbnail": {
            "Url": "http://www.example.com/image/481989943",
            "Height": 125,
            "Width": 100
        },
        "IDs": [116, 943, 234, 38793]
    }
}

From this example, you can see that the JSON type can well describe the relevant content of the data, such as the width, height, and title of this image (integer and string types are used here).

In addition to supporting string types, integer types, and date types, JSON objects can also support array types. For example, the IDs field in the above code.

Another type of JSON data is the array type, such as:

[
    {
        "precision": "zip",
        "Latitude": 37.7668,
        "Longitude": -122.3959,
        "Address": "",
        "City": "SAN FRANCISCO",
        "State": "CA",
        "Zip": "94107",
        "Country": "US"
    },
    {
        "precision": "zip",
        "Latitude": 37.371991,
        "Longitude": -122.026020,
        "Address": "",
        "City": "SUNNYVALE",
        "State": "CA",
        "Zip": "94085",
        "Country": "US"
    }
]

The above example demonstrates a JSON array with two JSON objects.

So far, many students may treat JSON as a large string type, which seems to be correct at first glance. But in essence, JSON is a new type with its own storage format. It can also create indexes on each corresponding field and do specific optimizations, which cannot be achieved by traditional strings. Another advantage of the JSON type is that there is no need to predefine fields, the fields can be expanded infinitely. On the other hand, traditional relational databases require predefined columns, and to expand, heavy operations such as ALTER TABLE … ADD COLUMN … are needed.

It should be noted that JSON types are supported starting from MySQL 5.7, and version 8.0 solves the performance bottleneck of updating JSON logs. If you want to use JSON data types in a production environment, it is strongly recommended to use MySQL 8.0.

By now, you already have a basic understanding of the JSON type. Next, let’s move on to the practical part: how to make good use of the JSON type in business?

Business Table Structure Design in Practice #

User Login Design #

In the database, the JSON type is more suitable for storing data that is modified less frequently and relatively static, such as storing user login information as follows:

DROP TABLE IF EXISTS UserLogin;

CREATE TABLE UserLogin (
    userId BIGINT NOT NULL,
    loginInfo JSON,
    PRIMARY KEY(userId)
);

Due to the increasing diversification of login methods in the current business, such as supporting login with phone numbers, WeChat, and QQ accounts under the same account, the JSON type can be used to store login information.

Next, insert the following data:

SET @a = '
{
    "cellphone": "13918888888",
    "wxchat": "破产码农",
    "QQ": "82946772"
}
';

INSERT INTO UserLogin VALUES (1,@a);

SET @b = '
        uk_idx_cellphone VARCHAR(255),
    
        CONSTRAINT uk_idx_cellphone UNIQUE (uk_idx_cellphone),
    
        INDEX(idx_cellphone) USING BTREE (cellphone)
    
    );
    UNIQUE KEY uk_idx_cellphone(cellphone)

);

User Profile Design #

Some businesses require user profiling (tagging users), and then recommend corresponding products based on user tags through data mining techniques. For example:

  • In the e-commerce industry, recommend relevant products based on the user’s fashion preferences.
  • In the music industry, recommend corresponding songs based on the user’s favorite music style and frequently listened to artists.
  • In the financial industry, recommend corresponding financial products based on the user’s risk preferences and investment experience.

Here, I strongly recommend using the JSON type to store user profile information in the database, and combining the characteristics of JSON array type and multi-value indexes for efficient querying. Suppose there is a profile definition table:

CREATE TABLE Tags (

    tagId bigint auto_increment,

    tagName varchar(255) NOT NULL,

    primary key(tagId)

);

SELECT * FROM Tags;

+-------+--------------+

| tagId | tagName      |

+-------+--------------+

|     1 | 70s          |

|     2 | 80s          |

|     3 | 90s          |

|     4 | 2000s        |

|     5 | Sports Lover |

|     6 | Highly Educated  |

|     7 | Petite Bourgeois  |

|     8 | Owns a House  |

|     9 | Owns a Car  |

|    10 | Movie Lover  |

|    11 | Love Online Shopping  |

|    12 | Love Order Takeout  |

+-------+--------------+

As you can see, the Tags table is a profile definition table that describes how many tags are currently defined. Then, each user is assigned tags. For example, David’s tags are ’80s’, ‘Highly Educated’, ‘Petite Bourgeois’, ‘Owns a House’, ‘Movie Lover’; Tom’s tags are ’90s’, ‘Movie Lover’, ‘Love Order Takeout’.

If you don’t use JSON data type for tag storage, you usually store all user tags in one field using a string and a delimiter:

+-------+---------------------------------------+

| User  | Tags                                  |

+-------+---------------------------------------+

| David | 80s;Highly Educated;Petite Bourgeois;Owns a House;Movie Lover |

| Tom   | 90s;Movie Lover;Love Order Takeout |

+-------+---------------------------------------+

The drawback of doing this is: it is not convenient to search for users with specific profiles. In addition, the delimiter is a self-convention, and other data can be stored in the database arbitrarily, which will create dirty data.

Using the JSON data type can solve this problem well:

DROP TABLE IF EXISTS UserTag;

CREATE TABLE UserTag (

    userId bigint NOT NULL,

    userTags JSON,

    PRIMARY KEY (userId)

);

INSERT INTO UserTag VALUES (1,'[2,6,8,10]');

INSERT INTO UserTag VALUES (2,'[3,10,12]');

In the above, the tags stored in userTags are the tag values defined in the Tags table, but they are stored using the JSON array type.

Starting from MySQL 8.0.17, Multi-Valued Indexes are supported for creating indexes on JSON arrays, and functions such as member of, json_contains, json_overlaps can be used to quickly retrieve indexed data. So you can create Multi-Valued Indexes on the UserTag table:

ALTER TABLE UserTag

ADD INDEX idx_user_tags ((cast((userTags->"$") as unsigned array)));

If you want to query users with the profile “Movie Lover”, you can use the MEMBER OF function:

EXPLAIN SELECT * FROM UserTag 

WHERE 10 MEMBER OF(userTags->"$")\G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: UserTag

   partitions: NULL

         type: ref

possible_keys: idx_user_tags

          key: idx_user_tags

      key_len: 9

          ref: const

         rows: 1

     filtered: 100.00

        Extra: Using where

1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 

WHERE 10 MEMBER OF(userTags->"$");

+--------+---------------+

| userId | userTags      |

+--------+---------------+

|      1 | [2, 6, 8, 10] |

|      2 | [3, 10, 12]   |

+--------+---------------+

2 rows in set (0.00 sec)

If you want to query users with the profiles “80s” and “Movie Lover”, you can use the JSON_CONTAINS function:

EXPLAIN SELECT * FROM UserTag 

WHERE JSON_CONTAINS(userTags->"$", '[2,10]')\G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: UserTag

   partitions: NULL

         type: range

possible_keys: idx_user_tags

          key: idx_user_tags

      key_len: 9

          ref: NULL

         rows: 3

     filtered: 100.00

        Extra: Using where

1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 

WHERE JSON_CONTAINS(userTags->"$", '[2,10]');

+--------+---------------+

| userId | userTags      |

+--------+---------------+

|      1 | [2, 6, 8, 10] |

+--------+---------------+

1 row in set (0.00 sec)

If you want to query users with the profiles “80s”, “90s”, and “Movie Lover”, you can use the JSON_OVERLAPS function:

EXPLAIN SELECT * FROM UserTag 

WHERE JSON_OVERLAPS(userTags->"$", '[2,3,10]')\G

*************************** 1. row ***************************

           id: 1

  select_type: SIMPLE

        table: UserTag

   partitions: NULL

         type: range

possible_keys: idx_user_tags

          key: idx_user_tags

      key_len: 9

          ref: NULL

         rows: 4

     filtered: 100.00

        Extra: Using where

1 row in set, 1 warning (0.00 sec)

SELECT * FROM UserTag 

WHERE JSON_OVERLAPS(userTags->"$", '[2,3,10]');

+--------+---------------+

| userId | userTags      |

+--------+---------------+

|      1 | [2, 6, 8, 10] |

|      2 | [3, 10, 12]   |

+--------+---------------+

2 rows in set (0.01 sec)

Summary #

The JSON type is a data type added in MySQL 5.7, and effectively solving many practical problems in business by using the JSON data type. Lastly, I summarize today’s key points:

  • Use the JSON data type, and it is recommended to use MySQL version 8.0.17 or above for better performance, which also supports Multi-Valued Indexes.
  • The advantages of the JSON data type include not having to define columns in advance, as the data itself has good descriptive properties.
  • Do not store obvious relational data using JSON, such as user balance, user name, user ID card, etc., as these are data that must be included for each user.
  • The JSON data type is recommended for storing static data that is not frequently updated.