30 How to Correctly Store and Transmit Sensitive Data

30 How to Correctly Store and Transmit Sensitive Data

30 How to Correctly Store and Transmit Sensitive Data #

Today, let’s discuss how to store and transmit sensitive information such as usernames, passwords, and ID cards from a security perspective. At the same time, you can further review the hash function, symmetric encryption, asymmetric encryption algorithms, and related knowledge such as HTTPS.

How Should User Passwords be Stored? #

The most sensitive data is undoubtedly the user’s password. Once a hacker steals a user’s password, they may be able to log into the user’s account, deplete their assets, post malicious information, and more. What’s even more frightening is that some users consistently use the same password, so once the password is leaked, hackers can use it to log in to various accounts across the internet.

To prevent password leaks, the most important principle is not to store user passwords. You might find this amusing, but what I mean is not to store the raw password. This way, even if the database is compromised, user passwords will not be leaked.

I often hear people say not to store passwords in plaintext and that passwords should be stored by encrypting them using MD5. This is indeed a correct direction, but this statement is not entirely accurate.

Firstly, MD5 is not actually an encryption algorithm. What we call an encryption algorithm is one that can use a key to encrypt plaintext into ciphertext, and then use the same key to decrypt the ciphertext back into plaintext. In other words, it is a two-way process.

However, MD5 is a hashing or digest algorithm. No matter how long the data is, the result of applying the MD5 algorithm is always a fixed-length digest or fingerprint, which cannot be decrypted back into the original data. Therefore, MD5 is a one-way process. Most importantly, using MD5 alone to hash passwords is not safe.

For example, the following code saves user information and calculates the MD5 hash of the password:

UserData userData = new UserData();

userData.setId(1L);

userData.setName(name);

// Save the password field using MD5 hash

userData.setPassword(DigestUtils.md5Hex(password));

return userRepository.save(userData);

The output shows that the password is a 32-character MD5 hash:

"password": "325a2cc052914ceeb8c19016c091d2ac"

By entering this MD5 hash into a certain MD5 cracking website, the original password is obtained in less than 1 second:

In fact, you can think about it: although MD5 cannot be decrypted, we can create a huge database and calculate the MD5 hash for all possible combinations of numbers and letters up to 20 characters long, and store them. Then, when decryption is needed, we can simply search the MD5 hash to obtain the original value. This is called a rainbow table.

Currently, some MD5 decryption websites use rainbow tables, which are a technique that balances time and space, meaning it can use more space to reduce the cracking time, or use longer cracking time to reduce space usage.

Additionally, you might think that using multiple rounds of MD5 is more secure, but it is not the case. For example, the following code uses two rounds of MD5 hashing:

userData.setPassword(DigestUtils.md5Hex(DigestUtils.md5Hex(password)));

This results in the following MD5 hash:

"password": "ebbca84993fe002bac3a54e90d677d09"

This password can also be cracked, and the cracking website even informs us that it is the result of two rounds of MD5 hashing:

Therefore, directly storing the MD5 hash of the password is not secure. Some people may say that salting is necessary. Yes, but if done improperly, salting can still be very insecure. There are two important points to consider.

Firstly, the salt should not be hardcoded in the code, and the salt should have a certain length, for example:

userData.setPassword(DigestUtils.md5Hex("salt" + password));

This results in the following MD5 hash:

"password": "58b1d63ed8492f609993895d6ba6b93a"

For this MD5 hash, although it cannot be found on a cracking website, a hacker can register an account and use a simple password, such as “1”:

"password": "55f312f84e7785aa1efa552acbf251db"

Then, they can try this MD5 hash on a cracking website and obtain the original password, which is “salt”, revealing the value of the salt:

In fact, knowing the salt value is not that important. The key issue is that we hardcoded the salt in the code and the salt is short, and all users share the same salt value. Doing this has three problems:

Since the salt is too short and simple, if the user’s original password is also simple, then the combined password is also short. In this case, most MD5 cracking websites can directly decrypt this MD5 hash by removing the salt, revealing the original password.
The same salt means that users who use the same password will have the same MD5 hash. Knowing one user’s password may reveal multiple other passwords.
We can also use this salt to construct a rainbow table. Although it may incur substantial costs, once it is constructed, everyone’s password can be cracked.

Therefore, it is best to have a unique and sufficiently long salt value for each password, for example, exceeding 20 characters.

Secondly, although it is recommended that each person’s salt be different, I do not recommend using a portion of the user’s data as the salt. For example, using the username as the salt:

userData.setPassword(DigestUtils.md5Hex(name + password));

If all systems in the world store passwords in this way, then users like “root” or “admin” will eventually have their complex passwords cracked, as hackers can create rainbow tables specifically for these commonly used usernames. Therefore, the salt should be a random value that is globally unique, meaning there is no pre-existing rainbow table available to use.

The correct approach is to use a globally unique, user-independent, and sufficiently long random value as the salt. For example, UUID can be used as the salt, and the salt is saved together with the password in the database:

userData.setSalt(UUID.randomUUID().toString());

userData.setPassword(DigestUtils.md5Hex(userData.getSalt() + password));

And every time the user changes the password, the salt should be recomputed and the new password should be saved. You might ask, since the salt is stored in the database, wouldn’t it be visible if the database is compromised? Shouldn’t it be encrypted?

In my opinion, it is not necessary to encrypt salt for storage. The purpose of salt is to prevent quick “decryption” of passwords through rainbow tables. If each user has a unique salt, then generating a rainbow table will only yield one user’s password, which reduces the motivation for hackers.

A better approach is to not use fast digest algorithms like MD5, but to use slower ones. For example, Spring Security has deprecated the use of MessageDigestPasswordEncoder and recommends using BCryptPasswordEncoder, which is based on BCrypt for password hashing. BCrypt is designed for password storage and is much slower than MD5.

Let’s write some code to test the performance of MD5 and BCrypt with different cost factors, and see how long it takes to hash a password.

private static BCryptPasswordEncoder passwordEncoder = new BCryptPasswordEncoder();

@GetMapping("performance")
public void performance() {
    StopWatch stopWatch = new StopWatch();
    String password = "Abcd1234";
    
    stopWatch.start("MD5");
    //MD5
    DigestUtils.md5Hex(password);
    stopWatch.stop();
    
    stopWatch.start("BCrypt(10)");
    //BCrypt with cost factor 10
    String hash1 = BCrypt.gensalt(10);
    BCrypt.hashpw(password, hash1);
    System.out.println(hash1);
    stopWatch.stop();
    
    stopWatch.start("BCrypt(12)");
    //BCrypt with cost factor 12
    String hash2 = BCrypt.gensalt(12);
    BCrypt.hashpw(password, hash2);
    System.out.println(hash2);
    stopWatch.stop();
    
    stopWatch.start("BCrypt(14)");
    //BCrypt with cost factor 14
    String hash3 = BCrypt.gensalt(14);
    BCrypt.hashpw(password, hash3);
    System.out.println(hash3);
    stopWatch.stop();
    
    log.info("{}", stopWatch.prettyPrint());
}

We can see that MD5 only takes 0.8 milliseconds, while three BCrypt hashes (with cost factors of 10, 12, and 14 respectively) take 82 milliseconds, 312 milliseconds, and 1.2 seconds:

This means that if it takes 5 months to create a rainbow table for 8-character passwords using MD5, it would take several decades for BCrypt, which most hackers do not have the patience for.

Let’s write some code to observe the pattern of password hashes generated by BCryptPasswordEncoder:

@GetMapping("better")
public UserData better(@RequestParam(value = "name", defaultValue = "zhuye") String name, @RequestParam(value = "password", defaultValue = "Abcd1234") String password) {
    UserData userData = new UserData();
    userData.setId(1L);
    userData.setName(name);
    //Save hashed password
    userData.setPassword(passwordEncoder.encode(password));
    userRepository.save(userData);
    //Check if password matches
    log.info("match ? {}", passwordEncoder.matches(password, userData.getPassword()));
    return userData;
}

We can observe three patterns.

First, when we call the encode and matches methods for hashing and password comparison, we don’t need to pass in the salt. BCrypt incorporates the salt as part of the algorithm and forces us to follow the best practice of securely storing passwords.

Second, the generated salt is concatenated with the hashed password: ( is the field delimiter, and the 2nd component after the first ) represents the algorithm version, the 3rd component after the second ) represents the cost factor (default is 10, representing 2 to the power of 10 hash iterations), and the following 22 characters are the salt, followed by the digest. Therefore, we don’t need to use a separate database field to store the salt.

"password": "$2a$10$wPWdQwfQO2lMxqSIb6iCROXv7lKnQq5XdMO96iCYCj7boK9pk6QPC"
//Format is: $<version>$<cost>$<salt><digest>

Third, the higher the cost factor value, the longer the time it takes to hash with BCrypt. Therefore, the recommended practice for the cost factor value is to set it as high as possible based on the user’s tolerance and hardware.

Finally, it is important to note that while it is difficult for hackers to crack passwords using rainbow tables, it is still possible to use brute force attacks, where common passwords are systematically attempted for the same username. Therefore, in addition to properly hashing and storing passwords, we should also implement a comprehensive security defense mechanism that can detect and defend against brute force attacks, such as enabling SMS verification, CAPTCHA, and temporary account lockout.

How to save names and ID cards? #

We call the name and ID card the two elements.

Nowadays, the Internet is very developed, and many services can be processed online. Many websites rely only on the two elements to confirm who you are. Therefore, the two elements are sensitive data. If they are stored in plaintext in the database, hackers may obtain a large amount of two-element information if the database is breached. If these two elements are used to apply for loans, the consequences can be unimaginable.

As mentioned earlier, the one-way hash algorithm is obviously not suitable for encrypting and storing the two elements because the data cannot be decrypted. At this time, we need to choose a real encryption algorithm. The available algorithms include symmetric encryption and asymmetric encryption algorithms.

Symmetric encryption algorithm uses the same key for encryption and decryption. If symmetric encryption algorithm is used to encrypt the communication between the two parties, they need to agree on a key first so that the encryption party can encrypt and the receiving party can decrypt. If the key is stolen during transmission, the encryption becomes meaningless. Therefore, the characteristic of this encryption method is that the encryption speed is relatively fast, but there is a risk of key leakage during key transmission and distribution.

Asymmetric encryption algorithm, or public key cryptography algorithm, is composed of a pair of key pairs. The encryption is done using the public key or encryption key, and the decryption is done using the private key or decryption key. The public key can be freely disclosed, but the private key cannot be disclosed. If asymmetric encryption is used, the two parties of communication can share only the public key for encryption, and the encrypted data cannot be decrypted without the private key. Therefore, the characteristic of this encryption method is that the encryption speed is relatively slow, but it solves the problem of secure key distribution.

However, for scenarios involving the storage of sensitive information, encryption and decryption are performed by our server program, and the security of key distribution is not a major concern. In other words, using asymmetric encryption algorithms does not make much sense. Here, we use symmetric encryption algorithm to encrypt the data.

Next, I will focus on symmetric encryption algorithm. Commonly used symmetric encryption algorithms include DES, 3DES, and AES.

Although many old projects still use the DES algorithm, I do not recommend using it. In the DES Challenge III in 1999, it took less than a day to crack the DES password, and now cracking the DES password is even faster. Therefore, using DES to encrypt data is very unsafe. Therefore, in business code, avoid using DES encryption.

As for the 3DES algorithm, it uses different keys for three consecutive calls to DES. Although it solves the problem that DES is not secure enough, it is slower than AES and is not highly recommended.

AES is the currently recognized symmetric encryption algorithm that is considered secure and has good performance. Strictly speaking, AES is not an actual algorithm name, but an algorithm standard. In 2000, NIST selected the Rijndael algorithm as the standard for AES.

AES has an important feature called block cipher mode, which can only process 128 bits of plaintext at a time and generate 128 bits of ciphertext. If long plaintext needs to be encrypted, iterative processing is required, and the iteration method is called a mode. Many codes using AES for encryption on the Internet use the simplest ECB mode (also known as Electronic Codebook mode), whose basic structure is as follows:

As can be seen, this structure has two risks: the plaintext and ciphertext are one-to-one correspondence. If the plaintext has duplicate groups, the duplicate can be observed in the ciphertext, revealing the regularity of the ciphertext. Because each group in the ciphertext is independently encrypted and decrypted, if the order of the ciphertext groups is reversed, the plaintext can also be manipulated without decrypting the ciphertext.

Let’s write some code to test it. In the code below, we test using the ECB mode:

Encrypt a 16-character string to obtain ciphertext A; then copy this string to form a 32-character string and encrypt it again to obtain ciphertext B. We verify whether ciphertext B is the same as ciphertext A repeated.

Simulate a bank transfer scenario, assuming the entire data consists of the sender’s account number, receiver’s account number, and amount fields. We attempt to change the order of the data in the ciphertext to manipulate the plaintext:

private static final String KEY = "secretkey1234567"; // the key

// Test ECB mode

@GetMapping("ecb")

public void ecb() throws Exception {

  Cipher cipher = Cipher.getInstance("AES/ECB/NoPadding");

  test(cipher, null);

}

// Method to obtain encryption key

private static SecretKeySpec setKey(String secret) {

    return new SecretKeySpec(secret.getBytes(), "AES");

}

// Test logic

private static void test(Cipher cipher, AlgorithmParameterSpec parameterSpec) throws Exception {

    // Initialize Cipher

    cipher.init(Cipher.ENCRYPT_MODE, setKey(KEY), parameterSpec);

    // Encrypt test text

    System.out.println("Once: " + Hex.encodeHexString(cipher.doFinal("abcdefghijklmnop".getBytes())));

    // Encrypt the test text repeated once

System.out.println("Twice: " + Hex.encodeHexString(cipher.doFinal("abcdefghijklmnopabcdefghijklmnop".getBytes())));

// Test manipulating plaintext by manipulating ciphertext

// Sender's account

byte[] sender = "1000000000012345".getBytes();

// Receiver's account

byte[] receiver = "1000000000034567".getBytes();

// Transfer amount

byte[] money = "0000000010000000".getBytes();

// Encrypt sender's account

System.out.println("Sender's account: " + Hex.encodeHexString(cipher.doFinal(sender)));

// Encrypt receiver's account

System.out.println("Receiver's account: " + Hex.encodeHexString(cipher.doFinal(receiver)));

// Encrypt amount

System.out.println("Amount: " + Hex.encodeHexString(cipher.doFinal(money)));

// Encrypt the complete transfer information

byte[] result = cipher.doFinal(ByteUtils.concatAll(sender, receiver, money));

System.out.println("Complete data: " + Hex.encodeHexString(result));

// Temporary byte array for manipulating ciphertext

byte[] hack = new byte[result.length];

// Swap the first two sections of the ciphertext

System.arraycopy(result, 16, hack, 0, 16);

System.arraycopy(result, 0, hack, 16, 16);

System.arraycopy(result, 32, hack, 32, 16);

cipher.init(Cipher.DECRYPT_MODE, setKey(KEY), parameterSpec);

// Attempt decryption

System.out.println("Original plaintext: " + new String(ByteUtils.concatAll(sender, receiver, money)));

System.out.println("Manipulated ciphertext: " + new String(cipher.doFinal(hack)));

}

The output is as follows:

As you can see:

The ciphertext generated by two identical plaintext blocks is two identical ciphertext blocks stacked together.

Without knowing the key, we manipulated the ciphertext to modify the plaintext data by swapping the sender’s and receiver’s account.

Therefore, although ECB mode is simple, it is not secure and is not recommended. Let’s take a look at another commonly used encryption mode, CBC mode. CBC mode introduces XOR operation before decryption or encryption. The first group uses the externally provided initialization vector (IV), and starts from the second group using the data from the previous group. This ensures that even if the plaintext is the same, the encrypted ciphertext will be different, and the order of the groups cannot be arbitrarily changed. This solves the problem of ECB mode:

We modify the previous code to use CBC mode and test again:

    private static final String initVector = "abcdefghijklmnop"; // initialization vector

@GetMapping("cbc")
public void cbc() throws Exception {
    Cipher cipher = Cipher.getInstance("AES/CBC/NoPadding");
    IvParameterSpec iv = new IvParameterSpec(initVector.getBytes("UTF-8"));
    test(cipher, iv);
}

As you can see, the same plaintext string copied twice does not result in two repeated ciphertext blocks, and the order of the ciphertext blocks cannot be manipulated to obtain the original plaintext:

In addition to ECB mode and CBC mode, AES also has CFB, OFB, and CTR modes. You can refer to this link to learn about their differences. “Practical Cryptography” book recommends CBC and CTR modes. It is also important to note that ECB and CBC modes require appropriate padding methods to handle data larger than one block.

In addition to choosing AES with the appropriate mode for encryption of sensitive data, I also recommend the following practices:

Do not hardcode a fixed key and initialization vector in the code. Like the mentioned salt, it is best to have a unique, independent, and changed value every time.
It is recommended to use a separate encryption service to manage keys and perform encryption operations. Do not store the keys and ciphertext in the same database. The encryption service should have very stringent control standards.
The sensitive information should not be stored in plain text in the database, but can be stored in a desensitized form. When performing regular queries, query the desensitized information directly.

Next, let’s implement the relevant code according to this strategy.

Step 1: For user names and ID cards, we save three pieces of information for each - the desensitized plaintext, the ciphertext, and the encryption ID. The encryption service encrypts the data and returns the ciphertext and encryption ID, which is then used to request the encryption service for decryption:

@Data
@Entity
public class UserData {
    @Id
    private Long id;
    private String idcard; // desensitized ID card
    private Long idcardCipherId; // ID card encryption ID
    private String idcardCipherText; // ID card ciphertext
    private String name; // desensitized name
    private Long nameCipherId; // name encryption ID
    private String nameCipherText; //name ciphertext
}

Step 2: The encryption service data table saves the encryption ID, initialization vector, and key. The encryption service table does not store the ciphertext, separating the ciphertext and key storage:

@Data
@Entity
public class CipherData {
    @Id
    @GeneratedValue(strategy = AUTO)
    private Long id;
    private String iv; // initialization vector
    private String secureKey; // key
}

Step 3: The encryption service uses AES-256-GCM, which is a GCM mode (Galois/Counter Mode) symmetric encryption algorithm. GCM is an AEAD (Authenticated Encryption with Associated Data) algorithm, which provides confidentiality as well as authentication and integrity verification for the ciphertext. It is currently the recommended AES mode.

When using AEAD-like algorithms such as GCM for encryption and decryption, apart from providing the key and initialization vector, an AAD (Additional Authentication Data) can also be provided to authenticate additional information not included in the plaintext. If the decryption does not use the same AAD used during encryption, the decryption will fail. GCM mode internally uses CTR mode, but also incorporates the GMAC signature algorithm to sign the ciphertext and achieve integrity verification.

Next, let’s implement the encryption service based on AES-256-GCM. It includes the following main logic:

During encryption, an external AAD is allowed for authentication, and the encryption service generates a new random value as the key and initialization vector each time.
After encryption, the encryption service saves the key and initialization vector to the database, and returns the encryption ID as the identifier for this encryption.
During decryption, the encryption service uses the encryption ID to retrieve the key and initialization vector from the database, and requires the encryption ID, ciphertext, and AAD used during encryption to perform decryption.

    GCMParameterSpec gcmParameterSpec = new GCMParameterSpec(GCM_TAG_LENGTH * 8, iv);

    // Encryption mode

    cipher.init(Cipher.ENCRYPT_MODE, keySpec, gcmParameterSpec);

    // Set AAD

    if (aad != null)

        cipher.updateAAD(aad);

    // Encrypt

    byte[] cipherText = cipher.doFinal(plaintext);

    return cipherText;

}

// Internal decryption method

public static String doDecrypt(byte[] cipherText, SecretKey key, byte[] iv, byte[] aad) throws Exception {

    // Encryption algorithm

    Cipher cipher = Cipher.getInstance("AES/GCM/NoPadding");

    // Key specification

    SecretKeySpec keySpec = new SecretKeySpec(key.getEncoded(), "AES");

    // GCM parameter specification

    GCMParameterSpec gcmParameterSpec = new GCMParameterSpec(GCM_TAG_LENGTH * 8, iv);

    // Decryption mode

    cipher.init(Cipher.DECRYPT_MODE, keySpec, gcmParameterSpec);

    // Set AAD

    if (aad != null)

        cipher.updateAAD(aad);

    // Decrypt

    byte[] decryptedText = cipher.doFinal(cipherText);

    return new String(decryptedText);

}

// Encryption entry

public CipherResult encrypt(String data, String aad) throws Exception {

    // Encryption result

    CipherResult encryptResult = new CipherResult();

    // Key generator

    KeyGenerator keyGenerator = KeyGenerator.getInstance("AES");

    // Generate key

    keyGenerator.init(AES_KEY_SIZE);

    SecretKey key = keyGenerator.generateKey();

    // IV data

    byte[] iv = new byte[GCM_IV_LENGTH];

    // Generate random IV

    SecureRandom random = new SecureRandom();

    random.nextBytes(iv);

    // Handle AAD

    byte[] aaddata = null;

    if (!StringUtils.isEmpty(aad))

        aaddata = aad.getBytes();

    // Get cipher text

    encryptResult.setCipherText(Base64.getEncoder().encodeToString(doEncrypt(data.getBytes(), key, iv, aaddata)));

    // Encrypted context data

    CipherData cipherData = new CipherData();

    // Save IV

    cipherData.setIv(Base64.getEncoder().encodeToString(iv));

    // Save key

    cipherData.setSecureKey(Base64.getEncoder().encodeToString(key.getEncoded()));

    cipherRepository.save(cipherData);

    // Return local encryption ID

    encryptResult.setId(cipherData.getId());

    return encryptResult;

}

// Decryption entry

public String decrypt(long cipherId, String cipherText, String aad) throws Exception {

    // Use encryption ID to find encryption context data

    CipherData cipherData = cipherRepository.findById(cipherId).orElseThrow(() -> new IllegalArgumentException("invlaid cipherId"));

    // Load key

    byte[] decodedKey = Base64.getDecoder().decode(cipherData.getSecureKey());

    // Initialize key

    SecretKey originalKey = new SecretKeySpec(decodedKey, 0, decodedKey.length, "AES");

    // Load IV

    byte[] decodedIv = Base64.getDecoder().decode(cipherData.getIv());

    // Handle AAD

    byte[] aaddata = null;

    if (!StringUtils.isEmpty(aad))

        aaddata = aad.getBytes();

    // Decrypt

    return doDecrypt(Base64.getDecoder().decode(cipherText.getBytes()), originalKey, decodedIv, aaddata);

}

The fourth step is to implement the encryption and decryption interfaces for testing.

We can let the user choose whether to protect the second factor. If they want to, they can input a query password as AAD. When the system needs to read the user’s sensitive information, the user also needs to provide this password, otherwise the data cannot be decrypted. This way, even if hackers have access to the encrypted data, the encryption service’s key and IV, they won’t be able to decrypt it because they lack the AAD:

@Autowired
private CipherService cipherService;

// Encryption
@GetMapping("right")
public UserData right(@RequestParam(value = "name", defaultValue = "朱晔") String name,
                      @RequestParam(value = "idcard", defaultValue = "300000000000001234") String idCard,
                      @RequestParam(value = "aad", required = false)String aad) throws Exception {
    UserData userData = new UserData();
    userData.setId(1L);
    // Masked name
    userData.setName(chineseName(name));
    // Masked ID card
    userData.setIdcard(idCard(idCard));

    // Encrypt name
    CipherResult cipherResultName = cipherService.encrypt(name,aad);
    userData.setNameCipherId(cipherResultName.getId());
    userData.setNameCipherText(cipherResultName.getCipherText());

    // Encrypt ID card
    CipherResult cipherResultIdCard = cipherService.encrypt(idCard,aad);
    userData.setIdcardCipherId(cipherResultIdCard.getId());
    userData.setIdcardCipherText(cipherResultIdCard.getCipherText());

    return userRepository.save(userData);
}

// Decryption
@GetMapping("read")
public void read(@RequestParam(value = "aad", required = false)String aad) throws Exception {
    // Query user information
    UserData userData = userRepository.findById(1L).get();
    // Decrypt name and ID card using AAD
    log.info("name : {} idcard : {}",
            cipherService.decrypt(userData.getNameCipherId(), userData.getNameCipherText(),aad),
            cipherService.decrypt(userData.getIdcardCipherId(), userData.getIdcardCipherText(),aad));
}

// Mask ID card
private static String idCard(String idCard) {
    String num = StringUtils.right(idCard, 4);
    return StringUtils.leftPad(num, StringUtils.length(idCard), "*");
}

// Mask name
public static String chineseName(String chineseName) {
    String name = StringUtils.left(chineseName, 1);
    return StringUtils.rightPad(name, StringUtils.length(chineseName), "*");
}

Accessing the encryption interface will give the following result, and you can see that the encrypted data and ciphertext are stored in the database:

{"id":1,"name":"朱*","idcard":"**************1234","idcardCipherId":26346,"idcardCipherText":"t/wIh1XTj00wJP1Lt3aGzSvn9GcqQWEwthN58KKU4KZ4Tw==","nameCipherId":26347,"nameCipherText":"+gHrk1mWmveBMVUo+CYon8Zjj9QAtw=="}

Accessing the decryption interface will show that the decryption was successful:

[21:46:00.079] [http-nio-45678-exec-6] [INFO ] [o.g.t.c.s.s.StoreIdCardController:102 ] - name : 朱晔 idcard : 300000000000001234

If the AAD is not correct, an exception will be thrown:

javax.crypto.AEADBadTagException: Tag mismatch!
  at com.sun.crypto.provider.GaloisCounterMode.decryptFinal(GaloisCounterMode.java:578)
  at com.sun.crypto.provider.CipherCore.finalNoPadding(CipherCore.java:1116)
  at com.sun.crypto.provider.CipherCore.fillOutputBuffer(CipherCore.java:1053)
  at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:853)
  at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:446)
  at javax.crypto.Cipher.doFinal(Cipher.java:2164)

Explaining HTTPS with an Image #

We know that the HTTP protocol transmits data in plain text. In scenarios where sensitive information is being transmitted, if there is a hacker acting as a middleman intercepting requests between the client and the server, they can eavesdrop on this data and even modify data sent by the client. This poses a significant security risk.

To address this security risk, the HTTPS protocol was introduced. HTTPS = SSL/TLS + HTTP, using a series of encryption algorithms to ensure secure information transmission and achieve confidentiality, integrity, and authority in data transmission.

Confidentiality: Non-symmetric encryption is used to encrypt the key, which is then used to encrypt the data. This approach is both secure and solves the problem of non-symmetric encryption being slow when used with large amounts of data. You can conduct an experiment to test the differences between the two encryption methods.

Integrity: Hash algorithms are used to create digests of the information, ensuring that the information remains intact and cannot be tampered with by intermediaries.

Authority: Digital certificates are used to ensure that we are communicating with a legitimate server.

Understanding the process of HTTPS will help us understand the differences between various encryption algorithms and the significance of certificates. Furthermore, SSL/TLS is a typical example of a hybrid encryption system. If you need to develop your own application-level data encryption system, you can refer to its process.

Now, let’s take a look at the entire process of establishing an HTTPS TLS 1.2 connection (RSA handshake).

As preparatory work, website administrators need to apply for and install a CA certificate on the server. The CA certificate contains the public key for non-symmetric encryption, the website domain information, and other details. The key is kept confidential by the server and is not publicly accessible.

The process of establishing an HTTPS connection includes TCP handshake and a series of steps in the TLS handshake, which are as follows:

The client informs the server of the supported cipher suites (e.g., TLS_RSA_WITH_AES_256_GCM_SHA384, where RSA is the key exchange method, AES_256_GCM is the encryption algorithm, and SHA384 is the message digest algorithm). The client also provides a random number.
The server responds with the selected cipher suite and provides a server random number.
The server sends the CA certificate to the client, and the client verifies the CA certificate (more details will be explained later).
The client generates a PreMasterKey and encrypts it using non-symmetric encryption along with the server’s public key.
The client sends the encrypted PreMasterKey to the server.
The server decrypts the PreMasterKey using non-symmetric encryption and the server’s private key. It then uses the PreMasterKey and the two random numbers to generate the MasterKey.
The client also generates the MasterKey using the PreMasterKey and the two random numbers.
The client informs the server that encrypted transmission will now be performed.
The client and server both conduct symmetric encryption tests using the MasterKey and the chosen symmetric encryption algorithm.
From this point on, all communication between the client and server is encrypted, and data integrity is ensured through digital signatures. You may wonder how the client verifies the CA certificate.

In fact, the CA certificate is a certificate chain. Take a look at the left side of the image:

The CA certificate obtained from the server is the user certificate. To verify its legitimacy, we need to find the intermediate certificate using the issuer information from the user certificate, and then find the root certificate online.

Root certificates are generated by a few authorized institutions and are usually preset in the operating system, making it impossible to forge them.

After obtaining the root certificate, extract its public key to verify the signature of the intermediate certificate and determine its authority.

Finally, obtain the public key of the intermediate certificate to verify the signature of the user certificate.

This process verifies the legitimacy of the user certificate, and then further verifies its validity period, domain name, and other information for further validation.

In summary, TLS cleverly solves the transmission security problem through its process and algorithm combination: data encryption using symmetric encryption and ensuring that intermediaries cannot decrypt the key using non-symmetric encryption algorithms, as well as authentication using a CA certificate chain to prevent intermediaries from forging their own certificates and public keys.

If a website involves the transmission of sensitive data, it must use the HTTPS protocol. As a user, if you encounter a website that is not HTTPS or see an invalid certificate warning, you should not continue using that website to prevent the leakage of sensitive information.

Key Review #

Today, we learned about how to store and transmit sensitive data. Let me recap the key points for you.

For data storage, you need to remember two things:

User passwords should not be stored in plain text or encrypted using weak methods. Instead, you should use a globally unique, sufficiently long, random salt combined with a one-way hashing algorithm for storage. Using the BCrypt algorithm is a good practice.
Sensitive information, such as names and ID cards, that requires reversible decryption should be stored using symmetric encryption algorithms. My recommendation is to store the de-identified data and ciphertext in the business database, and use independent encryption services for data encryption and decryption. The symmetric encryption keys and initialization vectors can be stored separately from the business database.

For data transmission, it is essential to use SSL/TLS. For HTTP communication between clients and servers, we should use HTTPS, which is based on SSL/TLS. For TCP-based RPC services, SSL/TLS can also be used to ensure secure transmission.

Lastly, I want to remind you that if you are unsure about how to implement encryption and decryption solutions or processes, you should consult the company’s internal security experts or refer to the solutions provided by major cloud vendors. Do not design processes or even create encryption algorithms based on your own assumptions.

I have uploaded the code we used today on GitHub. You can click on this link to view it.

Reflection and Discussion #

Although we store usernames and passwords in a desensitized and encrypted manner in the database, there may still be sensitive data in plaintext in the logs. Do you have any ideas for desensitizing logs at the framework or middleware level?

Do you know the purpose of HTTPS mutual authentication? What are the differences in the process?

Have you encountered any pitfalls regarding various encryption algorithms? How do you protect sensitive data? I’m Zhu Ye, and I welcome you to leave a comment in the comments section to share your thoughts. Feel free to share today’s content with your friends or colleagues and discuss it together.