30 Data Desensitization How to Implement Low-Intrusiveness Data Desensitization Solutions Based on Rewriting Engines #
Today, we will discuss the data desensitization module in ShardingSphere. As introduced in the lesson “10 | Data Desensitization: How to Ensure Secure Access to Sensitive Data?”, ShardingSphere provides a set of automatic data encryption and decryption mechanisms to achieve transparent data desensitization.
Overall Architecture of the Data Desensitization Module #
Just like in regular programming patterns, for data desensitization, we first obtain a DataSource as the entry point for the entire process. However, this is not an ordinary DataSource, but a specialized EncryptDataSource specifically designed for data desensitization. Similar to previous explanations of ShardingDataSource, ShardingConnection, ShardingStatement, and other topics, the approach for the data desensitization module follows a top-down logic.
Let’s review with the help of the following diagram:
In the diagram, classes related to the data desensitization module actually inherit from an abstract class. We have already covered this during the explanations on ShardingSphere, ShardingConnection, ShardingStatement, etc. Therefore, in the data desensitization module, we will focus on explaining a few key classes, while briefly reviewing the topics already covered.
Based on the diagram above, let’s start with the EncryptDataSource. The creation of EncryptDataSource relies on the EncryptDataSourceFactory, which is implemented as follows:
public final class EncryptDataSourceFactory {
public static DataSource createDataSource(final DataSource dataSource, final EncryptRuleConfiguration encryptRuleConfiguration, final Properties props) throws SQLException {
return new EncryptDataSource(dataSource, new EncryptRule(encryptRuleConfiguration), props);
}
}
Here, an EncryptDataSource is directly created, depending on the EncryptRule configuration object. Now, let’s clarify what is included in the EncryptRule.
EncryptRule #
EncryptRule is a core object in the data desensitization module, which deserves a separate explanation. In the EncryptRule, the following three core variables are defined:
// Encryption and decryption engines
private final Map<String, ShardingEncryptor> encryptors = new LinkedHashMap<>();
// Desensitized tables
private final Map<String, EncryptTable> tables = new LinkedHashMap<>();
// Desensitization rule configuration
private EncryptRuleConfiguration ruleConfiguration;
We can divide these three variables into two parts. ShardingEncryptor is used for encryption and decryption, while EncryptTable and EncryptRuleConfiguration are more related to the configuration system of data desensitization.
Next, I will explain these two parts separately.
1. ShardingEncryptor #
In the EncryptRule, ShardingEncryptor is an interface representing a specific encryptor class. Here is the definition of the interface:
public interface ShardingEncryptor extends TypeBasedSPI {
// Initialization
void init();
// Encryption
String encrypt(Object plaintext);
// Decryption
Object decrypt(String ciphertext);
}
The ShardingEncryptor interface contains a pair of methods for encryption and decryption. The interface also inherits from the TypeBasedSPI interface, which means it will be dynamically loaded through the Service Provider Interface (SPI) mechanism.
ShardingEncryptorServiceLoader handles this process, and in the sharding-core-common project, we can also find the SPI configuration file, as follows:
SPI Configuration File for ShardingEncryptor
Here, there are two implementation classes: MD5ShardingEncryptor and AESShardingEncryptor. As for the MD5 algorithm, we know that it is one-way hash, which means it cannot be reversed back to plaintext from the ciphertext. The implementation class for MD5ShardingEncryptor is as follows:
public final class MD5ShardingEncryptor implements ShardingEncryptor {
private Properties properties = new Properties();
@Override
public String getType() {
return "MD5";
}
@Override
public void init() {
}
@Override
public String encrypt(final Object plaintext) {
return DigestUtils.md5Hex(String.valueOf(plaintext));
}
@Override
public Object decrypt(final String ciphertext) {
return ciphertext;
}
}
On the other hand, AES is a symmetric encryption algorithm, which means it can be reversed back to plaintext from the ciphertext. The corresponding AESShardingEncryptor is as follows:
public final class AESShardingEncryptor implements ShardingEncryptor {
private static final String AES_KEY = "aes.key.value";
private Properties properties = new Properties();
@Override
public String getType() {
return "AES";
}
@Override
public void init() {
}
@Override
@SneakyThrows
public String encrypt(final Object plaintext) {
byte[] result = getCipher(Cipher.ENCRYPT_MODE).doFinal(StringUtils.getBytesUtf8(String.valueOf(plaintext)));
// Encryption using Base64
return Base64.encodeBase64String(result);
}
@Override
@SneakyThrows
public Object decrypt(final String ciphertext) {
if (null == ciphertext) {
return null;
}
// Decrypt using Base64
byte[] result = getCipher(Cipher.DECRYPT_MODE).doFinal(Base64.decodeBase64(String.valueOf(ciphertext)));
return new String(result, StandardCharsets.UTF_8);
}
private Cipher getCipher(final int decryptMode) throws NoSuchPaddingException, NoSuchAlgorithmException, InvalidKeyException {
Preconditions.checkArgument(properties.containsKey(AES_KEY), "No available secret key for `%s`.", AESShardingEncryptor.class.getName());
Cipher result = Cipher.getInstance(getType());
DefaultSQLRewriteEngine rewriter = new DefaultSQLRewriteEngine();
return rewriter.rewrite(sqlRewriteContext).getSql();
Object cipherValue = getEncryptRule().getEncryptValues(tableName, columnName, Collections.singletonList(originalValue)).iterator().next();
这里需要注意的是,encryptValues 方法可能返回多个值,因为在一些情况下加密后的值可能存在多个选项,比如加密算法中可能会使用随机数作为附加信息。但在这里的场景中,我们只取第一个值。
接下来,我们通过 parameterBuilder 的 addReplacedParameters 方法将参数替换为密文值,并将辅助查询字段和明文字段添加到 parameterBuilder 中。
通过以上的分析,我们了解到了一个具体的 ParameterRewriter 的实现机制。这种实现机制是典型的装饰器模式的典型应用,即使用装饰器模式在原有的方法基础上增强功能。
值得一提的是,EncryptParameterRewriterBuilder 中使用了 Builder 模式来构建 EncryptParameterRewriter 的集合,并经过 SQLRewriteContext 的调用顺序一步一步实现功能的增强。也正因为有这种机制,我们只需在原有的类完成基础功能后,再通过增加 Decorator 的方式来实现具体参数的加解密过程。
public String getRewriteSQL(
final String dataSource,
final ShardingRule shardingRule, final ConfigurationProperties props, final List<Object> parameters, final boolean showSQL) throws SQLException {
SQLRewriteContext sqlRewriteContext = new SQLRewriteEngine(shardingRule, dataSource, getShardingTableMetaData(dataSource), props, parameters, parameters).rewrite();
return sqlRewriteContext.generateSQL(showSQL);
}
在 getRewriteSQL
方法中,创建了一个 SQLRewriteEngine
对象,并调用其 rewrite()
方法来进行 SQL 改写。
在 SQLRewriteEngine
类中的 rewrite
方法中,在执行 SQLBuilder.buildToSQLNode
方法时,会根据语句类型创建不同的 SQLBuilder
对象。SQLBuilder
是抽象类,不同的语句类型有不同的实现类,例如 SelectSQLBuilder
、InsertSQLBuilder
、UpdateSQLBuilder
等。
我们以 SelectSQLBuilder
为例,看一下其 buildToSQLNode
方法的实现:
@Override
public SQLBuilderResult buildToSQLNode(final List<Object> parameters, final boolean isEncryptParameter) {
queryResult = new SQLBuilderResult();
queryResult.removeOrderBy();
if (sqlStatement.getOrderBy().isGenerated() && sqlStatement.getOrderByItems().isEmpty()) {
appendGeneratedOrderBy(queryResult, sqlStatementGroup.getSqlContexts(), parameters, isEncryptParameter);
}
queryResult.appendLiterals(sqlStatement.getSql());
if (!sqlStatement.getOrderByItems().isEmpty()) {
appendOrderBy(queryResult, parameters, isEncryptParameter);
}
appendQueryRouters(parameters, isEncryptParameter);
appendUnion();
return queryResult;
}
在 buildToSQLNode
方法中,首先通过调用 appendGeneratedOrderBy
方法生成排序语句,然后调用 appendLiterals
方法将原始 SQL 添加到结果集中,然后再添加排序语句,接着添加查询路由信息和 UNION
语句。
在 SQLRewriteContext
类中,有一个 ToStringSQLVisitor
类,它可以将 SQLNode
类型转换为 SQL 字符串。在 generateSQL
方法中,会调用 ToStringSQLVisitor
类的 visit
方法将 SQLNode
类型转换为 SQL 字符串。
而在 ToStringSQLVisitor
类中,根据语句类型,会调用相应的 visit
方法,例如 visitSelect
、visitInsert
、visitUpdate
等。这些 visit
方法负责遍历 SQLNode,并将 SQLNode 转换为 SQL 字符串。
接下来我们来看一下 ToStringSQLVisitor
的 visit
方法:
@Override
public String visit(final SQLNode sqlNode) {
String className = sqlNode.getClass().getName();
int visitorIndex = className.lastIndexOf(".") + 1;
String visitorName = className.substring(visitorIndex);
String visitMethodName = String.format("visit%s", visitorName);
try {
Method method = getClass().getMethod(visitMethodName, sqlNode.getClass());
return (String) method.invoke(this, sqlNode);
} catch (final ReflectiveOperationException ex) {
throw new UnsupportedOperationException(String.format("Cannot support visitor for class [%s]", sqlNode.getClass()), ex);
}
}
ToStringSQLVisitor
的 visit
方法是一个反射的实现,在这个方法中,首先获取 SQLNode 对应的类名,然后通过类名拼接出相应的 visit
方法名,最后使用反射来调用对应的 visit
方法,并将 SQLNode 作为参数传入。
这样,通过遍历 SQLNode,调用不同的 visit
方法,最终将 SQLNode 转换为 SQL 字符串。
在 ToStringSQLVisitor
类中,不同的 SQLNode 类型有不同的 visit
方法,例如:
private String visitSelect(final SelectStatement selectStatement) {
append(selectStatement.getWithClause());
if (null != selectStatement.getTable()) {
appendVisit(selectStatement.getTable());
}
append(selectStatement.getProjections());
append(selectStatement.getFrom());
append(selectStatement.getWhere());
append(selectStatement.getGroupBy());
append(selectStatement.getWindow());
append(selectStatement.getOrderBy());
append(selectStatement.getLimit());
return getSQLBuilder().toString();
}
private String visitInsert(final InsertStatement insertStatement) {
append(insertStatement.getWithClause());
append(insertStatement.getTable());
append(insertStatement.getColumns());
append(insertStatement.getValues());
append(insertStatement.getSetAssignment());
append(insertStatement.getQuery());
append(insertStatement.getSetAssignment());
return getSQLBuilder().toString();
}
private String visitUpdate(final UpdateStatement updateStatement) {
append(updateStatement.getWithClause());
append(updateStatement.getTable());
for (AssignmentSegment each : updateStatement.getSetAssignment().getAssignments()) {
appendVisit(each);
}
append(updateStatement.getWhere());
return getSQLBuilder().toString();
}
以上是 ToStringSQLVisitor
类中的部分 visit
方法。不同的 SQLNode 类型有不同的处理逻辑,例如对于 SelectStatement
类型的节点,会先调用 append
方法将 WithClause
添加到结果集中,然后分别添加各部分的 SQL 字符串,最后通过调用 getSQLBuilder().toString()
方法将结果生成 SQL 字符串返回。
通过这样的流程,最终将 SQLNode 类型转换为 SQL 字符串,并返回给调用者。
Let’s go back to the executeQuery
method of EncryptStatement
. We have the following statement:
ResultSet resultSet = statement.executeQuery(getRewriteSQL(sql));
After executing the executeQuery
method, we obtain a ResultSet
. However, we don’t directly return this resultSet
object. Instead, we need to encapsulate it and create an EncryptResultSet
object as shown below:
this.resultSet = new EncryptResultSet(connection.getRuntimeContext(), sqlStatementContext, this, resultSet);
EncryptResultSet
inherits from the AbstractUnsupportedOperationResultSet
class, which in turn inherits from the AbstractUnsupportedUpdateOperationResultSet
class. This AbstractUnsupportedUpdateOperationResultSet
class, in turn, inherits from the WrapperAdapter
class and implements the ResultSet
interface. So EncryptResultSet
is also an adapter, just like EncryptDataSource
and EncryptConnection
in essence.
Regarding EncryptResultSet
, there are a lot of get
methods, which don’t require specific introduction. The key point lies in the following method in the constructor:
mergedResult = createMergedResult(queryWithCipherColumn, resultSet);
As we know, in ShardingSphere, the execution engine is followed by the merge engine. In EncryptResultSet
, we use the merge engine to generate a MergedResult
.
For EncryptResultSet
, it first checks whether the passed SQLStatement
is a DALStatement
. If it is, it calls DALEncryptMergeEngine
to complete result merging; otherwise, it uses DQLEncryptMergeEngine
. Let’s focus on DQLEncryptMergeEngine
.
public final class DQLEncryptMergeEngine implements MergeEngine {
private final EncryptorMetaData metaData;
private final MergedResult mergedResult;
private final boolean queryWithCipherColumn;
@Override
public MergedResult merge() {
return new EncryptMergedResult(metaData, mergedResult, queryWithCipherColumn);
}
}
DQLEncryptMergeEngine
is very simple. Its merge
method only constructs an EncryptMergedResult
object and returns it. The core method getValue
in EncryptMergedResult
is shown below:
@Override
public Object getValue(final int columnIndex, final Class<?> type) throws SQLException {
Object value = mergedResult.getValue(columnIndex, type);
if (null == value || !queryWithCipherColumn) {
return value;
}
Optional<ShardingEncryptor> encryptor = metaData.findEncryptor(columnIndex);
return encryptor.isPresent() ? encryptor.get().decrypt(value.toString()) : value;
}
From the above process, we can see that the merging implementation in the data desensitization module is actually calling the decrypt
method of ShardingEncryptor
to decrypt the ciphertext of the encrypted column into plaintext.
With that, we have finished the introduction to the overall flow of the executeQuery
method in EncryptStatement
. After understanding the implementation process of this method, it becomes easier to comprehend other methods in EncryptStatement
and EncryptPreparedStatement
.
From Source Code Analysis to Daily Development #
For the topic discussed today, the content that can be directly applied to the daily development process is the abstraction process of ShardingEncryptor
and the internal implementation mechanism of encryption and decryption. ShardingSphere uses the DigestUtils
utility class to complete the application of the MD5 algorithm, as well as the Base64
utility class to implement the AES algorithm.
These two utility classes can be completely adopted in our own system, thus adding mature encryption and decryption algorithm implementation solutions.
Summary and Preview #
Today, we discussed the underlying principles of implementing data desensitization mechanism in ShardingSphere. We found that the data desensitization module relies on the rewriting engine and the merge engine in the sharding engine, with the rewriting engine playing a crucial role in the process of data desensitization. It completes the automatic encryption and decryption of plaintext and ciphertext data through column supplementation, as well as the transparent SQL conversion process.
Here’s a question for you to think about: What is the collaboration between the data desensitization module and the rewriting engine and merge engine in ShardingSphere? Feel free to discuss it in the comment section, and I will provide feedback on each answer.
After introducing the data desensitization mechanism today, tomorrow, we will discuss another useful function, which is orchestration and governance. We will explore the underlying principles of parsing configuration information and dynamically managing it based on the configuration center.