18 Spring Data Common Errors

18 Spring Data Common Errors #

Hello, I’m Fu Jian.

In the previous chapter, we learned about common errors in Spring Web development. From this lesson onwards, we will focus on other common errors in using various Spring tools.

In addition to Spring Web, Spring provides many other useful toolkits, and one of them is Spring Data. As we all know, almost all projects involve databases, so Spring provides thoughtful support for mainstream databases in the market. Let’s quickly browse through the following list:

  • Spring Data Commons
  • Spring Data JPA
  • Spring Data KeyValue
  • Spring Data LDAP
  • Spring Data MongoDB
  • Spring Data Redis
  • Spring Data REST
  • Spring Data for Apache Cassandra
  • Spring Data for Apache Geode
  • Spring Data for Apache Solr
  • Spring Data for Pivotal GemFire
  • Spring Data Couchbase (community module)
  • Spring Data Elasticsearch (community module)
  • Spring Data Neo4j (community module)

When you use these various databases, you may inevitably encounter problems. In the following, I will select three typical cases to summarize some common issues.

Case 1: Consistency between Read and Write #

When using Spring Data Redis, sometimes we may encounter situations where the stored data cannot be retrieved during the project upgrade process. Additionally, parsing errors may also occur. Here let’s write a testcase to simulate this issue:

@SpringBootApplication
public class SpringdataApplication {

    SpringdataApplication(RedisTemplate redisTemplate,
            StringRedisTemplate stringRedisTemplate){
        String key = "mykey";
        stringRedisTemplate.opsForValue().set(key, "myvalue");

        Object valueGotFromStringRedisTemplate = stringRedisTemplate.opsForValue().get(key);
        System.out.println(valueGotFromStringRedisTemplate);

        Object valueGotFromRedisTemplate = redisTemplate.opsForValue().get(key);
        System.out.println(valueGotFromRedisTemplate);
    }

    public static void main(String[] args) {
        SpringApplication.run(SpringdataApplication.class, args);
    }

}

In the above code, we used two templates provided by Redis: RedisTemplate and StringRedisTemplate. However, when we use the latter to store a data, you will find that the former cannot retrieve the corresponding data. The output is as follows:

myvalue- null

At this point, you may think that this problem is very simple, right? It must be because these two templates are different.

Yes, this is an extremely simplified case, and our learning purpose is to generalize from one. Imagine if different developers develop different projects. One project is responsible for storage, and another project is responsible for retrieval. Lack of communication and coordination between these two projects is not uncommon in actual work. Next, let’s understand the underlying reasons behind this problem.

Case Analysis #

To understand this problem, we need to have a basic understanding of the operation process in Spring Data Redis.

Firstly, we need to acknowledge a fact: we cannot directly store and retrieve data in Redis since some data are object types, such as String or even some custom objects. We need to serialize or deserialize the data before storing or retrieving it.

Respecting to our case, when we store or retrieve data with a key, it will execute AbstractOperations#rawKey, which conducts serialization on the key before storing or retrieving the key-value pair from Redis:

byte[] rawKey(Object key) {

   Assert.notNull(key, "non null key required");

   if (keySerializer() == null && key instanceof byte[]) {
      return (byte[]) key;
   }

   return keySerializer().serialize(key);
}

From the above code, assuming that keySerializer exists, it uses it to serialize the key. For StringRedisSerializer, it specifies StringRedisSerializer itself as follows:

public class StringRedisSerializer implements RedisSerializer<String> {

   private final Charset charset;

   @Override
   public byte[] serialize(@Nullable String string) {
      return (string == null ? null : string.getBytes(charset));
   }

}

However, if we use RedisTemplate, it uses JDK serialization. For specific serialization operations, please refer to the following implementation:

public class JdkSerializationRedisSerializer implements RedisSerializer<Object> {

   @Override
   public byte[] serialize(@Nullable Object object) {
      if (object == null) {
         return SerializationUtils.EMPTY_ARRAY;
      }
      try {
         return serializer.convert(object);
      } catch (Exception ex) {
         throw new SerializationException("Cannot serialize", ex);
      }
   }

}

Obviously, the processing of the key above adopts JDK serialization. It ultimately calls the following method:

public interface Serializer<T> {
    void serialize(T var1, OutputStream var2) throws IOException;

    default byte[] serializeToByteArray(T object) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream(1024);
        this.serialize(object, out);
        return out.toByteArray();
    }
}

You can directly use the above two serializers mentioned to serialize the string “mykey”, and you will find that their results are indeed different. This explains why they cannot retrieve the “myvalue” set by “mykey”.

As for how they specify RedisSerializer, let’s take StringRedisSerializer as an example. In the following code, it is the constructor of StringRedisSerializer. In the constructor, it directly specifies KeySerializer as RedisSerializer.string():

public class StringRedisTemplate extends RedisTemplate<String, String> {

   public StringRedisTemplate() {
      setKeySerializer(RedisSerializer.string());
      setValueSerializer(RedisSerializer.string());
      setHashKeySerializer(RedisSerializer.string());
      setHashValueSerializer(RedisSerializer.string());
   }
}

Among them, RedisSerializer.string() finally returns the instance as follows:

public static final StringRedisSerializer UTF_8 = new StringRedisSerializer(StandardCharsets.UTF_8);

Case Fix #

To solve this problem, it is very simple. You just need to check all your data operations to see if the same RedisTemplate is used. Even if it is the same, you need to check whether the specified RedisSerializers are completely consistent. Otherwise, various types of errors will occur.

Case 2: Error in Default Values #

When using Spring Data, just like other Spring modules, there are many default values in Spring Data for most scenarios or for user convenience. However, not all default values are necessarily the most appropriate.

For example, in a project that depends on Cassandra, sometimes we cannot immediately read the data after writing it. What could be the reason for this? This error does not produce any error messages and everything appears to be normal, except for the inability to read the data.

Case Analysis #

When we don’t configure anything and directly use Spring Data Cassandra for operations, we actually rely on the internal configuration file of the Cassandra driver. The specific directory is as follows:

.m2\repository\com\datastax\oss\java-driver-core\4.6.1\java-driver-core-4.6.1.jar!\reference.conf

We can see that there are many default configurations in it, and one important configuration is Consistency, which is by default set to LOCAL_ONE in the driver. Specifically:

basic.request {

  # The consistency level.
  #
  # Required: yes
  # Modifiable at runtime: yes, the new value will be used for requests issued after the change.
  # Overridable in a profile: yes
  consistency = LOCAL_ONE

// omitted other non-critical configurations
}

So when we perform read and write operations, we will always use LOCAL_ONE. Please refer to the following debug screenshot:

Screenshot

If you have a basic understanding of Cassandra, you will know that Cassandra follows a core principle, which is to ensure R (read) + W (write) > N, meaning that the sum of the number of read and write nodes must be greater than the replication factor.

For example, if we have 3 copies of our data backup, and the data to be written is stored on nodes A, B, and C respectively. The common practice is to set the consistency level for both R (read) and W (write) as LOCAL_QUORUM, which guarantees timely reading of the written data. However, if in this case, we read and write using LOCAL_ONE, the following situation may occur: a user writes data to node A and immediately returns, but user B tries to read from node C. Since the consistency level is LOCAL_ONE, user B can immediately return after reading from node C. At this point, the data may not be successfully fetched.

Screenshot

So the question is, why does the Cassandra driver default to using LOCAL_ONE?

In fact, when you first learn and apply Cassandra, you will probably start by playing with just one machine. In this case, setting it to LOCAL_ONE is actually the most appropriate because there is only one machine and both the read and write operations can only hit one machine. So there is no problem with read and write operations. However, most Cassandra deployments in production environments are multi-data center and multi-node, with a replication factor greater than 1. Therefore, using LOCAL_ONE for both read and write operations will lead to problems.

Case Correction #

Through the analysis of this case, we know that the default values of Spring Data Cassandra may not be suitable for all situations, and may not even be suitable for production environments. So here, let’s modify the default values, taking consistency as an example.

Let’s see how we can modify it:

@Override
protected SessionBuilderConfigurer getSessionBuilderConfigurer() {
    return cqlSessionBuilder -> {
        DefaultProgrammaticDriverConfigLoaderBuilder defaultProgrammaticDriverConfigLoaderBuilder = new DefaultProgrammaticDriverConfigLoaderBuilder();
        driverConfigLoaderBuilderCustomizer().customize(defaultProgrammaticDriverConfigLoaderBuilder);
        cqlSessionBuilder.withConfigLoader(defaultProgrammaticDriverConfigLoaderBuilder.build());
        return cqlSessionBuilder;
    };
}

@Bean
public DriverConfigLoaderBuilderCustomizer driverConfigLoaderBuilderCustomizer() {
    return loaderBuilder -> loaderBuilder
            .withString(REQUEST_CONSISTENCY, ConsistencyLevel.LOCAL_QUORUM.name());
}

Here, we are changing the consistency level from LOCAL_ONE to LOCAL_QUORUM, which is more aligned with our actual product deployment and application scenarios.

Case 3: Redundant Session #

Sometimes, when using Spring Data for connections, we may be concerned about memory usage. For example, when using Spring Data Cassandra to operate on Cassandra, we may encounter a problem like this:

After connecting to Cassandra, Spring Data Cassandra retrieves the metadata information of Cassandra. This memory usage is relatively large because it stores information such as Token Range of the data. As shown in the above image, in our application, occupying more than 40MB is already a significant amount. However, why are there 4 instances that occupy more than 40MB? Shouldn’t there be only one connection?

Case Analysis #

To locate this problem, it may not be particularly difficult. We just need to set a breakpoint at the location where the metadata is retrieved and find out the source that triggers the retrieval. However, since this is an indirect operation between Spring Data and Cassandra, and Cassandra driver itself can be complex, plus the complexity of Spring Data, it is not easy to quickly pinpoint the root cause of the problem.

Here we can start by writing an example to demonstrate the cause of the problem, and then see where our problem actually occurs!

Now let’s define a class called MyService, which will output its name information when it is constructed:

public class MyService {

    public MyService(String name){
        System.err.println(name);
    }
}

Then we define two Configuration classes, with a parent-child relationship, with the parent Configuration named as follows:

@Configuration
public class BaseConfig {

    @Bean
    public MyService service(){
        return new MyService("myservice defined from base config");
    }
}

The child Configuration is defined as follows:

@Configuration
public class Config extends BaseConfig {

    @Bean
    public MyService service(){
        return new MyService("myservice defined from config");
    }
}

The implementation of the service() method in the child class overrides the corresponding method in the parent class. Finally, we write a startup program:

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

In order for the program to start, we shouldn’t include both BaseConfig and Config in the scanning scope of the Application. We can organize the code in the following structure:

In the end, we will find that when the program starts, we only have one instance of MyService bean generated, and the output log is as follows:

myservice defined from config

As you can see, if the method in the child class that identifies the bean happens to override the corresponding method in the parent class, then only the child class method can generate a bean.

But what if we accidentally forget the existence of the parent class method when implementing the child class, like this:

@Configuration
public class Config extends BaseConfig {

    @Bean
    public MyService service2(){
        return new MyService("myservice defined from config");
    }
}

After making the accidental modification mentioned above, if we run the program again, you will find that 2 instances of MyService bean are generated:

myservice defined from config- myservice defined from base config

Talking about here, you might think of one reason for memory doubling. If we look at the code of the example program, we might find such a problem:

@Configuration
@EnableCassandraRepositories
public class CassandraConfig extends AbstractCassandraConfiguration

    @Bean
    @Primary
    public CqlSessionFactoryBean session() {
        log.info("init session");
        CqlSessionFactoryBean cqlSessionFactoryBean = new CqlSessionFactoryBean();
        // omit other non-critical code
        return cqlSessionFactoryBean ;
    }
    // omit other non-critical code
}

CassandraConfig inherits AbstractCassandraConfiguration, which already defines a CqlSessionFactoryBean, with the code as follows:

@Configuration
public abstract class AbstractSessionConfiguration implements BeanFactoryAware

    @Bean
    public CqlSessionFactoryBean cassandraSession() {
       CqlSessionFactoryBean bean = new CqlSessionFactoryBean();
       bean.setContactPoints(getContactPoints());
       // omit other non-critical code
        return bean;
    }
    // omit other non-critical code
}

Comparing these two definitions of CqlSessionFactoryBean, you will find that their method names are different:

cassandraSession() - session()

So combining with the simple example mentioned earlier, I believe you already understand where the problem lies!

Case Fix #

We can solve this problem in a few seconds. We can modify the original example code as follows:

@Configuration
@EnableCassandraRepositories
public class CassandraConfig extends AbstractCassandraConfiguration

    @Bean
    @Primary
    public CqlSessionFactoryBean cassandraSession() {
        // omit other non-critical code
    }
    // omit other non-critical code
}

Here we change the original method name session to cassandraSession. But you may have a question that this actually doubles the result, right? But it shouldn’t result in four times memory usage.

In fact, this is because using Spring Data Cassandra will create two sessions, both of which will retrieve metadata. For details, you can refer to the code CqlSessionFactoryBean’s afterPropertiesSet method:

@Override
public void afterPropertiesSet() {

   CqlSessionBuilder sessionBuilder = buildBuilder();
   // creation of system session
   this.systemSession = buildSystemSession(sessionBuilder);

   initializeCluster(this.systemSession);
   // creation of normal session
   this.session = buildSession(sessionBuilder);

   executeCql(getStartupScripts().stream(), this.session);
   performSchemaAction();

   this.systemSession.refreshSchema();
   this.session.refreshSchema();
}

The systemSession and session mentioned in the above code are the two sessions mentioned earlier.

Key Takeaways #

After studying these three cases, we can see that some errors have direct and severe consequences, making it easy to locate and solve the problems quickly. However, some issues can be very subtle, as demonstrated in case 2, because they cannot be reproduced 100% of the time.

Based on these cases, we can summarize some key points to keep in mind when using Spring Data:

  1. Pay attention to consistency, such as ensuring that the serialization methods for reading and writing are consistent.
  2. Double-check all default configurations to ensure they meet the current requirements. For example, in Spring Data Cassandra, the default consistency level is often not suitable for most situations.
  3. If you customize your own session, make sure to avoid generating redundant sessions.

Remember these three points, and you will be able to avoid many problems in using Spring Data.

Reflection Question #

In Case 1, we mentioned the use of Spring Data Redis and mentioned StringRedisTemplate and RedisTemplate. How are they created?

Looking forward to your thoughts in the comments section.