32 Practical High Performance Full Text Search Engine Redi Search

32 Practical High-Performance Full-text Search Engine RediSearch #

RediSearch is a high-performance full-text search engine that can run as a Redis module on Redis server.

The main features of RediSearch are as follows:

  • Full-text indexing of multiple fields based on documents
  • High-performance incremental indexing
  • Document sorting (manually provided by the user during indexing)
  • Complex Boolean queries with AND or NOT operators between subqueries
  • Optional query clauses
  • Prefix-based search
  • Support for field weight setting
  • Auto-complete suggestions (with fuzzy prefix suggestions)
  • Accurate phrase search
  • Query expansion based on stemming analysis in many languages
  • Support for custom functions for query expansion and scoring
  • Restricting search to specific document fields
  • Numeric filters and ranges
  • Geographic filters using Redis’s own geospatial commands
  • Unicode support (requires UTF-8 character set)
  • Retrieving complete document content or only the ID
  • Support for document deletion, updating, and index garbage collection
  • Support for partial updates and conditional document updates

Installation #

Similar to the introduction of Bloom Filter earlier, we can use the Docker installation method recommended by RediSearch to install and start RediSearch. The operation command is as follows:

docker run -p 6379:6379 redislabs/redisearch:latest

After successful installation and startup, as shown in the following figure:

RediSearch Installation Success.png

After installation is complete, use redis-cli to check if the RediSearch module is loaded successfully. Use Docker to start redis-cli with the following command:

docker exec -it myredis redis-cli

Where “myredis” is the name of the Redis server. The execution result is as follows:

127.0.0.1:6379> module list
1) 1) "name"
   2) "ft"
   3) "ver"
   4) (integer) 10610

The existence of the array “ft” indicates that the RediSearch module has been loaded successfully.

Installation with Source Code #

If you do not want to use Docker, you can also install RediSearch using the source code. The installation command is as follows:

git clone https://github.com/RedisLabsModules/RediSearch.git
cd RediSearch # Enter the module directory
make all

After installation is complete, you can use the following command to start Redis and load the RediSearch module:

src/redis-server redis.conf --loadmodule ../RediSearch/src/redisearch.so

Usage #

We will first use redis-cli to perform relevant operations on RediSearch.

Create Index and Fields #

127.0.0.1:6379> ft.create myidx schema title text weight 5.0 desc text
OK

Where “myidx” is the ID of the index, and this index contains two fields, “title” and “desc”. “weight” is the weight, with a default value of 1.0.

Add Content to the Index #

127.0.0.1:6379> ft.add myidx doc1 1.0 fields title "He urged her to study English" desc "good idea"
OK

Where “doc1” is the document ID (docid), and “1.0” is the score.

Query by Keyword #

127.0.0.1:6379> ft.search myidx "english" limit 0 10
1) (integer) 1
2) "doc1"
3) 1) "title"
   2) "He urged her to study English"
   3) "desc"
   4) "good idea"

We can see that by using the keyword “english” in the title field, we have queried a piece of data that meets the query criteria.

First, we need to add a piece of Chinese data to the index. Execute the following command:

127.0.0.1:6379> ft.add myidx doc2 1.0 language "chinese" fields title "Java 14 发布了!新功能速览" desc "Java 14 在 2020.3.17 日发布正式版了,但现在很多公司还在使用 Java 7 或 Java 8"
OK

Note: Here, you must set the language encoding to Chinese, that is, “language “chinese”. The default is English encoding. If not set, it will not support Chinese query (no results can be found).

We can use the previous query method with the following command:

127.0.0.1:6379> ft.search myidx "正式版"
1) (integer) 0

We find that no information is queried. This is because we did not specify the language for the search. In addition to specifying the encoding during storage, we also need to specify it during query. The query command is as follows:

127.0.0.1:6379> ft.search myidx "发布了" language "chinese"
1) (integer) 1
2) "doc2"
3) 1) "desc"
   2) "Java 14 \xe5\x9c\xa8 2020.3.17 \xe6\x97\xa5\xe5\x8f\x91\xe5\xb8\x83\xe6\xad\xa3\xe5\xbc\x8f\xe7\x89\x88\xe4\xba\x86\xef\xbc\x8c\xe4\xbd\x86\xe7\x8e\xb0\xe5\x9c\xa8\xe5\xbe\x88\xe5\xa4\x9a\xe5\x85\xac\xe5\x8f\xb8\xe8\xbf\x98\xe5\x9c\xa8\xe4\xbd\xbf\xe7\x94\xa8 Java 7 \xe6\x88\x96 Java 8"
   3) "title"
   4) "Java 14 \xe5\x8f\x91\xe5\xb8\x83\xe4\xba\x86\xef\xbc\x81\xe6\x96\xb0\xe5\x8a\x9f\xe8\x83\xbd\xe9\x80\x9f\xe8\xa7\x88"

From the results, we can see that the Chinese information has been successfully queried.

Delete Index Data #

127.0.0.1:6379> ft.del myidx doc1
(integer) 1

We can delete data by using the index plus the document ID.

Delete Index #

We can use the “ft.drop” keyword to delete the entire index. Execute the following command: 127.0.0.1:6379> ft.drop myidx OK

Querying Index Details #

We can use the keyword “ft.info” to query index-related information. Execute the following command:

127.0.0.1:6379> ft.info myidx
 1) index_name
 2) myidx
 3) index_options
 4) (empty list or set)
 5) fields
 6) 1) 1) title
       2) type
       3) TEXT
       4) WEIGHT
       5) "5"
    2) 1) desc
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
 7) num_docs
 8) "2"
 9) max_doc_id
10) "2"
11) num_terms
12) "9"
13) num_records
14) "18"
15) inverted_sz_mb
16) "0.000102996826171875"
17) total_inverted_index_blocks
18) "29"
19) offset_vectors_sz_mb
20) "1.71661376953125e-05"
21) doc_table_size_mb
22) "0.000164031982421875"
23) sortable_values_size_mb
24) "0"
25) key_table_size_mb
26) "8.0108642578125e-05"
27) records_per_doc_avg
28) "9"
29) bytes_per_record_avg
30) "6"
31) offsets_per_term_avg
32) "1"
33) offset_bits_per_record_avg
34) "8"
35) gc_stats
36)  1) bytes_collected
     2) "0"
     3) total_ms_run
     4) "16"
     5) total_cycles
     6) "14"
     7) avarage_cycle_time_ms
     8) "1.1428571428571428"
     9) last_run_time_ms
    10) "2"
    11) gc_numeric_trees_missed
    12) "0"
    13) gc_blocks_denied
    14) "0"
37) cursor_stats
38) 1) global_idle
    2) (integer) 0
    3) global_total
    4) (integer) 0
    5) index_capacity
    6) (integer) 128
    7) index_total
    8) (integer) 0

The “num_docs” indicates the number of stored documents.

Code Example #

The supported RediSearch clients include the following:

image.png

In this tutorial, we will use JRediSearch to implement full-text search functionality. First, add the JRediSearch dependency in the pom.xml file:

<!-- https://mvnrepository.com/artifact/com.redislabs/jredisearch -->
<dependency>
  <groupId>com.redislabs</groupId>
  <artifactId>jredisearch</artifactId>
  <version>1.3.0</version>
</dependency>

The complete code is as follows:

import io.redisearch.client.AddOptions;
import io.redisearch.client.Client;
import io.redisearch.Document;
import io.redisearch.SearchResult;
import io.redisearch.Query;
import io.redisearch.Schema;

public class RediSearchExample {
    public static void main(String[] args) {
        // Connect to Redis server and specify the index
        Client client = new Client("myidx", "127.0.0.1", 6379);
        // Define the schema
        Schema schema = new Schema().addTextField("title", 5.0)
                .addTextField("desc", 1.0);
        // Drop the index
        client.dropIndex();
        // Create the index
        client.createIndex(schema, Client.IndexOptions.Default());
        // Set Chinese encoding
        AddOptions addOptions = new AddOptions();
        addOptions.setLanguage("chinese");
        // Add data
        Document document = new Document("doc1");
        document.set("title", "Weather forecast");
        document.set("desc", "Today's weather is very good, with sunny skies, blue sky, and white clouds.");
        // Add the document to the index
        client.addDocument(document,addOptions);
        // Query
        Query q = new Query("weather") // Set the search criteria
                .setLanguage("chinese") // Set to Chinese encoding
                .limit(0,5);
        // Get the search result
        SearchResult res = client.search(q);
        // Output the search result
        System.out.println(res.docs);
    }
}

The execution result of the above program is as follows:

[{"id":"doc1","score":1.0,"properties":{"title":"Weather forecast","desc":"Today's weather is very good, with sunny skies, blue sky, and white clouds."}}]

It can be seen that the added Chinese data was correctly queried.

Summary #

In this tutorial, we successfully started RediSearch using Docker and source code compilation. To use the full-text search function of RediSearch, we must first create an index, then add data to the index, and finally use the ft.search command for full-text search. If you want to query Chinese content, you need to set the Chinese encoding when adding data and also when querying, using "language":"chinese".

References & Acknowledgments

Official website:

http://redisearch.io

Project repository:

https://github.com/RediSearch/RediSearch