02 Understand the Elastic Stack Ecosystem and Scenario Solutions

02 Understand the Elastic Stack Ecosystem and Scenario Solutions #

Elastic Stack is a powerful combination of different open-source tools that work together seamlessly to help users solve various data-related challenges. It comprises several components, including Elasticsearch, Logstash, Kibana, Beats, and other complementary tools.

Elasticsearch: It serves as the core component, providing distributed search and analytics capabilities. Elasticsearch enables users to store, search, and analyze large volumes of data in real-time, making it ideal for applications such as log analytics, observability, and full-text search.

Logstash: This tool allows users to collect, process, and send logs and other data from multiple sources to Elasticsearch or other destinations. It supports various input plugins for data ingestion and can transform and filter data before sending it to the desired location.

Kibana: As a data visualization platform, Kibana helps users explore, visualize, and share data stored in Elasticsearch. With its easy-to-use interface, users can create and customize real-time dashboards, perform ad-hoc queries, and generate meaningful visual representations of the data.

Beats: Beats act as lightweight agents that can be installed on servers or other devices to collect and send data to Elasticsearch or Logstash. There are different types of Beats available, such as Filebeat for collecting log files, Metricbeat for system and service metrics, and Auditbeat for tracking security-related events.

The Elastic Stack ecosystem offers a wide range of solutions for various use cases. Some common scenarios include:

  • Log analysis: Elastic Stack allows users to ingest and analyze log data from different sources, enabling efficient log management and troubleshooting.

  • Security analytics: With Auditbeat and other security-related features, Elastic Stack can help users monitor and detect security threats in real-time, allowing proactive response and threat mitigation.

  • Metrics monitoring: By using Metricbeat and other monitoring tools, users can collect and analyze metrics from different systems and services, enabling performance monitoring and troubleshooting.

  • Search and analytics: Elasticsearch’s powerful search capabilities and Kibana’s visualization features enable users to build complex queries, explore data patterns, and gain valuable insights from their datasets.

  • Business intelligence: Elastic Stack can be used to create interactive dashboards and reports that provide business insights by combining data from different sources and visualizing it in a meaningful way.

The flexibility and scalability of Elastic Stack make it suitable for a wide range of use cases, from IT operations and DevOps to security analytics and business intelligence.

Elastic Stack Ecosystem #

Beats + Logstash + ElasticSearch + Kibana

Below is a graph I found from the official blog, which shows the ELK ecosystem and scenarios based on ELK (at the top).

img

Since Elastic X-Pack is a paid product, we can also include X-Pack to see what it brings. This will help you identify the key points when reading the official documentation:

img

Beats #

Beats is a platform for lightweight collectors that can send data from edge machines to Logstash and ElasticSearch. It is developed using Go language, which makes it fast in terms of performance. From the following graph, you can see that different Beats suites are designed for different data sources.

img

Logstash #

Logstash is a dynamic data collection pipeline with an extensible plugin ecosystem. It supports data collection from various sources, transforms the data, and sends it to different repositories. It works seamlessly with ElasticSearch and was acquired by Elastic in 2013.

It has the following features:

  1. Real-time parsing and transformation of data.

  2. Scalability with over 200 plugins available.

  3. Reliability and security. Logstash ensures at least once delivery of running events through persistent queues and also encrypts data during transmission.

  4. Monitoring.

ElasticSearch #

ElasticSearch is used for searching, analyzing, and storing data. It is a distributed search and analysis engine based on JSON, designed specifically for scalability, reliability, and ease of management.

The implementation of ElasticSearch follows these steps:

  1. First, the user submits data to the ElasticSearch database.

  2. The corresponding statement is tokenized by the tokenizer.

  3. The tokenized results and their weights are stored. When users search for data, the results are ranked and scored based on the weights, and the returned results are presented to the users.

Kibana #

Kibana is used for data visualization. Its main role is to provide a user interface for configuring and managing ElasticSearch and displaying data in the ElasticSearch. Kibana originated as a tool built on Logstash and was acquired by Elastic in 2013.

Here are some of Kibana’s features:

  1. Kibana can provide various visualizations in the form of charts.

  2. It can use machine learning techniques to detect anomalies and identify suspicious issues in advance.

Evolution of the ES Stack from Log Collection Systems #

Let’s take a look at the evolution of the ELK technology stack, which is typically reflected in log collection systems.

A typical log system includes:

(1) Collection: the ability to collect log data from various sources.

(2) Transmission: the ability to parse, filter, and transmit log data to a storage system reliably.

(3) Storage: the storage of log data.

(4) Analysis: support for UI analysis.

(5) Alerting: the ability to provide error reporting and monitoring mechanisms.

beats+elasticsearch+kibana #

After collecting data with Beats, it is stored in ES, with Kibana providing visualizations.

img

beats+logstash+elasticsearch+kibana #

img

This framework introduces Logstash to the previous framework, bringing the following benefits:

(1) Logstash has a disk-based adaptive buffering system that absorbs incoming throughput, reducing backpressure.

(2) Extraction from other data sources such as databases, S3, or message queues.

(3) Sending data to multiple destinations such as S3, HDFS, or writing to a file.

(4) Composing more complex processing pipelines using conditional data flow logic.

Advantages of using beats in combination with Logstash:

(1) Horizontal scalability, high availability, and handling variable workloads: Beats and Logstash can achieve load balancing between nodes, and multiple Logstash instances can provide Logstash high availability.

(2) Message durability and at-least-once delivery guarantee: When using Beats or Winlogbeat for log collection, at least once delivery can be guaranteed. Both communication protocols, from Filebeat or Winlogbeat to Logstash and from Logstash to Elasticsearch, are synchronous and support acknowledgments. The Logstash persistent queue provides protection against node failures. Ensuring disk redundancy is essential for disk-level elasticity in Logstash.

(3) End-to-end secure transmission with authentication and wired encryption: The transmission from Beats to Logstash and from Logstash to Elasticsearch can both be done using encryption. There are many security options when communicating with Elasticsearch, including basic authentication, TLS, PKI, LDAP, AD, and other custom domains.

Adding more data sources, such as TCP, UDP, and HTTP protocols, is a common way to input data into Logstash.

img

beats+MQ+logstash+elasticsearch+kibana #

img

On top of the previous architecture, we can add components such as Redis, Kafka, RabbitMQ, etc. between beats and Logstash. Adding middleware brings the following benefits:

(1) Reducing the impact on the machines where logs are located. These machines typically deploy reverse proxies or application services and already bear heavy workloads, so it is preferable to do less on these machines.

(2) If there are many machines that need to collect logs, continuously writing data to Elasticsearch from each machine will inevitably create pressure on Elasticsearch. Therefore, data buffering is necessary, and this buffering can also protect data from being lost to a certain extent.

(3) Formatting and processing log data can be done unifiedly in the Indexer, allowing code modification and deployment to occur in one place, avoiding the need to modify configurations on multiple machines.

Best Practices for Elastic Stack #

Let’s take a look at the best practices shared by the official development team.

Log Collection System #

(PS: It’s what we discussed above)

Basic log system

img

Adding data sources and using MQ (Message Queue)

img

Metric Collection and APM Performance Monitoring #

img

Multi-datacenter Solution #

Achieving data high availability through redundancy

img

Two data collection centers (for example, collecting data from two factories) and data aggregation after collection

img

Data dispersion and cross-cluster search

img

References #