20 Log Processing How to Design Logging Packages and Record Logs

20 Log Processing How to Design Logging Packages and Record Logs

20 Log Processing How to Design Logging Packages and Record Logs #

Hello, I’m Kong Lingfei. In the next two lectures, we will discuss how to design and develop a logging package.

When developing Go projects, besides handling errors, another important task is logging. By recording logs, we can accomplish various functions, such as debugging during development and testing, troubleshooting, data analysis, monitoring and alerts, as well as recording events that occur.

To achieve these functionalities, we first need an excellent logging package. Additionally, I have noticed that many Go developers log in a casual manner, resulting in logs that cannot effectively pinpoint the problem. Therefore, we also need to know how to log better, which requires a logging specification.

With an excellent logging package and logging specification, we can quickly locate problems, obtain sufficient information, and carry out data analysis and monitoring alerts in later stages. We can also conveniently debug. In this lecture, I will provide a detailed introduction on how to design a logging package and logging specification.

First, let’s take a look at how to design a logging package.

How to Design a Logging Package #

Currently, although there are many excellent open-source logging packages to choose from, in a large system, these open-source logging packages may not meet our customized requirements. In this case, we need to develop our own logging package.

These logging packages may be based on one or several open-source logging packages, or they may be completely new logging packages. So when developing a logging package, what functions do we need to implement and how do we implement them? Let’s discuss in detail.

Let’s first take a look at the functions that a logging package needs to have. Based on the importance of the functions, I divide the functions that a logging package needs to implement into basic functions, advanced functions, and optional functions. Basic functions are the functions that a logging package must have; advanced functions and optional functions are functions that can be added in specific scenarios. Let’s start with the basic functions.

Basic Functions #

Basic functions are essential for an excellent logging package and can meet the needs of the majority of use cases, suitable for some small and medium-sized projects. A logging package should have the following four basic functions.

Support basic log information

A logging package needs to support basic log information, including timestamps, file names, line numbers, log levels, and log messages.

Timestamps can record the time when a log occurs. When troubleshooting, we often need to use timestamps to reconstruct the request process, check the context of the same timestamp, and locate the problem.

File names and line numbers can help us quickly locate the position where the log is printed and find the problematic code. If a logging library does not support file names and line numbers, troubleshooting will become very difficult, and we can only rely on grep and memory to locate the code. For enterprise-level services, it is necessary to ensure that the service can quickly recover after a failure. The longer the recovery time, the greater the loss and impact. This requires developers to be able to quickly locate and resolve issues. With file names and line numbers, we can accurately locate the problematic code, fix the problem as soon as possible, and restore the service.

By using log levels, we can know the error type of a log. The most common usage is to directly filter out logs of the Error level, so that we can directly locate the problem and then combine other logs to pinpoint the cause of the error. Without log levels, when troubleshooting, we may need to look through a large number of useless logs. In a large system, the amount of logs for a single request is large, which significantly prolongs the time it takes to locate issues.

And with log messages, we can know the specific cause of an error.

Support different log levels

Different log levels represent different log types. For example, logs of the Error level indicate error types. When troubleshooting, we usually start by looking at logs of the error level. Logs of the Warn level indicate that an exception has occurred but it does not affect the program’s operation. If the program’s execution result does not meet expectations, we can refer to logs of the Warn level to locate the exception. Logs of the Info level can assist us in debugging and record useful information for later analysis.

Usually, a logging package should implement at least 6 levels. I have provided you with a table that ranks the levels from low to high priority as follows:

Some logging packages, such as logrus, also support the Trace log level. The Trace level is even lower than the Debug level and can print more fine-grained log information. In my opinion, the Trace level is not necessary, and you can choose it according to your needs.

When printing logs, a log invocation actually has two attributes:

Output level: When printing logs, we expect the output level of the logs. For example, if we call glog.Info("This is info message") to print a log, the output log level will be Info.
Switch level: When starting the application, we specify which output level logs should be printed. For example, when using glog with -v=4, it means that only logs with a level higher than 4 will be printed.

If the switch level is set to L, logs will only be printed if the output level is >= L. For example, if the switch level is Warn, only logs with the Warn, Error, Panic, and Fatal levels will be recorded. The specific output relationship is shown in the following diagram:

Support custom configuration

Different runtime environments require different logging output configurations. For example, in the development and testing environment, the logging level needs to be set to Debug to facilitate debugging. In the production environment, the logging level needs to be set to Info to improve the performance of the application. For example, in the production environment, JSON-formatted logs are usually output to facilitate log collection, while in the development and testing environment, text-formatted logs are output for easy viewing. So, our log package needs to be configurable and have different configurations for different environments. Through configuration, we can change the logging behavior without recompiling the code.

Support output to standard output and file

Logs are always meant to be read, either being output to the standard output for developers to read in real-time or saved to a file for future reference. Outputting to the standard output and saving to a file are the basic functionalities of a log package.

Advanced functionalities #

Apart from the basic functionalities mentioned above, in some large-scale systems, log packages are often required to have some advanced functionalities. These advanced functionalities can help us better log records and achieve richer functionalities, such as log alerts. So, what advanced functionalities can a log package have?

Support multiple log formats

Log format is also a point we should consider. A good log format not only facilitates log viewing but also facilitates some log collection components to collect logs and interface with log search engines such as Elasticsearch.

A log package should provide at least the following two formats:

TEXT format: TEXT format logs have good readability and make it convenient for us to view logs during the development and debugging phase. For example:

2020-12-02T01:16:18+08:00 INFO example.go:11 std log
2020-12-02T01:16:18+08:00 DEBUG example.go:13 change std log to debug level

JSON format: JSON format logs can record more detailed information. They include some common or custom fields that can be used for future queries and analysis. Also, they can be conveniently collected and reported by log collection tools like Filebeat and Logstash. Below is an example of a JSON format log:

{"level":"DEBUG","time":"2020-12-02T01:16:18+08:00","file":"example.go:15","func":"main.main","message":"log in json format"}
{"level":"INFO","time":"2020-12-02T01:16:18+08:00","file":"example.go:16","func":"main.main","message":"another log in json format"}

I recommend using the TEXT format logs in the development and debugging phase and the JSON format logs in the production environment. An excellent log library, such as logrus, should not only provide basic output formats but also allow developers to customize the log output format.

Ability to output logs by level

To quickly locate the desired logs, it is a good practice to categorize logs by level, at least outputting error-level logs to a separate file. This way, when problems occur, we can directly look for error logs to locate the issues. For example, glog supports categorized output, as shown in the following image:

Support structured logging

Structured logging means using JSON or other encoding methods to structure logs. This facilitates using various tools such as Filebeat and Logstash Shipper to collect, filter, analyze, and search logs. For example, using zap for log printing:

package main

import (
    "time"

    "go.uber.org/zap"
)

func main() {
    logger, _ := zap.NewProduction()
    defer logger.Sync() // flushes buffer, if any
    url := "http://marmotedu.com"
    // Structured logging
    logger.Sugar().Infow("failed to fetch URL", "url", url, "attempt", 3, "backoff", time.Second)

    // Unstructured logging
    logger.Sugar().Infof("failed to fetch URL: %s", url)
}

The code above will output:

{"level":"info","ts":1607303966.9903321,"caller":"zap/structured_log.go:14","msg":"failed to fetch URL","url":"http://marmotedu.com","attempt":3,"backoff":1}
{"level":"info","ts":1607303966.9904354,"caller":"zap/structured_log.go:17","msg":"failed to fetch URL: http://marmotedu.com"}

Support for log rotation

In a large project, several gigabytes of logs may be generated in a day. In order to prevent logs from filling up the disk space, leading to server or program exceptions, it is necessary to ensure that when the log size reaches a certain level, it is split, compressed, and saved.

How do we split it? You can split the log based on its size or based on the date. Log splitting, compression, and storage functions can be encapsulated based on some excellent open source packages on GitHub, such as lumberjack, which supports archiving logs based on size and date, and file-rotatelogs, which supports log splitting based on hours.

For log rotation, I actually do not recommend adding it in the log package, as it would increase the complexity of the log package. Instead, I suggest using other tools to achieve log rotation. For example, Logrotate can be used for log rotation in Linux systems. Logrotate is powerful and is a professional log rotation tool.

Hook capability

The hook capability allows us to customize the processing of log content. For example, when a log of a certain level is generated, we can send an email or call an alert API for alerting. Many excellent open source log packages provide hook capabilities, such as logrus and zap.

In a large system, log alert is a very important feature. However, a better approach is to make the alerting capability a side function. With the side function, we can ensure that the log package has a focused and concise functionality. For example, logs can be collected to Elasticsearch and log alerting can be done through ElastAlert.

Optional features #

In addition to the basic and advanced features, there are some features that do not affect the core functionality of the log package, but having these features would make the log package more user-friendly. Here are three examples of such features.

Support for colored output

Enabling colored output during development and testing makes it easier for us to identify and debug errors and warnings by color-coding logs of different levels. When deploying to production, colored output can be disabled to improve performance.

Compatibility with the standard library log package

Some early Go projects heavily relied on the standard library log package. If our log library can be compatible with the standard library log package, we can easily replace the standard library log package. For example, logrus is compatible with the standard library log package. Here is an example code using the standard library log package:

    package main
    
    import (
        "log"
    )
    
    func main() {
        log.Print("call Print: line1")
        log.Println("call Println: line2")
    }

By simply replacing "log" with log "github.com/sirupsen/logrus", we can switch to the standard library log package as follows:

    package main
    
    import (
        log "github.com/sirupsen/logrus"
    )
    
    func main() {
        log.Print("call Print: line1")
        log.Println("call Println: line2")
    }

Support for logging to different destinations

In a distributed system, a service may be deployed on multiple machines. In this case, if we want to view logs, we need to log in to different machines separately, which is very cumbersome. We would prefer to have the logs delivered to Elasticsearch for centralized log viewing.

We may also need to analyze information such as the number of calls to a certain API or the number of requests from a certain user from logs. This requires us to be able to process logs. The general practice is to deliver logs to Kafka, and a data processing service consumes the logs stored in Kafka to analyze information such as the number of calls.

For the two scenarios mentioned above, logs need to be delivered to components such as Elasticsearch and Kafka. If our log package supports delivering logs to different downstream components, such a feature would be highly anticipated:

If the log does not support delivering logs to different downstream components such as Elasticsearch, Kafka, Fluentd, Logstash, etc., logs can also be collected from disk files using Filebeat and then delivered to downstream components.

Points to consider when designing a log package #

Above, we have discussed the features that a log package should have, which can guide us in designing the log package. Now, let’s take a look at a few aspects we still need to consider when designing a log package:

High performance: Since we need to frequently call the log package in our code to record logs, the performance of the log package is the first thing to consider. A log package with poor performance will inevitably lead to poor overall application performance.
Concurrency safety: Go applications heavily utilize Go’s concurrency features, which means logs need to be recorded concurrently, so the log package needs to be concurrency-safe.
Plugin capability: The log package should provide some plugin capabilities, such as allowing developers to customize the output format, storage location, and behavior when errors occur (such as alerts, sending emails, etc.). Plugin capabilities are not necessary, as the inherent features of logs can meet the majority of usage requirements, for example, supporting JSON and TEXT output formats, supporting standard output and file storage locations, and log monitoring can be achieved through some side systems.
Log parameter control: The log package should be configurable in a flexible manner, either through initialization configuration or runtime configuration. For example, initialization configuration can be done through the Init function, and runtime configuration can be done through functions such as SetOptions/SetLevel.

How to Log? #

Earlier, we discussed the functionalities, implementation methods, and considerations when designing a logging package. However, during this process, there is an important aspect that needs to be considered - logging.

More logs is not necessarily better. In actual development, we often encounter a large number of useless logs and lack the logs we need. Or, valuable logs are drowned in a sea of useless logs, making it difficult to find the relevant information.

An excellent logging package can assist us in better logging, viewing, and analyzing logs. However, how we log determines whether we can obtain useful information. The logging package is a tool, but logging itself is the soul. Here, I will explain in detail how to log.

To log better, we need to address the following questions:

Where should we print logs?
At which log levels should we print logs?
How should we log the content?

Where to Print Logs? #

Logs are mainly used to locate problems. Therefore, in general, we should print logs where required. So, specifically, where should we print logs? Here are a few suggestions.

Print logs at branching statements. Printing logs at branching statements allows us to determine which branch the code takes, helping us identify the next step of a request and continue troubleshooting.
Writing operations must print logs. Writing operations are most likely to cause serious business failures. Printing logs during writing operations helps us find key information when problems occur.
Be cautious when printing logs in loops. If the number of iterations is too high, printing a large number of logs can seriously affect code performance. The recommended approach is to record key points within the loop and summarize them outside the loop for printing.
Print logs at the original location where errors occur. For nested errors, it is advisable to print error logs at the original location where the error occurred. If the upper layer does not need to add necessary information, it can directly return the error from the lower layer. Here’s an example:

package main

import (
    "flag"
    "fmt"

    "github.com/golang/glog"
)

func main() {
    flag.Parse()
    defer glog.Flush()

    if err := loadConfig(); err != nil {
        glog.Error(err)
    }
}

func loadConfig() error {
    return decodeConfig() // return directly
}

func decodeConfig() error {
    if err := readConfig(); err != nil {
        return fmt.Errorf("could not decode configuration data for user %s: %v", "colin", err) // Add necessary information, such as the user's name
    }

    return nil
}

func readConfig() error {
    glog.Errorf("read: end of input.")
    return fmt.Errorf("read: end of input")
}

By printing logs at the location where errors first occur, we can easily trace the root cause of the logs and add necessary information at the upper layers. This allows us to understand the impact of the error and helps with troubleshooting. Additionally, returning the logs from the lower layers directly can reduce duplicate log prints.

When calling functions from third-party packages and the function encounters an error, it prints the error message. For example:

if err := os.Chdir("/root"); err != nil {
    log.Errorf("change dir failed: %v", err)
}

At Which Log Levels Should We Print Logs? #

Different levels of logs have different meanings and can achieve different functions. In development, we should record logs at appropriate levels according to the purpose. Here are some suggestions.

Debug Level

In order to obtain enough information for debugging, a lot of logs are usually printed at the Debug level. For example, the entire HTTP request body or response body can be printed.

Debug level requires printing a large number of logs, which can severely impact program performance. In addition, Debug level logs are mainly used for better debugging during the development and testing phase, and mostly consist of log information that does not affect production business. Therefore, Debug level logs must be disabled when the service is launched. Otherwise, a large number of logs can quickly consume disk space, resulting in service downtime and potentially affecting performance and user experience.

Debug level logs can be printed as desired. You can print any logs you think are helpful for development and debugging during the testing phase at this level.

Info Level

Info level logs can record useful information for future operational analysis. Therefore, the number of Info level logs should neither be too high nor too low; the main goal is to meet the requirements. Some key logs can be recorded at the Info level, but if the log volume is too large or the output frequency is too high, consider recording them at the Debug level.

The log level in production is generally set to Info level. In order to avoid filling up the entire disk space with log files, it is important to avoid generating too many Info level logs. For example, be cautious when using Info level logs in for loops.

Warn Level

Some warning logs can be recorded at the Warn level. Warn level logs often indicate program abnormalities or unexpected behavior that does not impact the continuous operation of the program, or it may only be a temporary impact that will recover later. Such logs require your attention. Warn level logs are more business-level warning logs.

Error Level

Error level logs inform us that the program execution has encountered errors. These errors will definitely impact the program’s execution results, such as failed requests or resource creation failures. It is important to record every log for these errors to avoid ignoring them during troubleshooting in the future. Most errors can be classified as Error level.

Panic Level

Panic level logs are rarely used in actual development. They are usually used only when error stack traces are needed or when we don’t want the program to exit due to serious errors and instead handle the errors using defer.

Level of Fatal

Fatal is the highest level of logging. It indicates that the problem is extremely serious, to the point where the program cannot continue running. Usually, it refers to system-level errors. In development, this level is rarely used unless we believe that a certain error would make the entire program unable to continue running.

Here is a diagram summarizing how to choose between the Debug, Info, Warn, Error, Panic, and Fatal log levels.

How to Log Content? #

I have a few suggestions on how to log content:

When logging, avoid outputting sensitive information such as passwords and keys.
For debugging convenience, it is common to use Debug level to log temporary messages. These log messages can be prefixed with special characters, such as log.Debugf("XXXXXXXXXXXX-1:Input key was: %s", setKeyName). This way, after completing the debugging, you can easily find and remove these temporary log messages by searching for the XXXXXXXXXXXX string before committing.
Log messages should start with lowercase letters and end with an English period (.). For example, log.Info("update user function called.").
To improve performance, use specific types whenever possible. For example, use log.Warnf("init datastore: %s", err.Error()) instead of log.Warnf("init datastore: %v", err).
Log messages should ideally contain two pieces of information. One is the Request ID, a unique ID for each request, which helps filter out logs for a specific request from a massive log collection. Another is the user and action, used to identify who did what.
Do not log messages at the wrong log level. For example, in my project development, I often find colleagues who print normal log messages at the Error level and print error log messages at the Info level.

Summary of “Best” Practices for Logging #

I have provided explanations from these three aspects regarding logging. Here are the main points summarizing the “best” practices for logging, which you can refer to in the future:

When developing and debugging or troubleshooting in a live environment, do not forget to optimize log printing during the troubleshooting process. Good logs may not be achieved in one go, so continuous optimization during actual development testing and problem locating in the live environment is necessary. However, this requires you to value logging, rather than treating it as just a way to record information without knowing why you are printing an Info log.
Print a suitable amount of logs, avoiding printing useless logs while not missing key log information. The best result is to be able to locate problems solely based on these key logs.
Support dynamic log output for easier troubleshooting of online issues.
Always save logs in local files. By logging locally, you can decouple from the centralized log platform. This way, even when the network is unavailable or the log center platform fails, you can still record logs normally.
Centralized log storage and processing: Because an application may contain multiple services, and a service may have multiple instances, it is best to store these logs in the same log platform, such as Elasticsearch, for centralized management and log viewing.
Structured log recording: Add some default common fields to each log entry for easy log querying and analysis.
Support Request ID: Use Request ID to link all the logs for a single request, which may be distributed across different components and machines. Supporting Request ID can greatly improve troubleshooting efficiency and reduce difficulty. In some large-scale distributed systems, troubleshooting without Request ID can be a disaster.
Support dynamic toggle for Debug logs: For locating deeply hidden issues, it may be necessary to print Debug logs for more information. However, the log level in the live environment is usually set to Info. To obtain Debug logs, we may consider modifying the log level to Debug and restarting the service. After locating the problem, we can modify the log level back to Info and restart the service. This approach is not only cumbersome but also may impact the live business. The best solution is to dynamically control whether to enable Debug logs for a specific request by using parameters like debug=true passed in the request.

Additional Content: Distributed Log Solution (EFK/ELK) #

In the previous section, we introduced the design of log packages and the specification for logging. In addition to that, there is another issue that you should be aware of: how do we collect, process, and display the logs we record?

In actual Go project development, in order to achieve high availability, at least two instances of the same service need to be deployed, and requests are forwarded through a load balancing strategy such as round-robin. Furthermore, an application may consist of multiple services. Let’s assume our application has two services, each with two instances. If the application malfunctions, we may need to log into four (2 x 2) servers to view the local log files and locate the problem, which is very inconvenient and increases the time for fault recovery. Therefore, in real enterprise scenarios, we collect and display these logs in a unified manner.

In the industry, there is already a very popular log solution for log collection, processing, and display: EFK (Elasticsearch + Filebeat + Kibana) or ELK (Elasticsearch + Logstash + Kibana). EFK can be seen as an evolution of ELK, replacing the log collection component Logstash with Filebeat. The main reason for using Filebeat instead of Logstash is that Filebeat is lighter and uses fewer resources. You can refer to this diagram for the log processing architecture.

The logs are recorded in local files (*.log files) through the log package and then collected into Kafka by the Shipper. The Shipper can be flexibly selected according to requirements. Common Shippers include Logstash Shipper, Flume, Fluentd, and Filebeat. Among them, Filebeat and Logstash Shipper are used most frequently. The reason why the Shipper does not deliver the logs directly to Logstash indexer or Elasticsearch is that Kafka can support larger throughput and can smooth out peaks.

The log messages in Kafka are consumed by the Logstash indexer, which processes them and delivers them to Elasticsearch for storage. Elasticsearch is a real-time full-text search and analytics engine that provides the three major functions of data collection, analysis, and storage. The logs stored in Elasticsearch can be displayed through the graphical interface provided by Kibana. Kibana is a web-based graphical interface used for searching, analyzing, and visualizing log data stored in Elasticsearch.

Logstash is responsible for log collection, transformation, and filtering. It supports almost any type of log, including system logs, error logs, and custom application logs. Logstash can be further divided into Logstash Shipper and Logstash indexer. Logstash Shipper monitors and collects logs, sends the log contents to Logstash indexer, and then Logstash indexer filters the logs and submits them to Elasticsearch.

Summary #

Logging is an essential feature for applications. The main purpose of logging is troubleshooting, and to achieve better troubleshooting, we need an excellent tool: a logging package. So how do we design a logging package? In my opinion, a logging package should have the following features:

Basic features: support basic log information, support different log levels, support custom configuration, and support output to standard output and a file.
Advanced features: support multiple log formats, be able to categorize logs by level, support structured logging, support log rotation, and have hook capabilities.
Optional features: support color output, be compatible with the standard library log package, and support output to different locations.

In addition, a logging package needs to support different log levels. The log levels, from low to high priority, are: Trace < Debug < Info < Warn/Warning < Error < Panic < Fatal. Among them, Debug, Info, Warn, Error, and Fatal are the basic levels, and it is recommended to include these levels when developing a logging package. Trace and Panic are optional levels.

Once we have understood the features of a logging package, we can proceed to design and develop the package. However, during the development process, we also need to ensure that our logging package has good performance, thread safety, plugin support, and log parameter control.

With a logging package in place, we also need to know how to use it effectively, that is, how to log. In the article, I have provided some logging recommendations, which include detailed content. You can refer back to the article for more information.

Finally, I have also mentioned a distributed logging solution: EFK/ELK. EFK is an upgraded version of ELK, and in actual project development, we can directly choose EFK. In the EFK solution, Filebeat uploads the logs to Kafka, Logstash indexer consumes the logs from Kafka and delivers them to Elasticsearch for storage, and finally, Kibana’s graphical interface is used to view the logs.

Exercise #

Think about what additional functions your project’s logging package needs and how to design it. If you have good logging conventions in your daily development, please feel free to share and discuss them in the comments section.

Looking forward to your thoughts and opinions. See you in the next lesson.