27 Learn to Reasonably Decompose Code to Improve Code Readability

27 Learn to Reasonably Decompose Code to Improve Code Readability #

Hello, I’m Jingxiao. Today, we won’t discuss any technical knowledge, but continue to chat about the philosophy of code.

As the saying goes, good code itself is a document. Even for programs, components, or systems with the same functionality, the code written by different people can be vastly different.

Some people have a design and code style that is as smooth as a hot knife cutting through butter. From top-level to bottom-level code, it is a joy to read, with detailed yet concise comments. When delving into the details of the code, it can be understood clearly even without comments.

However, there are others whose code barely works and may encounter bugs in slightly complex situations. When debugging the code, magic numbers and functions are scattered everywhere. With files consisting of thousands of lines and a confusing mix of design patterns, it becomes incredibly difficult to read, not to mention making modifications and iterative development.

Guido van Rossum, the creator of Python, once said that the frequency of reading code is much higher than the frequency of writing code. After all, even when writing code, you need to repeatedly read and debug it to ensure that it runs as expected.

Without further ado, let’s get to the point.

PEP 8 Specification #

In the previous class, we briefly mentioned PEP 8. Today, we will continue to interpret it in detail.

PEP stands for Python Enhancement Proposal. It is a set of guidelines for writing Python code, much like we have standards for sentence structure, punctuation, paragraph formatting, and indentation in writing. The purpose of PEP 8 is to make Python code more readable, thus enhancing code comprehensibility.

In fact, PyCharm already includes a built-in PEP 8 code checker. It automatically checks for code that does not adhere to the specified guidelines, points out the errors, and recommends ways to fix them. The following image is a screenshot of its interface.

Therefore, while learning today’s content, I recommend using the PyCharm IDE to check your code and identify any formatting issues. Especially for beginners, code style may be even more important than code accuracy, because in actual work, code readability is much more important than you might imagine.

Indentation Guidelines #

First, let’s look at indentation within code blocks.

The most significant difference between Python and C++/Java is that the latter uses curly brackets to distinguish code blocks, while Python relies on line breaks and indentation to delimit code blocks. There is a famous coding contest called the International Obfuscated C Code Contest, where you can find amazing creations with code formed into various shapes, such as drawings or cartoon avatars, but still producing astonishing results.

However, such tricks cannot be applied in Python. Nevertheless, we can have Python code that is as clear as reading English.

With that being said, Python indentation can be written in many ways, such as using tabs, two spaces, or four spaces, or even a mix of spaces and tabs. However, PEP 8 tells us to use four spaces for indentation, avoiding tabs, and never mix tabs and spaces.

The second thing to note is that the maximum line length should be limited to 79 characters.

This principle has two main advantages. The first advantage is easy to understand. Many engineers prefer to display multiple lines of source code side by side on the screen. If a line of code is too long, you will need to horizontally scroll or split the line into multiple lines with a soft return, which greatly affects coding and reading efficiency.

As for the second advantage, it is easier to grasp with more programming experience. When the nesting level of code becomes too deep, such as exceeding three levels, a line of code is likely to exceed 79 characters. Therefore, this rule also forces programmers to not write code with excessive levels of nesting, but instead think about breaking down the code into functions or logical blocks to optimize their code structure.

Blank Line Guidelines #

Next, let’s consider blank lines between code blocks.

We know that blank lines in Python have no effect on the execution by the Python interpreter, but they have a profound impact on readability.

PEP 8 specifies that there should be two blank lines above global classes and functions, and one blank line between functions within a class. Inside a function, you can also use blank lines to separate different groups of code, similar to paragraphs in English, but remember to use only one blank line at most and avoid overusing them.

In addition, although Python allows merging multiple lines into a single line using semicolons as separators, this is not recommended by PEP 8. So, even if you have control statements like if/while/for, if your execution statement only consists of one line of code, it is best to start a new line. This will significantly improve reading efficiency.

Regarding the end of the code, every code file should end with a blank line, and only this one blank line.

Space Guidelines #

Let’s now consider the usage of spaces within lines of code.

In a function’s parameter list, when calling a function, a comma will appear. Please remember to follow the comma with a space. This is a common practice in English and helps to read each parameter independently and more clearly.

Similarly, colons are often used to initialize dictionaries, and a space should be placed after the colon.

In addition, in Python, we can use # to add comments. Remember to add a space after # and before the comment.

As for operators, such as +, -, *, /, &, |, =, ==, !=, spaces should be included on both sides. In contrast, spaces are not needed around parentheses.

Line Break Guidelines #

Now, let’s revisit the indentation guidelines. Reminding you of the second point mentioned earlier, each line should not exceed the maximum length of 79 characters. However, what should you do when the logic for a function call becomes too long and has to exceed this limit?

Please take a look at the following code snippet, and try to analyze its characteristics:

def solve1(this_is_the_first_parameter, this_is_the_second_parameter, this_is_the_third_parameter,
           this_is_the_forth_parameter, this_is_the_fifth_parameter, this_is_the_sixth_parameter):
    return (this_is_the_first_parameter + this_is_the_second_parameter + this_is_the_third_parameter +
            this_is_the_forth_parameter + this_is_the_fifth_parameter + this_is_the_sixth_parameter)


def solve2(this_is_the_first_parameter, this_is_the_second_parameter, this_is_the_third_parameter,
           this_is_the_forth_parameter, this_is_the_fifth_parameter, this_is_the_sixth_parameter):
    return this_is_the_first_parameter + this_is_the_second_parameter + this_is_the_third_parameter + \
           this_is_the_forth_parameter + this_is_the_fifth_parameter + this_is_the_sixth_parameter


(top_secret_func(param1=12345678, param2=12345678, param3=12345678, param4=12345678, param5=12345678).check()
    .launch_nuclear_missile().wait())


top_secret_func(param1=12345678, param2=12345678, param3=12345678, param4=12345678, param5=12345678).check() \
    .launch_nuclear_missile().wait()

In fact, there are two classic methods here.

The first one is to use parentheses to encapsulate long calculations. In this case, even though it crosses multiple lines, it is still within the scope of a logical reference. The solve1 function has too many parameters, so it is directly wrapped to a new line. However, please note that the second line of parameters should be aligned with the first parameter on the first line. This makes the function look beautiful and easier to read. Similarly, the function call can also be formatted in a similar way by wrapping it with a pair of parentheses.

The second method is to use line breaks. This method is more straightforward, as you can see from the solve2 function and the second function call.

Regarding code formatting, I focus on these four aspects. Habits are not developed in a day, but you need to pay special attention and practice deliberately. What I can do is to tell you these points that you need to pay attention to and let you experience the actual code style in real projects.

The following code is selected from the open-source library Google TensorFlow Keras. To highlight the key points more intuitively, I have removed the comments and most of the code. I hope that by reading this code, you can get a more realistic understanding of how cutting-edge projects focus on enhancing readability.

class Model(network.Network):
    def fit(self,
            x=None,
            y=None,
            batch_size=None,
            epochs=1,
            verbose=1,
            callbacks=None,
            validation_split=0.,
            validation_data=None,
            shuffle=True,
            class_weight=None,
            sample_weight=None,
            initial_epoch=0,
            steps_per_epoch=None,
            validation_steps=None,
            validation_freq=1,
            max_queue_size=10,
            workers=1,
            use_multiprocessing=False,
            **kwargs):
        # Legacy support
        if 'nb_epoch' in kwargs:
            logging.warning(
                'The `nb_epoch` argument in `fit` has been renamed `epochs`.')
            epochs = kwargs.pop('nb_epoch')

if kwargs:
    raise TypeError('Unrecognized keyword arguments: ' + str(kwargs))
self._assert_compile_was_called()

func = self._select_training_loop(x)
return func.fit(
    self,
    x=x,
    y=y,
    batch_size=batch_size,
    epochs=epochs,
    verbose=verbose,
    callbacks=callbacks,
    validation_split=validation_split,
    validation_data=validation_data,
    shuffle=shuffle,
    class_weight=class_weight,
    sample_weight=sample_weight,
    initial_epoch=initial_epoch,
    steps_per_epoch=steps_per_epoch,
    validation_steps=validation_steps,
    validation_freq=validation_freq,
    max_queue_size=max_queue_size,
    workers=workers,
    use_multiprocessing=use_multiprocessing)

Document Specification #

Next, let’s talk about document specifications. Let’s start with the most commonly used import function.

First, all imports should be placed at the beginning of the file. There is not much to say about this, as importing everywhere can make it difficult to see the dependencies between files, and importing at runtime may lead to potential efficiency issues and other risks.

Second, do not use import to import multiple modules at once. Although we can import multiple modules in one line, separated by commas, please do not do this. import time, os is not recommended according to PEP 8.

If you use the statement from module import func, make sure that func does not conflict with any existing names in the current file. However, you can actually rename it using from module import func as new_func to avoid conflicts.

Commenting Convention #

There is a saying: “Bad comments are worse than no comments.” So when you modify the code, be sure to check if the surrounding comments need to be updated.

For large blocks of logic, we can write comments at the same indentation level, starting with #. Even though it’s a comment, you should write it as if it were a complete article. If it’s an English comment, please pay attention to capitalization and punctuation at the beginning and end, and avoid grammar and logic errors. The same requirements apply to Chinese comments. A good code cannot do without good comments.

As for inline comments, as mentioned in the spacing convention, we can add two spaces after a line, followed by a # to include the comment. However, please note that inline comments are not a highly recommended practice.

# This is an example to demonstrate how to comment.
# Please note this function must be used carefully.
def solve(x):
    if x == 1:  # This is the only exception.
        return False
    return True

Document Description #

Now let’s talk about document descriptions. Let’s continue using TensorFlow code as an example.

class SpatialDropout2D(Dropout):
    """Spatial 2D version of Dropout.
    This version performs the same function as Dropout, however it drops
    entire 2D feature maps instead of individual elements. If adjacent pixels
    within feature maps are strongly correlated (as is normally the case in
    early convolution layers) then regular dropout will not regularize the
    activations and will otherwise just result in an effective learning rate
    decrease. In this case, SpatialDropout2D will help promote independence
    between feature maps and should be used instead.
    Arguments:
        rate: float between 0 and 1. Fraction of the input units to drop.
        data_format: 'channels_first' or 'channels_last'.
            In 'channels_first' mode, the channels dimension
            (the depth) is at index 1,
            in 'channels_last' mode, it is at index 3.
            It defaults to the `image_data_format` value found in your
            Keras config file at `~/.keras/keras.json`.
            If you never set it, then it will be "channels_last".
    Input shape:
        4D tensor with shape:
        `(samples, channels, rows, cols)` if data_format='channels_first'
        or 4D tensor with shape:
        `(samples, rows, cols, channels)` if data_format='channels_last'.
    Output shape:
        Same as input
    References:
        - [Efficient Object Localization Using Convolutional
          Networks](https://arxiv.org/abs/1411.4280)
  """
    def __init__(self, rate, data_format=None, **kwargs):
        super(SpatialDropout2D, self).__init__(rate, **kwargs)
        if data_format is None:
            data_format = K.image_data_format()
        if data_format not in {'channels_last', 'channels_first'}:
            raise ValueError('data_format must be in '
                           '{"channels_last", "channels_first"}')
        self.data_format = data_format
        self.input_spec = InputSpec(ndim=4)

You may notice that the comments for classes and functions are there to help the reader quickly understand what the function does, its input parameters and formats, output return values and formats, and other things to note.

As for the docstring, it starts and ends with three double quotes. We first use one sentence to briefly describe what the function does, followed by a paragraph to explain it in detail. After that, we have the parameter list, parameter formats, and return value formats.

Naming Convention #

Next, let’s talk about naming conventions. You may have heard the saying, “There are only two hard things in computer science: cache invalidation and naming things.” Naming is not an easy task for programmers. A misleading name can potentially introduce bugs into a project. I won’t go into the method of classifying names here; I will only discuss some of the most practical conventions.

Let’s start with variable names. Avoid using meaningless single characters like a, b, c, d as variable names. Instead, use variable names that represent their meaning. Generally, variables are lowercase and connected with underscores, for example: data_format, input_spec, image_data_set. The only place where single characters are acceptable is in iteration, for example: for i in range(n). To simplify, single characters can be used here. If it’s a private variable of a class, remember to add two underscores before it.

For constants, the best practice is to use uppercase letters and connect them with underscores, for example: WAIT_TIME, SERVER_ADDRESS, PORT_NUMBER.

For function names, also use lowercase and connect with underscores, for example: launch_nuclear_missile(), check_input_validation().

For class names, capitalize the first letter and merge them together, for example: class SpatialDropout2D(), class FeatureSet().

In conclusion, as I mentioned before, don’t be too stingy with the length of a variable name. After reasonably describing what the variable represents, some concise ability is also necessary.

Code Decomposition Techniques #

Finally, let’s talk about some practical code optimization techniques.

One core principle in programming is to avoid writing repetitive code. Repetitive code can often be solved using conditionals, loops, constructors, and classes. Another core principle is to reduce the depth of iteration and make the Python code as flat as possible. After all, the human brain cannot handle too many stack operations.

Therefore, in many places where the business logic is complex, we need to include a lot of conditions and loops. However, if these are not properly written, the program will look like a mess.

Let’s take a look at a few examples to discuss the details of writing good conditionals and loops. First, let’s look at the first piece of code:

if i_am_rich:
    money = 100
    send(money)
else:
    money = 10
    send(money)

In this code snippet, the send statement appears twice. So we can simply merge it and refactor the code like this:

if i_am_rich:
    money = 100
else:
    money = 10
send(money)

Let’s look at another example:

def send(money):
    if is_server_dead:
        LOG('server dead')
        return
    else:
        if is_server_timed_out:
            LOG('server timed out')
            return
        else:
            result = get_result_from_server()
            if result == MONEY_IS_NOT_ENOUGH:
                LOG('you do not have enough money')
                return
            else:
                if result == TRANSACTION_SUCCEED:
                    LOG('OK')
                    return
                else:
                    LOG('something wrong')
                    return

This code has multiple layers of indentation, which looks ugly. Let’s refactor it:

def send(money):
    if is_server_dead:
        LOG('server dead')
        return

    if is_server_timed_out:
        LOG('server timed out')
        return

    result = get_result_from_server()

    if result == MONEY_IS_NOT_ENOUGH:
        LOG('you do not have enough money')
        return

    if result == TRANSACTION_SUCCEED:
        LOG('OK')
        return

    LOG('something wrong')

Isn’t the new code much clearer?

In addition, we know that the granularity of a function should be as fine as possible, and we should not let a function do too many things. So, when dealing with a complex function, we need to split it into several functions with simple functionalities and then merge them. So, how should we split a function?

Here, let me take a simple binary search as an example. I have given a non-decreasing integer array and a target. You need to find the smallest number x in the array that satisfies x * x > target. If it does not exist, return -1.

This functionality shouldn’t be too difficult to implement. You can try writing it yourself first, and then compare it with the code below to identify any issues in your code.

def solve(arr, target):
    l, r = 0, len(arr) - 1
    ret = -1
    while l <= r:
        m = (l + r) // 2
        if arr[m] * arr[m] > target:
            ret = m
            r = m - 1
        else:
            l = m + 1
    if ret == -1:
        return -1
    else:
        return arr[ret]


print(solve([1, 2, 3, 4, 5, 6], 8))
print(solve([1, 2, 3, 4, 5, 6], 9))
print(solve([1, 2, 3, 4, 5, 6], 0))
print(solve([1, 2, 3, 4, 5, 6], 40))

The first code snippet I provided is acceptable in algorithm contests and interviews. However, from an engineering perspective, we can further optimize it:

def comp(x, target):
    return x * x > target


def binary_search(arr, target):
    l, r = 0, len(arr) - 1
    ret = -1
    while l <= r:
        m = (l + r) // 2
        if comp(arr[m], target):
            ret = m
            r = m - 1
        else:
            l = m + 1
    return ret


def solve(arr, target):
    id = binary_search(arr, target)

    if id != -1:
        return arr[id]
    return -1


print(solve([1, 2, 3, 4, 5, 6], 8))
print(solve([1, 2, 3, 4, 5, 6], 9))
print(solve([1, 2, 3, 4, 5, 6], 0))
print(solve([1, 2, 3, 4, 5, 6], 40))

As you can see in the second code snippet, I extracted different functionality into separate functions. The comp() function serves as the core condition, and taking it out makes the entire program clearer. At the same time, I also separated the main program of the binary search, which is only responsible for the search itself. The solve() function takes the result and decides whether to return that value or -1. This way, each function has its own responsibility, and readability is improved.

Finally, let’s take a look at how to decompose classes. As usual, let’s start with the code:

class Person:
    def __init__(self, name, sex, age, job_title, job_description, company_name):
        self.name = name
        self.sex = sex
        self.age = age
        self.job_title = job_title
        self.job_description = description
        self.company_name = company_name

You should be able to see that the word “job” appears many times and represents a meaningful entity. In this case, we can consider extracting this part as a separate class.

class Person:
    def __init__(self, name, sex, age, job_title, job_description, company_name):
        self.name = name
        self.sex = sex
        self.age = age
        self.job = Job(job_title, job_description, company_name)

class Job:
    def __init__(self, job_title, job_description, company_name):
        self.job_title = job_title
        self.job_description = description
        self.company_name = company_name

As you can see, after refactoring the code, it becomes much clearer instantly.

Summary #

In today’s class, we briefly talked about how to improve the readability of Python code. We mainly introduced the PEP 8 specifications and explained and improved examples to show you how to optimize the readability of Python programs.

Thought-provoking Question #

Finally, I would like to pose a thought-provoking question. This question is open-ended, and I hope you can share in the comments section about the mistakes you made when you first started learning programming without paying attention to code conventions. What were the consequences of these mistakes? For example, did they mislead people who later read your code or introduce potential bugs, and so on.

I encourage you to share your experiences in the comment section. You can also share this article to promote mutual exchange of insights and experiences among more people, and to foster true growth and progress through these experiences.