26 Live Busy but Also Pay Attention to Code Style

26 Live Busy But Also Pay Attention to Code Style #

Hello, I’m Cai Yuannan, the author of “Practical Large-Scale Data Processing” on Geek Time. Today is the second time I’ve been invited to share in our column, and I’m glad to see you again. The topic I’m sharing today is: Too busy to work, still have time to pay attention to code style?!

Many people who visit Google will be surprised and slightly embarrassed to ask, “Is the Python programming style guide posted on the door in front of your toilet meant to be funny?”

This is not a joke. Google has extremely strict requirements for coding standards. Today, let’s talk about coding standards.

The understanding of coding standards (style guide) for many people may only be at the first stage: knowing that coding standards are useful, and the whole company requires the use of camel case naming. The later stages, such as why and how to do it, are not well understood.

But at Google, the belief in coding standards may exceed the imagination of many people. Let me briefly introduce a few points to you.

  1. Each language has a dedicated committee (Style Committee) to establish the company-wide mandatory coding standards and is responsible for arbitrating coding style disputes (Style Arbiters).
  2. In each corresponding language’s programming standards group, there are daily discussions and debates. The newly reached consensus is written on “big-character posters” and posted in the restroom, so that everyone, even visitors, can read it in those fragmented 5 minutes while sitting.
  3. Each code submission, similar to the concept of “diff” in Git, requires at least two code reviews: one for business logic and one for readability. The so-called readability review focuses on code style standards. Only those who have passed the assessment can become readability reviewers.
  4. There are a large number of development automation tools to ensure the enforcement of the above guidelines. For example, there will be a linter performing static rule checks before code submission, and the code cannot be submitted if it does not pass.

When you see this, I don’t know how you feel. I personally agree with this kind of engineering culture, so today, I will explain two points clearly:

  • Why is coding style important for Python, and does it really help with business development?
  • What processes and tools can be integrated into the existing development process to enforce your coding standards automatically?

During the explanation, I will timely refer to two regulations as examples, namely:

  • “PEP 8 – Style Guide for Python Code”, hereinafter referred to as PEP 8;
  • “Google Python Style Guide”, hereinafter referred to as Google Style. This is a style guide derived from Google’s internal rules. The publicly released community version is intended to unify the coding style of all Python open-source projects under Google. (http://google.github.io/styleguide/pyguide.html)

Relative to PEP 8, Google Style is a more stringent coding standard. Because PEP 8 is targeted at individuals and small teams, while Google Style is capable of handling large teams, enterprise-level codebases with millions of lines. I will also briefly explain their content later.

Why is a unified programming standard important? #

In summary, a unified programming standard can improve development efficiency. And development efficiency is related to three main entities: readers, programmers, and machines. Their priority is reader experience » programmer experience » machine experience.

Reader Experience » Programmer Experience #

Those who have written code may have experienced that in our actual work, the time spent typing code is far less than the time spent reading or debugging code. In fact, research has shown that 80% of the time in software engineering is spent on reading code. So, in order to improve development efficiency, what we need to optimize is not your typing time, but rather the team’s reading experience.

In fact, many programming standards exist to optimize the reading experience. For example, when it comes to naming conventions, I believe many people would understand that PEP8 Rule 38 specifies that names must be meaningful and not meaningless single letters.

Some people may say, “Ah, programming standards are so annoying! I have to write complete variable names, and it’s so tiring to type them out.” However, when you are the reader, you can definitely distinguish the difference in readability between the following two code examples:

# Incorrect example
if (a <= 0):
   return
elif (a > b):
   return
else:
   b -= a

# Correct example
if (transfer_amount <= 0):
   raise Exception('...')
elif (transfer_amount > balance):
   raise Exception('...')
else:
   balance -= transfer_amount

Here’s another example: Google Style Rule 2.2 states that in Python code, import objects should only be packages or modules.

# Incorrect example
from mypkg import Obj
from mypkg import my_func

my_func([1, 2, 3])

# Correct example
import numpy as np
import mypkg

np.array([6, 7, 8])

The incorrect examples above are syntactically valid (because there are no name collisions), but their readability for readers is very poor. For example, a name like my_func is difficult for readers to deduce its possible functionality without a package name providing context, and it is also difficult to find potential issues during debugging based on the package name.

In contrast, the correct example, although array is a very generic name, because of the hint provided by the numpy package, the reader can immediately understand, “Oh, this is a numpy array.” However, it is important to note that the example above is orthogonal to the concept of symbol collisions. Even if there are no symbol collisions, we should still adhere to this import rule.

Programmer Experience » Machine Experience #

After discussing reader experience, let’s talk about programmer experience. One common tendency I often see is the over-simplification of code, and I am also guilty of this. A typical example is the blind use of Python’s list comprehension.

# Incorrect example
result = [(x, y) for x in range(10) for y in range(5) if x * y > 10]

I bet very few people can write such a complex list comprehension in one go. Not only is it easy to strain oneself, but it also tires out readers. In fact, using a simple for loop would make the code more concise and understandable, and also make it easier for oneself.

# Correct example
result = []
for x in range(10):
  for y in range(5):
     if x * y > 10:
       result.append((x, y))

The Machine Experience is also Important #

After discussing the importance of the programmer and reader experiences, we can’t overlook the machine experience. Ultimately, we want the code to execute correctly and efficiently on the computer. However, certain dangerous programming styles can not only affect program correctness but also become bottlenecks for code efficiency.

Let’s take a look at the difference between using is and ==. Can you tell the output of the following code?

# Incorrect example
x = 27
y = 27
print(x is y)

x = 721
y = 721
print(x is y)

It seems that is is used to compare memory addresses. Therefore, the two results should be the same. However, the actual output is True and False respectively!

The reason is that in CPython (the C implementation of Python), the integers from -5 to 256 are implemented as singletons. This means that the numbers in this range refer to the same memory area. So, the 27 in the above example and the 27 in the below example are pointing to the same memory address, resulting in True.

But for numbers outside the range of -5 to 256, memory is reallocated due to your redefinition. Therefore, the two occurrences of 721 will point to different memory addresses, resulting in False.

So, even if you understand that is compares the memory addresses of objects, you should avoid using is to compare two Python integers in your code style.

# Correct Example
x = 27
y = 27
print(x == y)

x = 721
y = 721
print(x == y)

After looking at this example, let’s consider whether == always behaves as expected when comparing values. Again, you can first predict the output.

# Incorrect Example
x = MyObject()
print(x == None)

Is the output False? Not necessarily. For classes, the result of == depends on the specific implementation of its __eq__() method. The author of MyObject could have implemented it as follows:

class MyObject(object):
   def __eq__(self, other):
       if other:
           return self.field == other.field
       return True

The correct way is to always use is when comparing with None in your code style.

# Correct Example
x = MyObject()
print(x is None)

In the above two examples, I briefly introduced how to use code style restrictions to ensure safer use of is and ==. However, don’t forget about implicit boolean conversion in Python. For example:

# Incorrect Example
def pay(name, salary=None):
   if not salary:
       salary = 11
   print(name, "is compensated", salary, "dollars")

What will be printed if someone calls pay("Andrew", 0)? It will print “Andrew is compensated 11 dollars”. When you explicitly want to compare an object with None, you must explicitly use is None.

# Correct Example
def pay(name, salary=None):
   if salary is not None:
       salary = 11
   print(name, "is compensated", salary, "dollars")

This is why PEP8 and Google Style emphasize when to use is, when to use ==, and when to use implicit boolean conversion.

Unconventional coding practices can also lead to efficiency issues. Let’s see what’s wrong with the following code:

# Incorrect Example
adict = {i: i * 2 for i in xrange(10000000)}

for key in adict.keys():
   print("{0} = {1}".format(key, adict[key]))

The keys() method generates a temporary list before iterating, causing the above code to consume a lot of memory and run slowly. The correct way is to use the default iterator. The default iterator does not allocate new memory, and therefore, does not cause the performance issue mentioned above:

# Correct Example
for key in adict:

This is why Google Style 2.8 places restrictions on the choice of iteration method.

By now, I believe you have a better understanding of the importance of code style conventions. If you can take the next step and fully integrate them into your development process, it will be a transformative experience for you and your team.

Automated Tools Integrated into the Development Workflow #

As mentioned earlier, the ultimate goal of coding standards is to improve development efficiency. Obviously, if you have to spend a lot of extra time on code formatting every time you write code, it defeats our original intention.

First, you need to choose or establish standards that are suitable for your specific work environment, based on your company/team. The two common standards mentioned earlier, PEP8 and Google Style, can be used as references.

There is no one-size-fits-all standard, so you need to adapt to your specific situation. For example, at Google, due to historical reasons, exceptions are not used in C++, and the risks introduced by exceptions outweigh their benefits in the entire codebase. Therefore, exceptions are prohibited in their C++ code style.

Secondly, once the team agrees on the coding standards, they must be enforced. Verbal and mental consensus alone is not enough. How to enforce them? By using mandatory code reviews and mandatory static or dynamic linters.

Of course, it is important to note that when I say “mandatory”, I don’t mean that there will be fines for non-compliance. That would be too low and not in line with the geek spirit. By “mandatory,” I mean encoding the consensus into the code and automating these processes. For example:

  • Adding the necessary code formatting checks in the code review tool.
  • Including the team’s agreed coding standards in Pylint (https://www.pylint.org/), which can automatically check the code before every commit. Code that does not pass the checks cannot be submitted.

Once integrated, your team’s workflow will look like this:

Workflow Diagram

Summary #

By now, I believe you have gained a fresh understanding of the importance of code style. Code style is important because it affects the experience of readers, programmers, and the execution of code by machines.

Of course, it is not enough to simply be aware of the importance of code style. I have also shared some practical methods for automated code style checking, such as enforcing code reviews and enforcing static or dynamic linters. In short, our emphasis on coding standards is ultimately aimed at improving development efficiency, rather than doing extra work.

Thought question #

In your personal or team project experience, have you ever stepped into pitfalls or had arguments due to programming conventions? Welcome to leave a message and share with me. Feel free to share this article as well.