06 Directory Structure Design How to Organize a Maintainable and Extensible Code Directory

06 Directory Structure Design How to Organize a Maintainable and Extensible Code Directory #

Hello, I’m Kong Lingfei. Today, let’s talk about how to design the directory structure for code.

The directory structure is the façade of a project. Often, the mastery of a language can be reflected through the directory structure. Therefore, in my opinion, following a good directory convention and designing a maintainable and scalable codebase is even more important than documentation conventions or commit conventions.

So, how do we organize a good code directory? In today’s lesson, I will answer this question from two perspectives.

Firstly, I will introduce some basic principles for organizing directories, which can guide you in organizing a good code directory. Then, I will introduce you to some specific and excellent directory structures. You can study them, extract and summarize your own directory structure design methods, or you can directly use them as your directory structure conventions, which means structure is convention.

How to Standardize Directory Structure? #

To design a good directory structure, we first need to understand what a good directory looks like, that is, what content should be included in a standardized directory.

A directory structure typically refers to the composition of directories in our project, the files stored in each directory, the functionality they implement, and the dependencies between directories. In my opinion, a good directory structure should meet the following requirements:

Clear Naming: Directory names should be clear, concise, neither too long nor too short. The name should clearly express the functionality of the directory, and it’s better to use singular nouns. On one hand, using singular nouns is sufficient to describe the functionality of the directory. On the other hand, it can promote standardization and avoid the mixing of singular and plural nouns.
Clear Functionality: The functionality of a directory should be clear and highly recognizable throughout the project’s directory structure. In other words, when a new functionality needs to be added, we should have a clear understanding of which directory it belongs to.
Comprehensiveness: The directory structure should include the functionalities needed in the development process as much as possible, such as documentation, scripts, source code management, API implementation, tools, third-party packages, testing, build artifacts, etc.
Observability: Project sizes always grow, so a good directory structure should maintain its integrity even when the project expands.
Scalability: Each directory should store similar functionalities. When the project grows, these directories should be able to accommodate more functionalities of the same type. For example, consider the following directory structure:

$ ls internal/ app pkg README.md

The “internal” directory is used for implementing internal code, and all files in “app” and “pkg” directories belong to the internal code. If the “internal” directory only contains 2 files (app and pkg) regardless of the size of the project, it indicates that the “internal” directory is not scalable.

On the contrary, if the “internal” directory directly stores the source code directories of each component (a project can consist of one or multiple components), when the project grows and more components are added, the newly added component code can be stored in the “internal” directory. In this case, the “internal” directory is scalable. For example:

$ ls internal/
apiserver  authzserver  iamctl  pkg  pump  watcher

I have discussed the general standards for directory structure. Now let’s look at two specific directory structures that can be used as directory standards.

Typically, based on functionality, we can divide directory structures into structured directory structures and flat directory structures. Structured directory structures are mainly used in Go applications and are relatively complex, while flat directory structures are mainly used in Go packages and are relatively simple.

Since the flat directory structure is relatively simple, we will introduce it first.

Flat Directory Structure #

A Go project can be an application or a code framework/library. When the project is a code framework/library, a flat directory structure is more suitable.

A flat directory structure means that the project’s code is stored directly in the project’s root directory, making the entire directory structure appear as if it were a single layer. Many frameworks/libraries adopt this approach because it reduces the length of import paths. For example, github.com/marmotedu/log/pkg/options can be shortened to github.com/marmotedu/log/options. The log package from github.com/golang/glog is an example of a flat directory structure, with the following directory:

$ ls glog/
glog_file.go  glog.go  glog_test.go  LICENSE  README

Next, let’s learn about the structured directory structure, which is more suitable for Go applications and is more complex.

Structured Directory Structure #

The currently recommended structured directory structure in the Go community is project-layout. Although it is not an official or community specification, it is widely accepted by many Go developers because of its reasonable organization. Therefore, we can treat it as a de facto standard.

First, let’s take a look at the common features that should be included when developing a Go project. These features are listed in the GitHub document Common Features in Go Projects. The directory structure we design should be able to accommodate these features.

Combining the project-layout and the common features mentioned above, I have summarized a set of Go code organization methods, which is the directory structure used by the IAM project. This approach retains the advantages of project-layout while adding some of my personal understanding, hoping to provide you with a ready-to-use directory structure specification.

Next, let’s take a look at the directory structure of the practical project for this course. Because there are many directories in the practical project, here are only some important directories and files for you to quickly browse and deepen your understanding.

├── api
│   ├── openapi
│   └── swagger
├── build
│   ├── ci
│   ├── docker
│   │   ├── iam-apiserver
│   │   ├── iam-authz-server
│   │   └── iam-pump
│   ├── package
├── CHANGELOG
├── cmd
│   ├── iam-apiserver
│   │   └── apiserver.go
│   ├── iam-authz-server
│   │   └── authzserver.go
│   ├── iamctl
│   │   └── iamctl.go
│   └── iam-pump
│       └── pump.go
├── configs
├── CONTRIBUTING.md
├── deployments
├── docs
│   ├── devel
│   │   ├── en-US
│   │   └── zh-CN
│   ├── guide
│   │   ├── en-US
│   │   └── zh-CN
│   ├── images
│   └── README.md
├── examples
├── githooks
├── go.mod
├── go.sum
├── init
├── internal
│   ├── apiserver
│   │   ├── api
│   │   │   └── v1
│   │   │       └── user
│   │   ├── apiserver.go
│   │   ├── options
│   │   ├── service
│   │   ├── store
│   │   │   ├── mysql
│   │   │   ├── fake
│   │   └── testing
│   ├── authzserver
│   │   ├── api
│   │   │   └── v1
│   │   │       └── authorize
│   │   ├── options
│   │   ├── store
│   │   └── testing
│   ├── iamctl
│   │   ├── cmd
│   │   │   ├── completion
│   │   │   ├── user
│   │   └── util
│   ├── pkg
│   │   ├── code
│   │   ├── options
│   │   ├── server
│   │   ├── util
│   │   └── validation
├── LICENSE
├── Makefile
├── _output
│   ├── platforms
│   │   └── linux
│   │       └── amd64
├── pkg
│   ├── util
│   │   └── genutil
├── README.md
├── scripts
│   ├── lib
│   ├── make-rules
├── test
│   ├── testdata
├── third_party
│   └── forked
└── tools

Are you feeling a bit dizzy looking at this long list of directories? Don’t worry, let’s classify this large directory into categories first, and then take a closer look at the purpose of each category. Then everything will become clear.

In my opinion, a Go project consists of three main parts: Go applications, project management, and documentation. Therefore, our project directory can also be classified into these three categories. At the same time, Go applications run through the development stage, testing stage, and deployment stage, so the application directories can be further divided into smaller subcategories according to the development process. Of course, these are my suggested directories, and there are also some directories that are not recommended in Go project directories. Therefore, overall, our directory structure can be classified as shown in the following diagram:

Now let’s go through each directory and file together. When you organize your code project next time, you can come back and refer to this to deepen your understanding.

Go Applications: Mainly stores front-end and back-end code. #

First, let’s talk about the directories involved in the development phase. The code we develop includes front-end and back-end code, which can be stored in the front-end directory and back-end directory, respectively.

/web

This is the directory for storing front-end code, primarily used for storing web static resources, server templates, and single-page applications (SPAs).

/cmd

In a project with multiple components, you can put the folders containing the main functions of these components under the /cmd directory, for example:

$ ls cmd/
gendocs  geniamdocs  genman  genswaggertypedocs  genyaml  iam-apiserver  iam-authz-server  iamctl  iam-pump

$ ls cmd/iam-apiserver/
apiserver.go

The name of each component’s directory should be consistent with the expected executable file name. Make sure not to put too much code in the /cmd/<component name> directory. If you think the code can be imported and used in other projects, it should be placed in the /pkg directory. If the code is not reusable or you don’t want others to reuse it, please place the code in the /internal directory.

/internal

This directory is used to store private applications and library code. If there’s some code that you don’t want to be imported into other applications and libraries, you can put it under the /internal directory.

When importing packages from the /internal directory of another project, Go will raise a compile-time error:

An import of a path containing the element "internal" is disallowed
if the importing code is outside the tree rooted at the parent of the
"internal" directory.

Go language allows you to constrain other projects from importing packages internal to your project. The /internal directory is recommended to contain the following directories:

/internal/apiserver: This directory stores the actual application code. The shared code for these applications is stored in the /internal/pkg directory.
/internal/pkg: This directory stores packages that are shareable within the project but not outside of it. These packages provide foundational and common functionalities, such as tools, error codes, user authentication, and more.

My suggestion is to initially store all shared code in the /internal/pkg directory. When the shared code is ready for external development, move it to the /pkg directory.

Now, let me explain in detail the internal directory structure of the IAM project to deepen your understanding of internal. The directory structure is as follows:

├── apiserver
│   ├── api
│   │   └── v1
│   │       └── user
│   ├── options
│   ├── config
│   ├── service
│   │   └── user.go
│   ├── store
│   │   ├── mysql
│   │   │   └── user.go
│   │   ├── fake
│   └── testing
├── authzserver
│   ├── api
│   │   └── v1
│   ├── options
│   ├── store
│   └── testing
├── iamctl
│   ├── cmd
│   │   ├── cmd.go
│   │   ├── info
└── pkg
    ├── code
    ├── middleware
    ├── options
    └── validation

The /internal directory is roughly divided into 3 types of subdirectories:

/internal/pkg: This is the directory for storing internal shared packages.
/internal/authzserver, /internal/apiserver, /internal/pump, /internal/iamctl: These are the application directories that contain the implementation code of the applications.
/internal/iamctl: For some larger projects, there may also be a client tool needed.

In each application, there are also directory structures based on functionality:

/internal/apiserver/api/v1: This is where the specific implementation of the HTTP API interface resides. It is mainly used for unpacking HTTP requests, validating parameters, handling business logic, and returning responses. Note that the business logic here should be lightweight. If the business logic is complex and involves a large amount of code, it is recommended to place it under the /internal/apiserver/service directory. This source code file is mainly used for process chaining.
/internal/apiserver/options: This is where the command flags of the application are stored.
/internal/apiserver/config: This is where the application configuration is created based on command-line parameters.
/internal/apiserver/service: This directory stores the code for handling complex business logic in the application.
/internal/apiserver/store/mysql: If an application needs to persistently store some data, this is where the code for interacting with the database, such as Create, Update, Delete, Get, List, etc., is located.

The /internal/pkg directory stores packages that are shareable within the project and often contains the following directories:

/internal/pkg/code: Project-specific business code.
/internal/pkg/validation: Some common validation functions.
/internal/pkg/middleware: HTTP processing chain.

/pkg /pkg Directory is a very common directory in Go language projects. We can almost find its presence in all well-known open-source projects (non-frameworks), such as Kubernetes, Prometheus, Moby, Knative, etc.

This directory stores code libraries that can be used by external applications. Other projects can directly import the code from here using the import statement. Therefore, we must be careful when putting code libraries into this directory.

/vendor

Project dependencies can be created using go mod vendor. It is important to note that if it is a Go library, do not include the vendor dependencies.

/third_party

External helper tools, branch code, or other third-party applications (such as Swagger UI). For example, if we forked a third-party Go package and made some small changes, we can put it in the directory /third_party/forked. This way, we can clearly know that the package is forked from a third party and easily keep it in sync with the upstream.

Next, let’s take a look at the directories related to the testing phase. They can store files related to testing.

/test

Used to store other external testing applications and test data. The construction method of the /test directory is quite flexible: for larger projects, it makes sense to have a data subdirectory. For example, if you want Go to ignore the contents of this directory, you can use /test/data or /test/testdata.

It is important to note that Go ignores directories or files starting with “.” or “_”. This provides greater flexibility in naming test data directories.

Next, let’s take a look at the directories related to the deployment phase. These directories can store files related to deployment.

/configs

This directory is used for configuration file templates or default configurations. For example, you can store confd or consul-template template files here. It is important to note that sensitive information should not be included in the configuration. Instead, we can use placeholders. For example:

apiVersion: v1    
user:    
  username: ${CONFIG_USER_USERNAME} # iam user name    
  password: ${CONFIG_USER_PASSWORD} # iam password

/deployments

Used to store IaaS, PaaS systems, and container orchestration deployment configurations and templates (Docker-Compose, Kubernetes/Helm, Mesos, Terraform, Bosh). In some projects, especially those deployed with Kubernetes, this directory may be named “deploy”.

Why should directories related to Kubernetes be included in the directory structure? This is mainly because the current software deployment is moving towards containerized deployment.

/init

Stores initialization system (systemd, upstart, sysv) and process management configuration files (runit, supervisord). For example, the unit file of systemd. These files are useful in non-containerized deployments.

Project Management: Stores various files used to manage Go projects #

When developing a project, there are also some directories used to store files related to project management. Let’s take a look together.

/Makefile

Although Makefile is an old project management tool, it is still one of the best. Therefore, a Go project should have a Makefile tool in its root directory to manage the project. Makefile is usually used to perform tasks such as static code checks, unit tests, and compilation. For other common functions, you can refer to this link: Common Contents of Makefile.

I also have one suggestion: when executing make, execute the following steps directly: format -> lint -> test -> build. If there are code generation operations, you may also need to generate code first: gen -> format -> lint -> test -> build.

In actual development, we can automate some repetitive work and add them to the Makefile for unified management.

/scripts

This directory is mainly used to store script files that implement different functionalities such as building, installing, and analyzing. Different projects may include different files, but it usually contains the following 3 directories:

/scripts/make-rules: Used to store makefile files that implement various functions in the Makefile. Makefile has many functions, so to keep it concise, I suggest that you put the specific implementations of each function in the /scripts/make-rules folder.
/scripts/lib: Shell library, used to store shell scripts. A large project has many automation tasks, such as release, documentation update, code generation, etc. So you need to write many shell scripts, and these shell scripts may have some common functions. You can abstract these common functions into libraries and store them in the /scripts/lib directory, such as logging.sh, util.sh, etc.
/scripts/install: If the project supports automated deployment, you can put the deployment scripts in this directory. If the deployment script is simple, you can also put it directly in the /scripts directory.

In addition, the function names in shell scripts are recommended to be named semantically, such as iam::log::info. This semantic naming convention allows callers to easily identify the functional category of the function, which facilitates function management and referencing. This naming convention is widely used in Kubernetes scripts.

/build

This directory stores installation packages and files related to continuous integration. There are three directories that are highly likely to be used under this directory. Consider including them in the directory structure:

/build/package: Stores package configurations and scripts for containers (Docker) and systems (deb, rpm, pkg).
/build/ci: Stores configuration files and scripts for CI (travis, circle, drone).
/build/docker: Stores Dockerfile files for components of sub-projects.

/tools

Stores support tools for this project. These tools can import code from the /pkg and /internal directories.

/githooks Git hooks. For example, we can place commit-msg in this directory.
/assets

Other resources used by the project (images, CSS, JavaScript, etc.).

/website

If you are not using GitHub Pages, you can place project website-related data here.

Documentation: Mainly store various types of documents for the project #

A project also includes some documents, which have many categories and need some directories to store them. Let’s take a look at them together.

/README.md

The project’s README file generally includes the project’s introduction, features, quick installation and usage instructions, detailed documentation links, and development guidelines, etc. Sometimes the README document can be quite long, and in order to quickly locate the desired content, markdown TOC (Table of Contents) index needs to be added. You can use the tool tocenize to add the index.

Here’s a suggestion, as mentioned earlier, README can be standardized, so this README document can be generated automatically using scripts or tools.

/docs

Store design documents, development documents, and user documents (except for documents generated by godoc). It is recommended to store the following subdirectories:

/docs/devel/{en-US,zh-CN}: Store development documents, hack documents, etc.
/docs/guide/{en-US,zh-CN}: Store user manuals, installation, quickstart, product documentation, etc., divided into Chinese and English documents.
/docs/images: Store image files.

/CONTRIBUTING.md

If it is an open-source project, it is better to have a CONTRIBUTING.md file to explain how to contribute code, how to collaborate on open-source, etc. CONTRIBUTING.md can not only standardize collaboration processes but also reduce the difficulty for third-party developers to contribute code.

/api

In the /api directory, various types of API interface definition files provided by the current project are stored. It may include directories like /api/protobuf-spec, /api/thrift-spec, /api/http-spec, openapi, swagger, etc. These directories contain all API files provided by and dependent on the current project. For example, here is the /api folder of the IAM project:

├── openapi/
│   └── README.md
└── swagger/
    ├── docs/
    ├── README.md
    └── swagger.yaml

The main purpose of the subdirectory is to classify storage when a project provides multiple access methods at the same time. This approach can avoid potential conflicts and make the project structure clearer.

/LICENSE

The copyright file can be private or open source. Common open source licenses include: Apache 2.0, MIT, BSD, GPL, Mozilla, LGPL. Sometimes, public cloud products may release an open-source version of the product to build brand influence, so it is best to plan the future direction of the product and choose the appropriate license during the project planning stage.

To declare copyright, you may need to add the LICENSE header to source code files or other files, and this work can be automated through tools. The recommended tool is addlicense.

When referencing other open-source code in the code, the LICENSE needs to indicate the reference to other source code. This requires knowing which source code the code references and the open source licenses of these source code. You can use tools to check this, and the recommended tool is glice. As for how to indicate the reference to other source code, you can refer to the LICENSE file of the IAM project.

/CHANGELOG

When the project is updated, to facilitate understanding the update content of the current version or the history of update content, the update records need to be stored in the CHANGELOG directory. Writing CHANGELOG is a complex and tedious job. We can use Angular specification and git-chglog to automate the generation of CHANGELOG.

/examples

Store example code for applications or public packages. These example codes can lower the barrier for users to get started.

Not recommended directories #

In addition to the recommended directories mentioned above, there are some directories that are not recommended to be included in Go projects because they do not comply with Go’s design philosophy.

/src/

Some programming languages, such as Java projects, have a src directory. In Java projects, the src directory is a common pattern, but in Go projects, it is not recommended to use the src directory.

One important reason is that by default, Go projects are placed in the $GOPATH/src directory. This directory contains all the code. If we use the /src directory in our own project, there will be two src in the package import path, for example:

$GOPATH/src/github.com/marmotedu/project/src/main.go

This directory structure looks very strange.

xxs/

In Go projects, it is recommended to avoid using directories or packages with plural names. It is recommended to use singular names uniformly.

Some Suggestions #

The directory structure mentioned above includes many directories, but a small project does not require so many directories. For small projects, you can consider starting with the cmd, pkg, and internal directories, and create other directories as needed later. For example:

$ tree --noreport -L 2 tms
tms
├── cmd
├── internal
├── pkg
└── README.md

Additionally, when designing the directory structure, some empty directories can’t be added to the Git repository. However, if we want to upload an empty directory to the Git repository to preserve the directory structure, we can add a .keep file in the empty directory. For example:

$ ls -A build/ci/ 
.keep

Summary #

Today we mainly learned about how to design the directory structure of code. We first discussed the design principles for directory structures: when designing directory structures, it is important to ensure that the directory names are clear, the functionalities are explicit, and the designed directory structure is scalable.

Then, we learned about two specific types of directory structures: structured directory structure and flat directory structure. The structured directory structure is more suitable for Go applications, while the flat directory structure is more suitable for frameworks/libraries. These two types of directory structures are well-organized and can be used as directory standards.

You can also deepen your understanding of these two directory structures by studying examples from practical projects. For the structured directory structure, you can refer to the directory structure of the IAM practical project. For the flat directory structure, you can refer to the design of the log package in the practical section of this course.

Exercise #

Try refactoring your current project using the directory conventions described in this lesson and see the advantages and disadvantages.
Think about the good directory structures you have encountered in your work and discuss their advantages and areas for improvement in the comments section.

Feel free to discuss and exchange ideas with me in the comments. See you in the next lesson.