40 Summary of Data Structures and Algorithms in Python

40 Summary of Data Structures and Algorithms in Python #

Hello, I am Jingxiao.

Unknowingly, we have completed the study of the practical part of quantitative trading together again. I am very happy to see that many students have been actively learning and leaving many high-quality comments, which is worthy of our mutual thinking and communication. Some students have also repeatedly scrutinized and pointed out some inaccurate or inappropriate expressions in the articles, for which I am very grateful.

The main purpose of the practical part is to explain how Python plays a role in a complete technical field. Therefore, in each lesson, we will summarize a small piece of knowledge; at the same time, in Lesson 36, I explained strategies and backtesting systems in great detail, as they are the most important aspects of quantitative trading.

As for the Q&A session of this chapter, because many students have been asking questions about data structures and algorithms in Python, I will briefly address them here.

First of all, I hope you understand that the positioning of our Python column is an advanced course with a certain foundation in computer knowledge, focusing on the core knowledge points of Python, assuming that you have a certain understanding of basic algorithms and data structures. Therefore, in the process of explaining syntax and technical knowledge points, I will incorporate some basic knowledge of data structures in a comprehensive manner, but I will not delve into it. Key terms and difficulties in data structures will naturally be mentioned, but I still hope that you have the ability to self-study and master them.

However, in order to further facilitate your understanding of Python’s data structures and algorithms and deepen your grasp of the basic content of Python, I have summarized a comprehensive outline here. If you are lacking in this area, you can refer to it for learning and reference. Of course, if you have time and energy, I encourage you to implement all the data structures and algorithms in Python.

Basic Data Structures: Arrays, Heaps, Stacks, Queues, Linked Lists #

Arrays need no introduction. In Python, the basic array provides O(1) random access lookup and O(n) random insertion.

Heap, strictly speaking, is a special type of binary tree that allows random insertion and deletion in O(nlogn) time complexity. It also enables retrieving the maximum or minimum value in constant time complexity, O(1). Heaps can be used to implement priority queues and are widely used for task scheduling in various projects.

Stack is a last-in, first-out data structure, where both push and pop operations have a time complexity of O(1).

Queue, on the other hand, is the opposite of a stack. It is a first-in, first-out data structure, where the one who enters the queue first is the first to be served. Enqueuing and dequeuing operations also have a time complexity of O(1). Both stacks and queues can be implemented using arrays, but attention needs to be paid to space management, especially when implementing a queue using an array, where a circular queue is commonly used.

Linked list is another type of linear structure, different from arrays in that it does not support random access. You cannot access elements in a linked list using indexes. Elements in a linked list are connected through pointers, where in a singly linked list, each element points to the next one, while in a doubly linked list, adjacent elements are connected to each other.

All these fundamental data structures have good library and package support in Python, making them easy to use. However, it is still important for you to have a certain understanding of the principles behind them, so that you can handle complex problems with confidence and ease.

Advanced Data Structures: Undirected Graph, Directed Graph, Tree, DAG Graph, Trie, Hash Table #

An undirected graph is a data structure composed of vertices and edges. An edge connects two vertices (if the two vertices are the same, the edge is called a self-loop). As the name suggests, “undirected”, its edges have no direction.

A directed graph, like an undirected graph, is a “graph” data structure. The difference is that the edges of a directed graph have a direction, from one vertex to another.

The tree data structure can be divided into rooted trees and unrooted trees. In the former, the most common one is our binary tree, starting from a vertex and descending level by level, with each parent node having at most two child nodes. As for unrooted trees, they are a special type of undirected graph. An undirected graph without cycles is called an unrooted tree. It has many special properties and advantages, and is widely used in discrete mathematics.

A DAG (Directed Acyclic Graph), is a data structure used in special applications and frequently encountered in dynamic programming problems involving graphs. The way to traverse a DAG, which is commonly referred to as topological sorting, is a kind of graph algorithm. A DAG can be considered as a graph version of a linked list. If a blockchain is a linked list, then the blockchain 3.0 era could be a DAG graph.

A trie, also known as a Trie tree, is a directed graph with edges represented by characters. It has powerful applications in string processing. The well-known AC Automaton is used Trie tree to solve the problem of multiple pattern matching. Trie tree is also commonly used in the industry for search suggestions. For example, when you search for “极客时” in Baidu, it will automatically suggest “极客时间”.

A hash table is undoubtedly the most widely used and simplest data structure for programmers. For example, Python’s dict() can be used off the shelf, simple and natural. However, hash tables have profound connotations. Collision algorithms, hash algorithms, and capacity expansion algorithms are all worth exploring in depth.

Algorithm: Sorting #

It can be quite challenging to start with sorting when learning algorithms, as it requires an understanding of concepts like time complexity, basic binary thinking, and rigorous mathematical proofs. However, regardless of the difficulty, I want to emphasize that you should not skip these essential scientific training steps in the learning process. If you ignore the fundamentals and simply rely on list.sort(), you will struggle when faced with slightly more complex problems in the future, and you will end up spending more time revisiting the basics, which is not worth it.

We can start by understanding sorting with the basic concept of Bubble Sort, which is an algorithm that is easy to understand in terms of correctness and code. Then comes Selection Sort and Insertion Sort, which, like Bubble Sort, both have a time complexity of O(n^2).

From Merge Sort onwards, the algorithm complexity drops to the theoretical lower bound of O(nlogn). Here, we also start to encounter a classic idea in algorithms—Divide and Conquer. After that, there are algorithms like Quick Sort and Heap Sort, which also have a time complexity of O(nlogn).

In addition to these, there are some optimization sorting algorithms that can achieve a time complexity of O(n) under specific conditions, such as Counting Sort, Bucket Sort, and Radix Sort.

For more information on various algorithms, I recommend checking out this Bilibili video: https://www.bilibili.com/video/av685670

Algorithm: Binary Search #

Binary search is a concept that has wide applications, even in daily life (haha). For example, the design of page flipping in books is a form of binary search. You don’t need to search many times to find the page you are looking for. Another well-known example is the joke about girls finding books in the library¹.

When studying in the library, a girl carrying a pile of books entered the reading room. As a result, the alarm went off. The librarian asked the girl to see which book triggered the alarm. The girl poured out all the books and tested them one by one. Seeing this, the librarian became impatient and divided the books into two parts. She tested the first part, and the alarm rang. Then she divided this part into two and continued the test. After three repetitions, she found the book. The librarian looked at the girl with a disdainful look as if to say, “Can’t you tell the difference between O(n) and O(log2n)?”

When it comes to the binary search algorithm, you must not just rely on APIs and simple code. You must understand the essence of the binary search concept and apply it flexibly.

Algorithms: Depth First Search (DFS) and Breadth First Search (BFS) #

DFS and BFS are the basics in graph theory algorithms. You need to master these two fundamental concepts first, and then learn several classic algorithms such as shortest path algorithm, union-find, memoization in DFS, topological sorting, DP on DAG graphs, etc.

Here, it is important to note that our focus is on learning the ideas. For business logic, the importance of graph algorithms may not be that significant. However, when you start diving into the deeper layers of the technology stack, dealing with big data (Hadoop, Spark), and exploring neural networks and artificial intelligence, you will discover that the basic ideas of graphs have already permeated into design patterns, and DFS and BFS are the two fundamental keys to manipulating graphs.

Algorithms: Greedy and Dynamic Programming #

These two algorithms are still two important ways of thinking. Although in the vast majority of programmers’ work, these two algorithms may not be used more than a few times in a year, but at the same time, they are essential basic skills for upgrading to higher technical abilities. You don’t need to master them to the level of participating in the ACM World Finals, but even if we can have some understanding of the basic methodology, we will benefit a great deal.

A friend who has participated in the ACM competition once told me that after he learned dynamic programming, his entire worldview and methodology changed. After that, when he thinks about certain decisions in real life, he can understand which ones are short-sighted greedy approaches, and which ones are long-term considerations of dynamic programming (laughs).

Summary #

As a Python language column, it is indeed impossible for me to explain every data structure and algorithm in detail. However, as I’ve said before, basic data structures and algorithms are essential for every programmer.

Here, I recommend that you can learn from the column Data Structures and Algorithms: The Beauty of It by Wang Zheng and the video course Algorithm Interview Pass 40 Lectures by Qin Chao on Geek Time. These two teachers, who have worked at Google and Facebook, have solid foundations and rich practical experience and will provide you with more comprehensive algorithm lectures from different perspectives.

In today’s internet era of data explosion, learning materials are within easy reach, so time is more valuable. I have outlined these frameworks here with the purpose of helping you save time, organizing basic knowledge points suitable for entry-level learning and mastery, and enabling you to learn more purposefully with a global perspective.

Of course, successful learning requires our own efforts. Only in this way, can you, who have mastered data structures and algorithms, further enhance your understanding of Python based on mathematical foundations. At the same time, these ways of thinking will invisibly help you design higher-quality systems and architectures in future project designs. It can be said that it is a lifelong investment in learning.

I hope you can learn and gain something practical. If you have any confusion or questions, feel free to communicate and discuss with me in the comments section. Let’s improve and progress together!

Source: Twitter - @xueshudi ↩︎