User Story Little Prince Maintaining a Beginner's Mind and Not Becoming a Frog at the Bottom of the Well

User Story - Little Prince Maintaining a Beginner’s Mind and Not Becoming a Frog at the Bottom of the Well #

Hello, I am Xiaowang, a big data developer. I am currently working as a developer and operator in a telecommunications operator company, with over 4 years of experience.

From my experience, if an engineer is not involved in lower-level development or top-level architecture of big data, but focuses on business development, the big data frameworks they use can be mainly divided into three categories: data collection, data storage, and data processing.

Among these frameworks, Apache Spark is undoubtedly one of the most frequently used, ecologically developed, and commonly asked about frameworks in both work and interviews.

How did I learn Spark before? #

As an ordinary developer, my approach to learning new things usually follows three steps. Firstly, I would consult senior employees to understand the position and core functions of the framework in the architecture diagram. Secondly, I would search for introductory videos or blog materials online and quickly go through them to get a general understanding. Lastly, I would read the official documentation and write code according to it, in order to grasp the development process and clarify the details.

I used the same approach to learn Spark before. As mentioned in the [Opening Words] of the column, “after just three months of intensive practice, I have been able to handle various business requirements proficiently, thanks to the exceptionally high development efficiency of the Spark framework.”

At this point, I considered myself a junior Spark engineer. After thoroughly reading the Spark official documentation and implementing the code, I even regarded myself as an intermediate-level Spark engineer.

Recently, I became a trainer in the company through assessment. In order to accumulate experience and share knowledge, the trainers need to record practical courses related to the company’s business.

To ensure the quality of the courses, I realized that I needed to consolidate my Spark knowledge. Here, I have to mention the second step of the aforementioned approach, which is searching for learning materials. Although there are numerous materials online, a substantial amount of them are duplicate articles. Some authors have average writing skills, making the content boring and potentially instilling erroneous knowledge into readers.

Therefore, it is relatively difficult and time-consuming to find high-quality materials in this process. Fortunately, I came across this column, which exceeded my expectations and brought me many inspirations.

What did I gain from studying this column? #

After carefully studying the column “Spark from Scratch for Beginners,” I realized how incorrect my previous assessments were. I might have a long way to go before even considering myself a “junior” Spark engineer. During the process of reading this column, the expression “this exposes my knowledge blind spots” often popped up in my mind.

Oh my, realizing my blind spots made me feel uneasy, exclaiming “without a firm foundation, everything will be shaken.”

For example, I never thought about the differences and similarities between RDD and arrays. Comparing new knowledge with familiar knowledge is often a shortcut to understanding new concepts. I also never classified operators into categories. In fact, organizing them well allows you to write code effortlessly. I have also never tried to associate important attributes of RDD, such as DAG computation graphs, with daily life. This technique can help extend our memory curve, especially for memorizing conceptual knowledge points.

Through this column, these points that I never deeply contemplated were gradually illuminated. In addition to understanding new knowledge, understanding the major core systems of Spark is also crucial.

For example, understanding the flow process of the scheduling system and the responsibilities of its three main components allows us to grasp the essence of distributed computing. Another example is memory and storage systems. If we fully comprehend these components, it lays down the important foundation for writing high-performance code.

To fully understand these important points on my own, the most direct method is to read the source code. However, for someone with average qualifications like me, reading source code can be incredibly challenging. But in the face of such difficulties, this column happens to provide effective learning clues. It feels like an enlightening path to Wudang Mountain, making it possible for us to have a broader perspective.

In this column, instead of following the traditional approach of introducing Spark module by module, Mr. Wu, the instructor, demonstrates the underlying knowledge by connecting them through an introductory case. He presents a blueprint of the underlying architecture from a higher level. Don’t worry about not understanding it, because Mr. Wu associates these knowledge points with daily life, such as factory assembly lines and construction sites, making education enjoyable. Don’t underestimate the power of analogies and associations. Compared to dry terms, using relatable associations can effectively avoid rote memorization, allowing you to speak fluently and eloquently. The key is to deepen your understanding and achieve the idea of “seeing both the trees and the forest”.

With the foundation of “seeing the forest”, when I read source code again, I have a relatively clear clue in mind. I know where to focus and which parts are milestones. I am no longer afraid of reading source code.

I don’t know if you have heard of Richard Feynman’s learning theory, which is famously known as the “Feynman Technique”. One step of this technique is to “explain a concept in the simplest language possible, as if you were teaching it to a child.” The learning method of using relatable associations coincidentally aligns with this philosophy.

During the process of learning the column “Zero Basic Introduction to Spark”, I had a small realization: compared to students with zero experience, I think those with experience may find it harder to learn. This is because they may have formed fixed impressions or misconceptions about certain concepts, and when faced with conflicts, they need to first clear their minds of preconceived knowledge. It’s like emptying a cup of tea before accepting fresh tea poured by an old Zen master.

How did I learn from this column? #

In my opinion, learning methods are just means, and the goal is to acquire knowledge. Here, I will share my personal experience for reference.

Even if the instructor explains things clearly, the knowledge still belongs to the instructor. So during the learning process, I have always insisted on taking notes and internalizing the content. Due to my mediocre aptitude, I have to read an article three or four times, or even more, to grasp the ideas.

Here is my specific approach:

First, I read the article word by word carefully. When I encounter problems, I try to restrain the bad habit of immediately searching or asking questions. I persist in finishing the first reading and establishing an outline.

Second, with the outline and questions in mind, I read the article and the comments again. Maybe the answer is hidden in the details of the overlooked context, or maybe I will see similar questions from other students and their discussions in the comments section. (By the way, the comments section is a good place to expand cognitive boundaries. It’s true that the stones from other mountains can be used to polish jade.)

Third, I copy the titles, close the article, and see if I can list the relevant knowledge points and notes for each point based on the titles.

The content after the manuscript is from my study notes, provided for your reference. These three diagrams respectively illustrate the knowledge networks of scheduling systems, parallelism and parallel tasks, and Spark storage.

Whether it’s drawing diagrams or taking notes, the key is to help establish the logical relationships between knowledge. If you encounter difficulties during the organization process, you can read the course materials and official documentation to fill in the gaps.

After such learning and organization, I also apply this knowledge by writing code or reading source code to avoid just talking about it on paper. After three or four iterations, I achieve the small goal of “seeing the trees” in the idea of “seeing both the trees and the forest”.

For ordinary people, over 99% of their career success comes from working with great people. Professor Wu is one of those great people, and this column is one of those great things. I am fortunate to have read this course by Professor Wu. Do you know the feeling of cherishing something good tightly? We can’t waste such a good thing.

As the saying goes, “the best way to be lazy is to not be lazy”. Countless experiences have taught us that laziness always comes back to bite us. Since mastering Spark is unavoidable for a big data engineer, let’s study steadily and take action.

Knowledge gained from books is shallow. True understanding requires practice. Only through practice can the knowledge in this column slowly flow into your mind. When you use your hands to diligently work on the keyboard, the knowledge will flow out from your dancing fingertips, weaving into elegant and beautiful code.

Maintain an open-minded attitude and don’t be a frog at the bottom of a well. I hope we can progress together, apply what we learn, and keep up the good work!