02 Why Jupyter Notebook Is an Essential Skill for Modern Python

02 Why Jupyter Notebook is an Essential Skill for Modern Python #

Hello, I’m Jingxiao.

Stack Overflow released the question traffic on their site for various languages at the end of 2017. Among them, Python surpassed JavaScript and became the language with the highest traffic. It is predicted to have a far greater lead over JavaScript by 2020.

You may already know that Python’s “rise” in popularity in recent years is largely due to the rise of machine learning and statistical applications. But why is Python so suitable for mathematics, statistics, and machine learning? As an experienced programmer, I can tell you for sure that Jupyter Notebook (https://jupyter.org/) plays an indispensable role.

It is no exaggeration to say that, based on my knowledge of top companies like Facebook in Silicon Valley, if a Python engineer still doesn’t know how to use Jupyter Notebook, they might be really falling behind.

Sharpening the axe won’t hinder the job of cutting wood. Efficient tools help us accomplish programming tasks with half the effort. In this lesson, I will teach you how to use Jupyter Notebook, laying the necessary foundation for your future Python learning.

What is Jupyter Notebook? #

After all that being said, what exactly is Jupyter Notebook? According to Fernando Pérez, the founder of Jupyter, his original vision was to create a computational tool platform that integrates the scientific computing languages Julia, Python, and R. Therefore, he named it Ju-Py-te-R. Over time, Jupyter has evolved into a versatile scientific computing platform that supports almost all languages and integrates software code, computational output, explanatory documents, and multimedia resources.

As they say, a picture is worth a thousand words. Take a look at the image below, and you will understand what Jupyter Notebook is.

Jupyter Notebook

You directly enter code in a cell and it immediately gives you the output below. Cool, isn’t it? You might wonder if this seemingly “all style and no substance” thing really became a disruptor in the Python community. To be honest, a few years ago, I wouldn’t have believed it either. So, how big is the impact of Jupyter Notebook?

The Impact of Jupyter Notebook #

When we measure the influence of a technology or want to use our own technology to influence the world, we cannot avoid considering its impact on the education industry.

Take Microsoft’s Word text processing system for example. From a purely technical perspective, Word’s offline design concept has been outdated for 20 years. However, online document systems represented by Google Docs have not achieved the expected reduction in Word’s impact.

The intuitive reason for this is user habits. Users are accustomed to making multiple revisions in Word, and it still works fine. But if we think more deeply, the reason for developing such user habits lies in our education system. Starting from an early age, our education system trains users to use Word for more than ten years in primary, secondary, and university education. In the workplace, experienced employees continue to use Word with new employees, creating a positive feedback loop that perpetuates the technological influence.

Now let’s get back to our main topic today and look at Jupyter Notebook. Since 2017, a large number of top computer science courses in North America have started to use Jupyter Notebook as the primary tool. For example, Feifei Li’s CS231N course “Convolutional Neural Networks for Visual Recognition” used command-line Python for assignments in 2016, but in 2017, all assignments were completed using Jupyter Notebook. Similarly, UC Berkeley’s course “Foundations of Data Science” has also used Jupyter Notebook for all assignments since 2017.

In fact, Jupyter Notebook has an even greater impact on the industry. At Facebook, although large-scale backend development still relies on full-featured IDEs, almost all small and medium-sized programs, such as internal offline analysis software and machine learning module training, are done using Jupyter Notebook. Based on my understanding, other leading Silicon Valley companies, such as Google’s AI research department Google Brain, also use Jupyter Notebook exclusively, although they use their own improved custom version called Google Colab.

By now, I believe you recognize the current status of Jupyter Notebook in the field. However, when it comes to technology choices, some people say we should use a certain technology because it is popular, while others think that if Alibaba is using a technology, it must be the future and we should use it too, and so on. These viewpoints are one-sided. Whether it’s Alibaba or Facebook using a technology, it doesn’t necessarily mean it fits your use case.

I often encourage my colleagues in the tech industry to have independent thinking when it comes to technology choices and not to blindly follow others. At the very least, you should think about why Facebook chose a particular technology, what problems it solves, why Facebook didn’t choose other technologies, and what limitations it has. Looking solely at the choice result, the technology selected by Facebook is likely because they have hundreds of product lines and tens of thousands of engineers. Whereas for a small team of ten people, the same technology might become a burden.

Here, I don’t want to deceive you with any technology. What I want to teach you is the dialectical thinking method for analyzing technology. Next, let’s take a look at what problems Jupyter Notebook solves that others haven’t.

Advantages of Jupyter #

Integration of all resources #

In real software development, context switching takes up a lot of time. What does this mean? Let’s take an example to understand it better. For instance, you need to switch windows to view some documents, then switch windows to use another tool for drawing, and so on. These are all factors that affect productivity.

As mentioned earlier, Jupyter solves this problem by putting all the resources related to software development in one place. When you open a Jupyter Notebook, you can already see the corresponding documents, charts, videos, and the code. This way, you don’t need to switch windows to find information. Just by looking at one file, you can access all the information about the project.

Interactive programming experience #

In the field of machine learning and mathematical statistics, Python programming is highly experimental. It often happens that a small piece of code needs to be rewritten 100 times, for example, to try out 100 different methods, while the rest of the code remains unchanged. This is quite different from traditional Python development. In the traditional Python development process, each experiment would require running all the code again, which would consume a lot of time for the developers. This is particularly true in codebases of millions of lines, such as at companies like Facebook. Even with a well-optimized underlying architecture for the entire company, running the codebase again would still take several minutes.

Jupyter Notebook introduces the concept of cells, and each experiment can run only the code in a small cell. Moreover, what you see is what you get; you can immediately see the results below the code. This strong interactivity allows Python researchers to focus on the problems themselves, without being burdened by complex toolchains or having to switch directly in the command line. All scientific research work can be completed on Jupyter.

Zero-cost result reproduction #

Again, in the fields of machine learning and mathematical statistics, Python is highly versatile. A common scenario is that I see in a paper that someone’s method works very well, but when I try to reproduce it, I find that I need to reinstall a bunch of dependencies using pip. This preparation work may consume 80% of your time, but it is not really productive.

How does Jupyter Notebook solve this problem?

In fact, the initial Jupyter Notebook was quite troublesome, requiring you to install the IPython engine and various dependencies on your local machine. However, the current trend in technology is to completely move to the cloud. For example, the official Binder platform of Jupyter (introduction documentation: https://mybinder.readthedocs.io/en/latest/index.html) and Google’s Google Colab environment (introduction: https://colab.research.google.com/notebooks/welcome.ipynb). They make Jupyter Notebook as accessible as online documents in tools like Shimo Documents and Google Docs. You can simply open a link in the browser and run the code in the cloud.

Therefore, now when you open a Jupyter Notebook on GitHub using Binder, you don’t need to install any software. You can directly open the code in the browser and run it in the cloud.

First Experience with Jupyter Notebook #

The best way to learn technology is to use technology. However, in today’s article, it is not possible for me to teach you all the tricks of Jupyter Notebook. I would like to give you a direct experience of using Jupyter Notebook.

For example, let’s consider this GitHub file. In Binder, all you need to do is enter the name or URL of the corresponding GitHub repository, and you can open the entire repository in the cloud. Then, you can choose the notebook you need. This is how the interface looks like.

Each Jupyter execution unit contains an In and Out cell. As shown in the picture, you can use the Run button to run a single cell. Of course, you can also make modifications based on this or create a new notebook and write your own program. Open the link and give it a try!

In addition, I recommend the following Jupyter Notebooks as your first practice:

If you want to install Jupyter Notebook on your local or remote machine, you can refer to the following two documents:

Installation: https://jupyter.org/install.html

Running: https://jupyter.readthedocs.io/en/latest/running.html#running

Summary #

In this lesson, I introduced Jupyter Notebook to you and explained why it has become a must-learn technology in the Python community. This is mainly due to its three major features: integration of all resources, interactive programming experience, and zero-cost result reproduction. However, as the saying goes, learning technology requires hands-on practice. After this lesson, I hope you can try using Jupyter Notebook on your own. In our upcoming lessons, I will also share some code examples in the form of Jupyter Notebook with you.

Thought-provoking Question #

Have you tried Jupyter Notebook? Feel free to share your experience using it in the comments section.