04 Jvm Basic Knowledge Without Accumulation, There's No Reaching a Thousand Miles

04 JVM Basic Knowledge- Without Accumulation, There’s No Reaching a Thousand Miles #

In the previous chapters, we introduced the relationship between JDK and JVM, as well as environment preparation. In this section, we will discuss some basic knowledge about JVM, including the following topics:

  • Common programming language types
  • Cross-platform, runtime, and virtual machine
  • Memory management and garbage collection (GC)

3.1 Common Programming Language Types #

We all know that Java is a statically-typed compiled language based on a virtual machine. So how can we classify common programming languages?

1) Classification of programming languages #

First of all, we can divide the various programming languages from the bottom up into three main categories: machine language, assembly language, and high-level language.

66340662.png

According to the article “Development and Application of Computer Programming Languages”: Computer programming languages enable communication and interaction between humans and machines. Computer programming languages mainly include assembly language, machine language, and high-level language, which are described as follows:

  • Machine language: This language mainly uses binary encoding to send instructions. It can be recognized quickly by computers and is relatively flexible. Machine language is similar to assembly language, but due to its limitations, it has certain constraints in usage.
  • Assembly language: This language is mainly written using abbreviated English symbols. Programs written in assembly language are generally concise and convenient for execution. However, assembly language programs are usually lengthy, making them prone to errors.
  • High-level language: High-level language refers to the combination of various programming languages. It can integrate multiple instructions into a single instruction, simplifying the operation details, intermediate processes, and so on. Therefore, the entire program becomes more convenient to write and has strong operability. The simplification of this encoding method allows for a broader range of professional proficiency requirements for computer programming.

In short, machine language consists of binary instructions directly executed by the machine, and each CPU platform has its own machine language.

Assembly language represents instructions for the machine to execute, using mnemonics that can be understood by humans. This makes the code very long, but it also results in good performance.

High-level language is designed for human understanding, allowing for rapid design and implementation of program code. Generally, there is no correlation with machine language or assembly language instructions. After writing the code, it is converted to assembly code or machine code through compilation or interpretation. It is then passed to the computer for execution.

Machine language and assembly language are directly related to the CPU architecture of the target machine. On the other hand, high-level languages usually have no direct relationship. The advantage of high-level language lies in its ability to be applicable to different target machine CPU architectures. Whether it is x86 or other CPUs, although the instruction sets supported by different CPUs may differ slightly, they all become target code of the actual platform after the compilation or interpretation process. As a result, the developers of the code often do not need to be concerned about the differences in the target platform. This is very important because in the development of modern computer software systems, developers, testers, and deployment and operations personnel are usually not the same group of people, especially with the rapid development of public clouds. We may not even be aware of the physical architecture of our software system running in containers.

2) Classification of high-level languages #

If we classify high-level programming languages based on whether they have a virtual machine or not, they can be divided into two categories:

  • With a virtual machine: Java, Lua, Ruby, some implementations of JavaScript, etc.
  • Without a virtual machine: C, C++, C#, Golang, and most common programming languages

It’s strange that C# and Golang have garbage collection (GC) and runtime, but they don’t have a virtual machine (VM). Why is it designed this way? We will discuss this in detail later.

If we classify high-level programming languages based on whether their variables have determined types or can change freely, they can be divided into:

  • Static typing: Java, C, C++, etc.
  • Dynamic typing: Scripting languages

If we classify high-level programming languages based on whether they are compiled or interpreted, they can be divided into:

  • Compiled: C, C++, Golang, Rust, C#, Java, Scala, Clojure, Kotlin, Swift, etc.
  • Interpreted: Some implementations of JavaScript and NodeJS, Python, Perl, Ruby, etc.

Although JavaScript is generally considered an interpreted language, many implementation engines now support compilation, such as Google V8 and Oracle Nashorn.

In addition, we can also classify them based on their language features:

  • Procedural programming: C, Basic, Pascal, Fortran, etc.
  • Object-oriented programming: C++, Java, Ruby, Smalltalk, etc.
  • Functional programming: LISP, Haskell, Erlang, OCaml, Clojure, F#, etc.

Some of them can even be classified as pure object-oriented languages, such as Ruby, where everything is an object (In Java, not everything is an object, such as primitive types like int, long, etc., but their wrapper classes Integer, Long are objects). There are also languages that can be used as both compiled languages and scripting languages, such as Groovy.

3.2 Cross-platform #

Now let’s talk about cross-platform development. The reason for cross-platform development is that we want our code and programs to be able to run on multiple different system platforms at the source code level or after compilation, without having to implement two sets of code for different platforms. For example, if we write a web application, we naturally want to deploy it on Windows, Linux, and even macOS.

This is the ability of cross-platform development, which greatly saves development and maintenance costs and has received unanimous praise in the business market.

In general, scripting languages are cross-platform because the same script can be interpreted and executed by interpreters on different platforms. However, for compiled languages, there are two levels of cross-platform: source code level and binary level.

  1. Typical source code level cross-platform (C++): 71212109.png

  2. Typical binary level cross-platform (Java bytecode): 71237637.png

As can be seen, in C++, we need to compile the source code on different platforms to generate platform-specific binary executable files before they can be run on the corresponding platforms. This requires development tools and compilers for each platform, and the development libraries required by each platform need to be consistent or compatible. This was very painful in the past and was jokingly called “dependency hell”.

The slogan of C++ is “Write once, compile everywhere”, but in reality, it often becomes “Write once, debug everywhere, search for dependencies and modify configurations everywhere”. You can imagine the frustration of compiling a piece of code and finding that there are dozens of dependencies missing, or finding them but they are not compatible with the local version.

Java, on the other hand, solved this problem first through virtual machine technology. The source code only needs to be compiled once, and then the compiled class files or jar packages can be deployed to different platforms and executed directly on the JVM installed in these systems. Dependencies (jar files) can be copied to the target machine. Gradually, there has emerged a Maven Central Repository in which libraries that can be used directly on various platforms are available (similar to Linux’s yum or apt-get sources, macOS’s homebrew, and various modern programming languages usually have this kind of package dependency management mechanism: pip for Python, nuget for dotnet, npm for Node.js, dep for Golang, cargo for Rust, etc.). This allows the same application to run directly on different platforms.

In summary, cross-platform development is as follows:

  • Scripting languages can be executed by interpreters on different platforms, which is called script-level cross-platform development. The differences between platforms are resolved by different interpreters. This makes the code very portable, but it requires interpretation and translation, resulting in lower efficiency.
  • In cross-platform development of compiled languages, the same code needs to be compiled into corresponding binary files by compilers on different platforms before distribution and execution. The differences between platforms are resolved by the compilers. The compiled files are executable instructions specific to the platforms, so the execution efficiency is high. However, when compiling complex software on different platforms, there may be many environmental issues related to dependencies and configurations, resulting in higher development and maintenance costs.
  • In cross-platform development of compiled languages at the binary level, the same code is first compiled into a universal binary file, which is then distributed to different platforms for execution by the runtime. This combines the advantages of the other two cross-platform language types, allowing convenient and quick execution on various platforms, although the execution efficiency may be slightly lower than that of natively compiled languages. These advantages and disadvantages are also the advantages and disadvantages of the Java Virtual Machine.

Time and manpower are the most valuable resources for modern commercial applications, and machines are relatively less valuable in most cases.

3.3 About Runtime and Virtual Machine #

We have mentioned Java Runtime and JVM many times before. In simple terms, JRE refers to the Java runtime, which includes the virtual machine and related libraries and resources.

The runtime provides the basic environment for program execution. When the JVM starts, it needs to load all the core libraries and resources of the runtime, and then load our application bytecode so that it can run in the JVM container.

However, there are also some languages that do not have a virtual machine. Instead, they statically package or dynamically link the required core libraries and other feature support into the program during compilation and packaging, such as Golang, Rust, and C#.

In this way, the runtime is combined with the program instructions to form a complete application. The advantage is that there is no need for a virtual machine environment, but the disadvantage is that the compiled binary file cannot be directly cross-platform.

3.4 About Memory Management and Garbage Collection (GC) #

Since the birth of programming languages, memory management has always been a very important topic. Because memory resources are always limited and precious, if they are not released after being occupied, they will quickly be used up. When a program cannot obtain available memory, it will crash (think of the wild pointers that often appear in C++).

Memory management is the management of the lifecycle of memory, including operations such as memory allocation, compression, and recycling. In Java, memory management is done by the garbage collector (GC). The GC module of the JVM not only manages memory recycling, but also is responsible for memory allocation and compaction.

As we know from the previous content, Java program instructions run on the JVM, and our program code does not need to allocate and release memory (such as the malloc/free required in C/C++), so these operations are naturally taken care of by the JVM.

For languages like Golang and Rust, garbage collection also exists, but they do not have a virtual machine, so how is it implemented?

The key lies in the runtime. When compiling and packaging, the module for memory usage analysis can be packaged together with the application program. During runtime, there is a dedicated thread to analyze the memory usage and determine when to perform garbage collection and reclaim memory that is no longer in use. This way, even without a virtual machine, garbage collection can be achieved.

Rust language goes a step further and directly restricts the lifetime of all variables at the language specification level. If it exceeds a clear range, it becomes unavailable. In this way, during the compilation phase, it is known when to allocate memory for each object and when to destroy and reclaim memory, achieving precise and safe memory management.

  • C/C++ completely trusts and indulges programmers, allowing them to manage memory on their own, so they can write very flexible code, but a careless mistake can cause problems such as memory leaks, leading to program crashes.
  • Java/Golang completely distrusts programmers, but also indulges them. All memory lifecycle is uniformly managed by the JVM runtime. In most scenarios, you can write code very freely without worrying about the state of memory. When there are problems with memory usage, we can use the JVM to analyze, diagnose, and adjust related information. This is also the goal of this course.
  • Rust language chooses to neither trust nor indulge programmers. It requires you to manage your variables well according to Rust’s rules when writing code, so that the machine can analyze and manage memory efficiently. But this makes the code not conducive to human understanding, writing code is not free, and the learning cost is also high.

Finally, let’s end with the evaluation of these languages by a friend on Zhihu named “Zuozhi Le”:

First of all, Rust is a bit anti-human, otherwise it wouldn’t have been unpopular for so long. Then, the reason why Rust is anti-human is because humans are both foolish and arrogant, and there are too many damn things. Look at C++, it really trusts humans, it requires humans to delete what they new. C++: “I believe you can do this little thing!” Humans: “No problem! I got it!” Then, memory leaks, double free, wild pointers floating all over the world… C++: “…”

Java chooses not to trust humans, but to do things for humans. Java: “Don’t move, let me do it, I have a GC!” Humans: “Why are you doing things so slowly? Why did you stop the world? Are you not loving me anymore?” Java: “…”

Rust realizes that the only way is to neither trust humans nor indulge humans. Rust: “Do as I say, if you don’t, it won’t compile!” Humans: “You’re anti-human!” Rust: “Get lost!”

Reference materials #

  1. Development and application of computer programming languages: http://g.wanfangdata.com.cn/details/detail.do?_type=perio&id=dnbcjqywh201904012
  2. JavaScript engines: https://hllvm-group.iteye.com/group/topic/37596
  3. Are GC and virtual machines concepts that must always be mentioned together?: https://www.zhihu.com/question/45910460/answer/100056649
  4. Is Rust language anti-human?: https://www.zhihu.com/question/328066906/answer/708085473