07 What's the Difference Between Int and Integer

在Java中，int是一种原始数据类型，而Integer是int的包装类。下面是它们之间的区别：

数据类型： int是一种基本数据类型，用于表示整数值。它在内存中占据固定的大小（通常是32位）。而Integer是int的包装类，它提供了一些额外的功能，如将int值转换为对象，并在对象之间执行一些操作。
null值和默认值： int不能为null，因为它是原始数据类型。而Integer可以为null，因为它是一个对象。当我们没有对Integer对象进行初始化时，默认值为null。另外，未被初始化的int变量将具有默认值0。
计算机表达能力： 直接使用int类型可以更高效地执行数学运算，因为不需要进行装箱（将原始数据类型转换为包装类对象）和拆箱（将包装类对象转换为原始数据类型）。使用Integer对象时，需要进行装箱和拆箱操作，这可能会导致一些性能损失。

关于Integer的值缓存范围，Java将-128到127之间的Integer对象缓存起来，以便重用。这意味着如果我们使用的是在该范围内的整数，它们将返回相同的对象引用。但是，如果我们使用的是超出该范围的整数，它们将创建新的Integer对象实例。这是为了节省内存和提高性能。

07 What’s the difference between int and Integer #

int is a commonly used integer type in Java, which is one of the 8 primitive data types (boolean, byte, short, char, int, float, double, long) in Java. Although Java claims that everything is an object, primitive data types are an exception.

Integer is the wrapper class for int. It has a field of int type to store data and provides basic operations such as mathematical operations and conversions between int and String. In Java 5, autoboxing and unboxing were introduced, which allows Java to automatically convert based on the context, greatly simplifying related programming.

Regarding the value caching of Integer, this is another improvement introduced in Java 5. The traditional way to create an Integer object is to call the constructor and directly create a new object. However, based on practice, it was found that most data operations are concentrated within a limited range of small values. Therefore, in Java 5, a static factory method called valueOf was added. When calling this method, a caching mechanism is used, which brings significant performance improvements. According to the Javadoc, the default cache value is between -128 and 127.

Analysis of the Exam Points #

Today’s question covers two fundamental elements in Java: primitive data types and wrapper classes. When discussing this, it is natural to extend the topic to cover autoboxing and unboxing mechanisms, and further examine the design and implementation of wrapper classes. To be honest, understanding the basic principles and usage is already sufficient for daily work requirements, but when it comes to specific scenarios, there are still many issues that need careful consideration to determine the best approach.

The interviewer can combine other aspects to assess the candidate’s level of understanding and logical thinking, such as:

In the first lesson of the column, I mentioned the different stages of Java usage: compilation phase and runtime. At which stage does autoboxing / unboxing occur?
I mentioned earlier that using the static factory method valueOf utilizes caching mechanism. Does the caching mechanism work when autoboxing is performed?
Why do we need primitive data types? Java objects also seem to be efficient. What are the specific differences that may arise in an application?
Have you read the source code of the Integer class? Analyze the design points of the class or certain methods.

It seems that there are too many topics to discuss. Let’s analyze together.

Knowledge Expansion #

Understanding autoboxing and unboxing

Autoboxing is actually a kind of syntactic sugar. What is syntactic sugar? It can be simply understood as Java platform automatically performs some conversions to ensure that different expressions are equivalent at runtime. These conversions occur during the compilation phase, meaning that the generated bytecode is the same.

As mentioned earlier, for integers, javac automatically converts boxing into Integer.valueOf() and unboxing into Integer.intValue(). This seemingly also answers another question. Since Integer.valueOf is called, it naturally benefits from caching.

How can we programmatically verify the above conclusion?

You can write a simple program that includes the following two lines of code and then decompile it. Of course, this is a method of inferring from behavior. In most cases, it is more reliable to directly refer to the specification document, because software promises to follow the specifications, not maintain current behavior.

Integer integer = 1;
int unboxing = integer ++;

Decompilation output:

1: invokestatic  #2 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
8: invokevirtual #3 // Method java/lang/Integer.intValue:()I

This caching mechanism is not unique to Integer. It also exists in other wrapper classes, such as:

Boolean, which caches the instances corresponding to true/false, to be precise, it only returns two constant instances Boolean.TRUE/FALSE.
Short, which also caches values between -128 and 127.
Byte, since the values are limited, all of them are cached.
Character, which caches the range from ‘\u0000’ to ‘\u007F’.

Autoboxing/unboxing seems cool. Is there anything to be aware of in programming practice?

In principle, it is recommended to avoid unintentional boxing/unboxing behavior, especially in performance-sensitive situations. The overhead of creating 100,000 Java objects or 100,000 integers is not of the same order of magnitude. Whether it is memory usage or processing speed, the size of the object header alone has already made a difference of an order of magnitude.

In fact, we can extend this point of view to use primitive data types, arrays, and even native code implementations, which often have significant advantages in performance-sensitive scenarios. Replacing wrapper classes and dynamic arrays (such as ArrayList) with these can be considered as a performance optimization option. Some products or libraries that pursue extreme performance will try their best to avoid creating too many objects. Of course, in most product code, there is no need to do this, and development efficiency is the top priority. Taking the counter implementation that we often use as an example, the following is a common thread-safe counter implementation.

class Counter {
    private final AtomicLong counter = new AtomicLong();  
    public void increase() {
        counter.incrementAndGet();
    }
}

If a primitive data type is used, it can be modified to:

class CompactCounter {
    private volatile long counter;
    private static final AtomicLongFieldUpdater<CompactCounter> updater = AtomicLongFieldUpdater.newUpdater(CompactCounter.class, "counter");
    public void increase() {
        updater.incrementAndGet(this);
    }
}

Source code analysis

Examining whether you have read and understood the JDK source code may be a focus of some interviewers. This is not entirely an demanding requirement. Reading and practicing high-quality code is also a necessary path for the growth of programmers. Now, let’s analyze the source code of Integer.

Looking at the responsibility of Integer as a whole, it mainly includes various basic constants, such as maximum value, minimum value, number of bits, etc.; the mentioned static factory methods valueOf(); methods for obtaining numerical values from environment variables; various conversion methods, such as converting to strings in different number bases, such as octal, or reverse parsing methods, etc. Let’s take a look at some interesting parts. First of all, let’s continue to dig into the cache. Although the cache range of Integer is default from -128 to 127, what should we do in special scenarios where we know that larger values will be frequently used?

The upper limit of the cache can actually be adjusted according to our needs, and JVM provides the parameter setting:

-XX:AutoBoxCacheMax=N

These implementations are reflected in the source code of java.lang.Integer, and are implemented in the static initialization block of IntegerCache.

private static class IntegerCache {
    static final int low = -128;
    static final int high;
    static final Integer cache[];
    static {
        // high value may be configured by property
        int h = 127;
        String integerCacheHighPropValue = VM.getSavedProperty("java.lang.Integer.IntegerCache.high");
        ...
        // range [-128, 127] must be interned (JLS7 5.1.7)
        assert IntegerCache.high >= 127;
    }
    ...
}

Secondly, when we analyzed the design and implementation of String, we mentioned that strings are immutable, which ensures basic information security and thread safety in concurrent programming. If you look at the member variable “value” storing the numerical value in wrapper classes such as Integer and Boolean, you will see that they are declared as “private final”, so they are also immutable types!

This design choice is understandable, or rather necessary. Imagine this scenario: suppose Integer provides a getInteger() method to conveniently read system properties, and we can use a property to set the port of a server service. If I can easily change the obtained Integer object to another value, it would bring severe issues regarding product reliability.

Thirdly, wrapper classes such as Integer define constants like SIZE or BYTES. What does this reflect in terms of their design considerations? If you have used other languages such as C or C++, you might know that the number of bits of integer-like types is actually uncertain and can vary greatly depending on the platform, such as 32-bit or 64-bit platforms. So, do the number of data bits differ in 32-bit JDK and 64-bit JDK? Or in other words, if I develop and compile a program using 32-bit JDK, what special porting work do I need when running it on 64-bit JDK?

In fact, this kind of porting is relatively simpler for Java because there are no differences in primitive data types, which are clearly defined in the Java Language Specification. Developers do not need to worry about the differences in the number of data bits in 32-bit or 64-bit environments.

Regarding application porting, although there are some differences in the underlying implementations, such as objects in 64-bit HotSpot JVM being larger than those in 32-bit HotSpot JVM (the specific difference depends on the choice of different JVM implementations), overall there are no behavioral differences. It can still be said that “write once, run anywhere.” Application developers need to consider differences in capacity, capabilities, and other aspects.

Thread safety of primitive types

Earlier, I mentioned thread safety design. Have you ever wondered whether operations on primitive data types are thread-safe?

There may be different levels of problems here:

Thread safety of primitive data type variables obviously requires the use of concurrency-related techniques to ensure thread safety. I will discuss this in more detail in the concurrency topics later in this column. If there is a need for thread-safe calculations, it is recommended to consider using thread-safe classes like AtomicInteger and AtomicLong.
Particularly, for some wider data types like float, double, it cannot even guarantee the atomicity of update operations. It may result in a program reading a value with only half of its data bits updated!

Limitations of Java primitive data types and reference types

I have discussed a lot of technical details so far. Finally, let’s take a look at the limitations and evolution of primitive data types and objects from the perspective of the Java platform development.

For Java application developers, designing a complex and flexible type system seems to have become a common practice. However, to be honest, this type system design is based on technical decisions made many years ago, and now it has gradually revealed some side effects, such as:

Primitive data types cannot be used in combination with Java generics.

This is because Java generics can be considered as quasi-generics to some extent. It is purely a compilation technique, and the Java compiler will automatically convert types to corresponding specific types. This means that when using generics, the corresponding type must be convertible to Object.

It is not efficient to express data, and it is not convenient to express complex data structures such as vectors and tuples.

We know that Java’s objects are all reference types. If it is an array of primitive data types, it stores a continuous block of memory, while object arrays don’t. The data is stored as references, and objects are often stored in different locations in the heap. This design brings great flexibility, but it also leads to inefficient data operations, especially the inability to fully utilize modern CPU cache mechanisms.

Java has provided various support for object-oriented polymorphism, thread safety, and other aspects, but these are not necessary for every scenario. Especially now that the importance of data processing is increasing, it is very practical to have higher density value types.

In order to enhance these aspects, active development is currently taking place in the OpenJDK community. If you are interested, you can follow the related project: http://openjdk.java.net/projects/valhalla/.

Today, I have summarized the primitive data types and their wrapper classes, analyzed the design and implementation details such as cache mechanisms from the source code level, and analyzed some practices that can be learned for building extreme performance.

Practice Exercise #

Have you grasped the topic we discussed today? I have a thinking question for you. It was mentioned earlier that from a spatial perspective, Java objects consume much more memory compared to primitive data types. Do you know what the memory structure of an object looks like? For example, the structure of the object header. How can you calculate or obtain the size of a specific Java object?

Please write down your thoughts on this question in the comment section. I will select the comments that show careful thinking and reward you with a study encouragement prize. Feel free to discuss with me.

Is your friend also preparing for an interview? You can “invite a friend to read” and share today’s topic with them. Maybe you can help them out.