09 Numeric Calculation Precision Loss and Overflow Issues

09 Numeric Calculation Precision Loss and Overflow Issues #

Today, I want to talk to you about the issues of accuracy, rounding, and overflow in numerical calculations.

The reason why I want to share about numerical calculation separately is that many times the calculations we are used to or take for granted may not be the same for calculators or computers. For example, there was a news article recently that said a calculator on a mobile phone calculated 10% + 10% as 0.11 instead of 0.2.

The reason for this problem is that the calculation programs used in foreign countries use a step-by-step calculation method. In this method, a+b% represents a*(1+b%). So when the calculator calculates 10% + 10%, it is actually calculating 10%*(1+10%), resulting in 0.11 instead of 0.2.

In my opinion, the reason why calculators or computers produce counterintuitive calculation results can be attributed to:

To humans, floating-point numbers are just numbers with a decimal point, and 0.1 and 1 are equally precise numbers. However, computers cannot accurately store floating-point numbers, so the results of floating-point calculations cannot be precise either.

To humans, an extremely large number is just a number with a few more digits, and writing a few more “1"s won’t cause our brains to crash. However, computers store values in variables, and different types of numerical variables can hold different value ranges. When a value exceeds the upper limit that the type can express, an overflow problem occurs.

Next, let’s take a closer look at these issues.

The Double of “Danger” #

Let’s start with some simple, counterintuitive arithmetic operations with floating-point numbers:

    System.out.println(0.1+0.2);
    System.out.println(1.0-0.8);
    System.out.println(4.015*100);
    System.out.println(123.3/100);
    
    double amount1 = 2.15;
    double amount2 = 1.10;
    
    if (amount1 - amount2 == 1.05)
        System.out.println("OK");

The output is as follows:

    0.30000000000000004
    0.19999999999999996
    401.49999999999994
    1.2329999999999999

As you can see, the output is quite different from what we expected. For example, instead of 0.3, the sum of 0.1 and 0.2 is 0.30000000000000004. Also, the equality comparison between 2.15-1.10 and 1.05 does not hold.

The main reason for this issue is that computers store numerical values in binary, and floating-point numbers are no exception. Java uses the IEEE 754 standard to represent and perform arithmetic operations on floating-point numbers. You can check the binary representation of a number by following this link.

For example, the binary representation of 0.1 is 0.0 0011 0011 0011… (0011 repeating), which, when converted back to decimal, becomes 0.1000000000000000055511151231257827021181583404541015625. For a computer, 0.1 cannot be represented accurately, and this is the source of precision loss in floating-point calculations.

You might argue that in the case of 0.1, the difference between its decimal and binary representations is very small and should not have any impact on calculations. However, as they say, small leaks can sink great ships. If you use doubles extensively for financial calculations, even small precision losses can result in significant financial discrepancies. For example, if there are one million transactions per day, each differing by one cent, you would end up with a $300,000 discrepancy in a month. This is not a trivial matter. So, how can we solve this problem?

Most of us have heard of the BigDecimal type, which is used to represent and perform precise arithmetic operations with floating-point numbers. However, there are a few pitfalls to avoid when using BigDecimal. Let’s modify the previous arithmetic operations using BigDecimal:

    System.out.println(new BigDecimal(0.1).add(new BigDecimal(0.2)));
    System.out.println(new BigDecimal(1.0).subtract(new BigDecimal(0.8)));
    System.out.println(new BigDecimal(4.015).multiply(new BigDecimal(100)));
    System.out.println(new BigDecimal(123.3).divide(new BigDecimal(100)));

The output is as follows:

    0.3000000000000000166533453693773481063544750213623046875
    0.1999999999999999555910790149937383830547332763671875
    401.49999999999996802557689079549163579940795898437500
    1.232999999999999971578290569595992565155029296875

As you can see, the results are still not accurate; they just have a higher precision. Here’s the first principle to avoid pitfalls in floating-point arithmetic: use BigDecimal to represent and perform calculations with floating-point numbers, and always initialize BigDecimal using the constructor that takes a String argument:

    System.out.println(new BigDecimal("0.1").add(new BigDecimal("0.2")));
    System.out.println(new BigDecimal("1.0").subtract(new BigDecimal("0.8")));
    System.out.println(new BigDecimal("4.015").multiply(new BigDecimal("100")));
    System.out.println(new BigDecimal("123.3").divide(new BigDecimal("100")));

With this improvement, we get the desired output:

    0.3
    0.2
    401.500
    1.233

At this point, you might wonder how to convert a double to a precise representation of BigDecimal when you only have a double at hand and cannot call the BigDecimal constructor that takes a double argument.

Let’s try using Double.toString to convert the double to a string and see if it works:

    System.out.println(new BigDecimal("4.015").multiply(new BigDecimal(Double.toString(100))));

The output is 401.5000. Why does it have an extra 0 compared to the result obtained by multiplying 100 and 4.015 using the string initialization method? This is because BigDecimal has the concepts of scale and precision. Scale represents the number of digits to the right of the decimal point, while precision represents the number of significant digits.

By debugging, we can notice that the BigDecimal obtained from new BigDecimal(Double.toString(100)) has scale 1 and precision 4, while the BigDecimal obtained from new BigDecimal(“100”) has scale 0 and precision 3. For multiplication operations with BigDecimal, the scale of the result is the sum of the scales of the two numbers involved. Therefore, the different ways of initializing 100 result in scales of 4 and 3 respectively in the final result:

    private static void testScale() {
    
        BigDecimal bigDecimal1 = new BigDecimal("100");
        BigDecimal bigDecimal2 = new BigDecimal(String.valueOf(100d));
        BigDecimal bigDecimal3 = new BigDecimal(String.valueOf(100));
        BigDecimal bigDecimal4 = BigDecimal.valueOf(100d);
        BigDecimal bigDecimal5 = new BigDecimal(Double.toString(100));
    
        print(bigDecimal1); //scale 0 precision 3 result 401.500
        print(bigDecimal2); //scale 1 precision 4 result 401.5000
        print(bigDecimal3); //scale 0 precision 3 result 401.500
        print(bigDecimal4); //scale 1 precision 4 result 401.5000
        print(bigDecimal5); //scale 1 precision 4 result 401.5000
    }
    
    private static void print(BigDecimal bigDecimal) {
    
        log.info("scale {} precision {} result {}", bigDecimal.scale(), bigDecimal.precision(), bigDecimal.multiply(new BigDecimal("4.015")));
    }

The BigDecimal’s toString method produces a string representation that depends on the scale, leading to another issue: when it comes to representing and formatting floating-point numbers as strings, explicit consideration of the number of decimal places and rounding mode should be made. Next, we will discuss rounding and formatting of floating-point numbers.

Considerations for Floating Point Rounding and Formatting #

In addition to the precision issues that may arise when using Double to store floating point numbers, it is even more confusing when combined with the rounding method used by String.format, resulting in unexpected results.

Let’s take an example. First, initialize two floating point numbers, 3.35, using double and float, and then use String.format with the format %.1f to format these two numbers:

double num1 = 3.35;
float num2 = 3.35f;

System.out.println(String.format("%.1f", num1)); // rounding
System.out.println(String.format("%.1f", num2));

The result obtained is 3.4 and 3.3.

This is caused by the combination of precision issues and rounding methods. In fact, double and float, 3.35, is equivalent to 3.350xxx and 3.349xxx:

3.350000000000000088817841970012523233890533447265625
3.349999904632568359375

String.format uses rounding to the nearest number with one decimal place. The double value 3.350 is rounded to 3.4, while the float value 3.349 is rounded to 3.3.

Let’s take a look at the relevant source code of the Formatter class and find that the rounding mode used is HALF_UP (line 11 in the code):

else if (c == Conversion.DECIMAL_FLOAT) {

    // Create a new BigDecimal with the desired precision.
    int prec = (precision == -1 ? 6 : precision);
    int scale = value.scale();

    if (scale > prec) {
        // more "scale" digits than the requested "precision"
        int compPrec = value.precision();
        if (compPrec <= scale) {
            // case of 0.xxxxxx
            value = value.setScale(prec, RoundingMode.HALF_UP);
        } else {
            compPrec -= (scale - prec);
            value = new BigDecimal(value.unscaledValue(),
                                   scale,
                                   new MathContext(compPrec));
        }
    }
}

If we want to use other rounding methods to format the string, we can set the DecimalFormat, as shown in the following code:

double num1 = 3.35;
float num2 = 3.35f;
DecimalFormat format = new DecimalFormat("#.##");

format.setRoundingMode(RoundingMode.DOWN);
System.out.println(format.format(num1));
format.setRoundingMode(RoundingMode.DOWN);
System.out.println(format.format(num2));

When we round these two floating point numbers down to two decimal places, we get the output 3.35 and 3.34, which still suffer from the problem of storing floating point numbers precisely.

Therefore, even if we use DecimalFormat to accurately control the rounding method, problems with double and float can still lead to unexpected results. Thus, the second principle to avoid pitfalls with floating point numbers is to format the string representation of floating point numbers using BigDecimal.

For example, consider the following code snippet, which uses BigDecimal to format the number 3.35 and format it to 1 decimal place using both rounding down and rounding to the nearest value:

BigDecimal num1 = new BigDecimal("3.35");
BigDecimal num2 = num1.setScale(1, BigDecimal.ROUND_DOWN);

System.out.println(num2);
BigDecimal num3 = num1.setScale(1, BigDecimal.ROUND_HALF_UP);
System.out.println(num3);

This time, the result obtained is 3.3 and 3.4, which is as expected.

Is it always correct to use “equals” for equality comparison? #

Now that we know we should use BigDecimal to represent, calculate, and format floating-point numbers. In the previous lecture on equality comparisons, I mentioned a principle: comparison of wrapper classes should be done using equals and not ==. So, does using the equals method to compare two BigDecimal objects always give us the desired result?

Let’s look at the following example. We use the equals method to compare the two BigDecimal objects: 1.0 and 1:

System.out.println(new BigDecimal("1.0").equals(new BigDecimal("1")));

You might already have an idea of what I’m going to say next, and the result is indeed false. The reason is explained in the comments of the equals method in the BigDecimal class. equals compares the value and scale of the BigDecimal objects. The scale of 1.0 is 1, whereas the scale of 1 is 0, so the result will always be false:

/**
 * Compares this {@code BigDecimal} with the specified
 * {@code Object} for equality.  Unlike {@link
 * #compareTo(BigDecimal) compareTo}, this method considers two
 * {@code BigDecimal} objects equal only if they are equal in
 * value and scale (thus 2.0 is not equal to 2.00 when compared by
 * this method).
 *
 * @param  x {@code Object} to which this {@code BigDecimal} is
 *         to be compared.
 * @return {@code true} if and only if the specified {@code Object} is a
 *         {@code BigDecimal} whose value and scale are equal to this
 *         {@code BigDecimal}'s.
 * @see    #compareTo(java.math.BigDecimal)
 * @see    #hashCode
 */
@Override
public boolean equals(Object x)

If we want to compare only the value of the BigDecimal, we can use the compareTo method. So, the modified code will be:

System.out.println(new BigDecimal("1.0").compareTo(new BigDecimal("1")) == 0);

After learning about the previous lecture, you might realize that the equals and hashCode methods of BigDecimal consider both value and scale, which can cause problems when used with HashSet or HashMap. For example, if we add a BigDecimal with a value of 1.0 to a HashSet and then check if it contains a BigDecimal with a value of 1, the result will be false:

Set<BigDecimal> hashSet1 = new HashSet<>();
hashSet1.add(new BigDecimal("1.0"));
System.out.println(hashSet1.contains(new BigDecimal("1"))); // returns false

There are two ways to solve this problem:

The first method is to use TreeSet instead of HashSet. TreeSet does not use the hashCode method or the equals comparison for elements. Instead, it uses the compareTo method, so there won’t be any problems.

Set<BigDecimal> treeSet = new TreeSet<>();
treeSet.add(new BigDecimal("1.0"));
System.out.println(treeSet.contains(new BigDecimal("1"))); // returns true

The second method is to use the stripTrailingZeros method to remove trailing zeros from the BigDecimal before storing it in the HashSet or HashMap, and also remove trailing zeros when comparing. This ensures that BigDecimal objects with the same value will also have the same scale:

Set<BigDecimal> hashSet2 = new HashSet<>();
hashSet2.add(new BigDecimal("1.0").stripTrailingZeros());
System.out.println(hashSet2.contains(new BigDecimal("1.000").stripTrailingZeros())); // returns true

Be cautious of numeric overflow issues #

There is another point to be careful about in numerical computations, which is overflow. Regardless of whether it’s an int or a long, all basic numeric types have the possibility of exceeding their range of representation.

For example, performing the +1 operation on Long’s maximum value:

long l = Long.MAX_VALUE;
System.out.println(l + 1);
System.out.println(l + 1 == Long.MIN_VALUE);

The output is a negative number because Long’s maximum value +1 becomes Long’s minimum value:

-9223372036854775808
true

Clearly, an overflow has occurred, and it has silently happened without raising any exceptions. This type of problem is very easy to overlook, and there are two ways to address it.

The first method is to consider using the xxExact methods such as addExact and subtractExact from the Math class for numerical operations. These methods can throw exceptions actively when numeric overflow occurs. Let’s test it out by using Math.addExact to perform the +1 operation on Long’s maximum value:

try {
    long l = Long.MAX_VALUE;
    System.out.println(Math.addExact(l, 1));
} catch (Exception ex) {
    ex.printStackTrace();
}

After execution, an ArithmeticException can be obtained, which is a RuntimeException:

java.lang.ArithmeticException: long overflow
  at java.lang.Math.addExact(Math.java:809)
  at org.geekbang.time.commonmistakes.numeralcalculations.demo3.CommonMistakesApplication.right2(CommonMistakesApplication.java:25)
  at org.geekbang.time.commonmistakes.numeralcalculations.demo3.CommonMistakesApplication.main(CommonMistakesApplication.java:13)

The second method is to use the BigInteger class for handling large numbers. While BigDecimal is an expert in handling floating-point numbers, BigInteger is an expert in scientific calculations involving large numbers.

In the following code, BigInteger is used to perform the +1 operation on Long’s maximum value. If you wish to convert the result to a Long variable, you can use the longValueExact method of BigInteger. It will also throw an ArithmeticException when an overflow occurs during the conversion:

BigInteger i = new BigInteger(String.valueOf(Long.MAX_VALUE));
System.out.println(i.add(BigInteger.ONE).toString());
try {
    long l = i.add(BigInteger.ONE).longValueExact();
} catch (Exception ex) {
    ex.printStackTrace();
}

The output is as follows:

9223372036854775808
java.lang.ArithmeticException: BigInteger out of long range
  at java.math.BigInteger.longValueExact(BigInteger.java:4632)
  at org.geekbang.time.commonmistakes.numeralcalculations.demo3.CommonMistakesApplication.right1(CommonMistakesApplication.java:37)
  at org.geekbang.time.commonmistakes.numeralcalculations.demo3.CommonMistakesApplication.main(CommonMistakesApplication.java:11)

As you can see, there is no problem in adding 1 to Long’s maximum value using BigInteger. However, when attempting to convert the result to a Long type, it prompts BigInteger out of long range.

Key Takeaways #

Today, I shared with you some pitfalls relating to the representation, calculation, rounding, formatting, and overflow of floating-point numbers.

First and foremost, it is important to remember that if you want to accurately represent floating-point numbers, you should use BigDecimal. Additionally, the constructor of BigDecimal that takes a Double parameter also suffers from precision loss issues, so it is recommended to use the constructor that takes a String parameter or the BigDecimal.valueOf method for initialization.

Secondly, when performing precise calculations with floating-point numbers, all the values involved in the calculations should always be represented as BigDecimal, and all calculations should be done using the methods provided by BigDecimal. Avoid using BigDecimal merely as a placeholder. If any precision loss occurs at any step, the final calculation result may be inaccurate.

Thirdly, when formatting floating-point numbers, if you use String.format, please be aware that it uses rounding. You may consider using DecimalFormat to explicitly specify the rounding mode. However, considering the precision issue, I recommend using BigDecimal to represent floating-point numbers and using its setScale method to specify the number of digits and rounding mode.

Lastly, be cautious of overflow issues when performing numerical operations. Although no exception will be thrown when an overflow occurs, the resulting calculation will be completely wrong. Consider using the Math.xxxExact methods for calculations, as they will throw an exception when an overflow occurs. It is also highly recommended to use the BigInteger class for large number calculations where an overflow may occur.

In summary, for scenarios involving finance, scientific calculations, and so on, it is recommended to use BigDecimal and BigInteger as much as possible to avoid bugs that may be difficult to detect but have significant consequences, caused by precision and overflow issues.

I have uploaded the code used today on GitHub. You can click on this link to view it.

Reflection and Discussion #

BigDecimal provides 8 rounding modes. Could you explain their differences through some examples?

Do you know how to define floating-point numbers and integers in databases (such as MySQL)? And how to achieve accurate calculations for floating-point numbers?

Have you encountered any pitfalls in numerical computation? I’m Zhu Ye. Feel free to leave a comment in the comment section to share your thoughts. You are also welcome to share this article with your friends or colleagues for further discussion.