Floating Point

Objectives

We would like to :

Gain a basic understanding of the floating point type.
Discuss floating point operations.
Understand common problems with floating point.
Discuss the Math library.

Notes

I like making long notes, but short videos.
- So I will probably turn this into 4-5 videos.
Sources
- Again, we will use section 2.5 of the book.
- And the Oracle documentation.
Floating Point type.
- reference.
  - This is quite technical.
- Floating point numbers are numbers with
  - A decimal point (3.14)
  - Or in scientific notation
    - Very small: 2.9834 x 10 ^-12
    - Very large: 9.856 x 10³⁴
  - There are two floating point types in java
    - float, a 32 bit number.
    - double, a 64 bit number.
  - These both conform to IEEE 754 (Institute of Electrical and Electronic Engineers) standard number 754.
    - This is an old technical standard, but works well.
    - And we will NOT discuss it here.
- Literals
  - All literals are double unless they end with a f
  - 0.1f, 1.0e-1f or 1.0E-1f are all acceptable.
  - So are 3e23f
  - 3f is acceptable as well.
  - I can place a d at the end to form a double.
  - There are more rules for literals, but we will skip them.
- Printing
  - Remember, %3.2f prints two decimal points.
  - And will rounded up.
  - %E and %e are used for scientific notation.
Floating point operations.
- +,-,*,/ as you would expect.
  - Don't divide by 0.
  - Produces Infinity.
- % exists but I would not use it.
- ++ and -- as with integers.
- +=, -=, ...
The Math Library
- There is a mathematics library for java.
- Reference
- Some constant definitions (Math.PI, Math.E)
- MANY functions.
Problems :
- We have seen 1/0 produces infinity
  - There is also negative infinity
  - And 0.0f/0.0f produce nan
- We can't represent all numbers between any two floats.
- Look at Math.pow(2.0, 63.0);
  - ```
  double y;
  y = Math.pow(2.0,63.0);
  System.out.printf("%.0f\n",y);
               
```
- Real: 9223372036854775808
- Comp: 9223372036854776000
- Why
  - This is the scoreboard problem all over again.
  - We just ran out of digits, but this time we "put the error on the right."
  - This is a precision error.: the difference between a computed approximation and the exact result.
In conclusion:
- Choose double,
  - Reduces the chance/impact of overflow, precision errors.
  - floats might be slightly faster than doubles
  - But you are working in java, so use a double.
- Always be careful however, especially when dealing with very precise quantities.