$\require{cancel}$

Floating Point Numbers

Floating point numbers are the other basic number type.
- Numbers with values to the right of the decimal point 3.15
- Really small numbers $1.0 \times 10^{-20}$
- Really big numbers $1.0 \times 10^{30}$
Floating point types
- float is the standard and what you should use unless there is a justification for more.
- double is potentially bigger than a float
- long double is potentially bigger than a double.
- There is no unsigned.
Floating point literals
- reference.
- 1.0, if you don't include the .0 the compiler will make it an integer type.
- 1.0e12, 1.0e-12
- These are all doubles by default.
- 12.032l is a long double.
- -3.3e-11f is a float.
Output
- The stream attempts to present the number in the best format for the size.
- See the first part of floatDemo.cpp
There are a few useful I/O manipulators
- showpoint: forces floating point numbers to have a decimal point, even if they are integers.
- fixed, scientific, defaultfloat : set the format for the number (reference.
- setprecision(n) : set the number of digits to display. (reference.
Input
- We can read in many formats.
- 1 // as an int followed by a space
- 1.02
- .023
- 1.0e2
- 1.0e-4
- We might encounter roundoff error.
A quick look at the limits:
- Remember #include <limits>
- Remember: numeric_limits<type> :min(), (max)
Weird floating point stuff
- Overflow and Underflow.
- The GAP: the space between any two numbers.
- Roundoff error is when the result falls into the gap, the answer is not correct.