Skip to content

Commit

Permalink
language: Added approximate number literals (#185)
Browse files Browse the repository at this point in the history
This turned out to be a much more involved change that I expected. It
started by adding syntax support for scientific notation (i.e. 32e5) but
ended up overhauling how numbers work in vsql, and solving some major
current and future problems.

All literals in scientific notation are considered `DOUBLE PRECISION`,
that part is pretty straightforward. Furthermore, all approximate
numbers are always formatted in scientific notation, even if smaller
numbers are given a "e0" suffix.

Since this means that exact and approximate numbers now have a different
form we can take a lot of hacky and unreliable guess work out of
literals and implicit casting.

`REAL` and `DOUBLE PRECISION` no longer need to be treated as supertypes
for all other exact types. Which in retrospect makes no sense. The docs
make much more sense on how numbers work and this makes the larger
change of DECIMAL and NUMERIC types coming later much easier to explain
and implement.
  • Loading branch information
elliotchance authored Dec 24, 2023
1 parent 1a22339 commit 2a1989a
Show file tree
Hide file tree
Showing 31 changed files with 691 additions and 446 deletions.
327 changes: 242 additions & 85 deletions docs/numbers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,58 +3,102 @@ Numbers

.. contents::

Literals
--------
There are two categories of numbers in vsql: **approximate** and **exact**. Each
category has its own distinct SQL types, literals, functions and pros/cons.

Different forms have implicit types:
Numbers are never implicitly cast from one category to another. Operations
containing mixed categories (such as arithmetic) are not allowed because the
result category would be ambiguous. This means that sometimes you will need to
explicitly cast between categories.

.. list-table::
:header-rows: 1
Approximate Numbers
-------------------

* - Description
- Examples
- Type
Approximate (inexact) numbers are by definition approximations and are stored in
a 32 or 64bit native floating-point type. While it's possible that these
representations can give very close approximations for most numbers we use
day-to-day the accuracy cannot be guaranteed in storage or string
representation.

* - Integers
- ``123``, ``0``, ``-1234``
- ``BIGINT``
Advantages and Disadvantages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* - Floating-point (contains ``.``)
- ``12.3``, ``0.0001``, ``-12.0``
- ``DOUBLE PRECISION``
**Advantages** over exact numbers:

Exact Numeric Types
-------------------
1. Generally, they are very space efficient (4 or 8 bytes) regardless of
magnitude.

Exact numeric types will losslessly contain any value as long as it's within the
permitted range. If a value or an expression that produced a value is beyond the
possible range a ``SQLSTATE 22003 numeric value out of range`` is raised.
2. Can represent extremely large and extremely tiny values while maintaining a
certain amount of significant figures.

.. list-table::
:header-rows: 1
3. Operations (arithmetic, etc) are extremely fast because floating-point values
are implemented directly in the CPU.

* - Type
- Range (inclusive)
- Size
**Disadvantages** over exact numbers:

* - ``SMALLINT``
- -32,768 to 32,767
- 2 or 3 bytes [2]_
1. They are not reliable to compare equality to exact values. For example,
``0.1 + 0.2 = 0.3`` is an operation that might return ``FALSE`` on some systems
where the left hand side is computed as ``0.30000000000000004``. **This point
cannot be stressed enough, especially since floating-point values are rendered
with a maximum of 6 places for ``REAL`` or 12 places for ``DOUBLE PRECISION``
after the decimal.** See formatting notes below.

* - ``INTEGER`` or ``INT`` [1]_
- -2,147,483,648 to 2,147,483,647
- 4 or 5 bytes [2]_
2. They are subject to rounding errors when values cannot be represented closely
enough, or the result of operations between approximate numbers. For the same
reason described in the previous point, but this can occur for individual
numbers that are converted from base 10 (decimal).

* - ``BIGINT``
- -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
- 8 or 9 bytes [2]_
3. The string representation may truncate part of the value. For example,
``0.30000000000000004`` may be shown as ``3e-1`` which is very close but not
exactly equal.

Literals and Formatting
^^^^^^^^^^^^^^^^^^^^^^^

Literals for approximate numbers will always be ``DOUBLE PRECISION`` and must be
provided in scientific notation, in the form:

.. code-block:: text
[ + | - ] digit... { e | E } [ + | - ] digit...
Examples:

.. code-block:: sql
1e2 -- ~= 100.0
1.23456e4 -- ~= 12345.6
7E-5 -- ~= 0.00007
Approximate Numeric Types
-------------------------
Since scientific notation isn't a very friendly format to use, you can append
``e0`` to any number to have it represented as scientific notation:

Approximate types store floating-point values with a precision that is relative
to scale of the number. These types should not be used when an exact value (such
as currency) needs to be retained.
.. code-block:: sql
1e0 -- ~= 1.0
-1.23456e0 -- ~= -1.23456
0.000123e0 -- ~= 0.000123
When approximate numbers are displayed (formatted as strings) they are always
represented as scientific notation **with a maximum of 6 places for ``REAL`` or
12 places for ``DOUBLE PRECISION`` after the decimal.** This means that the
displayed value may not be the fully approximated value. This is partially to
combat encoding and rounding errors (such as ``0.30000000000000004``) but also
to reduce the string length as 6 or 12 places after the decimal is more than
enough for general use.

Approximate numbers that are whole numbers will be have the ```.0`` trimmed for
readability and if the number isn't large or small enough to have an exponent,
``e0`` will be appended to ensure the formatted value is guaranteed to always be
scientific notation:

.. code-block:: sql
VALUES 100.0e0;
-- COL1: 100e0
Approximate Types
^^^^^^^^^^^^^^^^^

.. list-table::
:header-rows: 1
Expand All @@ -71,68 +115,181 @@ as currency) needs to be retained.
- -1.7e+308 to +1.7e+308
- 8 or 9 bytes [2]_

Casting
-------
Exact Numbers
-------------

Exact numbers retain all precision of a number. SQL types for exact numbers that
do not have predefined ranges need to explicitly specify the scale (the maximum
size) and the precision (the accuracy) that an exact number must conform to.

Advantages and Disadvantages
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Advantages** over approximate numbers:

1. The value is always guaranteed to contain the scale and precision specified.

2. They can be any arbitrary size of precision desired.

3. Values can be bound to a maximum and minimum size (based on the scale).
Literals and operations that would result in an overflow will raise an error,
rather than implicitly truncating.

**Disadvantages** over approximate numbers:

1. Storage costs are higher, based on the scale of the number. Even if that
scale is not entirely used.

2. Operations (arithmetic, etc) are significantly slower than approximate
numbers because the operations are not natively supported by the CPU.

3. Can only represent numbers in the given precision, any extra precision will
be truncated by operations.

Literals and Formatting
^^^^^^^^^^^^^^^^^^^^^^^

The SQL type of an exact number depends on it's form and size:

.. code-block:: text
[ + | - ] [ . ] digit...
[ + | - ] digit... [ . [ digit... ] ]
Any number that contains a ``.`` will be treated as a ``NUMERIC``, even in the
case of whole numbers such as ``123.``. Otherwise, the smallest integer type
will be chosen that can contain the value. So ``100`` would be a ``SMALLINT``,
``-1000000`` would be an ``INTEGER``, etc. If the integer does not fit into the
range of ``BIGINT`` then it is treated as a ``NUMERIC`` with zero precision.

The precision of a ``NUMERIC`` is taken directly from the literal, so ``1.0``
and ``1.00`` are equal in value but have different types.

Formatting integers (representing as a string) are always shown as integers (of
any size) and ``NUMERIC`` will always be shown with the precision specified,
even if that requires padding more zeros.

Exact Types
^^^^^^^^^^^

Exact numeric types will contain any value as long as it's within the permitted
range. If a value or an expression that produced a value is beyond the possible
range a ``SQLSTATE 22003 numeric value out of range`` is raised.

.. list-table::
:header-rows: 1

* - ↓ From / To →
- ``SMALLINT``
- ``INTEGER``
- ``BIGINT``
- ``REAL``
- ``DOUBLE PRECISION``
* - Type
- Range (inclusive)
- Size

* - ``SMALLINT``
- ✅
- ✅
- ✅
- ✅
- ✅

* - ``INTEGER``
- ✅
- ✅
- ✅
- ✅
- ✅
- -32768 to 32767
- 2 or 3 bytes [2]_

* - ``INTEGER`` or ``INT`` [1]_
- -2147483648 to 2147483647
- 4 or 5 bytes [2]_

* - ``BIGINT``
- ✅
- ✅
- ✅
- ✅
- ✅
- -9223372036854775808 to 9223372036854775807
- 8 or 9 bytes [2]_

* - ``REAL``
- ✅
- ✅
- ✅
- ✅
- ✅

* - ``DOUBLE PRECISION``
- ✅
- ✅
- ✅
- ✅
- ✅

Arithmetic Operations
---------------------
Casting
-------

Implicit Casting
^^^^^^^^^^^^^^^^

Implicit casting is when the value can be safely converted from one type to
another to satisfy an expression. Consider the example:

.. code-block:: sql
VALUES 123 + 456789;
-- 456912
This operation seems very straightforward, but the parser will read this as
``SMALLINT + INTEGER`` due to the size of the literals. However, arithmetic
operations must take in an produce the same result. Rather than forcing the user
to explicitly cast one type to another we can always safely convert a
``SMALLINT`` to an ``INTEGER`` (this is called a supertype in SQL terms). The
implicit cast results in an actual expression of ``INTEGER + INTEGER`` that also
produces an ``INTEGER``.

It's important to know that the actual result is not taken into consideration,
so it's still possible to overflow:

.. code-block:: sql
VALUES 30000 + 30000;
-- error 22003: numeric value out of range
Because ``SMALLINT + SMALLINT`` results in a ``SMALLINT``. If you think it will
be possible for the value to overflow you should explicitly cast any of the
values to a larger type:

.. code-block:: sql
VALUES CAST(30000 AS INTEGER) + 30000;
-- COL1: 60000
Implicit casting only happens in supertypes of the same category:

* Approximate: ``REAL`` -> ``DOUBLE PRECISION``

* Exact: ``SMALLINT`` -> ``INTEGER`` -> ``BIGINT``

Explicit Casting
^^^^^^^^^^^^^^^^

Explicit casting is when you want to convert a value to a specific type. This is
done with the ``CAST`` function. The ``CAST`` function works for a variety of
types outside of numeric types but if a cast happens between numeric types the
value must be valid for the result or an error is returned:

.. code-block:: sql
VALUES CAST(30000 AS INTEGER);
-- Safe: 30000
VALUES CAST(60000 AS SMALLINT);
-- Error 22003: numeric value out of range
VALUES CAST(12345 AS VARCHAR(10));
-- Safe: "12345"
VALUES CAST(12345 AS VARCHAR(3));
-- Error 22001: string data right truncation for CHARACTER VARYING(3)
VALUES CAST(123456789 AS DOUBLE PRECISION);
-- COL1: 1.23456789e+08
Arithmetic
----------

Arithmetic operations (sometimes called binary operations) require the same type
for both operands and return this same type. For example ``INTEGER + INTEGER``
will result in an ``INTEGER``.

When the type of the operands are different it will implicitly cast to the
supertype of both. For numbers all supertypes are and in order: ``SMALLINT``,
``INTEGER``, ``BIGINT``, ``REAL`` and ``DOUBLE PRECISION``.
supertype of both. See *Implicit Casting*.

For example ``12 * 10.5`` will result in an error because
``SMALLINT * DOUBLE PRECISION`` because there is no supertype that satisfies
both operands (since they belong to different categories). Depending on what
category of result type you're looking for:

.. code-block:: sql
VALUES 12 * 10.5e0;
-- error 42883: operator does not exist: SMALLINT * DOUBLE PRECISION
For example ``12 * 10.5`` is evaluated as the expression
``BIGINT * DOUBLE PRECISION``. Since ``DOUBLE PRECISION`` is the only supertype
for both, ``12`` must be implicitly cast to a ``DOUBLE PRECISION`` and the
operation will yield as result as ``DOUBLE PRECISION``.
VALUES CAST(12 AS DOUBLE PRECISION) * 10.5e0;
-- COL1: 126e0
VALUES 12 * CAST(10.5e0 AS INTEGER);
-- COL1: 120
Notes
-----
Expand Down
Loading

0 comments on commit 2a1989a

Please sign in to comment.