Skip to content

Commit

Permalink
R20 update (#314)
Browse files Browse the repository at this point in the history
Major update for commodity definitions, and the addition of Recommendation 20 units support for international trade, including recommendation 21 for packaging commodities.
  • Loading branch information
phlptp authored Oct 25, 2023
1 parent 153aefd commit a2fdacd
Show file tree
Hide file tree
Showing 33 changed files with 8,135 additions and 3,746 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
build*/
~$.xlsx
13 changes: 10 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,19 @@ repos:
hooks:
- id: remove-tabs
- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
rev: v2.2.6
hooks:
- id: codespell
exclude: ^(test/|units/|docs/reference/)
args:
[
"-w",
"--skip=*.csv",
"--ignore-words=./config/spelling_whitelist.txt",
"--exclude-file=./config/spelling_ignorelines.txt",
]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: mixed-line-ending
- id: trailing-whitespace
Expand All @@ -40,7 +47,7 @@ repos:
- id: end-of-file-fixer
- id: check-shebang-scripts-are-executable
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v16.0.6
rev: v17.0.2
hooks:
- id: clang-format
types:
Expand Down
5 changes: 2 additions & 3 deletions config/cppcheck_suppressions.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
unusedFunction:units/x12_conv.cpp:1024
unusedFunction:units/r20_conv.cpp:2738
unusedFunction:units/x12_conv.cpp:1009
passedByValue:units/units.cpp:224
passedByValue:units/units.cpp:1156
passedByValue:units/units.cpp:1173
passedByValue:units/units.cpp:1160
passedByValue:units/units.cpp:1177
Empty file added config/spelling_ignorelines.txt
Empty file.
1 change: 1 addition & 0 deletions config/spelling_whitelist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
smoot
97 changes: 96 additions & 1 deletion docs/details/commodities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,99 @@
Commodity Details
==================

The `precise_unit` class includes an unsigned 32 bit field that represents a commodity of some kind.
The `precise_unit` class includes an unsigned 32-bit unsigned integer that represents a commodity of some kind.

This is a 32 bit code representing a commodity and possibly containers or form factor.

So while there is some predefined structure to the commodities. Any user is free to use it however they like as it can be manipulated as 32 bit code however a user might wish to use it. The conversion to and from string is governed by the following rules.

The high order bit(31) is a power, either 1 or -1. So a 1 in high bit represents an inverse commodity, for example a unit of `$/oz` of gold would have an inverse power of gold, while the `$/oz` would be in the `precise_unit`. Upon division all bits in the commodity are inverted.


Control code
----------------

bits 29 and 30 are control codes
`00` is a normal commodity
`01` is a normal commodity with form factor code
`10` is a direct definitions
`11` is a custom commodity defined in a map storage

Direct definitions
============================
The direct definitions define a set of codes that are defined in a couple different methods

The next 3 bits define which method

`000` short strings, 5 lower case characters+`_`+'{|}~' (ascii codes 95-126)
`001` 3 byte alpha numeric code
`010` 6 character hex code
`011` 4 byte code ascii code 32-95 [numbers+upper case+punctuation]
`100` short strings, 5 upper case characters+@[\]^_' (ascii codes 64-95)
`101` UNUSED
`110` UNUSED
`111` pure common commodity codes
others will be defined later.

Short Strings
++++++++++++++++

To avoid always having to do a map lookup, many commodities or commodity codes can be represented by a short string of 5 or fewer characters. These cannot be case sensitive so '_' is a space or null character and if at the end of the string will be removed for display purposes. The very limited character set includes '_', `a-z', '`' and, '{|}~'. This is meant to simplify a chunk of the use cases. Custom Commodity Strings which are not captured in this mode fall into the custom commodity bin. The bits for this kind of commodity definition are 010000U[AAAAA][BBBBB][CCCCC][DDDDD][EEEEE], with A, B, C, D, and E representing the bits of the code letters.
There are 2 codes one representing the lower case character set, and one with the upper cases character set with different punctuation marks.
For the upper case set, setting the `U` bit to 1 indicates a stock symbol.
3 byte code
++++++++++++++++

For short alpha/numeric codes of 3 bytes or fewer the byte code can be captured in the lower 24 bits of the commodity code.
The bits for this kind of commodity definition are 010001[UU][AAAAAAAA][BBBBBBBB][CCCCCCCC], with A,B, C representing the bits of the code letters.
The codes UU, define a set of types of code

`00` user defined
`01` UNDEFINED
`10` ISO currency codes defined in ISO 4217
`11` UNDEFINED

6 character hex code
++++++++++++++++++++++

Similar to the 3 byte code some commodities can be represented by a 6 byte hex code

The bits for this kind of commodity definition are 010011XX[AAAA][BBBB][CCCC][DDDD][EEEE][FFFF], with A, B, C, D, E, F representing the bits of hex codes.

4 character codes
++++++++++++++++++++++

Similar to the 3 byte code some commodities can be represented by a 6 byte hex code

The bits for this kind of commodity definition are 010011[UU][AAAAAA][BBBBBB][CCCCCC][DDDDDD], with A, B, C, D, representing the characters

`00` user defined
`01` Chemical Formula
`10` UNDEFINED
`11` UNDEFINED

Known Definitions
+++++++++++++++++++

A set of known commodities are defined in the header libraries. These are contained using code 111 and are defined in header files.
The first 6 bits are defined 010111 leaving 26 bits available for user defined commodity codes.


Custom Commodity
=======================
String which can't be represented by the very simplistic short string mode are placed into a hash table for lookup and assigned a hash code generated from the string. The string is converted to a 29-bit hash placed in the lower 29 bits of the commodity code.

Normal Commodity with Form Factor
=============================
Frequently commodities come in a specific form factor. With a form factor code in place this can represent a form factor independent of the actual commodity material. For example a drum of oil vs a drum of gasoline.
the container is wrapped in a 8-bit code bits 21-28. The commodity itself is contained in bits 0-20.
The bit codes for packaging is 001[FFFFFFFF][CCCCCCC][CCCCCCC][CCCCCCC]. To the extent possible the form factor codes in use are those used in recommendation 21 of international trade for use in conjunction with harmonized code. This covers the trade of goods but in general is insufficient to cover all the required packaging modes necessary for general description so it is not used exactly. The codes 0-99 if used correspond to codes used in recommendation 21. The way this is a encoded is the lowest 7 bits correspond to the recommendation if the value < 99 since that is a 2 digit decimal numerical code. Numbers 100-127 and 228-255 are local user definitions defined as required for other purposes. Numbers 128 to 227 correspond to alternate names for recommendation 21 codes, this is to disambiguate strings when converting to and from string representations. In Rec 21 codes 70-79 are reserved for future use but may be used in the units library as needed.

Normal Commodity
============================

The codes used for normal commodity are the same as those used with a container with the exception that the additional 8 bit can be used for more specific codes of that commodity used for international trade. The codes used are based on the harmonized system for international trade <https://www.trade.gov/harmonized-system-hs-codes>`_ The 0-20 bits contain the harmonized system 6 digit code. The chapter is contained in bits 14-20, the section in bits 7-15, and the subsection in bits 0-6. This allows structure that can act as a mask on specific types of commodities. Common commodities are mapped to chapter and section mostly, though some exceptions go to the subsection for commodity to string translation. The 6 digit harmonized commodity code is the same between using with a container and without. If no container is used. the additional 8 bits, can represent the country specific codes.

In the normalized code 7 bit sections, the codes for 100-127 represent other commodities that cannot be represented in the allowable 8 bits of space. These are stored in a hash map when used for later reference as needed. This allows representation of a large percentage of codes with no additional overhead and an additional 5.6 million codes through the hash structure. This is far more codes than are presently in use.
1 change: 1 addition & 0 deletions docs/details/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ The Low Level Details of the Units library

unit_base
commodities
string_parsing_squared
13 changes: 13 additions & 0 deletions docs/details/string_parsing_squared.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
==============================
Parsing of squared and cubic
==============================

When units are written there are a few terms that modify the powers of a unit.
The two primary terms are `square` and `cubic`

These are rules the library follows when parsing terms such as this

- `square` or `sq` or `sq.` will apply to the unit immediately following the term
- `cubic` or `cu` or `cu.` will apply the unit immediately following the term
- `squared` will apply to the unit immediately preceding the term
- `cubed` will apply to the unit immediately preceding the term
1 change: 1 addition & 0 deletions docs/installation/cmake_variables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ CMake variables
- `UNITS_DOMAIN`: Specify a default domain to use for string conversions. Can be either a name from the domains namespace such as `domains::surveying` or one of 'COOKING', 'ASTRONOMY', 'NUCLEAR', 'SURVEYING', 'USE_CUSTOMARY', 'CLIMATE', or 'UCUM'.
- `UNITS_DEFAULT_MATCH_FLAGS`: Specify an integer value for the default match flags to be used for conversion
- `UNITS_DISABLE_NON_ENGLISH_UNITS`: the library includes a number of non-english units that can be converted from strings, these can be disabled by setting `UNITS_DISABLE_NON_ENGLISH_UNITS` to ON or setting the definition in the C++ code.
- `UNITS_DISABLE_EXTRA_UNIT_STANDARDS`: If set to `ON` disables UN recommendation 12, X12(not implemented yet), DOD(not implemented yet), from being included in the compilation and generated from strings.

- `UNITS_NAMESPACE`: The top level namespace of the library, defaults to `units`.
When compiling with C++17 (or higher), this can be set to, e.g., `mynamespace::units` to avoid name clashes with other libraries defining `units`.
Expand Down
3 changes: 2 additions & 1 deletion docs/user-guide/custom_units.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ there are a few custom count units in use for specific clinical units Many of th
So there is no translation to other units and cannot be converted except to multiple of the same unit. There are often well established tests for these units but no good way to convert them to other units. Many of these units come from `UCUM <https://unitsofmeasure.org/ucum.html>`_.

- custom_unit(37): is `hounsfield units <https://radiopaedia.org/articles/hounsfield-unit?lang=us>`_ used it CT and radiology
- many units in UCUM are defined like `[MPL'U]` or `[mclg'U]` for this context they define some unit which doesn't interact with other units in any known fashion. The notion used in the units library for string translations is that these define custom units. Rather than individually define the library takes a hash of the part of the unit coming before the `'U]'` and generates a 10 bit hash. That 10 bit hash is used as the custom code for the units.
- custom_unit(49): is `erlang` used in telephone carrying capacity <https://en.wikipedia.org/wiki/Erlang_(unit)>`_
- many units in UCUM are defined like `[MPL'U]` or `[mclg'U]` for this context they define some unit which doesn't interact with other units in any known fashion. The notion used in the units library for string translations is that these define custom units. Rather than individually defining them, the library takes a hash of the part of the unit coming before the `'U]'` and generates a 10 bit hash. That 10 bit hash is used as the custom code for the units.
- custom_unit(77): is global warming potential related to climate operations
- custom_unit(78): is global temperature change potential

Expand Down
7 changes: 7 additions & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ set(UNITS_TESTS
test_google_units
)

if(NOT UNITS_DISABLE_EXTRA_UNIT_STANDARDS)
list(APPEND UNITS_TESTS test_r20)
endif()

set(TEST_FILE_FOLDER ${CMAKE_CURRENT_SOURCE_DIR}/files)

# /wd4459 is for a warning of a global m in google test. They won't interfere so ignore
Expand Down Expand Up @@ -68,6 +72,9 @@ else()
target_compile_definitions(
test_unit_strings PUBLIC -DENABLE_UNIT_TESTING=1 -DENABLE_UNIT_MAP_ACCESS=1
)
if(NOT UNITS_DISABLE_EXTRA_UNIT_STANDARDS)
target_compile_definitions(test_r20 PUBLIC -DENABLE_UNIT_MAP_ACCESS=1)
endif()

add_unit_test(test_leadingNumbers.cpp)
target_link_libraries(
Expand Down
4 changes: 2 additions & 2 deletions test/examples_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ int main(int argc, char* argv[])
return -1;
}

units::precise_unit prec1(units::precise::L, 1.25);
units::precise_unit prec1(1.25, units::precise::L);

if (prec1 != units::precise_unit(units::precise::m.pow(3), 0.00125)) {
if (prec1 != units::precise_unit(0.00125, units::precise::m.pow(3))) {
return -1;
}

Expand Down
Loading

0 comments on commit a2fdacd

Please sign in to comment.