Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R20 update #314

Merged
merged 65 commits into from
Oct 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
d51c131
add tests for r20 units and start work on getting the database for r2…
phlptp Mar 22, 2020
23c485f
add more output for the test
phlptp Mar 22, 2020
e1fda20
more additions to r20
phlptp Mar 26, 2020
e0d6233
add a bunch more r20 translations
phlptp Apr 21, 2020
d7ebc63
more r20 conversions
phlptp Apr 23, 2020
f10a690
more updates for r20
phlptp Apr 25, 2020
bafa13e
added some more r20 conversions
phlptp Nov 26, 2020
5717809
add more units and conversions
phlptp Dec 1, 2020
e98e13e
start modifying the commodities to match up with the harmonized system
phlptp Dec 2, 2020
7dac3a9
more commodities updates
phlptp Feb 5, 2021
6932cbb
more commodity definitions.
phlptp May 16, 2022
982aeb5
update commodity docs
phlptp May 16, 2022
05cc4b7
more commodity definitions to merge in main
phlptp Dec 29, 2022
8a0254d
update some more units
phlptp May 27, 2023
c7bdf7f
more work on updating commodities and packaging
phlptp Jun 3, 2023
d5ec57c
Merge branch 'main' into r20_update
phlptp Aug 28, 2023
df2dcf6
Merge remote-tracking branch 'remotes/origin/main' into r20_update
phlptp Sep 2, 2023
aca9ac3
all r20 commodities off default
phlptp Sep 14, 2023
a17fc20
fix more conversions and tests, fix issues with parsing square/square…
phlptp Sep 19, 2023
b6df882
more units to match
phlptp Sep 21, 2023
f138d3e
rework unit constructors to be more unambiguous
phlptp Sep 23, 2023
26cd9a1
more test fixes
phlptp Sep 24, 2023
09359ba
rearrange the bracketmodifiers section
phlptp Sep 27, 2023
c1f4e0f
more work on the unit string conversions
phlptp Oct 1, 2023
017056c
clear more mismatches
phlptp Oct 2, 2023
750512a
fix a few more units
phlptp Oct 2, 2023
f0b2901
fix more unit and conversions
phlptp Oct 5, 2023
f77bb9c
fix a few more conversions
phlptp Oct 16, 2023
ac682f3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2023
e2eb613
update pre-commit
phlptp Oct 16, 2023
23b70e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2023
a57d439
fix a few code checks
phlptp Oct 16, 2023
76fc084
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 16, 2023
4b4f6a2
fix a few of the tests
phlptp Oct 17, 2023
2e86472
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 17, 2023
ed6e646
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 17, 2023
78055f4
fix a few more unit conversions
phlptp Oct 18, 2023
81d8245
rework pressure units
phlptp Oct 20, 2023
0f9e38c
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 20, 2023
3ce619b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 20, 2023
5d9e9b9
better handing of US and repeated modifiers
phlptp Oct 20, 2023
871c5eb
fix remaining tests
phlptp Oct 22, 2023
6c60d21
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 22, 2023
3a17e7e
fix some compilation issues with the recommendation 20 unit inclusion…
phlptp Oct 23, 2023
78031ed
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 23, 2023
6c12255
clang tidy and gcc 4.8 compilation
phlptp Oct 23, 2023
6ed3bdd
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 23, 2023
6b23813
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 23, 2023
d10ee52
line length fixes
phlptp Oct 23, 2023
c857f1c
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 23, 2023
1118dd8
add fuzzing test
phlptp Oct 23, 2023
8bf1cff
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 23, 2023
ea5b4f2
add additional check on the r20 unit conversions
phlptp Oct 23, 2023
0387ff2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 23, 2023
40cf626
fix a fuzzing issue
phlptp Oct 24, 2023
c4cf996
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 24, 2023
0f31648
clang-tidy fixes
phlptp Oct 24, 2023
71b63dc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 24, 2023
f2bdc2c
more clang-tidy fixes
phlptp Oct 24, 2023
83d96e8
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 24, 2023
d5bc696
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 24, 2023
ad7a507
fix fuzz issue
phlptp Oct 25, 2023
3af298a
Merge branch 'r20_update' of https://github.com/LLNL/units into r20_u…
phlptp Oct 25, 2023
7303e26
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 25, 2023
de48700
more clang-tidy fixes
phlptp Oct 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
build*/
~$.xlsx
13 changes: 10 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,19 @@ repos:
hooks:
- id: remove-tabs
- repo: https://github.com/codespell-project/codespell
rev: v2.2.5
rev: v2.2.6
hooks:
- id: codespell
exclude: ^(test/|units/|docs/reference/)
args:
[
"-w",
"--skip=*.csv",
"--ignore-words=./config/spelling_whitelist.txt",
"--exclude-file=./config/spelling_ignorelines.txt",
]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: mixed-line-ending
- id: trailing-whitespace
Expand All @@ -40,7 +47,7 @@ repos:
- id: end-of-file-fixer
- id: check-shebang-scripts-are-executable
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v16.0.6
rev: v17.0.2
hooks:
- id: clang-format
types:
Expand Down
5 changes: 2 additions & 3 deletions config/cppcheck_suppressions.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
unusedFunction:units/x12_conv.cpp:1024
unusedFunction:units/r20_conv.cpp:2738
unusedFunction:units/x12_conv.cpp:1009
passedByValue:units/units.cpp:224
passedByValue:units/units.cpp:1156
passedByValue:units/units.cpp:1173
passedByValue:units/units.cpp:1160
passedByValue:units/units.cpp:1177
Empty file added config/spelling_ignorelines.txt
Empty file.
1 change: 1 addition & 0 deletions config/spelling_whitelist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
smoot
97 changes: 96 additions & 1 deletion docs/details/commodities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,99 @@
Commodity Details
==================

The `precise_unit` class includes an unsigned 32 bit field that represents a commodity of some kind.
The `precise_unit` class includes an unsigned 32-bit unsigned integer that represents a commodity of some kind.

This is a 32 bit code representing a commodity and possibly containers or form factor.

So while there is some predefined structure to the commodities. Any user is free to use it however they like as it can be manipulated as 32 bit code however a user might wish to use it. The conversion to and from string is governed by the following rules.

The high order bit(31) is a power, either 1 or -1. So a 1 in high bit represents an inverse commodity, for example a unit of `$/oz` of gold would have an inverse power of gold, while the `$/oz` would be in the `precise_unit`. Upon division all bits in the commodity are inverted.


Control code
----------------

bits 29 and 30 are control codes
`00` is a normal commodity
`01` is a normal commodity with form factor code
`10` is a direct definitions
`11` is a custom commodity defined in a map storage

Direct definitions
============================
The direct definitions define a set of codes that are defined in a couple different methods

The next 3 bits define which method

`000` short strings, 5 lower case characters+`_`+'{|}~' (ascii codes 95-126)
`001` 3 byte alpha numeric code
`010` 6 character hex code
`011` 4 byte code ascii code 32-95 [numbers+upper case+punctuation]
`100` short strings, 5 upper case characters+@[\]^_' (ascii codes 64-95)
`101` UNUSED
`110` UNUSED
`111` pure common commodity codes

others will be defined later.

Short Strings
++++++++++++++++

To avoid always having to do a map lookup, many commodities or commodity codes can be represented by a short string of 5 or fewer characters. These cannot be case sensitive so '_' is a space or null character and if at the end of the string will be removed for display purposes. The very limited character set includes '_', `a-z', '`' and, '{|}~'. This is meant to simplify a chunk of the use cases. Custom Commodity Strings which are not captured in this mode fall into the custom commodity bin. The bits for this kind of commodity definition are 010000U[AAAAA][BBBBB][CCCCC][DDDDD][EEEEE], with A, B, C, D, and E representing the bits of the code letters.
There are 2 codes one representing the lower case character set, and one with the upper cases character set with different punctuation marks.
For the upper case set, setting the `U` bit to 1 indicates a stock symbol.

3 byte code
++++++++++++++++

For short alpha/numeric codes of 3 bytes or fewer the byte code can be captured in the lower 24 bits of the commodity code.
The bits for this kind of commodity definition are 010001[UU][AAAAAAAA][BBBBBBBB][CCCCCCCC], with A,B, C representing the bits of the code letters.
The codes UU, define a set of types of code

`00` user defined
`01` UNDEFINED
`10` ISO currency codes defined in ISO 4217
`11` UNDEFINED

6 character hex code
++++++++++++++++++++++

Similar to the 3 byte code some commodities can be represented by a 6 byte hex code

The bits for this kind of commodity definition are 010011XX[AAAA][BBBB][CCCC][DDDD][EEEE][FFFF], with A, B, C, D, E, F representing the bits of hex codes.

4 character codes
++++++++++++++++++++++

Similar to the 3 byte code some commodities can be represented by a 6 byte hex code

The bits for this kind of commodity definition are 010011[UU][AAAAAA][BBBBBB][CCCCCC][DDDDDD], with A, B, C, D, representing the characters

`00` user defined
`01` Chemical Formula
`10` UNDEFINED
`11` UNDEFINED

Known Definitions
+++++++++++++++++++

A set of known commodities are defined in the header libraries. These are contained using code 111 and are defined in header files.
The first 6 bits are defined 010111 leaving 26 bits available for user defined commodity codes.


Custom Commodity
=======================
String which can't be represented by the very simplistic short string mode are placed into a hash table for lookup and assigned a hash code generated from the string. The string is converted to a 29-bit hash placed in the lower 29 bits of the commodity code.

Normal Commodity with Form Factor
=============================
Frequently commodities come in a specific form factor. With a form factor code in place this can represent a form factor independent of the actual commodity material. For example a drum of oil vs a drum of gasoline.
the container is wrapped in a 8-bit code bits 21-28. The commodity itself is contained in bits 0-20.
The bit codes for packaging is 001[FFFFFFFF][CCCCCCC][CCCCCCC][CCCCCCC]. To the extent possible the form factor codes in use are those used in recommendation 21 of international trade for use in conjunction with harmonized code. This covers the trade of goods but in general is insufficient to cover all the required packaging modes necessary for general description so it is not used exactly. The codes 0-99 if used correspond to codes used in recommendation 21. The way this is a encoded is the lowest 7 bits correspond to the recommendation if the value < 99 since that is a 2 digit decimal numerical code. Numbers 100-127 and 228-255 are local user definitions defined as required for other purposes. Numbers 128 to 227 correspond to alternate names for recommendation 21 codes, this is to disambiguate strings when converting to and from string representations. In Rec 21 codes 70-79 are reserved for future use but may be used in the units library as needed.

Normal Commodity
============================

The codes used for normal commodity are the same as those used with a container with the exception that the additional 8 bit can be used for more specific codes of that commodity used for international trade. The codes used are based on the harmonized system for international trade <https://www.trade.gov/harmonized-system-hs-codes>`_ The 0-20 bits contain the harmonized system 6 digit code. The chapter is contained in bits 14-20, the section in bits 7-15, and the subsection in bits 0-6. This allows structure that can act as a mask on specific types of commodities. Common commodities are mapped to chapter and section mostly, though some exceptions go to the subsection for commodity to string translation. The 6 digit harmonized commodity code is the same between using with a container and without. If no container is used. the additional 8 bits, can represent the country specific codes.

In the normalized code 7 bit sections, the codes for 100-127 represent other commodities that cannot be represented in the allowable 8 bits of space. These are stored in a hash map when used for later reference as needed. This allows representation of a large percentage of codes with no additional overhead and an additional 5.6 million codes through the hash structure. This is far more codes than are presently in use.
1 change: 1 addition & 0 deletions docs/details/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ The Low Level Details of the Units library

unit_base
commodities
string_parsing_squared
13 changes: 13 additions & 0 deletions docs/details/string_parsing_squared.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
==============================
Parsing of squared and cubic
==============================

When units are written there are a few terms that modify the powers of a unit.
The two primary terms are `square` and `cubic`

These are rules the library follows when parsing terms such as this

- `square` or `sq` or `sq.` will apply to the unit immediately following the term
- `cubic` or `cu` or `cu.` will apply the unit immediately following the term
- `squared` will apply to the unit immediately preceding the term
- `cubed` will apply to the unit immediately preceding the term
1 change: 1 addition & 0 deletions docs/installation/cmake_variables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ CMake variables
- `UNITS_DOMAIN`: Specify a default domain to use for string conversions. Can be either a name from the domains namespace such as `domains::surveying` or one of 'COOKING', 'ASTRONOMY', 'NUCLEAR', 'SURVEYING', 'USE_CUSTOMARY', 'CLIMATE', or 'UCUM'.
- `UNITS_DEFAULT_MATCH_FLAGS`: Specify an integer value for the default match flags to be used for conversion
- `UNITS_DISABLE_NON_ENGLISH_UNITS`: the library includes a number of non-english units that can be converted from strings, these can be disabled by setting `UNITS_DISABLE_NON_ENGLISH_UNITS` to ON or setting the definition in the C++ code.
- `UNITS_DISABLE_EXTRA_UNIT_STANDARDS`: If set to `ON` disables UN recommendation 12, X12(not implemented yet), DOD(not implemented yet), from being included in the compilation and generated from strings.

- `UNITS_NAMESPACE`: The top level namespace of the library, defaults to `units`.
When compiling with C++17 (or higher), this can be set to, e.g., `mynamespace::units` to avoid name clashes with other libraries defining `units`.
Expand Down
3 changes: 2 additions & 1 deletion docs/user-guide/custom_units.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ there are a few custom count units in use for specific clinical units Many of th
So there is no translation to other units and cannot be converted except to multiple of the same unit. There are often well established tests for these units but no good way to convert them to other units. Many of these units come from `UCUM <https://unitsofmeasure.org/ucum.html>`_.

- custom_unit(37): is `hounsfield units <https://radiopaedia.org/articles/hounsfield-unit?lang=us>`_ used it CT and radiology
- many units in UCUM are defined like `[MPL'U]` or `[mclg'U]` for this context they define some unit which doesn't interact with other units in any known fashion. The notion used in the units library for string translations is that these define custom units. Rather than individually define the library takes a hash of the part of the unit coming before the `'U]'` and generates a 10 bit hash. That 10 bit hash is used as the custom code for the units.
- custom_unit(49): is `erlang` used in telephone carrying capacity <https://en.wikipedia.org/wiki/Erlang_(unit)>`_
- many units in UCUM are defined like `[MPL'U]` or `[mclg'U]` for this context they define some unit which doesn't interact with other units in any known fashion. The notion used in the units library for string translations is that these define custom units. Rather than individually defining them, the library takes a hash of the part of the unit coming before the `'U]'` and generates a 10 bit hash. That 10 bit hash is used as the custom code for the units.
- custom_unit(77): is global warming potential related to climate operations
- custom_unit(78): is global temperature change potential

Expand Down
7 changes: 7 additions & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ set(UNITS_TESTS
test_google_units
)

if(NOT UNITS_DISABLE_EXTRA_UNIT_STANDARDS)
list(APPEND UNITS_TESTS test_r20)
endif()

set(TEST_FILE_FOLDER ${CMAKE_CURRENT_SOURCE_DIR}/files)

# /wd4459 is for a warning of a global m in google test. They won't interfere so ignore
Expand Down Expand Up @@ -68,6 +72,9 @@ else()
target_compile_definitions(
test_unit_strings PUBLIC -DENABLE_UNIT_TESTING=1 -DENABLE_UNIT_MAP_ACCESS=1
)
if(NOT UNITS_DISABLE_EXTRA_UNIT_STANDARDS)
target_compile_definitions(test_r20 PUBLIC -DENABLE_UNIT_MAP_ACCESS=1)
endif()

add_unit_test(test_leadingNumbers.cpp)
target_link_libraries(
Expand Down
4 changes: 2 additions & 2 deletions test/examples_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ int main(int argc, char* argv[])
return -1;
}

units::precise_unit prec1(units::precise::L, 1.25);
units::precise_unit prec1(1.25, units::precise::L);

if (prec1 != units::precise_unit(units::precise::m.pow(3), 0.00125)) {
if (prec1 != units::precise_unit(0.00125, units::precise::m.pow(3))) {
return -1;
}

Expand Down
Loading