Migrate magnitude to vector space representation #300

chiphogg · 2021-10-14T14:07:52Z

The magnitude of a unit is a positive real number, and we need to pick a representation for it. Units are closed under products and rational powers, so the representation should support these operations well. Here's my proposal for the requirements for this representation.

It must support exact symbolic math under the supported operations (i.e., products and rational powers).
Representations must be unique for each real number (we don't want a proliferation of types for the same magnitude).
The representation must be computable (we need to be able to get a suitable approximation of the real number: definitely floating point, and perhaps also with a rational part).

The most common solution is a "ratio", or some variant of this. This could never meet the requirements, because ratios are not closed under rational powers.

Boost units satisfies the first requirement, but fails the second: it's easy to create multiple distinct representations for the same number.

The only solution I know which satisfies these requirements is the vector space representation. Each magntiude is represented as the product of rational powers of basis vectors. (If we take this to log space, we have sums of rational scalings of basis vectors, and our vector space over the rationals becomes apparent.)

This is actually the same approach we use for dimensions already, so we automatically gain the ability to support everything that dimensions can!

So: what are the basis vectors? We start with the prime numbers, because the product of rational powers of any finite number of primes is unique. For each magnitude we want to represent, we can follow a simple algorithm.

Is it representable using only the basis vectors we already have? (If so, we're done.)
If not, then it must be linearly independent, and can itself be added as a new basis vector.

This is something of an "existence proof" that this approach can meet our needs. In practice, we'll want to be judicious about which basis vector we add. (When we go to add angular degrees, we'll be better off choosing pi as a basis vector, rather than (pi / 180)!)

This will bring at least three significant benefits:

We can actually support rational powers of arbitrary units (Raise coherent quantity to fractional power doesnt give correct result. #71)
We can support irrational magnitudes, unlocking degrees as an angle unit (How to address radian and steradian? #99, Irrational conversion factors: planar angle and π #195)
We won't be quite so vulnerable to overflow anymore (An alternative multiply semantic. #55, Different powers of different units #60, What are the minimum requirements for a standard units library? #187; see also docs)

We know this works well in practice, because we've been using it in Aurora's units library from the start and haven't seen any bugs. Compile time is not a downside: we measured the impact using "adversarial" magnitudes with dozens of basis vectors, and the slowdown was measurable but not subjectively noticeable. The only drawback is that the type name for a magnitude is more verbose (because everything is represented in its prime factorization). This shouldn't be an issue for mp-units, because downcasting will prevent end users from seeing the magnitude type names anyway.

I initially thought we were the first to invent this, but I later found that the benri library had already done so before us (nice work @jansende! 😎).

As discussed on #71, we're loosely planning to start work on this sometime in early November.

chiphogg · 2021-10-14T14:12:49Z

As discussion points in the meantime:

I expect these requirements to be pretty uncontroversial, but I haven't checked whether others agree. Are there more requirements I've missed? Do any of them feel like they don't belong?
Assuming we agree on these requirements, does anyone know any other solution that meets them? Or is the vector space representation really the only game in town?

kwikius · 2021-10-14T17:14:33Z

A major requirement for me is that it is comprehensible to people like myself who don't have a strong mathematical background. It needs to be simple and intuitive to extend the library with other units.

Ideallly it should be expressible concisely in code to avoid very long error messages

I would suggest an example. How for example would you create a unit representing the BTU in the SI system and how would that be converted to Joules in your scheme

chiphogg · 2021-10-14T17:51:32Z

The page you linked gives a table with 6 different definitions for "BTU". I'm just going to pick the ISO one for concreteness, as it's the only one with a precise definition.

I haven't yet worked in depth with mp-units code, but based on the docs, I assume it would look something like this:

struct btu : named_scaled_unit<btu, "BTU", no_prefix, ratio(1'055'060, 1'000), si::joule> {};

Would syntax like this meet your needs?

kwikius · 2021-10-14T18:13:02Z

The page you linked gives a table with 6 different definitions for "BTU". I'm just going to pick the ISO one for concreteness, as it's the only one with a precise definition.

I haven't yet worked in depth with mp-units code, but based on the docs, I assume it would look something like this:
struct btu : named_scaled_unit<btu, "BTU", no_prefix, ratio(1'055'060, 1'000), si::joule> {};
Would syntax like this meet your needs?

Sure, but you don't seem to have innovated your own magnitude representation here ? . I picked this since I assume you would use pound, fahrenheit basis vector magnitudes or I don't otherwise understand what you are meaning by basis vectors

chiphogg · 2021-10-14T18:29:30Z

For a dimension, "basis vectors" are the base dimensions of the system (mass, length, angle, etc.). They're the things we raise to rational powers. We choose them such that every product of distinct rational powers of basis vectors gives a unique dimension. (So if we have length and time as basis vectors, we'd better not also include speed as one!)

For a magnitude, "basis vectors" are a set of real numbers. We choose them such that every product of distinct rational powers of basis vectors gives a unique real number. (So if we have 2 and 5 as basis vectors, we'd better not also include 10 as one!)

Of course, we don't want to ask users to do the work of decomposing their "magnitude vector" onto our basis vectors. As you rightly point out, that would be a terrible interface! So we let them provide something more intuitive, like ratio(1'055'060, 1'000), and let the library do the heavy lifting under the hood.

Will this exact syntax work? I think so, but I'm not 100% confident. At worst, we might end up with something like ratio_scale<1'055'060, 1'000>. I'd rather avoid that, because I don't want to cause churn, but it wouldn't be the end of the world.

Anyway, the good news is that the vector space representation gives us a strict superset of the magnitudes we can represent with a ratio (at least, the physically meaningful ones, i.e., the positive ones). That means we can present either the same interface, or a very similar interface, to our end users.

kwikius · 2021-10-14T21:27:46Z

Ok so as a user, I don't need to know about the maths. Well but I would still like to know how it works :)

so we have our rational number 1'055'060/ 1'000.

How do we go about converting it to/from the 'vector space representation"?

chiphogg · 2021-10-14T21:51:10Z

This is notional only, for exposition purposes, but let's say our template for a basis vector raised to a power is Power<Base, N=1, D=1>. And let's say that we represent our integer basis vectors (prime numbers) by Int<P>. (This notation is sloppy, but I hope the ideas are clear!)

The first thing we would do is compute the prime factorization of the numerator and denominator; naturally, this can be done at compile time.

1'055'060 simplifies to Magnitude<Power<Int<2>, 2>, Power<Int<5>>, Power<Int<71>>, Power<Int<743>>>; call this Num.
1'000 simplifies to Magnitude<Power<Int<2>, 3>, Power<Int<5>, 3>>; call this Denom.

The final result is then quotient_t<Num, Denom>, likely implemented as product_t<Num, inverse_t<Denom>>. We negate the exponents of the contributions to the denominator, and then combine them with the numerator. In this case, that would give us: Magnitude<Power<Int<2>, -1>, Power<Int<5>, -2>, Power<Int<71>>, Power<Int<743>>>.

The basis vectors are in their canonical order (increasing by value), and if any exponent had become zero, we would omit the corresponding basis vector. This preserves a unique representation for each magnitude. Here, we have given the ratio 1'055'060 / 1'000 its unique representation by a product of prime powers, 2^(-1) * 5^(-2) * 71 * 743.

Note how it's automatically "in lowest terms". If you form a numerator by grouping positive powers, and likewise group negative powers to form the denominator, you get 52'753 / 50, or 1'055.06. When it comes time to convert a quantity between BTU and Joules, you could associate a constexpr double with this latter value.

Alternatively, you could have separate constexpr value traits for the numerator and denominator. This would preserve your ability to return exact integer results when, say, your value is evenly divisible by 50.

kwikius · 2021-10-14T22:39:07Z

OK. I am starting to see the point. To try to understand what you are explaining , for me it works better by rewriting in simple runtime c++ code

#include <cmath>
#include <iostream>

int main()
{

   // "compute the prime factorization of the numerator and denominator"
  // https://en.wikipedia.org/wiki/Integer_factorization

   int nume = std::pow(2,2) * 5 * 71 * 743;

   std::cout << "nume == 1'055'060 == " << std::boolalpha << (nume == 1'055'060) <<'\n';

   int denom = std::pow(2,3) * std::pow(5,3) ;

   std::cout << "denom == 1000 == " << std::boolalpha << (denom == 1000) <<'\n';

   auto v0 = static_cast<double> (nume) / denom;

   // expand and simplify the expression
   auto v1 = ( std::pow(2,2) * 5 * 71 * 743 )  /  ( std::pow(2,3) * std::pow(5,3) );

   auto v2 = ( std::pow(2,2) * 5 * 71 * 743 )  *  std::pow(2,-3) * std::pow(5,-3);

   auto v3 =  std::pow(2,-1) * std::pow(5,-2) * 71 * 743;

   // verify these results  should be same within fp rounding anyway
   std::cout << v0 <<'\n';
   std::cout << v1 <<'\n';
   std::cout << v2 <<'\n';
   std::cout << v3 <<'\n';

}

output :
nume == 1'055'060 == true
denom == 10000 == true
1055.06
1055.06
1055.06
1055.06

chiphogg · 2021-10-14T23:33:26Z

Yes---nothing communicates like code! Nice illustration. 🙂

kwikius · 2021-10-15T09:41:32Z

Next question ( from reading Wikipedia) is what factorization algorithms you will use ?

Also this would be useful as a general purpose entity representing a compile time number rather than as a mpusz-units only entity, in the same way as std::ratio ( which was incorporated into the C++ standard because of it was used by std::chrono::duration, but is generally useful)

chiphogg · 2021-10-16T14:45:57Z

Sorry for the delay---truth is, I'm rather behind on my CppCon slides, so I need to back-burner this until after my talk. Let me answer your questions before I do!

Next question ( from reading Wikipedia) is what factorization algorithms you will use ?

We use a recursive strategy which goes something like this.

Find the first factor f which divides N evenly.
Determine the multiplicity, m, of f.
Using the above notation, the magnitude is product_t<Magnitude<Power<Int<f>, m>>, make_magnitude_t<N / pow(f, m)>>, where make_magnitude_t is a helper to perform the prime factorization and return a Magnitude, and pow is not std::pow, but rather some constexpr function that works on integer inputs.

e.g., for N = 24, f = 2, and m = 3, because 2 divides 24 3 times. (24 = 2 * 2 * 2 * 3).

Strictly speaking, one doesn't need the multiplicity, but it is an easy way to cut down on the number of template instantiations.

To find the first factor for N, we start at 2, and increment to the next odd number until we find one that divides N evenly, stopping early (and returning N) when the square of the factor exceeds N. I'm sure this whole strategy can be made more efficient, but it really doesn't seem to have any noticeable effect on compile times, so we haven't been motivated to speed it up.

Also this would be useful as a general purpose entity representing a compile time number rather than as a mpusz-units only entity, in the same way as std::ratio ( which was incorporated into the C++ standard because of it was used by std::chrono::duration, but is generally useful)

Great idea---I hadn't thought of this! I would love to see what other people do with this once we get a reference implementation into the library.

kwikius · 2021-10-17T09:20:36Z

OK. I'm out of time today, FWIW Here is where I got with it, based on https://en.wikipedia.org/wiki/Trial_division

#include <cmath>
#include <iostream>
#include <vector>
#include <algorithm>

// https://en.wikipedia.org/wiki/Trial_division
std::vector<int> TrialDivision(int n)
{
    std::vector<int> v; 
    int f = 2;
    while (n % 2 == 0) { v.push_back(f); n /= 2; }
    f = 3;
    while (n % 3 == 0) { v.push_back(f); n /= 3; }
    f = 5;
    int ac = 9;
    int temp = 16;
    do {
        ac += temp; 
        if (ac > n) break; 
        if (n % f == 0) {
            v.push_back(f);
            n /= f;
            ac -= temp;
        }else { 
            f += 2;
            temp += 8;
        }
    } while (1);
    if (n != 1) v.push_back(n);
    return v;
}

// reduce to pairs of {factor , power}
std::vector<std::pair<int,int> >  
get_prime_factors(int n)
{
   auto raw = TrialDivision(n);

   std::vector <std::pair<int,int> > res;
   while (raw.size()){
      auto const v = raw[0];
      auto const p = count(raw.begin(),raw.end(),v);
      raw.erase(raw.begin(),raw.begin() + p);
      res.push_back({v,p});
   }
   return res;
}

// output
void show_factors(int n)
{
   std::cout << "prime factors of "<< n <<  " = {";
   auto vect = get_prime_factors(n);
   bool start = true; 
   for( auto  v : vect){
      if (start ) {
         start = false;  // no commas at start of seq
      }else{ 
         std::cout << ',';
      }
      if ( v.second != 1){
         std::cout << v.first << "^"  << v.second ;
      }else{
         std::cout << v.first  ;
      }
   }
   std::cout << "}\n";
}

int raw_int_from_factors(std::vector <std::pair<int,int> > const & factors)
{
   int accum = 1;
   for ( auto p : factors){
      accum *= std::pow(p.first,p.second);
   }
   return accum;
}

bool test(int n);

int main()
{
   show_factors( 1'055'060);

   show_factors (1'000);

   test( 1'055'060);
   test( 1'000);
}

bool test(int n)
{
   auto factors = get_prime_factors(n);

   int const n1 = raw_int_from_factors(factors);

   if ( n1 != n){
      std::cout << "Fail\n";
      return false;
   }
  return true;
}

chiphogg · 2021-12-30T16:14:07Z

#323 lays the foundations! I don't seem to be able to select a reviewer or run the workflows, so I think I need @mpusz to approve moving it forward.

Sorry about the delay... it was humbling to realize just how dependent I am on my already-configured toolchains and environments at work. 😬 😆 I'm also used to bazel rather than CMake. But, eventually I got things turning over!

chiphogg · 2022-01-22T20:49:47Z

I'm still working on this and making progress (although I don't tend to do much during the work week). Here's my rough plan right now, just to give people an idea of what to expect.

Get magnitude to parity with ratio.
- Add a value<T> template to get the value in a given type T (as long as it's representable in T).
- Add hidden friends num(Magnitude), denom(Magnitude), irrational_part(Magnitude), such that num(m) / denom(m) * irrational_part(m) == m.
Replace ratio template arguments with Magnitude auto ones.
- The preceding work items should make this easier, because for purely rational magnitudes, we can freely and losslessly interconvert between ratio and Magnitude auto representations. And, since we're using ratio right now, that's the only kind we have!
- This will probably be a single PR... but, in any case, the point of doing it this way is to let me take relatively small steps, keeping the build green at every stage.

After that, it should be easier to take stock of the uses of ratio::exp, and see if we can replace and get rid of them!

chiphogg · 2022-03-01T03:06:21Z

Status update: I got the ratio -> Magnitude migration fully working in my client---or, at least, near enough that I can see the full scope of the work that's required! I did have to comment out a (very) few unit tests, and there were a couple more that failed. But the important part is that I got over the hump, and got the first real proof of concept that this will actually work.

Here's what's left.

Paving the way

A lot of the (build or test) failures have to do with specific kinds of unit or quantity. This includes units whose Magnitude is hard for our current implementation to handle (mostly because of big prime numbers). It also includes quantities with non-standard Rep (i.e., not is_arithmetic).

These PRs will anticipate and solve problems that we would run into later. It will help keep the "real" PRs cleaner.

Use intmax for base of integral powers #333. This was stuck due to an apparently spurious failure. But it looks like 5668257 may have fixed that, so it's ready for review now!
Prime numbers. We need a more efficient way to compute them, or else we will hit constexpr loop limits and operation limits. I learned about wheel factorization, and got an implementation working where we can tune the basis size with a template parameter. This is ready for review, and I'll post the PR as soon as Use intmax for base of integral powers #333 lands.
The min_expl class causes problems for Magnitude, because all of our constraints so far have assumed is_arithmetic. A naive solution (switch to Representation) won't work, because it will cause a cycle. (basic_concepts.h, which houses Representation, doesn't yet depend on magnitude.h, but it needs to.)
- This is the last "hard work" left to do IMO.

New features

We need a common_magnitude function, which generalizes common_ratio to work with Magnitudes. I have a working implementation of this which I can turn into a PR.
We also need to be able to convert a Magnitude into a ratio. This PR will add the numerator() and denominator() functions, and use them to create an implicit conversion operator to ratio (which hard-errors if we try to use it with an irrational Magnitude). Later on we might remove the implicit conversion operator.

The big enchilada

The final PR cuts over from ratio to Magnitude! It's necessarily pretty big, but it's single-purpose and not so bad to review.
- Note that several format tests fail, because we produce a different representation of the same number. (For example, 1 / 6 * 10^2 became 5 / 3 * 10^1.) I think it would be OK to just update the tests, as there are only 3--4 of them and the current representation is not obviously better. However, if we really need to, I'm sure we can figure out how to reproduce the old results.

I hope this status update gives some much-needed big picture context as I prepare the PRs for review. I am trying to keep each one single-purpose to reduce cognitive load for the reviewer. The end is in sight! And then the cleanup will be unblocked. 🙂

JohelEGP · 2022-03-04T20:40:23Z

Note that several format tests fail, because we produce a different representation of the same number. (For example, 1 / 6 * 10^2 became 5 / 3 * 10^1.) I think it would be OK to just update the tests, as there are only 3--4 of them and the current representation is not obviously better. However, if we really need to, I'm sure we can figure out how to reproduce the old results.

Changing it should be OK. The documentations repeat this a few times:

Important
Always remember to explicitly cast the quantity to the destination unit with quantity_cast before calling quantity::number()!
-- https://mpusz.github.io/units/framework/conversions_and_casting.html?highlight=cast

JohelEGP · 2022-07-20T15:26:42Z

I also found https://github.com/mpusz/units/blob/3710b249f3e478537f200903d6f16aed94eca77e/test/unit_test/runtime/fmt_units_test.cpp#L319-L332 as a follow-up clean-up.

chiphogg · 2022-08-06T20:34:13Z

I also found

https://github.com/mpusz/units/blob/3710b249f3e478537f200903d6f16aed94eca77e/test/unit_test/runtime/fmt_units_test.cpp#L319-L332

as a follow-up clean-up.

Nice catch. That'll be non-trivial, because there's separate work in first developing a policy (i.e., deciding what the "right answer" should be). But at least with #386 now there's a home for that discussion.

We'll provide 3 tiers: full support, best effort, and assumed support. The new doc explains what these 3 tiers mean. The motivation behind this division is in part due to my experience working on mpusz/mp-units#300. That project got stalled for multiple months because I couldn't iterate locally on an issue we encountered with the MSVC compiler (which turned out to be a compiler bug!). Therefore, the compilers which _do_ let us iterate locally get the highest level of support. Next up is "anything we can make a CI job for", including several Windows/MSVC configurations. We plan to keep these passing always, although a time might come when something in this tier gets dropped because it's too painful to maintain. The final tier is "assumed support", which is just what it sounds like. I also added the new badgest to the README.md. I thought about adding a section for compiler support, but it didn't flow well, and besides, everybody understands more or less what the badges mean --- we can revisit this if and when any of them ever start failing. Fixes #144.

We'll provide 3 tiers: full support, best effort, and assumed support. The new doc explains what these 3 tiers mean. The motivation behind this division is in part due to my experience working on mpusz/mp-units#300. That project got stalled for multiple months because I couldn't iterate locally on an issue we encountered with the MSVC compiler (which turned out to be a compiler bug!). Therefore, the compilers which _do_ let us iterate locally get the highest level of support. Next up is "anything we can make a CI job for", including several Windows/MSVC configurations. We plan to keep these passing always, although a time might come when something in this tier gets dropped because it's too painful to maintain. The final tier is "assumed support", which is just what it sounds like. I also added the new badgest to the README.md. I thought about adding a section for compiler support, but it didn't flow well, and besides, everybody understands more or less what the badges mean --- we can revisit this if and when any of them ever start failing. Fixes #144. --------- Co-authored-by: johnzoppina <[email protected]>

chiphogg mentioned this issue Oct 14, 2021

Raise coherent quantity to fractional power doesnt give correct result. #71

Closed

chiphogg mentioned this issue Dec 30, 2021

Add vector space representation for magnitudes #323

Merged

chiphogg mentioned this issue Feb 24, 2022

Use intmax for base of integral powers #333

Merged

JohelEGP mentioned this issue May 28, 2022

Use Magnitude instead of ratio in unit implementations #355

Merged

chiphogg mentioned this issue Jun 1, 2022

Slug definition appears to be off by a few parts per billion nholthaus/units#289

Open

chiphogg mentioned this issue Jul 7, 2022

Migrate remaining ratio template parameters to Magnitude #368

Merged

chiphogg mentioned this issue Jul 15, 2022

Add angular degrees #370

Closed

mpusz closed this as completed in #368 Jul 19, 2022

chiphogg mentioned this issue Jul 20, 2022

Update prefix documentation #375

Merged

chiphogg mentioned this issue Aug 1, 2022

Simplify common magnitude use cases #377

Closed

chiphogg mentioned this issue Jul 5, 2023

Test basic functionality for wide variety of platforms/compilers aurora-opensource/au#144

Closed

2 tasks

chiphogg mentioned this issue Jul 28, 2023

Clarify tiers of support for compilers aurora-opensource/au#156

Merged

JohelEGP mentioned this issue Oct 24, 2023

Prime numbers (unit magnitude) unfit for standardization #509

Closed

chiphogg mentioned this issue Jan 20, 2024

Large numbers in magnitudes cause compiler crashes or long compilation times aurora-opensource/au#217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate magnitude to vector space representation #300

Migrate magnitude to vector space representation #300

chiphogg commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021 •

edited

Loading

chiphogg commented Oct 14, 2021

kwikius commented Oct 15, 2021

chiphogg commented Oct 16, 2021

kwikius commented Oct 17, 2021

chiphogg commented Dec 30, 2021

chiphogg commented Jan 22, 2022

chiphogg commented Mar 1, 2022

JohelEGP commented Mar 4, 2022

JohelEGP commented Jul 20, 2022

chiphogg commented Aug 6, 2022

Migrate magnitude to vector space representation #300

Migrate magnitude to vector space representation #300

Comments

chiphogg commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021

chiphogg commented Oct 14, 2021

kwikius commented Oct 14, 2021 • edited Loading

chiphogg commented Oct 14, 2021

kwikius commented Oct 15, 2021

chiphogg commented Oct 16, 2021

kwikius commented Oct 17, 2021

chiphogg commented Dec 30, 2021

chiphogg commented Jan 22, 2022

chiphogg commented Mar 1, 2022

Paving the way

New features

The big enchilada

JohelEGP commented Mar 4, 2022

JohelEGP commented Jul 20, 2022

chiphogg commented Aug 6, 2022

kwikius commented Oct 14, 2021 •

edited

Loading