-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate magnitude to vector space representation #300
Comments
As discussion points in the meantime:
|
A major requirement for me is that it is comprehensible to people like myself who don't have a strong mathematical background. It needs to be simple and intuitive to extend the library with other units. Ideallly it should be expressible concisely in code to avoid very long error messages I would suggest an example. How for example would you create a unit representing the BTU in the SI system and how would that be converted to Joules in your scheme |
The page you linked gives a table with 6 different definitions for "BTU". I'm just going to pick the ISO one for concreteness, as it's the only one with a precise definition. I haven't yet worked in depth with mp-units code, but based on the docs, I assume it would look something like this: struct btu : named_scaled_unit<btu, "BTU", no_prefix, ratio(1'055'060, 1'000), si::joule> {}; Would syntax like this meet your needs? |
Sure, but you don't seem to have innovated your own magnitude representation here ? . I picked this since I assume you would use pound, fahrenheit basis vector magnitudes or I don't otherwise understand what you are meaning by basis vectors |
For a dimension, "basis vectors" are the base dimensions of the system (mass, length, angle, etc.). They're the things we raise to rational powers. We choose them such that every product of distinct rational powers of basis vectors gives a unique dimension. (So if we have length and time as basis vectors, we'd better not also include speed as one!) For a magnitude, "basis vectors" are a set of real numbers. We choose them such that every product of distinct rational powers of basis vectors gives a unique real number. (So if we have 2 and 5 as basis vectors, we'd better not also include 10 as one!) Of course, we don't want to ask users to do the work of decomposing their "magnitude vector" onto our basis vectors. As you rightly point out, that would be a terrible interface! So we let them provide something more intuitive, like Will this exact syntax work? I think so, but I'm not 100% confident. At worst, we might end up with something like Anyway, the good news is that the vector space representation gives us a strict superset of the magnitudes we can represent with a ratio (at least, the physically meaningful ones, i.e., the positive ones). That means we can present either the same interface, or a very similar interface, to our end users. |
Ok so as a user, I don't need to know about the maths. Well but I would still like to know how it works :) so we have our rational number 1'055'060/ 1'000. How do we go about converting it to/from the 'vector space representation"? |
This is notional only, for exposition purposes, but let's say our template for a basis vector raised to a power is The first thing we would do is compute the prime factorization of the numerator and denominator; naturally, this can be done at compile time.
The final result is then The basis vectors are in their canonical order (increasing by value), and if any exponent had become zero, we would omit the corresponding basis vector. This preserves a unique representation for each magnitude. Here, we have given the ratio Note how it's automatically "in lowest terms". If you form a numerator by grouping positive powers, and likewise group negative powers to form the denominator, you get Alternatively, you could have separate |
OK. I am starting to see the point. To try to understand what you are explaining , for me it works better by rewriting in simple runtime c++ code #include <cmath>
#include <iostream>
int main()
{
// "compute the prime factorization of the numerator and denominator"
// https://en.wikipedia.org/wiki/Integer_factorization
int nume = std::pow(2,2) * 5 * 71 * 743;
std::cout << "nume == 1'055'060 == " << std::boolalpha << (nume == 1'055'060) <<'\n';
int denom = std::pow(2,3) * std::pow(5,3) ;
std::cout << "denom == 1000 == " << std::boolalpha << (denom == 1000) <<'\n';
auto v0 = static_cast<double> (nume) / denom;
// expand and simplify the expression
auto v1 = ( std::pow(2,2) * 5 * 71 * 743 ) / ( std::pow(2,3) * std::pow(5,3) );
auto v2 = ( std::pow(2,2) * 5 * 71 * 743 ) * std::pow(2,-3) * std::pow(5,-3);
auto v3 = std::pow(2,-1) * std::pow(5,-2) * 71 * 743;
// verify these results should be same within fp rounding anyway
std::cout << v0 <<'\n';
std::cout << v1 <<'\n';
std::cout << v2 <<'\n';
std::cout << v3 <<'\n';
} output : |
Yes---nothing communicates like code! Nice illustration. 🙂 |
Next question ( from reading Wikipedia) is what factorization algorithms you will use ? Also this would be useful as a general purpose entity representing a compile time number rather than as a mpusz-units only entity, in the same way as std::ratio ( which was incorporated into the C++ standard because of it was used by std::chrono::duration, but is generally useful) |
Sorry for the delay---truth is, I'm rather behind on my CppCon slides, so I need to back-burner this until after my talk. Let me answer your questions before I do!
We use a recursive strategy which goes something like this.
e.g., for N = 24, f = 2, and m = 3, because 2 divides 24 3 times. (24 = 2 * 2 * 2 * 3). Strictly speaking, one doesn't need the multiplicity, but it is an easy way to cut down on the number of template instantiations. To find the first factor for N, we start at 2, and increment to the next odd number until we find one that divides N evenly, stopping early (and returning N) when the square of the factor exceeds N. I'm sure this whole strategy can be made more efficient, but it really doesn't seem to have any noticeable effect on compile times, so we haven't been motivated to speed it up.
Great idea---I hadn't thought of this! I would love to see what other people do with this once we get a reference implementation into the library. |
OK. I'm out of time today, FWIW Here is where I got with it, based on https://en.wikipedia.org/wiki/Trial_division #include <cmath>
#include <iostream>
#include <vector>
#include <algorithm>
// https://en.wikipedia.org/wiki/Trial_division
std::vector<int> TrialDivision(int n)
{
std::vector<int> v;
int f = 2;
while (n % 2 == 0) { v.push_back(f); n /= 2; }
f = 3;
while (n % 3 == 0) { v.push_back(f); n /= 3; }
f = 5;
int ac = 9;
int temp = 16;
do {
ac += temp;
if (ac > n) break;
if (n % f == 0) {
v.push_back(f);
n /= f;
ac -= temp;
}else {
f += 2;
temp += 8;
}
} while (1);
if (n != 1) v.push_back(n);
return v;
}
// reduce to pairs of {factor , power}
std::vector<std::pair<int,int> >
get_prime_factors(int n)
{
auto raw = TrialDivision(n);
std::vector <std::pair<int,int> > res;
while (raw.size()){
auto const v = raw[0];
auto const p = count(raw.begin(),raw.end(),v);
raw.erase(raw.begin(),raw.begin() + p);
res.push_back({v,p});
}
return res;
}
// output
void show_factors(int n)
{
std::cout << "prime factors of "<< n << " = {";
auto vect = get_prime_factors(n);
bool start = true;
for( auto v : vect){
if (start ) {
start = false; // no commas at start of seq
}else{
std::cout << ',';
}
if ( v.second != 1){
std::cout << v.first << "^" << v.second ;
}else{
std::cout << v.first ;
}
}
std::cout << "}\n";
}
int raw_int_from_factors(std::vector <std::pair<int,int> > const & factors)
{
int accum = 1;
for ( auto p : factors){
accum *= std::pow(p.first,p.second);
}
return accum;
}
bool test(int n);
int main()
{
show_factors( 1'055'060);
show_factors (1'000);
test( 1'055'060);
test( 1'000);
}
bool test(int n)
{
auto factors = get_prime_factors(n);
int const n1 = raw_int_from_factors(factors);
if ( n1 != n){
std::cout << "Fail\n";
return false;
}
return true;
} |
#323 lays the foundations! I don't seem to be able to select a reviewer or run the workflows, so I think I need @mpusz to approve moving it forward. Sorry about the delay... it was humbling to realize just how dependent I am on my already-configured toolchains and environments at work. 😬 😆 I'm also used to bazel rather than CMake. But, eventually I got things turning over! |
I'm still working on this and making progress (although I don't tend to do much during the work week). Here's my rough plan right now, just to give people an idea of what to expect.
After that, it should be easier to take stock of the uses of |
Status update: I got the Here's what's left. Paving the wayA lot of the (build or test) failures have to do with specific kinds of unit or quantity. This includes units whose Magnitude is hard for our current implementation to handle (mostly because of big prime numbers). It also includes quantities with non-standard Rep (i.e., not These PRs will anticipate and solve problems that we would run into later. It will help keep the "real" PRs cleaner.
New features
The big enchilada
I hope this status update gives some much-needed big picture context as I prepare the PRs for review. I am trying to keep each one single-purpose to reduce cognitive load for the reviewer. The end is in sight! And then the cleanup will be unblocked. 🙂 |
Changing it should be OK. The documentations repeat this a few times:
|
I also found https://github.com/mpusz/units/blob/3710b249f3e478537f200903d6f16aed94eca77e/test/unit_test/runtime/fmt_units_test.cpp#L319-L332 as a follow-up clean-up. |
Nice catch. That'll be non-trivial, because there's separate work in first developing a policy (i.e., deciding what the "right answer" should be). But at least with #386 now there's a home for that discussion. |
We'll provide 3 tiers: full support, best effort, and assumed support. The new doc explains what these 3 tiers mean. The motivation behind this division is in part due to my experience working on mpusz/mp-units#300. That project got stalled for multiple months because I couldn't iterate locally on an issue we encountered with the MSVC compiler (which turned out to be a compiler bug!). Therefore, the compilers which _do_ let us iterate locally get the highest level of support. Next up is "anything we can make a CI job for", including several Windows/MSVC configurations. We plan to keep these passing always, although a time might come when something in this tier gets dropped because it's too painful to maintain. The final tier is "assumed support", which is just what it sounds like. I also added the new badgest to the README.md. I thought about adding a section for compiler support, but it didn't flow well, and besides, everybody understands more or less what the badges mean --- we can revisit this if and when any of them ever start failing. Fixes #144.
We'll provide 3 tiers: full support, best effort, and assumed support. The new doc explains what these 3 tiers mean. The motivation behind this division is in part due to my experience working on mpusz/mp-units#300. That project got stalled for multiple months because I couldn't iterate locally on an issue we encountered with the MSVC compiler (which turned out to be a compiler bug!). Therefore, the compilers which _do_ let us iterate locally get the highest level of support. Next up is "anything we can make a CI job for", including several Windows/MSVC configurations. We plan to keep these passing always, although a time might come when something in this tier gets dropped because it's too painful to maintain. The final tier is "assumed support", which is just what it sounds like. I also added the new badgest to the README.md. I thought about adding a section for compiler support, but it didn't flow well, and besides, everybody understands more or less what the badges mean --- we can revisit this if and when any of them ever start failing. Fixes #144. --------- Co-authored-by: johnzoppina <[email protected]>
The magnitude of a unit is a positive real number, and we need to pick a representation for it. Units are closed under products and rational powers, so the representation should support these operations well. Here's my proposal for the requirements for this representation.
The most common solution is a "ratio", or some variant of this. This could never meet the requirements, because ratios are not closed under rational powers.
Boost units satisfies the first requirement, but fails the second: it's easy to create multiple distinct representations for the same number.
The only solution I know which satisfies these requirements is the vector space representation. Each magntiude is represented as the product of rational powers of basis vectors. (If we take this to log space, we have sums of rational scalings of basis vectors, and our vector space over the rationals becomes apparent.)
This is actually the same approach we use for dimensions already, so we automatically gain the ability to support everything that dimensions can!
So: what are the basis vectors? We start with the prime numbers, because the product of rational powers of any finite number of primes is unique. For each magnitude we want to represent, we can follow a simple algorithm.
This is something of an "existence proof" that this approach can meet our needs. In practice, we'll want to be judicious about which basis vector we add. (When we go to add angular degrees, we'll be better off choosing pi as a basis vector, rather than (pi / 180)!)
This will bring at least three significant benefits:
We know this works well in practice, because we've been using it in Aurora's units library from the start and haven't seen any bugs. Compile time is not a downside: we measured the impact using "adversarial" magnitudes with dozens of basis vectors, and the slowdown was measurable but not subjectively noticeable. The only drawback is that the type name for a magnitude is more verbose (because everything is represented in its prime factorization). This shouldn't be an issue for mp-units, because downcasting will prevent end users from seeing the magnitude type names anyway.
I initially thought we were the first to invent this, but I later found that the benri library had already done so before us (nice work @jansende! 😎).
As discussed on #71, we're loosely planning to start work on this sometime in early November.
The text was updated successfully, but these errors were encountered: