-
Notifications
You must be signed in to change notification settings - Fork 5k
BigInteger deficiencies and solutions #29193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@verelpode. Thanks for taking the time to log this. It would be nice if the original issue could be broken up into a few concrete proposals that follow the "good example" template given here: https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md#steps. This will help us prioritize and review the individual API proposals as necessary. It's probably worth mentioning that I've given some thought to the issues frequently brought up around
I want to think about this one some more as there have also been some discussions around exposing proper
We already have internal methods that do this and public versions that take/return
I'm generally okay with exposing methods that are
This shouldn't need an API review and it should be fine to submit a PR (bonus points for including benchmark numbers).
I think being able to get the length of a
I think this will largely be resolved by the
I want to think about this more as this is also something that comes up for things like The root problem is that when performing a operations over an array, it can be very inefficient to walk the array twice when the operations could instead be merged into the same "loop". This can be partially solved by exposing specialized operations, but this can quickly bloat the exposed API surface area.... It would be nice if we had a centralized place that handled these or if their was some way to tell the runtime that two operations could be "merged" together... |
Hi Tanner, thanks for the quick reply. Here's my thoughts on those issues:
Indeed BigInteger.Builder would make BigInteger much more powerful, especially when an algorithm runs in a loop and must perform many iterations and each iteration modifies/mutates the same local variables of type BigInteger.Builder.
Would be useful. One useful ability of Int128 is that when you multiply, add, or subtract two Int64 operands to produce a Int128 result, it is guaranteed not to overflow. Sometimes it is desirable to check if the result no longer fits in Int64 and then perform a different action using the Int128 value, instead of aborting the program or method with OverflowException. Catching OverflowException is not a good solution because exceptions are relatively expensive to throw and catch, and OverflowException still doesn't provide the result of the multiply/add/subtract. ToArray / ToByteArray / ToUInt32Array
Currently, it is problematic when you want to move data into or out of BigInteger. Although public BigInteger.ToByteArray() already exists, unfortunately it creates a new instance of the array. I'd prefer to be able to do it like this example:
Thus the following methods and properties would be created in BigInteger:
In that case, in order to improve cross-platform portability, I suggest that the sign be output as a boolean "is negative" parameter instead of outputting two's complement representation. New BigInteger from ArraySegmentI suggest creating the following constructors that would copy/convert an ArraySegment into a new BigInteger instance:
The input ArraySegment would contain simply the absolute/unsigned representation of the integer, not two's complement, and the sign would be provided separately via the boolean "isNegative" parameter. In my experience, a boolean "isNegative" parameter is easier to understand, easier to use, more reliable, and more portable (cross-platform) than using two's complement in a BigInteger or BigNumber or BigRational. TryConvertToX and ConvertToX
That's a good point, but here is another good point: That API pattern was designed before In other words, one can say that ideally/preferably a method should be designed as either a pure "function" or a "procedure", not a mix of both. I'm trying to adopt this modern principle, but admittedly in practice it is sometimes convenient to define return value and out params in the same method, even if it does kind of violate functional programming principles and C# async methods.
I agree.
ByteLength and BitLength
I think both are useful. ByteLength is especially useful in combination with ToByteArray/ToArray methods. And BitLength is useful in certain algorithms, for example:
Example implementation of BitLength:
Resulting Sign of Bitwise OperatorsCurrently the documentation says: That surprised me because I presumed that bitwise AND, OR, and XOR in BigInteger should not use two's complement, and instead should calculate the result sign by applying the same bitwise operator to the sign bits, meaning:
The same technique is applicable to BAND, BOR, and BXOR:
Correct me if I'm wrong, but I think this technique will produce the same sign as if two's complement is used, therefore you could switch over to this technique even without breaking compatibility with the existing behavior of the bitwise operators in BigInteger and in regular System.Int32. Admittedly I haven't fully tested this, but off the top of my head, I think I'm correct in saying the above. The existing version of BigInteger is already an excellent help and looks well-designed and well-implemented -- I like it a lot -- but a few extra methods as described above would increase the usability and performance of BigInteger, especially if we could eliminate the difficulty of efficiently moving data into and out of BigInteger, via the addition of constructors that take |
Exposing methods that fill an existing buffer, rather than creating a new one, sound fine and is something we provide in other types. Directly exposing the underlying array (to allow access without any copying) via a
Most APIs aren't taking or using
CoreFX isn't only designed for C#, it needs to work for multiple languages. I believe this has come up in API review in the past and the guidance is to still follow the current pattern which returns a
I agree both can be useful, but if you want the true bit length, then it is likely better exposed as the integral |
Here's one more issue that might interest you or other readers. Firstly, note that every new PC sold today (and every PC sold since the year 2000-and-something) has a 64-bit processor. Even "small" processors such as Intel Atom are 64-bit (some people think that Intel Atom processors are 32-bit but they're actually 64-bit and run 64-bit Windows well). Currently the BigInteger struct contains these 2 fields:
Thus the struct occupies a minimum of 32 + 64 = 96 bits but actually it occupies 128 bits because the array reference |
I've put some thought into modifying However, I also need to follow up on whether breaking binary serialization compatibility here is OK. |
That's a different issue than what I mentioned. I was talking about changing the
True true, I forgot that
I think such direct access should be permitted even if it is unsafe, because you can name the method "UnsafeGetInternalSpan" or property "UnsafeDirectSpan", thus the caller of "UnsafeDirectSpan" is clearly warned that it is their own responsibility to be careful when using direct access. If a programmer doesn't want to take the risk, then he can simply choose not to use the "UnsafeDirectSpan" property. Whereas if direct access is completely forbidden in the public API, then it is quite frustrating when the only thing stopping us from solving a special-case problem is "access denied". Thus in my opinion access shouldn't be denied, rather it should say "This is unsafe, not recommended, be careful, but it is allowed just in case you need it to solve a problem in a special case." However, considering that BigInteger is immutable, if the API returns ROSpan not Span, then it's safe anyway. Both could be permitted:
OK. In order to solve the problem that people don't know or don't remember that Log2 == bit length, the Log2 method could have a doc-comment that mentions in BigInteger currently contains an implementation of BitLengthOfUInt that can be replaced:
Unfortunately CoreFX or the CLR does not yet support use of the x86 BitScanForward and BitScanReverse instructions (see #32269), so in the meantime, a less-bad solution is to use the algorithm from "Hacker's Delight" by H.Warren:
|
Yes, and I think it would only be considered if we kept the The optimal solution would be to make both
I wouldn't consider exposing this on the readonly
We now have |
Sounds good, but does it cause a problem for you if you need to internally produce a 128bit result from the multiplication (or addition) of two 64bit array elements in _bits? With the current implementation where _bits is UInt32[], you can simply use UInt64 to multiply (or add) two elements with guaranteed success (without overflow). Would you lose this ability if you change the array to native int? Would you need the CLR to give you access to the CLMUL instruction? The CLMUL instruction computes the 128-bit carry-less product of two 64-bit values. If I recall correctly, x86 also has another instruction for 128-bit math but I forget the name. In other words, would
If you'd like to permit the internal
Thus the returned span would always be
Oh, that's new, great! Damn, apparently the reason I can't find it in the Object Browser is that it's not public (#35606). |
We will be exposing that via the hardware intrinsics feature in .NET Core 3.0: But otherwise, it is all fully implementable via software emulation (and will need to be for the non-x86 codepath). Same as 64-bit arithmetic on a 32-bit machine is done.
Not required, it would just help in some cases for computing the carry per iteration.
Right, it will be shipped first in .NET Core 3.0 and was only recently exposed. |
There is already a constructor that accepts If the goal is also convenience, then they could still be useful. |
Great! I didn't know that BigInteger has already advanced beyond the version that shipped with VS 2019 this week. What if the source array is
Should the "isBigEndian" parameter only exist for I would be satisfied if |
I've logged https://github.com/dotnet/corefx/issues/37204 proposing a |
@tannergooding |
There has been some discussion about BigDecimal at https://github.com/dotnet/corefx/issues/17267. Also, I suspect this comment from 2015 about the related BigRational type might be relevant:
|
dotnet/corefxlab#2635 tracks prototyping some of these types, like a BigDecimal type. There is a I would just ask you split off the type you plan on working on into it's own issue on corefxlab and try to capture a rough API shape. That issue can eventually be moved back to CoreFX for the API review when the type is done. |
I believe all points except the Also the |
@verelpode commented on Sun Apr 07 2019
System.Numerics.BigInteger is really great, except for a few issues that could be easily fixed.
Nasty conversion from 128-bit integer to BigInteger
Currently, in order to convert a 128-bit integer to BigInteger, it requires the following ugly workaround that creates 4 instances of BigInteger including the 4 internal array objects on the heap! When this conversion is performed many times in a loop, it creates a bunch of unnecessary garbage for GC.
I suggest adding the following optimized constructors to BigInteger:
Construct from array of UInt32
I'd also enjoy an efficient constructor that uses an array of UInt32 like the internal "_bits" field inside BigInteger:
The "data" parameter represents the words of an unsigned integer and the sign is supplied separately via the "isNegative" parameter.
By default, it would copy the "data" parameter to a new array instance, but if "useDirectly" is true, then it'd directly store the "data" reference to the internal "_bits" field inside BigInteger. This is quite helpful to enable various high-performance optimizations for situations where many math calculations are performed using BigInteger -- big number-crunching situations.
("useDirectly" would be defined as a preference to respect "if possible", not strict requirement, thus the implementation of BigInteger would still be permitted to copy "data" regardless of useDirectly, thus the BigInteger implementation is not required to forever keep "_bits" as UInt32 array in future versions.)
Copy to array of UInt32
For the opposite direction, it'd be great to have a method that copies "_bits" to the specified array:
Typecast but with failure reported as null instead of OverflowException
It'd be very useful to have the following "TryConvertToXXXX" methods that would attempt to typecast/convert the BigInteger without throwing System.OverflowException, unlike the existing typecast operators in BigInteger. The following methods should return null if conversion is unsucccessful (meaning return null if the BigInteger does not fit within the specified smaller integer type).
In the case of converting to unsigned UInt64, if the BigInteger is less than zero (UInt64.MinValue) or greater than UInt64.MaxValue, then TryConvertToUInt64 would return null.
For 128-bit integers, the following TryConvertToInt128 method would return true if the conversion is successful. The sign would be output separately via the boolean "outNegative" parameter (I've found that separated sign is quite useful), thus the "outLowPart" and "outHighPart" parameters are in-effect set to what would be the result of
BigInteger.Abs
.If desired, TryConvertToXXXX could also be supported for floating-point as follows.
Note TryConvertToFloat64 etc should only return exactly converted values. If the BigInteger would be inexactly/approx represented by float64 (System.Double), then null should be returned. If you want an inexact conversion to float64, then use the preexisting
public static explicit operator Double(BigInteger value)
that is already present in BigInteger. Alternatively, if desired, TryConvertToFloat64 might have a parameter to control exact versus inexact conversion, as follows:Tiny mistake in BigInteger.Abs, easily improved
The existing
BigInteger.Abs
method is useful but seems to be mistakenly implemented inefficiently:I think it should be changed to:
GetBitLength
A "GetBitLength" method would also be useful. It would return the number of binary bits consumed by an absolute/unsigned/positive representation of the binary integer (excluding any leading zero bits). The result is the same regardless of whether the BigInteger is positive or negative. If BigInteger.IsZero, then GetBitLength returns 0.
BigInteger result from 64-bit operands
It'd be good to have math operations that produce a BigInteger result from two 64-bit operands, as follows:
Two operations in one
Also worth considering:
@verelpode commented on Sun Apr 07 2019
The last one, MultiplyAdd, is intended to be like: https://en.wikipedia.org/wiki/Multiply–accumulate_operation
The text was updated successfully, but these errors were encountered: