Skip to content

BigInteger deficiencies and solutions #29193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ahsonkhan opened this issue Apr 8, 2019 · 16 comments
Open

BigInteger deficiencies and solutions #29193

ahsonkhan opened this issue Apr 8, 2019 · 16 comments

Comments

@ahsonkhan
Copy link
Contributor

@verelpode commented on Sun Apr 07 2019

System.Numerics.BigInteger is really great, except for a few issues that could be easily fixed.

Nasty conversion from 128-bit integer to BigInteger

Currently, in order to convert a 128-bit integer to BigInteger, it requires the following ugly workaround that creates 4 instances of BigInteger including the 4 internal array objects on the heap! When this conversion is performed many times in a loop, it creates a bunch of unnecessary garbage for GC.

public static BigInteger UInt128ToBigInteger(UInt64 lowPart, UInt64 highPart)
{
	if (highPart == 0) return new BigInteger(lowPart);
	return (new BigInteger(highPart) << 64) | new BigInteger(lowPart);
}

public static BigInteger SInt128ToBigInteger(UInt64 lowPart, Int64 highPart)
{
	if (highPart == 0) return new BigInteger(lowPart);
	if (highPart == -1) return BigInteger.Negate(new BigInteger(unchecked((~lowPart) + 1)));
	return (new BigInteger(highPart) << 64) | new BigInteger(lowPart);
}

I suggest adding the following optimized constructors to BigInteger:

// For unsigned 128-bit integer:
public BigInteger(UInt64 lowPart, UInt64 highPart) { ... }

// For signed 128-bit integer:
public BigInteger(UInt64 lowPart, Int64 highPart) { ... }

// For 128-bit integer with sign supplied separately (this is useful):
public BigInteger(UInt64 lowPart, UInt64 highPart, bool isNegative) { ... }

Construct from array of UInt32

I'd also enjoy an efficient constructor that uses an array of UInt32 like the internal "_bits" field inside BigInteger:

public BigInteger(UInt32[] data, bool isNegative, bool useDirectly = false) { ... }

The "data" parameter represents the words of an unsigned integer and the sign is supplied separately via the "isNegative" parameter.
By default, it would copy the "data" parameter to a new array instance, but if "useDirectly" is true, then it'd directly store the "data" reference to the internal "_bits" field inside BigInteger. This is quite helpful to enable various high-performance optimizations for situations where many math calculations are performed using BigInteger -- big number-crunching situations.
("useDirectly" would be defined as a preference to respect "if possible", not strict requirement, thus the implementation of BigInteger would still be permitted to copy "data" regardless of useDirectly, thus the BigInteger implementation is not required to forever keep "_bits" as UInt32 array in future versions.)

Copy to array of UInt32

For the opposite direction, it'd be great to have a method that copies "_bits" to the specified array:

// "destinationOffset" is offset/ordinal/index within "destinationArray".
/// <returns>Length copied to "destinationArray".  Returns 0 if BigInteger.IsZero.</returns>
public int ToArray(uint[] destinationArray, int destinationOffset, out bool outNegative)
{
	outNegative = (_sign < 0);
	...
	if (_bits is null)
	{
		if (_sign == 0) return 0;
		destinationArray[destinationOffset] = unchecked((uint)Math.Abs(_sign));
		return 1;
	}
	...
}

Typecast but with failure reported as null instead of OverflowException

It'd be very useful to have the following "TryConvertToXXXX" methods that would attempt to typecast/convert the BigInteger without throwing System.OverflowException, unlike the existing typecast operators in BigInteger. The following methods should return null if conversion is unsucccessful (meaning return null if the BigInteger does not fit within the specified smaller integer type).

public Int32? TryConvertToInt32() { ... }
public Int64? TryConvertToInt64() { ... }
public UInt32? TryConvertToUInt32() { ... }
public UInt64? TryConvertToUInt64() { ... }

In the case of converting to unsigned UInt64, if the BigInteger is less than zero (UInt64.MinValue) or greater than UInt64.MaxValue, then TryConvertToUInt64 would return null.

For 128-bit integers, the following TryConvertToInt128 method would return true if the conversion is successful. The sign would be output separately via the boolean "outNegative" parameter (I've found that separated sign is quite useful), thus the "outLowPart" and "outHighPart" parameters are in-effect set to what would be the result of BigInteger.Abs.

public bool TryConvertToInt128(out UInt64 outLowPart, out UInt64 outHighPart, out bool outNegative)
{
	outNegative = (_sign < 0);
	...
}

// Conversion to unsigned 128-bit integer:
public bool TryConvertToUInt128(out UInt64 outLowPart, out UInt64 outHighPart)
{
	bool neg;
	return (this.TryConvertToInt128(out outLowPart, out outHighPart, out neg) && !neg);
}

// Conversion to signed 128-bit integer with high part represented as signed 64-bit integer:
public bool TryConvertToInt128(out UInt64 outLowPart, out Int64 outHighPart)
{
	UInt64 unsignedHighPart;
	bool neg;
	if (this.TryConvertToUInt128(out outLowPart, out unsignedHighPart, out neg))
	{
		if (neg)
		{
			if (unsignedHighPart <= unchecked((UInt64)Int64.MinValue))
			{
				outHighPart = unchecked(-(Int64)unsignedHighPart);
				return true; // success
			}
		}
		else if ((unsignedHighPart >> 64) == 0)
		{
			outHighPart = unchecked((Int64)unsignedHighPart);
			return true; // success
		}
	}
	outLowPart = 0;
	outHighPart = 0;
	return false; // unsuccessful
}

If desired, TryConvertToXXXX could also be supported for floating-point as follows.

public double? TryConvertToFloat64() { ... }
public single? TryConvertToFloat32() { ... }
public decimal? TryConvertToDecimal() { ... }

Note TryConvertToFloat64 etc should only return exactly converted values. If the BigInteger would be inexactly/approx represented by float64 (System.Double), then null should be returned. If you want an inexact conversion to float64, then use the preexisting public static explicit operator Double(BigInteger value) that is already present in BigInteger. Alternatively, if desired, TryConvertToFloat64 might have a parameter to control exact versus inexact conversion, as follows:

public double? TryConvertToFloat64(bool approximation = false) { ... }
public single? TryConvertToFloat32(bool approximation = false) { ... }
public decimal? TryConvertToDecimal(bool approximation = false) { ... }

Tiny mistake in BigInteger.Abs, easily improved

The existing BigInteger.Abs method is useful but seems to be mistakenly implemented inefficiently:

public static BigInteger Abs(BigInteger value)
{
    return (value >= BigInteger.Zero) ? value : -value;
}

I think it should be changed to:

public static BigInteger Abs(BigInteger value)
{
    return (value._sign < 0) ? -value : value;
}

GetBitLength

A "GetBitLength" method would also be useful. It would return the number of binary bits consumed by an absolute/unsigned/positive representation of the binary integer (excluding any leading zero bits). The result is the same regardless of whether the BigInteger is positive or negative. If BigInteger.IsZero, then GetBitLength returns 0.

public uint GetBitLength() { ... }

BigInteger result from 64-bit operands

It'd be good to have math operations that produce a BigInteger result from two 64-bit operands, as follows:

public static BigInteger Add(Int64 left, Int64 right) { ... }
public static BigInteger Add(UInt64 left, UInt64 right) { ... }
public static BigInteger Subtract(Int64 left, Int64 right) { ... }
public static BigInteger Subtract(UInt64 left, UInt64 right) { ... }
public static BigInteger Multiply(Int64 left, Int64 right) { ... }
public static BigInteger Multiply(UInt64 left, UInt64 right) { ... }
// Divide(Int64,Int64) is not applicable.

// Already exists:
public static BigInteger Add(BigInteger left, BigInteger right) { ... }
public static BigInteger Subtract(BigInteger left, BigInteger right) { ... }
public static BigInteger Multiply(BigInteger left, BigInteger right) { ... }

Two operations in one

Also worth considering:

public static BigInteger MultiplyAdd(BigInteger inOperandA, BigInteger inOperandB, BigInteger inOperandC)
{
	// To be optimized now or in future with use of System.Numerics.BigIntegerBuilder.
	return (inOperandA * inOperandB) + inOperandC;
}

public static BigInteger MultiplyAdd(BigInteger inOperandA, sint64 inOperandB, sint64 inOperandC)
{
	// To be optimized now or in future with use of System.Numerics.BigIntegerBuilder.
	return (inOperandA * inOperandB) + inOperandC;
}


@verelpode commented on Sun Apr 07 2019

The last one, MultiplyAdd, is intended to be like: https://en.wikipedia.org/wiki/Multiply–accumulate_operation

@tannergooding
Copy link
Member

@verelpode. Thanks for taking the time to log this.

It would be nice if the original issue could be broken up into a few concrete proposals that follow the "good example" template given here: https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md#steps. This will help us prioritize and review the individual API proposals as necessary.

It's probably worth mentioning that I've given some thought to the issues frequently brought up around BigInteger and I've been meaning to create a proposal around it, but haven't managed to find the time to do so yet. The gist behind that proposal would be extending BigInteger to have a BigInteger.Builder type. This type would be mutable, growable, and follow the existing pattern we have for other immutable structs. It should help resolve many of the allocation problems people have when performing multiple operations with BigInteger.

Nasty conversion from 128-bit integer to BigInteger

I want to think about this one some more as there have also been some discussions around exposing proper Int128/UInt128 types (an issue tracking us prototyping them can be seen here: dotnet/corefxlab#2635). We might also be able to largely solve this issue using the builder pattern.

Construct from array of UInt32
Copy to array of UInt32

We already have internal methods that do this and public versions that take/return byte. I think these are probably fine to make public provided we can do this in a safe cross-platform/architecture manner.

Typecast but with failure reported as null instead of OverflowException

I'm generally okay with exposing methods that are TryConvertToX, but they should follow our existing API pattern for this; which is to return a bool and have the result in an out parameter. We should likely also consider exposing plain ConvertToX methods for contrast (as I believe the non-throwing methods would be the explicit casts otherwise).

Tiny mistake in BigInteger.Abs, easily improved

This shouldn't need an API review and it should be fine to submit a PR (bonus points for including benchmark numbers).

GetBitLength

I think being able to get the length of a BigInteger is ok. However, it should probably be in bytes for consistency with other APIs and since that is the smallest size we can deal with anyways.

BigInteger result from 64-bit operands

I think this will largely be resolved by the Builder pattern I mentioned at the top.

Two operations in one

I want to think about this more as this is also something that comes up for things like Tensor, vectorization, or machine learning. We might also be able to handle this in the Builder pattern by exposing the underlying data in some manner...

The root problem is that when performing a operations over an array, it can be very inefficient to walk the array twice when the operations could instead be merged into the same "loop". This can be partially solved by exposing specialized operations, but this can quickly bloat the exposed API surface area....

It would be nice if we had a centralized place that handled these or if their was some way to tell the runtime that two operations could be "merged" together...

@verelpode
Copy link

Hi Tanner, thanks for the quick reply. Here's my thoughts on those issues:

The gist behind that proposal would be extending BigInteger to have a BigInteger.Builder type. This type would be mutable, growable,

Indeed BigInteger.Builder would make BigInteger much more powerful, especially when an algorithm runs in a loop and must perform many iterations and each iteration modifies/mutates the same local variables of type BigInteger.Builder.

exposing proper Int128/UInt128 types

Would be useful. One useful ability of Int128 is that when you multiply, add, or subtract two Int64 operands to produce a Int128 result, it is guaranteed not to overflow. Sometimes it is desirable to check if the result no longer fits in Int64 and then perform a different action using the Int128 value, instead of aborting the program or method with OverflowException. Catching OverflowException is not a good solution because exceptions are relatively expensive to throw and catch, and OverflowException still doesn't provide the result of the multiply/add/subtract.

ToArray / ToByteArray / ToUInt32Array

We already have internal methods that do this and public versions that take/return byte.

Currently, it is problematic when you want to move data into or out of BigInteger. Although public BigInteger.ToByteArray() already exists, unfortunately it creates a new instance of the array. I'd prefer to be able to do it like this example:

BigInteger a = ...;
BigInteger b = ...;
BigInteger c = ...;
int bufferLen = a.ByteLength + b.ByteLength + c.ByteLength;
byte[] buffer = new byte[bufferLen];
int currentOffset = 0;
int integerLen;
a.ToArray(buffer, currentOffset, out integerLen);
currentOffset += integerLen;
b.ToArray(buffer, currentOffset, out integerLen);
currentOffset += integerLen;
c.ToArray(buffer, currentOffset, out integerLen);
currentOffset += integerLen;

Thus the following methods and properties would be created in BigInteger:

public int ByteLength { get; }
public int BitLength { get; }
public void ToArray(byte[] destinationArray, int destinationOffset, out int outLengthWritten, out bool outNegative);
public void ToArray(UInt32[] destinationArray, int destinationOffset, out int outLengthWritten, out bool outNegative);
// And if desired, could also support UInt64 array:
public void ToArray(UInt64[] destinationArray, int destinationOffset, out int outLengthWritten, out bool outNegative);

provided we can do this in a safe cross-platform/architecture manner.

In that case, in order to improve cross-platform portability, I suggest that the sign be output as a boolean "is negative" parameter instead of outputting two's complement representation.

New BigInteger from ArraySegment

I suggest creating the following constructors that would copy/convert an ArraySegment into a new BigInteger instance:

public BigInteger(System.ArraySegment<byte> data, bool isNegative) { ... }

// Maybe also support both little-endian and big-endian via an "isBigEndian" parameter:
public BigInteger(System.ArraySegment<byte> data, bool isNegative, bool isBigEndian = false) { ... }

// More efficient than byte array is to use UInt32 array:
public BigInteger(System.ArraySegment<UInt32> data, bool isNegative) { ... }

// If desired, could also support UInt64 array:
public BigInteger(System.ArraySegment<UInt64> data, bool isNegative) { ... }

The input ArraySegment would contain simply the absolute/unsigned representation of the integer, not two's complement, and the sign would be provided separately via the boolean "isNegative" parameter. In my experience, a boolean "isNegative" parameter is easier to understand, easier to use, more reliable, and more portable (cross-platform) than using two's complement in a BigInteger or BigNumber or BigRational.

TryConvertToX and ConvertToX

I'm generally okay with exposing methods that are TryConvertToX, but they should follow our existing API pattern for this; which is to return a bool and have the result in an out parameter.

That's a good point, but here is another good point: That API pattern was designed before System.Nullable<T> existed, and also before C# supported the ability to make a function return multiple result values with the help of ValueTuple. If the API pattern was designed today, then I believe people would use either Nullable<T> or return ValueTuple, instead of the old style of return bool + out parameter. It is also interesting to note that when any method has both a return value and one or more output parameters, then it is kind of a violation of functional programming principles and C# async methods that return Task<T>. Generally speaking, at least in theory, an API is more modern or more "functional" when it doesn't define a return value AND out parameters in the same method: It should use EITHER out parameters or a return value or multiple return values via ValueTuple.

In other words, one can say that ideally/preferably a method should be designed as either a pure "function" or a "procedure", not a mix of both. I'm trying to adopt this modern principle, but admittedly in practice it is sometimes convenient to define return value and out params in the same method, even if it does kind of violate functional programming principles and C# async methods.

We should likely also consider exposing plain ConvertToX methods for contrast

I agree.

public Int64 ConvertToInt64()
{
    Int64? nv = this.TryConvertToInt64();
    if (nv.HasValue) return nv.GetValueOrDefault();
    throw new System.OverflowException();
}

ByteLength and BitLength

I think being able to get the length of a BigInteger is ok. However, it should probably be in bytes for consistency with other APIs and since that is the smallest size we can deal with anyways.

I think both are useful. ByteLength is especially useful in combination with ToByteArray/ToArray methods. And BitLength is useful in certain algorithms, for example:

/// <summary>Calculates the integer part of logarithm of inNumber or Floor(Log(inBase,inNumber)).</summary>
public static uint64 IntegerLogarithm(uint32 inBase, BigInteger inNumber)
{
    ...
    if (inBase == 2)
        return inNumber.BitLength - 1; // Floor(Log(2,number)) == number.BitLength-1 == IntegerLength(number,2)-1
    ...
}

Example implementation of BitLength:

public int BitLength
{
    get {
        int sourceSign = this._sign;
        if (sourceSign == 0) return 0;
        uint[] sourceArray = this._bits;
        if (sourceArray != null)
        {
            int lastOrd = GetMinimumWordCount(sourceArray) - 1; // excluding leading zero words.
            if (lastOrd < 0) return 0;
            int lastWordBitCount = 32 - CountLeadingZeroBits32(sourceArray[lastOrd]);
            return checked( (lastOrd * 32) + lastWordBitCount );
        }
        uint absSmallValue;
        if (sourceSign < 0) absSmallValue = unchecked((uint)(-sourceSign));
        else absSmallValue = unchecked((uint)sourceSign);
        if (absSmallValue == 1) return 1;
        return unchecked(32 - CountLeadingZeroBits32(absSmallValue));
    }
}

Resulting Sign of Bitwise Operators

Currently the documentation says:
"The BitwiseAnd method performs the bitwise And operation on two BigInteger values as if they were both in two's complement representation with virtual sign extension."

That surprised me because I presumed that bitwise AND, OR, and XOR in BigInteger should not use two's complement, and instead should calculate the result sign by applying the same bitwise operator to the sign bits, meaning:

  1. Calculate the boolean "is negative" status of operand A.
  2. Calculate the boolean "is negative" status of operand B.
  3. bool resultIsNegative = aIsNegative & bIsNegative;

The same technique is applicable to BAND, BOR, and BXOR:

bool bandResultIsNegative = aIsNegative & bIsNegative;
bool borResultIsNegative  = aIsNegative | bIsNegative;
bool bxorResultIsNegative = aIsNegative ^ bIsNegative;

Correct me if I'm wrong, but I think this technique will produce the same sign as if two's complement is used, therefore you could switch over to this technique even without breaking compatibility with the existing behavior of the bitwise operators in BigInteger and in regular System.Int32. Admittedly I haven't fully tested this, but off the top of my head, I think I'm correct in saying the above.

The existing version of BigInteger is already an excellent help and looks well-designed and well-implemented -- I like it a lot -- but a few extra methods as described above would increase the usability and performance of BigInteger, especially if we could eliminate the difficulty of efficiently moving data into and out of BigInteger, via the addition of constructors that take ArraySegment<UInt32> and a ToArray method that doesn't create a new array instance, and use of efficient UInt32 array instead of unnecessary extra cost incurred for conversion between byte array and internal UInt32 array.

@tannergooding
Copy link
Member

ToArray / ToByteArray / ToUInt32Array

Exposing methods that fill an existing buffer, rather than creating a new one, sound fine and is something we provide in other types. Directly exposing the underlying array (to allow access without any copying) via a ROSpan<byte> might also be possible...

New BigInteger from ArraySegment

Most APIs aren't taking or using ArraySegment, they are using Span<T> instead. We would likely do the same here.

TryConvertToX and ConvertToX

CoreFX isn't only designed for C#, it needs to work for multiple languages. I believe this has come up in API review in the past and the guidance is to still follow the current pattern which returns a bool and has an out T. This also avoids some other problems that Nullable<T> can have, such as no longer allowing atomic access to the underlying value.

ByteLength and BitLength

I agree both can be useful, but if you want the true bit length, then it is likely better exposed as the integral Log2.

@verelpode
Copy link

Here's one more issue that might interest you or other readers. Firstly, note that every new PC sold today (and every PC sold since the year 2000-and-something) has a 64-bit processor. Even "small" processors such as Intel Atom are 64-bit (some people think that Intel Atom processors are 32-bit but they're actually 64-bit and run 64-bit Windows well).

Currently the BigInteger struct contains these 2 fields:

public struct BigInteger
{
    internal int _sign;
    internal uint[] _bits;
}

Thus the struct occupies a minimum of 32 + 64 = 96 bits but actually it occupies 128 bits because the array reference _bits needs to be aligned to a 64-bit boundary in RAM. Considering that the struct normally occupies 128 bits already, this means that 32 bits are currently unused/wasted, so why not take advantage of these unused 32 bits and change the _sign field from Int32 to Int64? The struct would continue to consume the same size, 128 bits, but with the advantage that BigInteger would become capable of representing larger integers inline before resorting to creating an array object. What do you think?

@tannergooding
Copy link
Member

I've put some thought into modifying BigInteger to operate on native int (so UInt64 on 64-bits and UInt32 on 32-bits) and believe it would allow some measurable perf gains.

However, I also need to follow up on whether breaking binary serialization compatibility here is OK.

@verelpode
Copy link

I've put some thought into modifying BigInteger to operate on native int

That's a different issue than what I mentioned. I was talking about changing the _sign field to Int64 regardless of whether the _bits field is UInt32[] or UInt64[], because the _bits field is a reference that causes 64-bit alignment regardless of whether it is a reference to UInt32[] or UInt64[].

Most APIs aren't taking or using ArraySegment, they are using Span instead.

True true, I forgot that Span<T> replaces ArraySegment<T>.

Directly exposing the underlying array (to allow access without any copying) via a ROSpan might also be possible...

I think such direct access should be permitted even if it is unsafe, because you can name the method "UnsafeGetInternalSpan" or property "UnsafeDirectSpan", thus the caller of "UnsafeDirectSpan" is clearly warned that it is their own responsibility to be careful when using direct access. If a programmer doesn't want to take the risk, then he can simply choose not to use the "UnsafeDirectSpan" property. Whereas if direct access is completely forbidden in the public API, then it is quite frustrating when the only thing stopping us from solving a special-case problem is "access denied".

Thus in my opinion access shouldn't be denied, rather it should say "This is unsafe, not recommended, be careful, but it is allowed just in case you need it to solve a problem in a special case."

However, considering that BigInteger is immutable, if the API returns ROSpan not Span, then it's safe anyway. Both could be permitted:

public ROSpan<uint> InternalBits { get; }
public Span<uint> UnsafeInternalBits { get; }

I agree both can be useful, but if you want the true bit length, then it is likely better exposed as the integral Log2.

OK. In order to solve the problem that people don't know or don't remember that Log2 == bit length, the Log2 method could have a doc-comment that mentions in <summary> or <remarks> that Log2 == bit length.

BigInteger currently contains an implementation of BitLengthOfUInt that can be replaced:

internal static int BitLengthOfUInt(uint x)
{
    int numBits = 0;
    while (x > 0)
    {
        x >>= 1;
        numBits++;
    }
    return numBits;
}

Unfortunately CoreFX or the CLR does not yet support use of the x86 BitScanForward and BitScanReverse instructions (see #32269), so in the meantime, a less-bad solution is to use the algorithm from "Hacker's Delight" by H.Warren:

public static int CountLeadingZeroBits32(uint32 inNumber)
{
    unchecked {
        int x, y, m, n;
        x = (int)inNumber;
        y = -(x >> 16);        // If left half of x is 0,
        m = (y >> 16) & 16;    // set n = 16.  If left half
        n = 16 - m;            // is nonzero, set n = 0 and
        x = x >> m;            // shift x right 16.
                               // Now x is of the form 0000xxxx.
        y = x - 0x100;         // If positions 8-15 are 0,
        m = (y >> 16) & 8;     // add 8 to n and shift x left 8.
        n = n + m;
        x = x << m;

        y = x - 0x1000;        // If positions 12-15 are 0,
        m = (y >> 16) & 4;     // add 4 to n and shift x left 4.
        n = n + m;
        x = x << m;

        y = x - 0x4000;        // If positions 14-15 are 0,
        m = (y >> 16) & 2;     // add 2 to n and shift x left 2.
        n = n + m;
        x = x << m;

        y = x >> 14;           // Set y = 0, 1, 2, or 3.
        m = y & ~(y >> 1);     // Set m = 0, 1, 2, or 2 resp.
        return (int)(n + 2 - m);
    }
}

@tannergooding
Copy link
Member

That's a different issue than what I mentioned. I was talking about changing the _sign field to Int64 regardless of whether the _bits field is UInt32[] or UInt64[], because the _bits field is a reference that causes 64-bit alignment regardless of whether it is a reference to UInt32[] or UInt64[].

Yes, and I think it would only be considered if we kept the _bits field the same size. Even if most modern computers support 64-bit, not every app runs in 64-bit and we don't want to regress perf for the 32-bit scenario.

The optimal solution would be to make both _sign and _bits be native int so that they can do the most efficient thing for the current platform/architecture.

public Span UnsafeInternalBits { get; }

I wouldn't consider exposing this on the readonly BigInteger struct as it breaks the semantics of the type. There would, instead, likely be a direct way to get a writeable Span from the Builder struct.

Unfortunately CoreFX or the CLR does not yet support use of the x86 BitScanForward and BitScanReverse instructions (see #27382), so in the meantime, a less-bad solution is to use the algorithm from "Hacker's Delight" by H.Warren:

We now have System.Numerics.BitOperations.Log2 which will do the efficient thing on modern hardware (it is implemented using lzcnt, where available). Supporting BitScanForward and BitScanReverse might happen in the future as part of future HWIntrinsic expansion, but that is to be proposed/reviewed/determined.

@verelpode
Copy link

The optimal solution would be to make both _sign and _bits be native int so that they can do the most efficient thing for the current platform/architecture.

Sounds good, but does it cause a problem for you if you need to internally produce a 128bit result from the multiplication (or addition) of two 64bit array elements in _bits? With the current implementation where _bits is UInt32[], you can simply use UInt64 to multiply (or add) two elements with guaranteed success (without overflow). Would you lose this ability if you change the array to native int? Would you need the CLR to give you access to the CLMUL instruction? The CLMUL instruction computes the 128-bit carry-less product of two 64-bit values. If I recall correctly, x86 also has another instruction for 128-bit math but I forget the name.

In other words, would native int[] _bits mean that you would require a hardware-optimized built-in Int128 type?

Directly exposing the underlying array (to allow access without any copying) via a ROSpan might also be possible...

If you'd like to permit the internal _bits field to be changed to a different array type in a future version of CoreFX, then the public API for direct access could be like this:

public int InternalElementSize    // returns either 4 or 8.
{
    get { return sizeof(_bits[0]); }
}

public ROSpan<byte> InternalBits { get { return new ROSpan<byte>(_bits); } }

//public Span<byte> UnsafeInternalBits { get { return new Span<byte>(_bits); } }

Thus the returned span would always be ROSpan<byte> regardless of whether internal _bits is an array of byte, UInt32, UInt64, or native int, and it is the caller's responsibility to read and respect the InternalElementSize property when interpreting the ROSpan<byte>.

We now have System.Numerics.BitOperations.Log2

Oh, that's new, great! Damn, apparently the reason I can't find it in the Object Browser is that it's not public (#35606).

@tannergooding
Copy link
Member

Would you need the CLR to give you access to the CLMUL instruction?

We will be exposing that via the hardware intrinsics feature in .NET Core 3.0: System.Runtime.Intrinsics.X86.Pclmulqdq.CarrylessMultiply.

But otherwise, it is all fully implementable via software emulation (and will need to be for the non-x86 codepath). Same as 64-bit arithmetic on a 32-bit machine is done.

In other words, would native int[] _bits mean that you would require a hardware-optimized built-in Int128 type?

Not required, it would just help in some cases for computing the carry per iteration.

Oh, that's new, great! Damn, apparently the reason I can't find it in the Object Browser is that it's not public (#35606).

Right, it will be shipped first in .NET Core 3.0 and was only recently exposed.

@svick
Copy link
Contributor

svick commented Apr 8, 2019

There is already a constructor that accepts Span<byte>. If the goal of the proposals for new ways of creating BigInteger is avoiding allocations, then I think they're mostly not necessary, since you should be able to implement them yourself using that.

If the goal is also convenience, then they could still be useful.

@verelpode
Copy link

There is already a constructor that accepts Span.

Great! I didn't know that BigInteger has already advanced beyond the version that shipped with VS 2019 this week.

What if the source array is UInt32[]? Can it still be passed to the constructor that accepts ROSpan<byte>? Thinking about the "isBigEndian" parameter and the cross-platform interaction with System.BitConverter.IsLittleEndian is giving me a headache. Even if it's possible to safely typecast UInt32[] to ROSpan<byte>, people are likely to make mistakes with this, therefore I suggest the addition of at least a ROSpan<UInt32> constructor. Potentially all of the following could be supported:

public BigInteger (ReadOnlySpan<byte> value, bool isUnsigned = false, bool isBigEndian = false);
public BigInteger (ReadOnlySpan<UInt32> value, bool isUnsigned = false);
public BigInteger (ReadOnlySpan<UInt64> value, bool isUnsigned = false);
public BigInteger (ReadOnlySpan<native int> value, bool isUnsigned = false);

Should the "isBigEndian" parameter only exist for ROSpan<byte>, not for the others? If ROSpan<UInt32> should have a "isBigEndian" parameter, then I believe it should only affect the order of the elements in the UInt32 array, not the order of the bytes within each UInt32.

I would be satisfied if ROSpan<UInt32> only supports little-end-first because I think I'm correct in saying that regardless of architecture nobody places uint32 array elements in big-endian order. The first element in a uint32 or uint64 array is always the little end of the integer AFAIK, even if the CPU is big-endian. Thus it seems that the "isBigEndian" parameter is only relevant for ROSpan<byte>.

@tannergooding
Copy link
Member

I've logged https://github.com/dotnet/corefx/issues/37204 proposing a BigInteger.Builder type.

@ygc369
Copy link

ygc369 commented Apr 26, 2019

@tannergooding
Java has BigInteger and BigDecimal, C# only has BigInteger. That's a pity. I hope C# to have BigDecimal API too.

@svick
Copy link
Contributor

svick commented Apr 26, 2019

@ygc369

There has been some discussion about BigDecimal at https://github.com/dotnet/corefx/issues/17267.

Also, I suspect this comment from 2015 about the related BigRational type might be relevant:

Right now I think it's fair to say that we don't have any concrete plans to port [BigRational] over to corefx. If you feel that there's a scenario you'd like to have this functionality for, please open a new issue and we can talk about the necessary shape of the library there, use cases, etc, perhaps using this old BigRational type as a starting point.

@tannergooding
Copy link
Member

dotnet/corefxlab#2635 tracks prototyping some of these types, like a BigDecimal type.

There is a System.Numerics.Experimental project in corefxlab and if someone wants to work on the initial prototype, I'd be happy to take it (for any of the types listed in the meta issue).

I would just ask you split off the type you plan on working on into it's own issue on corefxlab and try to capture a rough API shape. That issue can eventually be moved back to CoreFX for the API review when the type is done.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020
@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented Jun 25, 2022

I believe all points except the uint[] constructor & CopyTo are covered by UInt128/Int128 and Generic Math

Also the uint representation of BigInteger is implementation detail and there's plans to use nuint instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants