Still having large bit precision calculation issues #522

wjamesjr · 2024-10-14T17:01:33Z

When using mpfrs at high precision (> 2 ** 30 bits) algorithms fail. I have verified that mpfr works as expected using a C++ wrapper and test pi agm code. Using very similar code with gmpy, the code fails (agm fails to converge). I am attaching some test code to demonstrate the problem. At 250 million digits the agm routine works, at 400 million digits the agm routine fails to converge. The c++ wrapper had a similar issue that was determined to be an argument in a call that needed to be a long that was an int.
PiAGM_GMPY.py.zip

skirpichev · 2024-10-16T05:19:26Z

Using very similar code with gmpy

Attached archive has only Python code. We can't guess how "similar" it is to your C/C++ code.

wjamesjr · 2024-10-16T05:35:03Z

C++ code is attached.
main.cpp.zip

skirpichev · 2024-10-16T06:06:22Z

Well, maybe I miss something else (versions use different variable name conventions, etc; it's not trivial to compare literally), but so far you code examples aren't "identical". For example:

to set precision you are using math.log(10) / math.log(2) coefficient, instead of more precise math.log2(10). Either way it's not an equivalent of mpfr::digits2bits().
you are using different initial values for y0/g. sqrt(0.5) != 1/sqrt(mpfr("2.0").
finally, mpf("10.E-digits") != mpfr("10.0")**-digits.

So, if you C++ code works correctly, please double check that Python code match it.

wjamesjr · 2024-10-16T13:21:19Z

Made changes to Python source as suggested. Still exhibits same issue with large bit precision.
PiAGM_GMPY.py.zip

skirpichev · 2024-10-16T13:30:40Z

So far you just changed math.log(10) / math.log(2) to math.log2(10)

wjamesjr · 2024-10-16T14:25:03Z

bcf = math.log2(10)

g, z, half = sqrt(mpfr("0.5")), mpfr("0.25"), mpfr("0.5")

epsilon = mpfr("10.E-" + str(needed_digits))

The above three lines were changes.

casevh · 2024-10-20T04:10:13Z

I have identified a bug in gmpy2. The new exponent limits assigned to a context are not being used to configure the MPFR library. The are currently only used when normalizing a final result.

The easy fix is to configure the MPFR library to used the maximum exponent on startup. The MPFR exponent range is thread local setting. So unilaterally change the value may impact other CPython applications (SageMATH) that may be running in the same thread.

The better option is saving the current exponent range, setting the new exponent range, calling MPFR, restoring the old exponent range.

I prefer the second option but it will be more complicated.

wjamesjr · 2024-10-20T06:24:55Z

Glad to hear that you found a problem. I am in no hurry for a solution, so take whatever time is needed to get it right.

skirpichev · 2024-10-20T06:56:16Z

@wjamesjr, just to confirm, could you try the "easy fix"?

diff --git a/src/gmpy2_context.c b/src/gmpy2_context.c
index fecd79af..e4c776da 100644
--- a/src/gmpy2_context.c
+++ b/src/gmpy2_context.c
@@ -1034,6 +1034,7 @@ GMPy_CTXT_Set_emin(CTXT_Object *self, PyObject *value, void *closure)
         return -1;
     }
     self->ctx.emin = exp;
+    mpfr_set_emin(exp);
     return 0;
 }

@@ -1067,6 +1068,7 @@ GMPy_CTXT_Set_emax(CTXT_Object *self, PyObject *value, void *closure)
         return -1;
     }
     self->ctx.emax = exp;
+    mpfr_set_emax(exp);
     return 0;
 }

wjamesjr · 2024-10-20T07:37:09Z

Sorry I have never built gmpy from source. Always used pip.

skirpichev · 2024-10-20T07:59:09Z

It's not too hard and documented:
https://gmpy2.readthedocs.io/en/latest/install.html#from-sources

Just apply the patch before doing pip install ..

wjamesjr · 2024-10-20T08:55:50Z

I tried to follow instructions and it timed out on my Mac. See below:

Last login: Sun Oct 20 03:49:44 on ttys000
williamjames@Mac gmpy % git clone git://github.com/aleaxit/gmpy.git
Cloning into 'gmpy'...
fatal: unable to connect to github.com:
github.com[0: 140.82.112.3]: errno=Operation timed out

skirpichev · 2024-10-20T09:23:10Z

Ah, perhaps it's some firewall settings on your side. git clone https://github.com/aleaxit/gmpy.git - probably should work as you comment on the github.com.

wjamesjr · 2024-10-20T11:22:15Z

Got it cloned and source change made now have an error on pip install:

williamjames@Mac gmpy % pip install -e .[tests]
zsh: no matches found: .[tests]

directory looks like:

COPYING MANIFEST.in demo pyproject.toml src
COPYING.LESSER README.rst docs scripts test
INSTALL TODO gmpy2 setup.py test_cython

skirpichev · 2024-10-20T11:43:49Z

On Sun, Oct 20, 2024 at 04:22:37AM -0700, Bill James wrote: Got it cloned and source change made now have an error on pip install: ***@***.*** gmpy % pip install -e .[tests] zsh: no matches found: .[tests]

You can do just ``pip install .``, this shouldn't trigger some zsh-specific behaviour.

wjamesjr · 2024-10-20T12:14:04Z

OK just pip install . worked but now 'gmp.h' can't be found . gmp, mpfr and mpc are installed but since they were installed by home-brew they are installed in non-standard locations. I ran into this before that stopped me from compiling open source packages.

wjamesjr · 2024-10-20T13:23:14Z

Installed gmp, mpfr and libmpc with macports, pip install . still can't find gmp.h.

skirpichev · 2024-10-21T02:52:03Z

Hmm, probably you can set correct CFLAGS and LDFLAGS like this:

PGMP="$(brew --prefix gmp)"
PMFR="$(brew --prefix mpfr)"
PMPC="$(brew --prefix libmpc)"
CFLAGS="-I$PGMP/include -I$PMFR/include -I$PMPC/include" LDFLAGS="-L$PGMP/lib -L$PMFR/lib -L$PMPC/lib" pip install .

casevh · 2024-10-21T03:07:41Z

The current behavior of mpfr and mpc calculations involves two steps. The first step uses the MPFR/MPC libraries to calculate the result to the desired precision and the default exponent range. The default exponent range is a thread-local (or global) setting in the MPFR library. The second step extracts the exception flags and normalizes the result. During the normalization step, gmpy2 temporarily changes the exponent range and then restores the previous exponent range. To keep the same behavior with 64 bit exponent range, we need to set the exponent range used by the MFPR library to MPFR_EMAX_MAX and MPFR_EMIN_MIN. This can be done by adding the following two lines to the startup code in gmpy2.c: /* Set precision to maximum and minimum values. */ mpfr_set_emax(MPFR_EMAX_MAX); mpfr_set_emin(MPFR_EMIN_MIN); The default values in a context will still show the 30 bit default but that is only used for normalization. Larger exponent ranges values will return the expected values since the calculations use the maximum. It is trivial to change GMPy_CTXT_New to use the expanded range but it is a user-visible change. I think gmpy2 shouldn't assume that the exponent values used by the MPFR library don't change. This will require setting and restoring the exponent setting for all calls into MFR/MPC. This will be a complex change. For a bugfix release, I think setting the maximum exponent range is fine. There should be no visible change and code that requests a larger exponent range will "just work". I don't have a strong opinion about changing the context defaults. Eliminating the assumption that the underlying exponent range does not change should require a new version. casevh

…

On Sun, Oct 20, 2024 at 6:23 AM Bill James ***@***.***> wrote: Installed gmp, mpfr and libmpc with macports, pip install . still can't find gmp.h. — Reply to this email directly, view it on GitHub <#522 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAMR236CFQUDTM3O742E67DZ4OVFRAVCNFSM6AAAAABP5NDHM2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRUHE2TSNBVGY> . You are receiving this because you commented.Message ID: ***@***.***>

wjamesjr · 2024-10-21T08:14:41Z

@wjamesjr, just to confirm, could you try the "easy fix"?

diff --git a/src/gmpy2_context.c b/src/gmpy2_context.c
index fecd79af..e4c776da 100644
--- a/src/gmpy2_context.c
+++ b/src/gmpy2_context.c
@@ -1034,6 +1034,7 @@ GMPy_CTXT_Set_emin(CTXT_Object *self, PyObject *value, void *closure)
         return -1;
     }
     self->ctx.emin = exp;
+    mpfr_set_emin(exp);
     return 0;
 }

@@ -1067,6 +1068,7 @@ GMPy_CTXT_Set_emax(CTXT_Object *self, PyObject *value, void *closure)
         return -1;
     }
     self->ctx.emax = exp;
+    mpfr_set_emax(exp);
     return 0;
 }

I finally got the "easy fix" to build and tested it against the test case that I provided and it did not work for me. That being said I am not confident that I have done everything right. Skirpichev, thank you for your patience, I can now build the latest version in the future if I need to.

Case, thank you for your explanation of a fix. I wasn't expecting it to be this complicated.

wjamesjr · 2024-10-22T01:06:14Z

Figured out what I was doing wrong. I was running Python 3.13 interpreter and never using the modified code. I had already backed out the "quick fix" change and changed to the two line addition to gmpy2.c when I figured this out. Ran a test against the test code I submitted and using Python 3.11 it worked perfectly. How can you modify the downloaded code to use 3.13 instead of 3.11?

skirpichev · 2024-10-22T03:27:06Z

Usually it's a good idea to use virtual environments, especially if you are using different interpreter versions, work on different projects, etc.

E.g. after git clone ...gmpy.git && cd gmpy && patch the gmpy2_context.c, do:

python3.13 -m venv .venv/3.13/  # create virtual environment
. .venv/3.13/bin/activate  # activate this virtual environment
pip install .  # install gmpy2 into this environment
python /path/to/PiAGM_GMPY.py  # run your script

casevh · 2024-10-22T04:51:34Z

I created a branch called issue522. The resulting binaries can be found at https://github.com/aleaxit/gmpy/actions/runs/11452424995

Download the wheels.zip file at the bottom of the page and extract the appropriate version for you system. It can be installed with the command "python3.13 -m pip install gmpy2-2.2.1-cp313-cp313-macosx_11_0_arm64.whl". (I think that is correct version for you system.)

I've already tested the Windows version and it fails. The MPFR exponent type is defined as a C long so it is only 32 bits on Windows platform.

wjamesjr · 2024-10-22T05:38:34Z

Tested against Python 3.11, 3.12, and 3.13 on arm64 Mac. All three worked.

casevh · 2024-10-22T14:19:31Z

Thanks for checking. I completed a successful test for 1_000_000_000 digits in slightly less than 2 hours. Now testing 2_000_000_000.

wjamesjr · 2024-10-22T17:24:42Z

I am running into a problem now converting large mpfrs to a string. It actually causes a Python crash. I am now trying to find the hard cutoff. Suspect it is about 2**31 decimal digits (2147483648). Working to verify it now. Are you aware of a hard limit for max digits for mpfr to string conversions?

skirpichev · 2024-10-22T17:46:17Z

What kind of "crash", have you seen some error message? How you do conversion? E.g. str/repr/format.

wjamesjr · 2024-10-22T17:53:19Z

I am running it within PyCharm. I get a popup window that says that Python quit unexpectedly. I am attaching a simple test case that will demonstrate the problem. It causes Python to quit on my Mac.
test.py.zip

wjamesjr · 2024-10-22T18:04:40Z

Error message is:

Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)

Using str(mpfr) to convert.

wjamesjr · 2024-10-23T00:18:50Z

Exit code 139 is a seg fault. The conversion to a string is taking some time so it isn't happening immediately after start of conversion. My guess is some code is writing beyond a memory buffer.

skirpichev · 2024-10-23T03:31:55Z

My guess is some code is writing beyond a memory buffer.

Can you try to reproduce this in C? str(mpfr) internally just calls "{0:.pg}".format(mpfr), where

p = (long)(log10(2) * (double)mpfr_get_prec(MPFR(self))) + 2;

which in turn calls mpfr_asprintf(ppbuf, "%.pRg", mpfr).

Looking on the code, I don't think problem is on our side:

gmpy/src/gmpy2_convert_mpfr.c

Line 881 in 17fb42d

GMPy_MPFR_Str_Slot(MPFR_Object *self)

gmpy/src/gmpy2_format.c

Line 202 in 17fb42d

GMPy_MPFR_Format(PyObject *self, PyObject *args)

wjamesjr · 2024-10-23T09:46:02Z

Looking on the code, I don't think problem is on our side:

I agree. The mpfr_asprintf call returns an int with the number of characters placed in the buffer. int on my system is an int32. So the call is incapable of returning a value greater than 2,147,483,647. This is why 2,147,483,648 is failing for my test case. I am guessing that the call is returning a negative value for number of characters written in this case. The C++ wrapper that I was using checks for a negative return.

inline std::string mpreal::toString(const std::string& format) const
{
char *s = NULL;
std::string out;

if( !format.empty() )
{
    if(!(mpfr_asprintf(&s, format.c_str(), mpfr_srcptr()) < 0))
    {
        out = std::string(s);

        mpfr_free_str(s);
    }
}

return out;

}

So it looks like mpfr is not capable of converting mpfrs whose length is greater than 2,147,483,647. It looks like the gmpy2 code does not check for a negative value and presses forward. Is that an issue to fix.

See aleaxit#522 (comment)

skirpichev · 2024-10-23T10:40:47Z

Good catch. See #526, I would appreciate testing (wheels could be downloaded at https://github.com/aleaxit/gmpy/actions/runs/11478047408?pr=526).

On another hand, e.g. on my system - INT_MAX is 2147483647, while gmpy2.get_max_precision() reports 9223372036854775551. Perhaps, we should use the mpfr_out_str() (returns size_t) in str/repr instead.

wjamesjr · 2024-10-23T13:58:31Z

Good catch. See #526, I would appreciate testing (wheels could be downloaded at https://github.com/aleaxit/gmpy/actions/runs/11478047408?pr=526).

Tested against 2147483648 digit precision on Python 3.11, 3.12, 3.13 and received the message below:

Traceback (most recent call last):
File "/Users/williamjames/PycharmProjects/stringtest/test.py", line 16, in
s1 = str(z)
RuntimeError: Number of characters written exceeds the INT_MAX

Tested against 2147483600 and got normal conversion.

mpfr_out_str() looks promising!

wjamesjr · 2024-10-23T21:03:02Z

I did some testing to see if there is a precision at the boundary from returning a string and throwing an exception. Setting a precision of 2147483644 in my test script is the boundary and it still seg faults. Maybe it is returning a zero versus a negative value??

wjamesjr · 2024-10-24T00:41:49Z

mpfr_get_str writes directly to a string but separates the exponent into a separate variable. It has size_t character size which makes it viable. But I don't know how it can be a replacement for str() since the formatting is different.

wjamesjr · 2024-10-24T03:02:01Z

The C++ wrapper http://www.holoborodko.com/pavel/mpfr/ that I have been testing has a toString() call. It has a conditional if construct that uses mpfr_asprintf() for mpfr versions 2.4.0+ and mpfr_get_str for versions less than 2.4.0. I am modifying the code to force the use of mpfr_get_str and see how well it works.

casevh · 2024-10-24T05:56:30Z

I have some good news. I accidentally solved this issue years ago.

The digits() method of an mpfr returns a tuple containing the digits in the mantissa, the exponent, and the precision. It calls mpfr_get_str which doesn't appear to have a size limit.

Maybe we just update some documentation and the error message in #526?

>>> from gmpy2 import *
>>> get_context().precision=8_500_000_000
>>> a=mpfr(1)/7
>>> b=a.digits()
>>> len(b[0])
2558754965
>>>

skirpichev · 2024-10-24T07:59:53Z

Setting a precision of 2147483644 in my test script is the boundary and it still seg faults.

Did you check error message, it has SIGSEGV?

Perhaps, you could try following C program and check it's behaviour near the limit:

/* compile with cc a.c -lmpfr */
#include <stdio.h>
#include <stdlib.h>
#include <mpfr.h>
#include <math.h>
int main(int argc, char* argv[])
{
    mpfr_t x;
    mpfr_init_set_ui(x, 3, MPFR_RNDN);
    char *buf, fmt[20];
    long dps = INT_MAX + atoi(argv[1]);
    sprintf(fmt, "%%#.%ldRg", dps);

    int sz = mpfr_asprintf(&buf, fmt, x);
    if (sz != -1) {
//      printf("%s\n", buf);
        mpfr_free_str(buf);
    }

    mpfr_clear(x);
    return 0;
}

?

The digits() method of an mpfr returns a tuple containing the digits in the mantissa, the exponent, and the precision. It calls mpfr_get_str which doesn't appear to have a size limit.

This doesn't fix str/repr. On another hand, now I'm not sure that current str/repr is helpful for big precision.

Maybe we just update some documentation and the error message in #526?

What you suggest?

wjamesjr · 2024-10-24T09:46:59Z

Did you check error message, it has SIGSEGV?

I ran it again and the error message is different:

GNU MP: Cannot reallocate memory (old_size=2147487744 new_size=18446744071562067968)

Process finished with exit code 134 (interrupted by signal 6:SIGABRT)

wjamesjr · 2024-10-24T09:54:53Z

I have some good news. I accidentally solved this issue years ago.

The digits() method of an mpfr returns a tuple containing the digits in the mantissa, the exponent, and the precision. It calls mpfr_get_str which doesn't appear to have a size limit.

The digits() method meets my requirement.

skirpichev · 2024-10-24T09:59:27Z

I ran it again and the error message is different: GNU MP: Cannot reallocate memory

That's ok. It means you run out of memory. The GMP in such cases just calls abort() and we can't recover from this. So, #526 - seems to be working.

wjamesjr · 2024-10-24T10:13:15Z

Maybe we just update some documentation and the error message in #526?

What you suggest?

What would the documentation change be? Maybe something like the below:

There are two methods of displaying mpfrs, str() and digits(). str() is most useful for smaller values and digits() for larger ones. str() will fail for string lengths greater than about 2147483647.

wjamesjr · 2024-10-24T10:15:52Z

I am now good with the changes made. I can calculate with high precision and display using digits().

skirpichev · 2024-11-13T07:00:07Z

There are two methods of displaying mpfrs, str() and digits(). str() is most useful for smaller values and digits() for larger ones. str() will fail for string lengths greater than about 2147483647.

I don't think this belongs to the error message.

it's too verbose
the exact bound depends on system

@casevh, I think #526 may be merged as is.

Perhaps, we should include some hints to use the digits function/method in the __str__/__repr__ docstring? But I think it's better just fix those dunders to remove that limitation.

skirpichev added a commit to skirpichev/gmpy that referenced this issue Oct 23, 2024

Prevent crash in GMPy_MPFR_Format()

c57fd1e

See aleaxit#522 (comment)

skirpichev mentioned this issue Oct 23, 2024

Prevent crash in GMPy_MPFR_Format() #526

Open

Still having large bit precision calculation issues #522

Still having large bit precision calculation issues #522

Comments

wjamesjr commented Oct 14, 2024

skirpichev commented Oct 16, 2024

wjamesjr commented Oct 16, 2024

skirpichev commented Oct 16, 2024

wjamesjr commented Oct 16, 2024

skirpichev commented Oct 16, 2024

wjamesjr commented Oct 16, 2024

casevh commented Oct 20, 2024

wjamesjr commented Oct 20, 2024

skirpichev commented Oct 20, 2024

wjamesjr commented Oct 20, 2024

skirpichev commented Oct 20, 2024

wjamesjr commented Oct 20, 2024

skirpichev commented Oct 20, 2024

wjamesjr commented Oct 20, 2024

skirpichev commented Oct 20, 2024 via email

wjamesjr commented Oct 20, 2024

wjamesjr commented Oct 20, 2024

skirpichev commented Oct 21, 2024

casevh commented Oct 21, 2024 via email

wjamesjr commented Oct 21, 2024

wjamesjr commented Oct 22, 2024

skirpichev commented Oct 22, 2024

casevh commented Oct 22, 2024

wjamesjr commented Oct 22, 2024

casevh commented Oct 22, 2024

wjamesjr commented Oct 22, 2024

skirpichev commented Oct 22, 2024

wjamesjr commented Oct 22, 2024

wjamesjr commented Oct 22, 2024

wjamesjr commented Oct 23, 2024

skirpichev commented Oct 23, 2024

wjamesjr commented Oct 23, 2024

skirpichev commented Oct 23, 2024

wjamesjr commented Oct 23, 2024

wjamesjr commented Oct 23, 2024

wjamesjr commented Oct 24, 2024

wjamesjr commented Oct 24, 2024

casevh commented Oct 24, 2024

skirpichev commented Oct 24, 2024

wjamesjr commented Oct 24, 2024

wjamesjr commented Oct 24, 2024

skirpichev commented Oct 24, 2024

wjamesjr commented Oct 24, 2024

wjamesjr commented Oct 24, 2024

skirpichev commented Nov 13, 2024