Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-102471, PEP 757: Add PyLong import and export API #121339

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jul 3, 2024

Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.


📚 Documentation preview 📚: https://cpython-previews--121339.org.readthedocs.build/

@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2024

cc @skirpichev @casevh

Include/cpython/longintrepr.h Outdated Show resolved Hide resolved
@vstinner
Copy link
Member Author

vstinner commented Jul 3, 2024

See also issue #111415

Copy link
Member

@skirpichev skirpichev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2c.

The gmpy2 code, used for benchmarking, can be found in my fork:
https://github.com/skirpichev/gmpy/tree/trying-py-import-export

Doc/c-api/long.rst Outdated Show resolved Hide resolved
Doc/c-api/long.rst Outdated Show resolved Hide resolved
Doc/c-api/long.rst Outdated Show resolved Hide resolved
Comment on lines 6705 to 6763
PyUnstable_Long_Export(PyObject *obj, PyUnstable_LongExport *long_export)
{
if (!PyLong_Check(obj)) {
PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
return -1;
}
PyLongObject *self = (PyLongObject*)obj;

long_export->obj = (PyLongObject*)Py_NewRef(obj);
long_export->negative = _PyLong_IsNegative(self);
long_export->ndigits = _PyLong_DigitCount(self);
if (long_export->ndigits == 0) {
long_export->ndigits = 1;
}
long_export->digits = self->long_value.ob_digit;
return 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this mostly give a direct access to the PyLongObject - it almost as fast as using private stuff before.

Old code:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)'
1000000 loops, best of 20: 232 nsec per loop
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)'
500000 loops, best of 11: 500 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)'
100000 loops, best of 20: 2.53 usec per loop

With proposed API:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)'
1000000 loops, best of 20: 258 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)'
500000 loops, best of 20: 528 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)'
100000 loops, best of 20: 2.56 usec per loop

Comment on lines 6697 to 6742
PyObject*
PyUnstable_Long_Import(int negative, size_t ndigits, Py_digit *digits)
{
return (PyObject*)_PyLong_FromDigits(negative, ndigits, digits);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is something I would like to avoid. This requires allocation of a temporary buffer and using memcpy. Can we offer a writable layout to use it's digits in the mpz_export directly?

Benchmarks for old code:

$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 11: 111 nsec per loop
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 11: 475 nsec per loop
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 11: 2.39 usec per loop

With new API:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 20: 111 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 20: 578 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 20: 2.53 usec per loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires allocation of a temporary buffer and using memcpy.

Right, PyLongObject has to manage its own memory.

Can we offer a writable layout to use it's digits in the mpz_export directly?

That sounds strange from the Python point of view and make the internals "less opaque". I would prefer to leak less implementation details.

Copy link
Member

@skirpichev skirpichev Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, PyLongObject has to manage its own memory.

I'm not trying to change that. More complete proposal: vstinner#4

gmpy2 patch: https://github.com/skirpichev/gmpy/tree/trying-py-import-export-v2

New benchmarks:

$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'
2000000 loops, best of 20: 111 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'
500000 loops, best of 20: 509 nsec per loop
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'
100000 loops, best of 20: 2.44 usec per loop

I would prefer to leak less implementation details.

I don't think this leak anything. It doesn't leak memory management details. PyLong_Import will just do allocation memory. Writting digits will be job for mpz_export, as before.

Without this, it seems - there are noticeable performance regression for integers of intermediate range. Something up to 20% vs 7% on my branch.

Edit: currently, proposed PyUnstable_Long_ReleaseImport() match PyUnstable_Long_ReleaseExport(). Perhaps, it could be one function (say, PyUnstable_Long_ReleaseDigitArray()), but I unsure - maybe it puts some constraints on internals of the PyLongObject.

Objects/longobject.c Outdated Show resolved Hide resolved
Doc/c-api/long.rst Show resolved Hide resolved
Doc/c-api/long.rst Outdated Show resolved Hide resolved
@skirpichev
Copy link
Member

CC @tornaria, as Sage people might be interested in this feature.

@skirpichev
Copy link
Member

CC @oscarbenjamin, you may want this for python-flint

@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

I updated my PR:

  • Use -1 for little endian and +1 for big endian.
  • Rename PyUnstable_LongExport to PyUnstable_Long_DigitArray.
  • Add "always succeed" mention in the doc.

Lib/test/test_capi/test_long.py Outdated Show resolved Hide resolved
Lib/test/test_capi/test_long.py Show resolved Hide resolved
Lib/test/test_capi/test_long.py Show resolved Hide resolved
Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved
@oscarbenjamin
Copy link
Contributor

CC @oscarbenjamin, you may want this for python-flint

Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and fmpz so anything more direct is an improvement. I expect that python-flint would use mpz_import/mpz_export just like gmpy2.

Objects/longobject.c Outdated Show resolved Hide resolved
@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

@skirpichev: I added a PyLongWriter API similar to what @encukou proposed.

Example:

PyLongObject *
_PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits)
{
    PyLongWriter *writer = PyLongWriter_Create();
    if (writer == NULL) {
        return NULL;
    }

    if (negative) {
        PyLongWriter_SetSign(writer, -1);
    }

    Py_digit *writer_digits = PyLongWriter_AllocDigits(writer, digit_count);
    if (writer_digits == NULL) {
        goto error;
    }
    memcpy(writer_digits, digits, digit_count * sizeof(digit));

    return (PyLongObject*)PyLongWriter_Finish(writer);

error:
    PyLongWriter_Discard(writer);
    return NULL;
}

The PyLongWriter_Finish() function normalizes the number and gets a small number if needed. Example:

>>> import _testcapi; _testcapi.pylong_import(0, [100, 0, 0]) is 100
True

@vstinner vstinner marked this pull request as draft July 4, 2024 13:00
@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

I mark the PR as a draft until we agree on the API.

@skirpichev
Copy link
Member

I added a PyLongWriter API similar to what @encukou proposed.

Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow.

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in above proposal.

@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

I updated the PR to remove the PyUnstable_ prefix, replace it with the PyLong prefix.

@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

But cost is 5 (!) public functions and one new struct, additionally to PyUnstable_Long_Import(), which will be a more slow API. Correct? C.f. just one function in #121339 (comment) proposal.

My concern is to avoid the problem capi-workgroup/api-evolution#36 : avoid exposing _PyLong_New() object until it's fully initialized. The "writer" API hides the implementation details but also makes sure that the object is not "leaked" to Python before it's fully initialized and valid. By the way, the implementation uses functions which are only safe if the object cannot be seen in Python: if Py_REFCNT(obj) is 1.

@vstinner
Copy link
Member Author

vstinner commented Jul 4, 2024

@skirpichev: Would it be useful to add a PyLongWriter_SetValue(PyLongWriter *writer, long value) function? It would be similar to PyLong_FromLong(long value) (but may be less efficient), so I'm not sure if it's relevant.

Doc/c-api/long.rst Outdated Show resolved Hide resolved
@skirpichev
Copy link
Member

skirpichev commented Sep 17, 2024 via email

@vstinner vstinner changed the title gh-102471: Add PyLong import and export API gh-102471, PEP 757: Add PyLong import and export API Sep 17, 2024
@oscarbenjamin
Copy link
Contributor

Do you have concerns about Windows where long is only 32-bit?
Yes, we support Windows (

There is always a way to make this work though like:

#if sizeof(int64_t) == sizeof(long)

@skirpichev
Copy link
Member

Here new benchmarks for PyLong_Export with updated gmpy2 implementation (follows PEP).

Current implementation:

Benchmark master new-api
1<<7 265 ns 283 ns: 1.07x slower
1<<38 321 ns 376 ns: 1.17x slower
1<<300 511 ns 602 ns: 1.18x slower
1<<3000 2.35 us 2.44 us: 1.04x slower
Geometric mean (ref) 1.11x slower

With PyUnstable_Long_CompactValue:

int
PyLong_Export(PyObject *obj, PyLongExport *export_long)
{   
    if (!PyLong_Check(obj)) {
        PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
        return -1;
    }
    PyLongObject *self = (PyLongObject*)obj;

    if (PyUnstable_Long_IsCompact(self)) {
        export_long->value = (int64_t)PyUnstable_Long_CompactValue(self);
        export_long->negative = 0;
        export_long->ndigits = 0;
        export_long->digits = 0;
        export_long->_reserved = 0;
    }
    else {
        export_long->value = 0;
        export_long->negative = _PyLong_IsNegative(self);
        export_long->ndigits = _PyLong_DigitCount(self);
        if (export_long->ndigits == 0) {
            export_long->ndigits = 1;
        }
        export_long->digits = self->long_value.ob_digit;
        export_long->_reserved = (Py_uintptr_t)Py_NewRef(obj);
    }
    return 0;
}
Benchmark master new-api
1<<7 265 ns 275 ns: 1.04x slower
1<<38 321 ns 346 ns: 1.08x slower
1<<300 511 ns 539 ns: 1.06x slower
1<<3000 2.35 us 2.37 us: 1.01x slower
Geometric mean (ref) 1.04x slower

With PyLong_AsLongAndOverflow:

int
PyLong_Export(PyObject *obj, PyLongExport *export_long)
{
    if (!PyLong_Check(obj)) {
        PyErr_Format(PyExc_TypeError, "expect int, got %T", obj);
        return -1;
    }
    PyLongObject *self = (PyLongObject*)obj;

    int overflow;
    long value = PyLong_AsLongAndOverflow(obj, &overflow);

    if (!overflow) { 
        export_long->value = (int64_t)value;
        export_long->negative = 0;
        export_long->ndigits = 0;
        export_long->digits = 0;
        export_long->_reserved = 0;
    }   
    else {
        export_long->value = 0;
        export_long->negative = _PyLong_IsNegative(self);
        export_long->ndigits = _PyLong_DigitCount(self);
        if (export_long->ndigits == 0) {
            export_long->ndigits = 1;
        }
        export_long->digits = self->long_value.ob_digit;
        export_long->_reserved = (Py_uintptr_t)Py_NewRef(obj);
    }
    return 0;
}
Benchmark master new-api
1<<7 265 ns 276 ns: 1.04x slower
1<<38 321 ns 284 ns: 1.13x faster
1<<300 511 ns 554 ns: 1.08x slower
1<<3000 2.35 us 2.38 us: 1.02x slower
Geometric mean (ref) 1.00x slower

Probably, last one seems to be fastest. Current implementation for PyLong_Export() also hardest to port.

Objects/longobject.c Outdated Show resolved Hide resolved
Doc/c-api/long.rst Outdated Show resolved Hide resolved
Co-authored-by: Sergey B Kirpichev <[email protected]>
@skirpichev skirpichev self-requested a review November 13, 2024 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants