Skip to content

Commit c6dd78b

Browse files
authored
ARROW-180 Support PyArrow 13 (#168)
* ARROW-180 Support PyArrow 13 * fix prerelease handling * clean up pytest invocation * remove Py3.12 support * switch back to ubuntu 20 * fix manifest * Use a manylinux wheel for linux builds * fix handling of env variable * fix handling of env variable * revert wheel building on linux * try building pyarrow from src * fixup * fix * fix and try py312 * try without gcc override * update to match pyarrow wheel target * fixups * cleanup * fixups * fix min version * undo changes to benchmarks file * add debug print * clean up asv * skip failing asv methods * more skips * try without setup_cache * fixups * more skips * undo changes * undo change to workflow * undo changes to dev guide * fix gcc version * address review
1 parent 7cabba7 commit c6dd78b

File tree

15 files changed

+131
-109
lines changed

15 files changed

+131
-109
lines changed

.github/workflows/benchmark.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,15 +42,15 @@ jobs:
4242
4343
- name: Run benchmarks
4444
run: |
45-
set -eu
45+
set -eux
4646
run_asv () {
4747
if [ ! -e "asv.conf.json" ] ; then
4848
git checkout refs/bm/pr asv.conf.json
4949
git checkout refs/bm/pr benchmarks/__init__.py
5050
git checkout refs/bm/pr benchmarks/benchmarks.py
5151
fi
5252
git show --no-patch --format="%H (%s)"
53-
asv run --python=`which python` --set-commit-hash $(git rev-parse HEAD)
53+
asv run -e --python=`which python` --set-commit-hash $(git rev-parse HEAD)
5454
}
5555
5656
asv machine --yes

.github/workflows/release-python.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ jobs:
4848
- uses: actions/setup-python@v4
4949
with:
5050
python-version: ${{env.PYTHON_VERSION}}
51+
cache: 'pip'
52+
cache-dependency-path: 'bindings/python/pyproject.toml'
53+
allow-prereleases: true
5154

5255
- name: Set up QEMU
5356
if: runner.os == 'Linux'
@@ -84,7 +87,7 @@ jobs:
8487

8588
make_sdist:
8689
name: Make SDist
87-
runs-on: ubuntu-latest
90+
runs-on: macos-latest
8891
steps:
8992
- uses: actions/checkout@v3
9093
with:

.github/workflows/test-python.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ jobs:
2121
runs-on: ubuntu-latest
2222

2323
steps:
24-
- uses: actions/checkout@v3
24+
- uses: actions/checkout@v4
2525
- uses: actions/setup-python@v4
2626
- uses: pre-commit/[email protected]
2727
with:
@@ -32,12 +32,12 @@ jobs:
3232
runs-on: ${{ matrix.os }}
3333
strategy:
3434
matrix:
35-
os: ["ubuntu-20.04", "macos-latest", "windows-latest"]
35+
os: ["ubuntu-latest", "macos-latest", "windows-latest"]
3636
python-version: [3.8, 3.9, "3.10", "3.11"]
3737
fail-fast: false
3838
name: CPython ${{ matrix.python-version }}-${{ matrix.os }}
3939
steps:
40-
- uses: actions/checkout@v3
40+
- uses: actions/checkout@v4
4141
- name: Setup Python
4242
uses: actions/setup-python@v4
4343
with:
@@ -71,6 +71,7 @@ jobs:
7171
net start MongoDB
7272
- name: Install libbson
7373
run: |
74+
pip install packaging # needed for mongo-c-driver-1.24.4/build/calc_release_version.py
7475
./build-libbson.sh
7576
- name: Install Python dependencies
7677
run: |
@@ -96,7 +97,7 @@ jobs:
9697
docs:
9798
runs-on: ubuntu-latest
9899
steps:
99-
- uses: actions/checkout@v3
100+
- uses: actions/checkout@v4
100101
- name: Cache conda
101102
uses: actions/cache@v3
102103
env:

bindings/python/MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ graft pymongoarrow
1515

1616
recursive-include test *
1717
recursive-exclude docs *
18+
recursive-exclude benchmarks *
1819

1920
global-exclude *.cpp
2021
global-exclude *.dylib

bindings/python/asv.conf.json

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,9 @@
66
"repo_subdir": "bindings/python",
77
"branches": ["main"],
88
"matrix": {
9-
"req": {
10-
"pyarrow": ["7.0.0"],
11-
"pymongo": ["3.11", "4.1.1"],
12-
"pandas": [],
13-
"Cython": [],
14-
"numpy": []
15-
},
169
"env": {
17-
"N_DOCS": ["20000", "1000"],
18-
},
10+
"N_DOCS": ["20000", "1000"]
11+
}
1912
},
20-
"environment_type": "virtualenv",
13+
"environment_type": "virtualenv"
2114
}

bindings/python/benchmarks/benchmarks.py

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ class Insert(ABC):
5353
rounds = 1
5454

5555
@abc.abstractmethod
56-
def setup_cache(self):
56+
def setup(self):
5757
raise NotImplementedError
5858

5959
def time_insert_arrow(self):
@@ -94,7 +94,7 @@ class Read(ABC):
9494
rounds = 1
9595

9696
@abc.abstractmethod
97-
def setup_cache(self):
97+
def setup(self):
9898
raise NotImplementedError
9999

100100
# We need this because the naive methods don't always convert nested objects.
@@ -160,7 +160,7 @@ class ProfileReadArray(Read):
160160
}
161161
)
162162

163-
def setup_cache(self):
163+
def setup(self):
164164
coll = db.benchmark
165165
coll.drop()
166166
base_dict = dict(
@@ -205,7 +205,7 @@ class ProfileReadDocument(Read):
205205
}
206206
)
207207

208-
def setup_cache(self):
208+
def setup(self):
209209
coll = db.benchmark
210210
coll.drop()
211211
base_dict = dict(
@@ -247,7 +247,7 @@ class ProfileReadSmall(Read):
247247
schema = Schema({"x": pyarrow.int64(), "y": pyarrow.float64()})
248248
dtypes = np.dtype(np.dtype([("x", np.int64), ("y", np.float64)]))
249249

250-
def setup_cache(self):
250+
def setup(self):
251251
coll = db.benchmark
252252
coll.drop()
253253
base_dict = dict(
@@ -268,7 +268,7 @@ class ProfileReadLarge(Read):
268268
schema = Schema({k: pyarrow.float64() for k in large_doc_keys})
269269
dtypes = np.dtype([(k, np.float64) for k in large_doc_keys])
270270

271-
def setup_cache(self):
271+
def setup(self):
272272
coll = db.benchmark
273273
coll.drop()
274274

@@ -284,7 +284,7 @@ class ProfileReadExtensionSmall(Read):
284284
schema = Schema({"x": Decimal128Type(), "y": BinaryType(10)})
285285
dtypes = np.dtype(np.dtype([("x", np.object_), ("y", np.object_)]))
286286

287-
def setup_cache(self):
287+
def setup(self):
288288
coll = db.benchmark
289289
coll.drop()
290290
base_dict = dict(
@@ -299,13 +299,20 @@ def setup_cache(self):
299299
% (N_DOCS, len(BSON.encode(base_dict)) // 1024, len(base_dict))
300300
)
301301

302+
# This must be skipped because arrow can't read the Decimal128Type
303+
def time_conventional_arrow(self):
304+
pass
305+
306+
def time_insert_conventional(self):
307+
pass
308+
302309

303310
class ProfileReadExtensionLarge(Read):
304311
large_doc_keys = [f"{i}" for i in range(LARGE_DOC_SIZE)]
305312
schema = Schema({k: Decimal128Type() for k in large_doc_keys})
306313
dtypes = np.dtype([(k, np.object_) for k in large_doc_keys])
307314

308-
def setup_cache(self):
315+
def setup(self):
309316
coll = db.benchmark
310317
coll.drop()
311318

@@ -316,16 +323,20 @@ def setup_cache(self):
316323
% (N_DOCS, len(BSON.encode(base_dict)) // 1024, len(base_dict))
317324
)
318325

326+
# This must be skipped because arrow can't read the Decimal128Type
327+
def time_conventional_arrow(self):
328+
pass
329+
330+
def time_insert_conventional(self):
331+
pass
332+
319333

320334
class ProfileInsertSmall(Insert):
321335
large_doc_keys = [f"a{i}" for i in range(LARGE_DOC_SIZE)]
322336
schema = Schema({"x": pyarrow.int64(), "y": pyarrow.float64()})
323-
arrow_table = find_arrow_all(db.benchmark, {}, schema=schema)
324-
pandas_table = find_pandas_all(db.benchmark, {}, schema=schema)
325-
numpy_arrays = find_numpy_all(db.benchmark, {}, schema=schema)
326337
dtypes = np.dtype([("x", np.int64), ("y", np.float64)])
327338

328-
def setup_cache(self):
339+
def setup(self):
329340
coll = db.benchmark
330341
coll.drop()
331342
base_dict = dict([("x", 1), ("y", math.pi)])
@@ -334,17 +345,17 @@ def setup_cache(self):
334345
"%d docs, %dk each with %d keys"
335346
% (N_DOCS, len(BSON.encode(base_dict)) // 1024, len(base_dict))
336347
)
348+
self.arrow_table = find_arrow_all(db.benchmark, {}, schema=self.schema)
349+
self.pandas_table = find_pandas_all(db.benchmark, {}, schema=self.schema)
350+
self.numpy_arrays = find_numpy_all(db.benchmark, {}, schema=self.schema)
337351

338352

339353
class ProfileInsertLarge(Insert):
340354
large_doc_keys = [f"a{i}" for i in range(LARGE_DOC_SIZE)]
341355
schema = Schema({k: pyarrow.float64() for k in large_doc_keys})
342-
arrow_table = find_arrow_all(db.benchmark, {}, schema=schema)
343-
pandas_table = find_pandas_all(db.benchmark, {}, schema=schema)
344-
numpy_arrays = find_numpy_all(db.benchmark, {}, schema=schema)
345356
dtypes = np.dtype([(k, np.float64) for k in large_doc_keys])
346357

347-
def setup_cache(self):
358+
def setup(self):
348359
coll = db.benchmark
349360
coll.drop()
350361
base_dict = dict([(k, math.pi) for k in self.large_doc_keys])
@@ -353,3 +364,6 @@ def setup_cache(self):
353364
"%d docs, %dk each with %d keys"
354365
% (N_DOCS, len(BSON.encode(base_dict)) // 1024, len(base_dict))
355366
)
367+
self.arrow_table = find_arrow_all(db.benchmark, {}, schema=self.schema)
368+
self.pandas_table = find_pandas_all(db.benchmark, {}, schema=self.schema)
369+
self.numpy_arrays = find_numpy_all(db.benchmark, {}, schema=self.schema)

bindings/python/build-libbson.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ set -o errexit
55

66
# Version of libbson to build
77
# Keep in sync with pymongoarrow.version._MIN_LIBBSON_VERSION
8-
LIBBSON_VERSION=${LIBBSON_VERSION:-"1.21.1"}
8+
LIBBSON_VERSION=${LIBBSON_VERSION:-"1.23.1"}
99
if [ -z "$LIBBSON_VERSION" ]
1010
then
1111
echo "Did not provide a libbson revision ID to build"

bindings/python/docs/source/developer/installation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Command Line Tools. Additionally, you need CMake and pkg-config::
1212
$ brew install cmake
1313
$ brew install pkg-config
1414

15-
On Linux, you require gcc 4.8, CMake and pkg-config.
15+
On Linux, installation requires gcc 12, CMake and pkg-config.
1616

1717
Windows is not yet supported.
1818

bindings/python/docs/source/installation.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ PyMongoArrow is regularly built and tested on macOS and Linux
1111
Python Compatibility
1212
--------------------
1313

14-
PyMongoArrow is currently compatible with CPython 3.8, 3.9, 3.10 and 3.11.
14+
PyMongoArrow is currently compatible with CPython 3.8, 3.9, 3.10, and 3.11.
1515

1616
Using Pip
1717
---------
@@ -56,20 +56,20 @@ Dependencies
5656

5757
PyMongoArrow requires:
5858

59-
- PyMongo>=3.11 (PyMongo 4.0 is supported from 0.2)
60-
- PyArrow>=7,<7.1
59+
- PyMongo>=4.4
60+
- PyArrow>=13,<13.1
6161

6262
To use PyMongoArrow with a PyMongo feature that requires an optional
6363
dependency, users must install PyMongo with the given dependency manually.
6464

6565
.. note:: PyMongo's optional dependencies are detailed
6666
`here <https://pymongo.readthedocs.io/en/stable/installation.html#dependencies>`_.
6767

68-
For example, to use PyMongoArrow with MongoDB Atlas' ``mongodb+srv://`` URIs
69-
users must install PyMongo with the ``srv`` extra in addition to installing
68+
For example, to use PyMongoArrow with Client-Side Field Level Encryption
69+
users must install PyMongo with the ``encryption`` extra in addition to installing
7070
PyMongoArrow::
7171

72-
$ python -m pip install 'pymongo[srv]' pymongoarrow
72+
$ python -m pip install 'pymongo[encryption]' pymongoarrow
7373

7474
Applications intending to use PyMongoArrow APIs that return query result sets
7575
as :class:`pandas.DataFrame` instances (e.g. :meth:`~pymongoarrow.api.find_pandas_all`)

bindings/python/pymongoarrow/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@
1414

1515
__version__ = "1.1.0.dev0"
1616

17-
_MIN_LIBBSON_VERSION = "1.21.0"
17+
_MIN_LIBBSON_VERSION = "1.23.1"

0 commit comments

Comments
 (0)