Python Polars 0.20.7
⚠️ Deprecations
- Rename
threadpool_size
tothread_pool_size
(#14236)
🚀 Performance improvements
- prune parquet row groups when
is_not_null
is used (#14260) - Avoid unnecessary copies in
Series.to_numpy
for boolean/temporal types (#14261) - use is_between to skip parquet row groups (#14244)
- Use a compression API that is designed for this use case (#11699) (#14194)
- Use
UnitVec
in polars-plan traversal (#14199) - use
UnitVec
in streaming joins (#14197) - improve
ChunkId
(#14175) - improve iteration performance (#14126)
- elide unneeded work in window? (#14108)
- run window functions more in parallel (#14095)
- improve skip row group using statistics condition (#14056)
✨ Enhancements
- add
u8
/i8
/u16
/i16
parsers to CSV reader (#14241) - move
F-order
data in and out of numpy to polars zero copy (#14259) - read arrow-c-interface without requiring pyarrow (#14254)
- Implements
list.gather_every
(#14253) - Implements
prefix/suffix_fields
(#14251) - Change
Series.to_numpy
to returnf64
forInt32/UInt32
Series with nulls instead off32
(#14240) - Polish decimal arithmetic (#14172)
- improved
read_excel
format detection, and support for excel 97-2004 workbooks (#14234) - Introduce
arr.to_struct
(#14202) - Supports map fields name of struct (#14203)
- make
IdxVec
generic asUnitVec
(#14196) - add new arithmetic kernels (#14026)
- Supports
unique
andhash_rows
fornull
column (#14111) - Implement arithmetic operations for
Null
columns (#14107) - support pd.Index in from_pandas and elsewhere (#14087)
- Allow renaming expressions with keyword syntax in
group_by
(#14071) - raise more informative error message if someone lands on Expr.__bool__ (#14067)
- Adapt extend_constant to function expr architecture and expressify it (#14058)
- add integer negation (#14049)
list
&array
measures of dispersion (#13245)- gc binview when writing ipc (#14035)
- When calling
convert_time_zone
on time-zone-naive datetime, convert as if converting from UTC (#13960)
🐞 Bug fixes
- deduplicate recursive growables (#14264)
- Fix
glimpse
overload signature (#14258) - allow set operations on list of categoricals (#14110)
any/all_horizontal
with single input has incorrect type (#14256)- load numpy array with np array values #14237 (#14238)
- Make
Series.to_numpy
on booleans without nulls returnbool
type (#14239) - fix ufunc in agg (change __ufunc_array__ so it uses
is_elementwise=True
parameter) (#14135) - Fix join validation for String types (#14229)
- enable windows test coverage for
read_excel
"calamine" (fastexcel) engine (#14171) - make csv parser more robust to edge cases (#14210)
- Fix for
set_operations
of binary dtype (#14152) - fix read_csv date/datetime inference and parsing (#14113)
- don't see files as hive partitions (#14128)
- allow eval on list of categoricals (#14132)
- Forbid casting from
Date
toTime
and vice versa (#14127) - preserve old naming convention for multi-value pivot (this will change in 1.0 to no longer redundantly have the column name in the middle) (#14120)
- Implements
gt/lt
cmp for null dtype (#14119) - ignore comments at beginning of csv if schema provided (#14115)
- fix pivot when multiple columns are passed. Output is now aligned with what tidyverse / pandas.pivot_table would do (#14048)
- multiple
read_excel
updates (#14039) - some temporal conversion errors for datetimes earlier than
1970-01-01
(#14050) - Preserve name when casting from categorical (#14085)
- respect
Object
dtype designation (#14072) - fix cse bug when window function is nested (#14070)
- Fix
melt
panic when there are no value vars (#14057) json_encode
should respect the logical type (#14063)- improve skip row group using statistics condition (#14056)
- Raise for .dt.epoch and .dt.timestamp for Duration dtype (#13962)
- handle
SliceSink
with empty data (#14025) - Allow
Series.to_pandas
for categorical types (#14028) - correct field type schema inference (using read_csv) (#14042)
- Use int formatter for unsigned ints (#14043)
📖 Documentation
- fix code block in user-guide/lazy/schemas (#14228)
- Add visualization page to user guide (#13052)
- Fix typo in contributing guide (#14181)
- Small improvements Ecosystem page (#14176)
- fix code blocks in user-guide/concepts/data-structures (#14146)
- Document that Kleene logic is followed in
any_horizontal
andall_horizontal
(#14148) - Fix description of
return_dtype
parameter formap_elements
andmap_batches
(#14114) - Fix bullet point formatting in CI contributing guide (#14117)
- Add documentation on replacement strings to
str.replace
andstr.replace_all
(#13382) - Replace alternatives page with more objective comparison (#13784)
- Note that only one
name
operation is allowed per expression (#14075) - Improve deprecation message of
dtype_if_empty
param (#14068) - fix more docstring bullet points (#14065)
🛠️ Other improvements
- Reorganize NumPy interop tests (#14257)
- additional dataframe test coverage (#14243)
- Remove
*args
inSeries.to_numpy
(#14248) - Move metadata utils to
meta
module (#14230) - remove unused method DataFrame._from_dicts (#14212)
- make gather_chunked completely generic (#14195)
- Add
.cargo
directory to .gitignore (#14191) take_chunked
to polars-ops (#14185)- Issue a warning when running doctests on Python 3.11 or lower (#14187)
- Run
cargo update
(#14160) - merge take kernels (#14137)
- improve From<Ca> -> Vec (#14123)
- hoist boolean -> string cast (#14122)
- remove unused argument (#14014)
Thank you to all our contributors for making this release possible!
@JulianCologne, @MarcoGorelli, @Vincenthays, @Wainberg, @alexander-beedie, @apcamargo, @braaannigan, @c-peters, @deanm0000, @dependabot, @dependabot[bot], @dpinol, @edavisau, @eitsupi, @flisky, @grinya007, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @orlp, @r-brink, @reswqa, @ritchie46, @stinodego and @taki-mekhalfa