Skip to content

Commit

Permalink
Warning for polars lazy frame. (#11126)
Browse files Browse the repository at this point in the history
  • Loading branch information
trivialfis authored Dec 21, 2024
1 parent 95f5776 commit 027eb7b
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 5 deletions.
14 changes: 13 additions & 1 deletion doc/python/python_intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ Markers
- F: Not supported.
- NE: Invalid type for the use case. For instance, `pd.Series` can not be multi-target label.
- NPA: Support with the help of numpy array.
- AT: Support with the help of arrow table.
- CPA: Support with the help of cupy array.
- SciCSR: Support with the help of scripy sparse CSR. The conversion to scipy CSR may or may not be possible. Raise a type error if conversion fails.
- FF: We can look forward to having its support in recent future if requested.
Expand Down Expand Up @@ -170,13 +171,24 @@ Support Matrix
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| modin.Series | NPA | FF | NPA | NPA | FF | |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| pyarrow.Table | NPA | NPA | NPA | NPA | NPA | NPA |
| pyarrow.Table | T | T | T | T | T | T |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| polars.DataFrame | AT | AT | AT | AT | AT | AT |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| polars.LazyFrame (WARN) | AT | AT | AT | AT | AT | AT |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| polars.Series | AT | AT | AT | AT | AT | NE |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| _\_array\_\_ | NPA | F | NPA | NPA | H | |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+
| Others | SciCSR | F | | F | F | |
+-------------------------+-----------+-------------------+-----------+-----------+--------------------+-------------+

The polars ``LazyFrame.collect`` supports many configurations, ranging from the choice of
query engine to type coercion. XGBoost simply uses the default parameter. Please run
``collect`` to obtain the ``DataFrame`` before passing it into XGBoost for finer control
over the behaviour.

Setting Parameters
------------------
XGBoost can use either a list of pairs or a dictionary to set :doc:`parameters </parameter>`. For instance:
Expand Down
5 changes: 5 additions & 0 deletions python-package/xgboost/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -961,6 +961,11 @@ def _transform_polars_df(
) -> Tuple[ArrowTransformed, Optional[FeatureNames], Optional[FeatureTypes]]:
if _is_polars_lazyframe(data):
df = data.collect()
warnings.warn(
"Using the default parameters for the polars `LazyFrame.collect`. Consider"
" passing a realized `DataFrame` or `Series` instead.",
UserWarning,
)
else:
df = data

Expand Down
8 changes: 4 additions & 4 deletions python-package/xgboost/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -1143,7 +1143,7 @@ def fit(
Parameters
----------
X :
Feature matrix. See :ref:`py-data` for a list of supported types.
Input feature matrix. See :ref:`py-data` for a list of supported types.
When the ``tree_method`` is set to ``hist``, internally, the
:py:class:`QuantileDMatrix` will be used instead of the :py:class:`DMatrix`
Expand Down Expand Up @@ -1267,7 +1267,7 @@ def predict(
Parameters
----------
X :
Data to predict with.
Data to predict with. See :ref:`py-data` for a list of supported types.
output_margin :
Whether to output the raw untransformed margin value.
validate_features :
Expand Down Expand Up @@ -1334,8 +1334,8 @@ def apply(
Parameters
----------
X : array_like, shape=[n_samples, n_features]
Input features matrix.
X :
Input features matrix. See :ref:`py-data` for a list of supported types.
iteration_range :
See :py:meth:`predict`.
Expand Down

0 comments on commit 027eb7b

Please sign in to comment.