Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-10463] Add missing import in learning-to-rank tutorial #10464

Merged
merged 2 commits into from
Jun 21, 2024

Conversation

jpizagno
Copy link
Contributor

@jpizagno jpizagno commented Jun 20, 2024

Introduction

As mentioned in ISSUE-10463, the query id needs to be sorted. Following the Learning to Rank tutorial, one gets the following error: Check failed: non_dec: qid must be sorted in non-decreasing order along with data.. A full stacktrace is below. This is happening with versions: xgboost.__version__ = 2.1.0

Fix

Currently the tutorial lists:
qid = rng.integers(0, n_query_groups, size=X.shape[0])
and should be
qid = sorted( rng.integers(0, n_query_groups, size=X.shape[0]) )

There is also a import pandas as pd added

Tests

# python
Python 3.10.14 (main, Jun 13 2024, 06:43:06) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sklearn
>>> sklearn.__version__
'1.5.0'
>>> import numpy as np
>>> np.__version__
'2.0.0'
>>> import xgboost as xgb
>>> xgb.__version__
'2.1.0'

# follow tutorial

>>> from sklearn.datasets import make_classification
>>> import numpy as np
>>> import xgboost as xgb
>>> # Make a synthetic ranking dataset for demonstration
>>> seed = 1994
>>> X, y = make_classification(random_state=seed)
>>> rng = np.random.default_rng(seed)
>>> n_query_groups = 3
>>> qid = rng.integers(0, n_query_groups, size=X.shape[0])
>>> # Sort the inputs based on query index
>>> sorted_idx = np.argsort(qid)
>>> X = X[sorted_idx, :]
>>> y = y[sorted_idx]
>>> ranker = xgb.XGBRanker(tree_method="hist", lambdarank_num_pair_per_sample=8, objective="rank:ndcg", lambdarank_pair_method="topk")
>>> ranker.fit(X, y, qid=qid)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 726, in inner_f
    return func(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/xgboost/sklearn.py", line 1997, in fit
    train_dmatrix, evals = _wrap_evaluation_matrices(
  File "/usr/local/lib/python3.10/site-packages/xgboost/sklearn.py", line 596, in _wrap_evaluation_matrices
    train_dmatrix = create_dmatrix(
  File "/usr/local/lib/python3.10/site-packages/xgboost/sklearn.py", line 1879, in _create_ltr_dmatrix
    return super()._create_dmatrix(ref=ref, data=data, qid=qid, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/xgboost/sklearn.py", line 1003, in _create_dmatrix
    return QuantileDMatrix(
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 726, in inner_f
    return func(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 1573, in __init__
    self._init(
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 1632, in _init
    it.reraise()
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 569, in reraise
    raise exc  # pylint: disable=raising-bad-type
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 550, in _handle_exception
    return fn()
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 637, in <lambda>
    return self._handle_exception(lambda: self.next(input_data), 0)
  File "/usr/local/lib/python3.10/site-packages/xgboost/data.py", line 1416, in next
    input_data(**self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 726, in inner_f
    return func(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 626, in input_data
    self.proxy.set_info(
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 726, in inner_f
    return func(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 962, in set_info
    self.set_uint_info("qid", qid)
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 1064, in set_uint_info
    dispatch_meta_backend(self, data, field, "uint32")
  File "/usr/local/lib/python3.10/site-packages/xgboost/data.py", line 1354, in dispatch_meta_backend
    _meta_from_numpy(data, name, dtype, handle)
  File "/usr/local/lib/python3.10/site-packages/xgboost/data.py", line 1295, in _meta_from_numpy
    _check_call(_LIB.XGDMatrixSetInfoFromInterface(handle, c_str(field), interface_str))
  File "/usr/local/lib/python3.10/site-packages/xgboost/core.py", line 284, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [16:12:53] /workspace/src/data/data.cc:539: Check failed: non_dec: `qid` must be sorted in non-decreasing order along with data.
Stack trace:
  [bt] (0) /usr/local/lib/python3.10/site-packages/xgboost/lib/libxgboost.so(+0x22d7cc) [0x7f3eb922d7cc]
  [bt] (1) /usr/local/lib/python3.10/site-packages/xgboost/lib/libxgboost.so(+0x4b4468) [0x7f3eb94b4468]
  [bt] (2) /usr/local/lib/python3.10/site-packages/xgboost/lib/libxgboost.so(+0x4b4850) [0x7f3eb94b4850]
  [bt] (3) /usr/local/lib/python3.10/site-packages/xgboost/lib/libxgboost.so(XGDMatrixSetInfoFromInterface+0xb2) [0x7f3eb9134b42]
  [bt] (4) /lib/x86_64-linux-gnu/libffi.so.8(+0x6f7a) [0x7f3ef4093f7a]
  [bt] (5) /lib/x86_64-linux-gnu/libffi.so.8(+0x640e) [0x7f3ef409340e]
  [bt] (6) /lib/x86_64-linux-gnu/libffi.so.8(ffi_call+0xcd) [0x7f3ef4093b0d]
  [bt] (7) /usr/local/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0xd66d) [0x7f3ef40a666d]
  [bt] (8) /usr/local/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0xde1c) [0x7f3ef40a6e1c]

# qid is not sorted
>>> qid
array([1, 1, 2, 0, 1, 1, 2, 2, 2, 0, 1, 1, 2, 0, 1, 1, 2, 1, 0, 2, 1, 1,
       2, 0, 1, 2, 2, 2, 1, 0, 0, 1, 0, 1, 0, 0, 2, 1, 2, 2, 0, 1, 1, 1,
       2, 1, 1, 1, 0, 2, 2, 2, 2, 1, 0, 0, 2, 2, 0, 2, 0, 1, 2, 0, 2, 2,
       0, 0, 1, 2, 0, 2, 2, 1, 2, 1, 0, 1, 1, 2, 1, 1, 2, 1, 2, 0, 1, 2,
       1, 2, 1, 0, 2, 2, 0, 1, 2, 0, 2, 1])

###########
# try sorting qid with sorted()
###########
>>> qid = sorted(rng.integers(0, n_query_groups, size=X.shape[0]))
>>> qid
[np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2), np.int64(2)]
>>> sorted_idx = np.argsort(qid)
>>> X = X[sorted_idx, :]
>>> y = y[sorted_idx]
>>> ranker = xgb.XGBRanker(tree_method="hist", lambdarank_num_pair_per_sample=8, objective="rank:ndcg", lambdarank_pair_method="topk")
>>> ranker.fit(X, y, qid=qid)
XGBRanker(base_score=None, booster=None, callbacks=None, colsample_bylevel=None,
          colsample_bynode=None, colsample_bytree=None, device=None,
          early_stopping_rounds=None, enable_categorical=False,
          eval_metric=None, feature_types=None, gamma=None, grow_policy=None,
          importance_type=None, interaction_constraints=None,
          lambdarank_num_pair_per_sample=8, lambdarank_pair_method='topk',
          learning_rate=None, max_bin=None, max_cat_threshold=None,
          max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
          max_leaves=None, min_child_weight=None, missing=nan,
          monotone_constraints=None, multi_strategy=None, n_estimators=None,
          n_jobs=None, ...)
>>> 

@trivialfis
Copy link
Member

Hi, the qid is sorted by the sorted_idx

@jpizagno
Copy link
Contributor Author

Hi, the qid is sorted by the sorted_idx

Thank you for commenting. Why does the stacktrace occur then?

(It does look like X and y are sorted)

@trivialfis
Copy link
Member

@jpizagno Did you include

qid = qid[sorted_idx]
in the example?

@trivialfis
Copy link
Member

I see, this has been fixed in #9673 . We are still trying to get the document build for 2.1 working #10470 .

@jpizagno
Copy link
Contributor Author

I see, this has been fixed in #9673 . We are still trying to get the document build for 2.1 working #10470 .

@trivialfis "aw shucks" , I was hoping to contribute. 😄 Yes, that would fix it. When the build is working, the docs would be correct.

There is also the import pandas as pd in this PR. Maybe one can assume that people who want to learn xgboost will also know pandas, but it can't hurt to add that. thoughts?

@hcho3 hcho3 changed the title [ISSUE-10463] Fixed sorting of qid in read-the-docs tutorial [ISSUE-10463] Add missing import in learning-to-rank tutorial Jun 21, 2024
@hcho3 hcho3 merged commit 124bc57 into dmlc:master Jun 21, 2024
24 of 29 checks passed
@hcho3
Copy link
Collaborator

hcho3 commented Jun 21, 2024

The stable version now correctly shows the line qid = qid[sorted_idx]: https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants