Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search response returns incorrect number of documents #32424

Open
akolhun opened this issue Sep 18, 2024 · 1 comment
Open

Search response returns incorrect number of documents #32424

akolhun opened this issue Sep 18, 2024 · 1 comment
Assignees

Comments

@akolhun
Copy link
Contributor

akolhun commented Sep 18, 2024

Describe the bug
Search response returns root.fields.totalCount = X, but in fact lesser number of documents is returned

To Reproduce
Steps to reproduce the behavior:

  1. launch vespa cluster with 12 pods (helm chart attached): helm create vespa -n vespa-privatemedia --create-namespace .
  2. load vespa application package (attached)
  3. load attached data.json via vespa-feeder cli tool: vespa-feeder data.json
  4. execute a query as:
curl 'http://localhost:8080/search/' \
--header 'Content-Type: application/json' \
--data '{
    "yql": "select * from mp_private_media where site_id contains '\''c5402062-bedf-4e3e-80ad-d668993ed9b2'\'' and state contains '\''trash'\''",
    "hits": 100,
    "offset": 0
}'

Response contains root.fields.totalCount=54, but in fact 38 docs get returned

Expected behavior
Response should contain 54 docs, as root.fields.totalCount claims

Environment (please complete the following information):

  • OS: Amazon Linux
  • Infrastructure: Kubernetes
  • Versions: v1.24.17-eks

Vespa version
8.408.12

Additional context
Note: the problem varies based on the number of nodes defined in content cluster. Looks like it's a distribution key releated issue

vap_privatemedia.zip
vespa_privatemedia_helm.zip
data.json.zip

@hmusum
Copy link
Member

hmusum commented Sep 23, 2024

This could be due to timeout, see the doc on timeout and the further documentation this points to, e.g. soft timeout

See also documentation about summaries, especially the section on performance

An actual query response could also be helpful. In that case please include "trace.level": 4 in the query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants