Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding float64 support, document level boosting, and facet collector #52

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

AliFlux
Copy link

@AliFlux AliFlux commented Jul 13, 2022

This PR adds a couple of new features that are present in Tantivy core repo, but not exposed in tantivy-py:

Conjunction by default parameter

By default, tantivy parses queries using OR operator, instead of AND operator. If we want to modify this behavior, we can now set this value when parsing query:

parse_query(text, fields, conjunction_by_default=True)

Floating point support

A new add_float_field function is available so that we can add f64 fields.

Document level boosting #51

We can now give priority to certain documents using the new weight_by_field parameter:

searcher.search(query, limit, weight_by_field='popularity')

TopDocs tweak_score is used, and the callback is abstracted away from python code for performance reasons.

Facet collector

We can now get counts of facets available by specifying the count_facets_by_field parameter:

data = searcher.search(query, limit, count_facets_by_field='genre')

print(data.facet_counts)
# prints a dictionary where keys are facets, and values are counts

Copy link
Collaborator

@cjrh cjrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited for f64 support to get in. I'm not a maintainer of tantivy-py, but I left several comments hoping that it helps to move forward on the PR.

Perhaps f64 support would get merged faster if you made a separate PR for just the f64 support?

pyproject.toml Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@@ -48,7 +48,7 @@ impl Facet {
#[classmethod]
fn from_string(_cls: &PyType, facet_string: &str) -> Facet {
Facet {
inner: schema::Facet::from(facet_string),
inner: schema::Facet::from_text(facet_string).unwrap(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to comment about the unwrap but I see that the LHS schema::Facet::from is just implemented using from_text with an unwrap anyway: https://docs.rs/tantivy/0.18.0/src/tantivy/schema/facet.rs.html#181-183

I'm curious what was the reason for this change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly remember this change. Maybe I couldn't get facets to work without it

src/schemabuilder.rs Outdated Show resolved Hide resolved
@AliFlux
Copy link
Author

AliFlux commented Oct 25, 2022

@poljar let me know if there's anything I can do to expedite the merge

4.0
@@ -0,0 +1,26 @@
Collecting pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems like it was added in error...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants