Fix search facet from mirepresenting the amount of ebook results #10160

sbwhitt · 2024-12-17T21:31:06Z

Hides printdisabled results for users not explicitly searching for them who do not have printdisabled settings stored in cookies.

Technical

Testing

Ensure that ebook facet counts match the amount of actual ebook results, with or without 'pd' set to true in cookies.

Screenshot

Stakeholders

@cdrini @RayBB

cdrini

This won't quite work at this layer, since it'll filter out printdisabled books all the time, even if the mode isn't specified. We want the way the facets are computed to be modified, so that it correctly handles the has_fulltext facet.

We currently have a rewrite rule for the has_fulltext facet, which is what's causing the discrepancy. The facet is computed using the has_fulltext solr field, but once it's selected, it's translated by this code to use the ebook_access field:

openlibrary/openlibrary/plugins/worksearch/schemes/works.py

Lines 203 to 210 in 6a117fa

    
               ): lambda: f'ebook_access:[{get_fulltext_min()} TO *]', 
        
               ( 
        
                   'has_fulltext', 
        
                   'false', 
        
               ): lambda: f'ebook_access:[* TO {get_fulltext_min()}}}', 
        
           } 
        
           def is_search_field(self, field: str):

This is the discrepancy.

Solr does have options to specify more custom facets, I believe they're called facet queries (the docs here are useful ; sorry couldn't find the specific section! https://solr.apache.org/guide/solr/latest/query-guide/faceting.html ).

If we update the the Work search scheme's facet_fields to instead of having the plain has_fulltext field, instead have a dict with options to specify a facet query, and then update the code that actually builds the solr query to understand/serialize this dict, that might do the trick?

Work Search Scheme facet fields:

openlibrary/openlibrary/plugins/worksearch/schemes/works.py

Lines 102 to 113 in 6a117fa

    
           facet_fields = { 
        
               "has_fulltext", 
        
               "author_facet", 
        
               "language", 
        
               "first_publish_year", 
        
               "publisher_facet", 
        
               "subject_facet", 
        
               "person_facet", 
        
               "place_facet", 
        
               "time_facet", 
        
               "public_scan_b", 
        
           }

Code that does the actual query:

openlibrary/openlibrary/plugins/worksearch/code.py

Lines 191 to 205 in 9a4e45e

    
           facet_fields = scheme.facet_fields if isinstance(facet, bool) else facet 
        
           if facet and facet_fields: 
        
               params.append(('facet', 'true')) 
        
               for facet in facet_fields:  # noqa: PLR1704 
        
                   if isinstance(facet, str): 
        
                       params.append(('facet.field', facet)) 
        
                   elif isinstance(facet, dict): 
        
                       params.append(('facet.field', facet['name'])) 
        
                       if 'sort' in facet: 
        
                           params.append((f'f.{facet["name"]}.facet.sort', facet['sort'])) 
        
                       if 'limit' in facet: 
        
                           params.append((f'f.{facet["name"]}.facet.limit', facet['limit'])) 
        
                   else: 
        
                       # Should never get here 
        
                       raise ValueError(f'Invalid facet type: {facet}')

And then there might be some awkward stitching logic. Apologies this a bit more of a meatier change than expected! Let us know if you have any questions!

sbwhitt · 2025-01-25T13:32:13Z

@cdrini Thanks for taking a look. The current change should only exclude printdisabled books when the mode is not specified and the user does not have the pd cookie, but it does remove those books from the results entirely. Is the issue that we want to avoid hiding printdisabled books to users without the pd cookie?

hornc · 2025-02-13T21:00:25Z

@sbwhitt , @cdrini #10451 was just merged which removed the printdisabled search mode (because it relied on a no-longer used subject facet to filter, and hasn't worked for some time) That might simplify the logic here?

sbwhitt · 2025-02-15T17:46:19Z

@hornc Thank you for letting me know!

update query to hide printdisabled results without pd cookie

b952dfe

github-actions bot assigned cdrini Dec 17, 2024

cdrini requested changes Jan 22, 2025

View reviewed changes

github-actions bot added the Needs: Response Issues which require feedback from lead label Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix search facet from mirepresenting the amount of ebook results #10160

Fix search facet from mirepresenting the amount of ebook results #10160

sbwhitt commented Dec 17, 2024

cdrini left a comment

sbwhitt commented Jan 25, 2025

hornc commented Feb 13, 2025

sbwhitt commented Feb 15, 2025

	): lambda: f'ebook_access:[{get_fulltext_min()} TO *]',
	(
	'has_fulltext',
	'false',
	): lambda: f'ebook_access:[* TO {get_fulltext_min()}}}',
	}

	def is_search_field(self, field: str):

	facet_fields = {
	"has_fulltext",
	"author_facet",
	"language",
	"first_publish_year",
	"publisher_facet",
	"subject_facet",
	"person_facet",
	"place_facet",
	"time_facet",
	"public_scan_b",
	}

	facet_fields = scheme.facet_fields if isinstance(facet, bool) else facet
	if facet and facet_fields:
	params.append(('facet', 'true'))
	for facet in facet_fields: # noqa: PLR1704
	if isinstance(facet, str):
	params.append(('facet.field', facet))
	elif isinstance(facet, dict):
	params.append(('facet.field', facet['name']))
	if 'sort' in facet:
	params.append((f'f.{facet["name"]}.facet.sort', facet['sort']))
	if 'limit' in facet:
	params.append((f'f.{facet["name"]}.facet.limit', facet['limit']))
	else:
	# Should never get here
	raise ValueError(f'Invalid facet type: {facet}')

Fix search facet from mirepresenting the amount of ebook results #10160

Are you sure you want to change the base?

Fix search facet from mirepresenting the amount of ebook results #10160

Conversation

sbwhitt commented Dec 17, 2024

Technical

Testing

Screenshot

Stakeholders

cdrini left a comment

Choose a reason for hiding this comment

sbwhitt commented Jan 25, 2025

hornc commented Feb 13, 2025

sbwhitt commented Feb 15, 2025