Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: meilisearch backend #150

Closed
wants to merge 14 commits into from
20 changes: 19 additions & 1 deletion edxsearch/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
# This is just a container for running tests
DEBUG = True

ALLOWED_HOSTS = []
ALLOWED_HOSTS = ['*']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure when/if this settings file is used, but I don't think we should just set ALLOWED_HOSTS = ['*']

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mostly used during development


TEMPLATES = [
{
Expand Down Expand Up @@ -99,3 +99,21 @@
# https://docs.djangoproject.com/en/1.6/howto/static-files/

STATIC_URL = '/static/'
################### Using ElasticSearch ###################

SEARCH_ENGINE = os.getenv('SEARCH_ENGINE', 'search.elastic.ElasticSearchEngine')

################### Using Meilisearch (Beta) ###################

# Enable Studio search features (powered by Meilisearch) (beta, off by default)
MEILISEARCH_ENABLED = False

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to have a setting called MEILISEARCH_ENABLED. Instead, to enable Meilisearch, one would just set SEARCH_ENGINE to the Meilisearch backend.

# Meilisearch URL that the python backend can use. Often points to another docker container or k8s service.
MEILISEARCH_URL = os.getenv('MEILISEARCH_URL', 'http://localhost:7700')
# URL that browsers (end users) can use to reach Meilisearch. Should be HTTPS in production.
MEILISEARCH_PUBLIC_URL = os.getenv('MEILISEARCH_PUBLIC_URL', 'http://localhost:7700')
# To support multi-tenancy, you can prefix all indexes with a common key like "sandbox7-"
# and use a restricted tenant token in place of an API key, so that this Open edX instance
# can only use the index(es) that start with this prefix.
# See https://www.meilisearch.com/docs/learn/security/tenant_tokens
MEILISEARCH_INDEX_PREFIX = os.getenv('MEILISEARCH_INDEX_PREFIX', '')
MEILISEARCH_API_KEY = os.getenv('MEILISEARCH_API_KEY', 'masterKey')
1 change: 1 addition & 0 deletions requirements/base.in
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ Django # Web application framework
elasticsearch>=7.8.0,<8.0.0
edx-toggles
event-tracking
meilisearch==0.31.1
41 changes: 27 additions & 14 deletions requirements/base.txt
Original file line number Diff line number Diff line change
@@ -1,28 +1,31 @@
#
# This file is autogenerated by pip-compile with Python 3.8
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
# make upgrade
# pip-compile --output-file=requirements/base.txt requirements/base.in
#
amqp==5.2.0
# via kombu
annotated-types==0.7.0
# via pydantic
asgiref==3.7.2
# via django
attrs==23.2.0
# via openedx-events
backports-zoneinfo[tzdata]==0.2.1 ; python_version < "3.9"
# via
# -c requirements/constraints.txt
# celery
# kombu
billiard==4.2.0
# via celery
camel-converter[pydantic]==3.1.2
# via meilisearch
celery==5.3.6
# via event-tracking
certifi==2024.2.2
# via elasticsearch
# via
# elasticsearch
# requests
cffi==1.16.0
# via pynacl
charset-normalizer==3.3.2
# via requests
click==8.1.7
# via
# celery
Expand Down Expand Up @@ -76,12 +79,16 @@ event-tracking==2.3.0
# via -r requirements/base.in
fastavro==1.9.4
# via openedx-events
idna==3.7
# via requests
jinja2==3.1.3
# via code-annotations
kombu==5.3.5
# via celery
markupsafe==2.1.5
# via jinja2
meilisearch==0.31.1
# via -r requirements/base.in
newrelic==9.6.0
# via edx-django-utils
openedx-events==9.5.2
Expand All @@ -94,6 +101,10 @@ psutil==5.9.8
# via edx-django-utils
pycparser==2.21
# via cffi
pydantic==2.7.4
# via camel-converter
pydantic-core==2.18.4
# via pydantic
pymongo==3.13.0
# via
# edx-opaque-keys
Expand All @@ -110,6 +121,8 @@ pytz==2024.1
# event-tracking
pyyaml==6.0.1
# via code-annotations
requests==2.32.3
# via meilisearch
six==1.16.0
# via
# event-tracking
Expand All @@ -125,15 +138,15 @@ text-unidecode==1.3
# via python-slugify
typing-extensions==4.9.0
# via
# asgiref
# edx-opaque-keys
# kombu
# pydantic
# pydantic-core
tzdata==2024.1
# via
# backports-zoneinfo
# celery
# via celery
urllib3==1.26.18
# via elasticsearch
# via
# elasticsearch
# requests
vine==5.1.0
# via
# amqp
Expand Down
52 changes: 36 additions & 16 deletions requirements/testing.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
#
# This file is autogenerated by pip-compile with Python 3.8
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
# make upgrade
# pip-compile --output-file=requirements/testing.txt requirements/testing.in
#
amqp==5.2.0
# via
# -r requirements/base.txt
# kombu
annotated-types==0.7.0
# via
# -r requirements/base.txt
# pydantic
asgiref==3.7.2
# via
# -r requirements/base.txt
Expand All @@ -16,16 +20,14 @@ attrs==23.2.0
# via
# -r requirements/base.txt
# openedx-events
backports-zoneinfo[tzdata]==0.2.1 ; python_version < "3.9"
billiard==4.2.0
# via
# -c requirements/constraints.txt
# -r requirements/base.txt
# celery
# kombu
billiard==4.2.0
camel-converter[pydantic]==3.1.2
# via
# -r requirements/base.txt
# celery
# meilisearch
celery==5.3.6
# via
# -r requirements/base.txt
Expand All @@ -34,10 +36,15 @@ certifi==2024.2.2
# via
# -r requirements/base.txt
# elasticsearch
# requests
cffi==1.16.0
# via
# -r requirements/base.txt
# pynacl
charset-normalizer==3.3.2
# via
# -r requirements/base.txt
# requests
click==8.1.7
# via
# -r requirements/base.txt
Expand Down Expand Up @@ -71,6 +78,7 @@ ddt==1.3.1
# via
# -c requirements/constraints.txt
# -r requirements/testing.in
django==3.2.24
# via
# -c https://raw.githubusercontent.com/edx/edx-lint/master/edx_lint/files/common_constraints.txt
# -r requirements/base.txt
Expand Down Expand Up @@ -110,12 +118,14 @@ elasticsearch==7.13.4
# -r requirements/base.txt
event-tracking==2.3.0
# via -r requirements/base.txt
exceptiongroup==1.2.0
# via pytest
fastavro==1.9.4
# via
# -r requirements/base.txt
# openedx-events
idna==3.7
# via
# -r requirements/base.txt
# requests
iniconfig==2.0.0
# via pytest
jinja2==3.1.3
Expand All @@ -130,6 +140,8 @@ markupsafe==2.1.5
# via
# -r requirements/base.txt
# jinja2
meilisearch==0.31.1
# via -r requirements/base.txt
mock==5.1.0
# via -r requirements/testing.in
newrelic==9.6.0
Expand Down Expand Up @@ -160,6 +172,14 @@ pycparser==2.21
# via
# -r requirements/base.txt
# cffi
pydantic==2.7.4
# via
# -r requirements/base.txt
# camel-converter
pydantic-core==2.18.4
# via
# -r requirements/base.txt
# pydantic
pymongo==3.13.0
# via
# -r requirements/base.txt
Expand Down Expand Up @@ -190,6 +210,10 @@ pyyaml==6.0.1
# via
# -r requirements/base.txt
# code-annotations
requests==2.32.3
# via
# -r requirements/base.txt
# meilisearch
six==1.16.0
# via
# -r requirements/base.txt
Expand All @@ -209,25 +233,21 @@ text-unidecode==1.3
# via
# -r requirements/base.txt
# python-slugify
tomli==2.0.1
# via
# coverage
# pytest
typing-extensions==4.9.0
# via
# -r requirements/base.txt
# asgiref
# edx-opaque-keys
# kombu
# pydantic
# pydantic-core
tzdata==2024.1
# via
# -r requirements/base.txt
# backports-zoneinfo
# celery
urllib3==1.26.18
# via
# -r requirements/base.txt
# elasticsearch
# requests
vine==5.1.0
# via
# -r requirements/base.txt
Expand Down
101 changes: 101 additions & 0 deletions search/api.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
""" search business logic implementations """

from datetime import datetime

import meilisearch
from django.conf import settings

from eventtracking import tracker as track
Expand Down Expand Up @@ -158,3 +160,102 @@ def course_discovery_search(search_term=None, size=20, from_=0, field_dictionary
)

return results


def _meilisearch_auto_suggest_search_api(term, course_id, limit=30):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a method on MeiliSearchEngine ? Why is it in this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this endpoint

"""
Perform an auto-suggest search using the Elasticsearch search engine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Perform an auto-suggest search using the Elasticsearch search engine.
Perform an auto-suggest search using the Meilisearch search engine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "auto-suggest search"? How is it different than a regular search?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a POC for implementation I am going to remove it


Args:
term (str): The search term.
course_id (str): The ID of the course to filter the search results.
limit (int, optional): The maximum number of results to return. Defaults to 30.

Returns:
list: A list of dictionaries containing the search results with 'id', 'display_name', and 'usage_key'.
"""
# Create a client instance for MeiliSearch
client = meilisearch.Client(settings.MEILISEARCH_URL, settings.MEILISEARCH_API_KEY)

# Define the index name
index_name = settings.MEILISEARCH_INDEX_PREFIX + "studio_content"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is searching the studio index. It doesn't make sense. It should get the index name from the subclass, and it definitely shouldn't be searching the studio content. For courseware search, it should be COURSEWARE_CONTENT_INDEX_NAME


# Perform the search with specified facets and filters
results = client.index(index_name).search(term, {
"facets": ["block_type", "tags"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any code using edx-search is aware of tags, so we don't need to include tags as a facet.

"filter": [f"context_key='{course_id}'"],
"limit": limit
})

# Process the search hits to extract relevant fields
results = list(map(lambda it: {
"id": it["id"],
"display_name": it["display_name"],
"usage_key": it["usage_key"],
}, results["hits"]))

return results


def _elasticsearch_auto_suggest_search_api(term, course_id, limit=30):
"""
Perform an auto-suggest search using either Elasticsearch or MeiliSearch based on configuration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Perform an auto-suggest search using either Elasticsearch or MeiliSearch based on configuration.
Perform an auto-suggest search using Elasticsearch.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "auto-suggest search" ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be moved to the ElasticSearchEngine class? I don't think it belongs in this file.


Args:
term (str): The search term.
course_id (str): The ID of the course to filter the search results.
limit (int, optional): The maximum number of results to return. Defaults to 30.

Returns:
list: A list of dictionaries containing the search results with 'id', 'display_name' and 'usage_key'.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define this using type hints rather than just describing it.

"""

# Get the search engine instance
searcher = SearchEngine.get_search_engine(
getattr(settings, "COURSEWARE_CONTENT_INDEX_NAME", "courseware_content")
)

# Perform the search with the specified query string, size, and field dictionary
results = searcher.search(
query_string=term,
size=limit,
field_dictionary={"course": course_id}
)

# Process the search results to extract relevant fields
results = list(map(lambda it: {
"id": it["_id"],
"display_name": it["data"]["content"]["display_name"],
"usage_key": it["_id"],
}, results["results"]))

return results


def auto_suggest_search_api(term, course_id, limit=30):
"""
Perform an auto-suggest search using the MeiliSearch search engine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Perform an auto-suggest search using the MeiliSearch search engine.
Perform an auto-suggest search using the configured search backend.


Args:
term (str): The search term.
course_id (str): The ID of the course to filter the search results.
limit (int, optional): The maximum number of results to return. Defaults to 30.

Returns:
list: A list of dictionaries containing the search results with 'id', 'display_name' and 'usage_key'.
"""
# Initialize response dictionary
response = {"results": []}

# Check which search engine to use based on settings
if getattr(settings, "MEILISEARCH_ENABLED", False):
# Use MeiliSearch otherwise
results = _meilisearch_auto_suggest_search_api(term, course_id, limit)
else:
# Use Elasticsearch if MEILISEARCH_ENABLED is set to True
results = _elasticsearch_auto_suggest_search_api(term, course_id, limit)

# Update response with the search results
response.update(results=results)

return response
Loading