-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: auto import course structure on course publish #43
Merged
DawoudSheraz
merged 10 commits into
overhangio:master
from
edly-io:danyalfaheem/import-course-data-on-course-publish
Sep 2, 2024
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
94b5011
feat: auto import course structure on course publish
Danyal-Faheem 30f36cb
docs: add changelog entry
Danyal-Faheem 302b640
fix: make requested changes
Danyal-Faheem 53c8c31
fix: remove unnecessary requirements.txt file
Danyal-Faheem 241f0ae
fix: update k8s command and port
Danyal-Faheem c7a8327
feat: add batch processing to course_id sinks
Danyal-Faheem 23cee68
fix: update k8s deployment and services
Danyal-Faheem 379149a
docs: update refreshing course block data guide in readme
Danyal-Faheem 4bbe2ea
fix: remove unnecessary comments
Danyal-Faheem c818836
refactor: sort imports and add file docstring
Danyal-Faheem File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
changelog.d/20240805_183125_danyal.faheem_import_course_data_on_course_publish.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
- [Improvement] Auto import course structure to clickhouse on course publish by parsing CMS logs. (by @Danyal-Faheem) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
""" | ||
This module provides an HTTP service for importing course data into ClickHouse. | ||
|
||
It defines a single HTTP endpoint that allows for the submission of course IDs, | ||
which are then processed and used to trigger a subprocess for data import. | ||
|
||
Functions: | ||
- import_courses_to_clickhouse(request): Handles POST requests to '/courses/published/'. | ||
Validates the input data, verifies course IDs, and triggers an external Python script | ||
to import the data into ClickHouse. | ||
|
||
Usage: | ||
- python server.py | ||
- Run this module to start the HTTP server. It listens on port 9282 and processes | ||
requests sent to the '/courses/published/' endpoint. | ||
""" | ||
import logging | ||
import subprocess | ||
|
||
from aiohttp import web | ||
from opaque_keys.edx.locator import CourseLocator | ||
|
||
logging.basicConfig(level=logging.INFO, | ||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s') | ||
|
||
log = logging.getLogger(__name__) | ||
|
||
async def import_courses_to_clickhouse(request): | ||
|
||
data = await request.json() | ||
if not isinstance(data, list): | ||
return web.json_response({"error": f"Incorrect data format. Expected list, got {data.__class__}."}, status=400) | ||
course_ids = [] | ||
for course in data: | ||
course_id = course.get("course_id") | ||
if not isinstance(course_id, str): | ||
return web.json_response({"error": f"Incorrect course_id format. Expected str, got {course_id.__class__}."}, status=400) | ||
|
||
# Get the list of unique course_ids | ||
unique_courses = list({course['course_id']: course for course in data}.values()) | ||
|
||
course_ids = [] | ||
|
||
for course in unique_courses: | ||
course_id = course['course_id'] | ||
# Verify course_id is a valid course_id | ||
try: | ||
CourseLocator.from_string(course_id) | ||
except Exception as e: | ||
log.exception(f"An error occured: {str(e)}") | ||
return web.json_response({"error": f"Incorrect arguments. Expected valid course_id, got {course_id}."}, status=400) | ||
|
||
course_ids.append(course_id) | ||
|
||
subprocess.run(["python", "/openedx/scripts/importcoursedata.py", "-c", *course_ids], check=True) | ||
return web.Response(status=204) | ||
|
||
|
||
app = web.Application() | ||
app.router.add_post('/courses/published/', import_courses_to_clickhouse) | ||
|
||
web.run_app(app, host='0.0.0.0', port=9282) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the additional CPU/memory resources that are needed by this container? (see
docker stats
) Given that it's a very thin wrapper, I expect that the requirements are low. But if they are not, then we'll have to gatekeep this service behind a feature flag.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My environment is the following:
My machine specs are the following:
Resources utilized:
When the container is sitting idle, listening to requests:
When the container is executing the importcoursedata script:
Then it returns back to idle resources once it is completed.
These results would definitely vary on a linux based machine.