Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: auto import course structure on course publish #43

Conversation

Danyal-Faheem
Copy link
Collaborator

@Danyal-Faheem Danyal-Faheem commented Aug 5, 2024

Changes

  • Parse openedx_logs in vector to look for the Updating course overview for <course_id> logs
  • Send this course_id to a service
  • Create a new service cairn-watchcourses that listens for incoming events from vector
  • Trigger importcoursedata script from cairn-watchcourses for the specific course whenever a course is published

Caveats

  • We utilize batch processing on the vector sinks with a timeout of 300 secs or batch size of 10 events. This means that changes to the course structure may take up to 5 minutes to show up in the superset dashboard.

@Danyal-Faheem Danyal-Faheem self-assigned this Aug 5, 2024
@Danyal-Faheem Danyal-Faheem added the enhancement New feature or request label Aug 5, 2024
Copy link
Collaborator

@regisb regisb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent work. I have many comments, because this is an important change, but I expect we'll get there.

tutorcairn/plugin.py Outdated Show resolved Hide resolved
tutorcairn/patches/local-docker-compose-services Outdated Show resolved Hide resolved
tutorcairn/templates/cairn/apps/openedx/scripts/main.py Outdated Show resolved Hide resolved
tutorcairn/templates/cairn/apps/openedx/scripts/main.py Outdated Show resolved Hide resolved
@@ -85,3 +85,17 @@ cairn-postgresql:
depends_on:
- permissions
{% endif %}
cairn-watchcourses:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the additional CPU/memory resources that are needed by this container? (see docker stats) Given that it's a very thin wrapper, I expect that the requirements are low. But if they are not, then we'll have to gatekeep this service behind a feature flag.

Copy link
Collaborator Author

@Danyal-Faheem Danyal-Faheem Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My environment is the following:

  • Docker 4.28.0
  • Tutor 18.1.1
  • Plugins enabled: Cairn

My machine specs are the following:

  • MacOS Sonoma
  • Macbook Pro M1

Resources utilized:

When the container is sitting idle, listening to requests:

  • CPU: 0.00 - 0.03%
  • Memory: ~58 MiB

When the container is executing the importcoursedata script:

  • CPU: 100% (on MacOS, this usually signifies one core is being completely utilized)
  • Memory: 250 MiB - 300 MiB

Then it returns back to idle resources once it is completed.

These results would definitely vary on a linux based machine.

&& uvicorn --app-dir /openedx/scripts/ main:app --host 0.0.0.0 --port {{ CAIRN_WATCHCOURSES_PORT }}"
restart: unless-stopped
environment:
SETTINGS: ${TUTOR_EDX_PLATFORM_SETTINGS:-tutor.production}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid defining this environment variable here? After all, the fastapi process only needs it for its subprocess, not its main process. I suggest removing it from the service declaration and use the env option in subprocess.call(... env=...) (docs).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just works without defining this environment variable. Checking the value of os.environ, it reveals that the container is already using the production settings.

This is what I found in os.environ.

'DJANGO_SETTINGS_MODULE': 'lms.envs.tutor.production'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to work in production because the container inherits the settings from the image definition. But you should make sure that it's correct in dev and in kubernetes.

inputs = ["course_published"]
batch.timeout_secs = 300
batch.max_events = 10
uri = "http://cairn-watchcourses:9282/import_course/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name does not seem very adequate anymore, now that we can pass several course IDs at once. Maybe /courses/published?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds much better, I'll update it.


data = await request.json()
if not isinstance(data, list) or 'course_id' not in data[0]:
return web.json_response({"error":"Value course_id is required."}, status=400)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's return two different errors based on the data format:

if not isinstance(data, list):
    return web.json_response({"error": f"Incorrect data format. Expected list, got {data.__class__}."}, status=400)
course_ids = []
for course in data:
    course_id = data.get("course_id")
    if not isinstance(course_id, str):
        return web.json_response({"error": f"Incorrect course_id format: expected str, got {course_id.__class__}."}, status=400)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks great but according to this, even if one course_id is invalid, we return an error. Shouldn't we still try to run import for the valid course_ids instead?

# Verify course_id is a valid course_id
try:
CourseLocator.from_string(course_id)
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should almost never silence an arbitrary error. At the very least, we should log.exception(e).


# If none of the course_ids are valid, return an error
if not course_ids:
return web.json_response({"error": f"Invalid course_id"}, status=400)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "invalid course_id"? The message should explain that we expect course_id as arguments, and we didn't get any.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to signify that the format of the course_id itself is invalid. But I will update this to a better sounding message.

tutorcairn/templates/cairn/apps/openedx/scripts/server.py Outdated Show resolved Hide resolved
tutorcairn/templates/cairn/apps/openedx/scripts/server.py Outdated Show resolved Hide resolved
&& uvicorn --app-dir /openedx/scripts/ main:app --host 0.0.0.0 --port {{ CAIRN_WATCHCOURSES_PORT }}"
restart: unless-stopped
environment:
SETTINGS: ${TUTOR_EDX_PLATFORM_SETTINGS:-tutor.production}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to work in production because the container inherits the settings from the image definition. But you should make sure that it's correct in dev and in kubernetes.

tutorcairn/templates/cairn/apps/openedx/scripts/server.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@regisb regisb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work!

Comment on lines 6 to 10
# Configure logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')

# Get a logger instance

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the logging comments here, the code is self-explanatory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -0,0 +1,45 @@
from aiohttp import web

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • nit: a file docstring will be gelpful to understand the context.
  • Please check that imports are ordered correctly (first party \n third party \n local code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sorted the imports and added a file docstring as well.

@DawoudSheraz DawoudSheraz merged commit e1fd754 into overhangio:master Sep 2, 2024
2 checks passed
@Danyal-Faheem Danyal-Faheem deleted the danyalfaheem/import-course-data-on-course-publish branch September 5, 2024 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

3 participants