Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Commit

Permalink
Merge pull request #68 from openedx/bmtcril/remove_block_relationships
Browse files Browse the repository at this point in the history
Remove block relationships / add email to profile sink
  • Loading branch information
bmtcril authored Jan 4, 2024
2 parents 8156417 + 704088e commit dc70783
Show file tree
Hide file tree
Showing 18 changed files with 20 additions and 319 deletions.
5 changes: 0 additions & 5 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,10 @@ instructions.
3. Expect C to happen
4. If D happened instead - check failed.

**Reviewers:**
- [ ] tag reviewer
- [ ] tag reviewer

**Merge checklist:**
- [ ] All reviewers approved
- [ ] CI build is green
- [ ] Version bumped
- [ ] Changelog record added
- [ ] Documentation updated (not only docstrings)
- [ ] Commits are squashed

Expand Down
25 changes: 0 additions & 25 deletions CHANGELOG.rst

This file was deleted.

43 changes: 4 additions & 39 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
Event Sink ClickHouse
#####################

|pypi-badge| |ci-badge| |codecov-badge| |doc-badge| |pyversions-badge|
|license-badge| |status-badge|

Purpose
*******

This project acts as a plugin to the `Edx Platform`_, listens for
configured `Open edX events`_, and sends them to a `ClickHouse`_ database for
analytics or other processing. This is being maintained as part of the Open
Analytics Reference System (`OARS`_) project.
analytics or other processing. This is being maintained as part of the
`Aspects`_ project.

OARS consumes the data sent to ClickHouse by this plugin as part of data
enrichment for reporting, or capturing data that otherwise does not fit in
Expand All @@ -21,9 +18,7 @@ Sinks

Currently the only sink is in the CMS. It listens for the ``COURSE_PUBLISHED``
signal and serializes a subset of the published course blocks into one table
and the relationships between blocks into another table. With those we are
able to recreate the "graph" of the course and get relevant data, such as
block names, for reporting.
in ClickHouse.

Commands
********
Expand All @@ -44,7 +39,7 @@ Please see the command help for details:
.. _Open edX events: https://github.com/openedx/openedx-events
.. _Edx Platform: https://github.com/openedx/edx-platform
.. _ClickHouse: https://clickhouse.com
.. _OARS: https://docs.openedx.org/projects/openedx-oars/en/latest/index.html
.. _Aspects: https://docs.openedx.org/projects/openedx-aspects/en/latest/index.html

Getting Started
***************
Expand Down Expand Up @@ -212,33 +207,3 @@ Reporting Security Issues
*************************

Please do not report security issues in public. Please email [email protected].

.. |pypi-badge| image:: https://img.shields.io/pypi/v/openedx-event-sink-clickhouse.svg
:target: https://pypi.python.org/pypi/openedx-event-sink-clickhouse/
:alt: PyPI

.. |ci-badge| image:: https://github.com/openedx/openedx-event-sink-clickhouse/workflows/Python%20CI/badge.svg?branch=main
:target: https://github.com/openedx/openedx-event-sink-clickhouse/actions
:alt: CI

.. |codecov-badge| image:: https://codecov.io/github/openedx/openedx-event-sink-clickhouse/coverage.svg?branch=main
:target: https://codecov.io/github/openedx/openedx-event-sink-clickhouse?branch=main
:alt: Codecov

.. |doc-badge| image:: https://readthedocs.org/projects/openedx-event-sink-clickhouse/badge/?version=latest
:target: https://openedx-event-sink-clickhouse.readthedocs.io/en/latest/
:alt: Documentation

.. |pyversions-badge| image:: https://img.shields.io/pypi/pyversions/openedx-event-sink-clickhouse.svg
:target: https://pypi.python.org/pypi/openedx-event-sink-clickhouse/
:alt: Supported Python versions

.. |license-badge| image:: https://img.shields.io/github/license/openedx/openedx-event-sink-clickhouse.svg
:target: https://github.com/openedx/openedx-event-sink-clickhouse/blob/main/LICENSE.txt
:alt: License

.. TODO: Choose one of the statuses below and remove the other status-badge lines.
.. |status-badge| image:: https://img.shields.io/badge/Status-Experimental-yellow
.. .. |status-badge| image:: https://img.shields.io/badge/Status-Maintained-brightgreen
.. .. |status-badge| image:: https://img.shields.io/badge/Status-Deprecated-orange
.. .. |status-badge| image:: https://img.shields.io/badge/Status-Unsupported-red
1 change: 0 additions & 1 deletion docs/changelog.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ Contents:
testing
internationalization
modules
changelog
decisions
references/index

Expand Down
2 changes: 1 addition & 1 deletion event_sink_clickhouse/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
A sink for Open edX events to send them to ClickHouse.
"""

__version__ = "0.5.0"
__version__ = "1.0.0"
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def dump_target_courses_to_clickhouse(

class Command(BaseCommand):
"""
Dump course block and relationship data to a ClickHouse instance.
Dump course block data to a ClickHouse instance.
"""

help = dedent(__doc__).strip()
Expand Down
3 changes: 0 additions & 3 deletions event_sink_clickhouse/models.py

This file was deleted.

4 changes: 4 additions & 0 deletions event_sink_clickhouse/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,18 @@ def get_time_last_dumped(self, instance): # pylint: disable=unused-argument
class UserProfileSerializer(BaseSinkSerializer, serializers.ModelSerializer):
"""Serializer for user profile events."""

email = serializers.CharField(source="user.email")

class Meta:
"""Meta class for user profile serializer."""

model = get_model("user_profile")

fields = [
"id",
"user_id",
"name",
"email",
"meta",
"courseware",
"language",
Expand Down
22 changes: 2 additions & 20 deletions event_sink_clickhouse/sinks/base_sink.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import csv
import datetime
import io
import json
from collections import namedtuple

import requests
Expand Down Expand Up @@ -52,38 +51,23 @@ def __init__(self, connection_overrides, log):
"timeout_secs", self.ch_timeout_secs
)

def _send_clickhouse_request(self, request, expected_insert_rows=None):
def _send_clickhouse_request(self, request):
"""
Perform the actual HTTP requests to ClickHouse.
"""
session = requests.Session()
prepared_request = request.prepare()
response = None

try:
response = session.send(prepared_request, timeout=self.ch_timeout_secs)
response.raise_for_status()

if expected_insert_rows:
summary = response.headers["X-ClickHouse-Summary"]
written_rows = json.loads(summary)["written_rows"]
if expected_insert_rows != int(written_rows):
self.log.error(
f"Clickhouse query {prepared_request.url} expected {expected_insert_rows} "
f"rows to be inserted, but only got {written_rows}!"
)

return response
except requests.exceptions.HTTPError as e:
self.log.error(str(e))
self.log.error(e.response.headers)
self.log.error(e.response)
self.log.error(e.response.text)
raise
except (requests.exceptions.InvalidJSONError, KeyError):
# ClickHouse can be configured not to return the metadata / summary we check above for
# performance reasons. It's not critical, so we eat those here.
return response


class ModelBaseSink(BaseSink):
Expand Down Expand Up @@ -286,9 +270,7 @@ def send_item(self, serialized_item, many=False):
auth=self.ch_auth,
)

self._send_clickhouse_request(
request, expected_insert_rows=len(serialized_item) if many else 1
)
self._send_clickhouse_request(request)

def fetch_target_items(self, ids=None, skip_ids=None, force_dump=False):
"""
Expand Down
63 changes: 2 additions & 61 deletions event_sink_clickhouse/sinks/course_published.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Does the following:
- Pulls the course structure from modulestore
- Serialize the xblocks and their parent/child relationships
- Serialize the xblocks
- Sends them to ClickHouse in CSV format
Note that the serialization format does not include all fields as there may be things like
Expand All @@ -25,27 +25,6 @@
}


class XBlockRelationshipSink(ModelBaseSink):
"""
Sink for XBlock relationships
"""

clickhouse_table_name = "course_relationships"
name = "XBlock Relationships"
timestamp_field = "time_last_dumped"
unique_key = "parent_location"

def dump_related(self, serialized_item, dump_id, time_last_dumped):
self.dump(
serialized_item,
many=True,
initial={"dump_id": dump_id, "time_last_dumped": time_last_dumped},
)

def serialize_item(self, item, many=False, initial=None):
return item


class XBlockSink(ModelBaseSink):
"""
Sink for XBlock model
Expand Down Expand Up @@ -112,45 +91,7 @@ def serialize_item(self, item, many=False, initial=None):
XBlockSink.strip_branch_and_version(block.location)
] = fields

nodes = list(location_to_node.values())

self.serialize_relationships(
items,
location_to_node,
course_key,
initial["dump_id"],
initial["time_last_dumped"],
)

return nodes

def serialize_relationships(
self, items, location_to_node, course_id, dump_id, dump_timestamp
):
"""Serialize the relationships between XBlocks"""
relationships = []
for item in items:
for index, child in enumerate(item.get_children()):
parent_node = location_to_node.get(
XBlockSink.strip_branch_and_version(item.location)
)
child_node = location_to_node.get(
XBlockSink.strip_branch_and_version(child.location)
)

if parent_node is not None and child_node is not None: # pragma: no cover
relationship = {
"course_key": str(course_id),
"parent_location": str(parent_node["location"]),
"child_location": str(child_node["location"]),
"order": index,
"dump_id": dump_id,
"time_last_dumped": dump_timestamp,
}
relationships.append(relationship)
XBlockRelationshipSink(self.connection_overrides, self.log).dump_related(
relationships, dump_id, dump_timestamp
)
return list(location_to_node.values())

def serialize_xblock(
self, item, index, detached_xblock_types, dump_id, time_last_dumped
Expand Down
5 changes: 1 addition & 4 deletions event_sink_clickhouse/sinks/user_retire.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,4 @@ def send_item(self, serialized_item, many=False):
params=params,
auth=self.ch_auth,
)
self._send_clickhouse_request(
request,
expected_insert_rows=0, # DELETE requests don't return a row count
)
self._send_clickhouse_request(request)
22 changes: 0 additions & 22 deletions event_sink_clickhouse/templates/event_sink_clickhouse/base.html

This file was deleted.

3 changes: 1 addition & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,14 +130,13 @@ def is_requirement(line):
sys.exit()

README = open(os.path.join(os.path.dirname(__file__), 'README.rst'), encoding="utf8").read()
CHANGELOG = open(os.path.join(os.path.dirname(__file__), 'CHANGELOG.rst'), encoding="utf8").read()

setup(
name='openedx_event_sink_clickhouse',
version=VERSION,
description="""A sink for Open edX events to send them to ClickHouse""",
long_description_content_type="text/x-rst",
long_description=README + '\n\n' + CHANGELOG,
long_description=README,
author='edX',
author_email='[email protected]',
url='https://github.com/openedx/openedx_event_sink_clickhouse',
Expand Down
Loading

0 comments on commit dc70783

Please sign in to comment.