Skip to content

Commit

Permalink
docs: update Readme to reflect Python 3.13 / Poetry 1.8.4 requirements
Browse files Browse the repository at this point in the history
  • Loading branch information
Criamos committed Nov 29, 2024
1 parent 0f351cc commit 42d85e3
Showing 1 changed file with 9 additions and 6 deletions.
15 changes: 9 additions & 6 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Open Edu Hub Search ETL

## Step 1: Project Setup - Python 3.12 (manual approach)
## Step 1: Project Setup Python 3.13 (manual approach)

- make sure you have python3 installed (<https://docs.python-guide.org/starting/installation/>)
- (Python 3.12 or newer is required)
- (Python 3.13 is required)
- go to project root
- Run the following commands:

Expand All @@ -22,7 +22,7 @@ python3 -m venv .venv

## Step 1 (alternative): Project Setup - Python (automated, via `poetry`)

- Step 1: Make sure that you have [Poetry](https://python-poetry.org) v1.5.0+ installed
- Step 1: Make sure that you have [Poetry](https://python-poetry.org) [v1.8.4](https://github.com/python-poetry/poetry/releases/tag/1.8.4)+ installed
- for detailed instructions, please consult the [Poetry Installation Guide](https://python-poetry.org/docs/#installation)
- Step 2: Open your terminal **in the project root directory**:
- Step 2.1: If you want to have your `.venv` to be created inside the project root directory:
Expand All @@ -31,6 +31,7 @@ python3 -m venv .venv
- Step 3: **Install dependencies** (according to `pyproject.toml`) by running: `poetry install`

## Step 2: Project Setup - required Docker Containers

If you have Docker installed, use `docker-compose up` to start up the multi-container for `Splash` and `Playwright`-integration.

As a last step, set up your config variables by copying the `.env.example`-file and modifying it if necessary:
Expand All @@ -40,7 +41,7 @@ As a last step, set up your config variables by copying the `.env.example`-file
# Running crawlers

- A crawler can be run with `scrapy crawl <spider-name>`.
- (It assumes that you have an edu-sharing v6.0+ instance in your `.env` settings configured which can accept the data.)
- (It assumes that you have an edu-sharing v8.1+ instance in your `.env` settings configured which can accept the data.)
- If a crawler has [Scrapy Spider Contracts](https://docs.scrapy.org/en/latest/topics/contracts.html#spiders-contracts) implemented, you can test those by running `scrapy check <spider-name>`


Expand All @@ -60,8 +61,10 @@ docker compose up

- We use Scrapy as a framework. Please check out the guides for Scrapy spider (https://docs.scrapy.org/en/latest/intro/tutorial.html)
- To create a new spider, create a file inside `converter/spiders/<myname>_spider.py`
- We recommend inheriting the `LomBase` class in order to get out-of-the-box support for our metadata model
- You may also Inherit a Base Class for crawling data, if your site provides LRMI metadata, the `LrmiBase` is a good start. If your system provides an OAI interface, you may use the `OAIBase`
- We recommend inheriting the `LomBase` class to get out-of-the-box support for our metadata model
- You may also inherit a base class (see: `converter/spiders/base_classes/`) for crawling data.
- If your site provides LRMI metadata, the `LrmiBase` is a good start.
- If your system provides an OAI interface, you may use the `OAIBase`
- As a sample/template, please take a look at the `sample_spider.py` or `sample_spider_alternative.py`
- To learn more about the LOM standard we're using, you'll find useful information at https://en.wikipedia.org/wiki/Learning_object_metadata

Expand Down

0 comments on commit 42d85e3

Please sign in to comment.