Skip to content

Commit

Permalink
Merge pull request #111 from openeduhub/develop
Browse files Browse the repository at this point in the history
merge develop into master (2024-09-04)
  • Loading branch information
Criamos authored Sep 4, 2024
2 parents 79fae39 + cb3b053 commit a8013ff
Show file tree
Hide file tree
Showing 1,226 changed files with 190,480 additions and 69,785 deletions.
21 changes: 13 additions & 8 deletions .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10"]
python-version: ["3.12"]

steps:
- uses: actions/checkout@v3
Expand All @@ -33,17 +33,22 @@ jobs:
restore-keys: |
${{ runner.os }}-pip-
${{ runner.os }}-
- name: Install dependencies
- name: Install Poetry via pip
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
if [ -f requirements-dev.txt ]; then pip install -r requirements-dev.txt; fi
- name: Lint with flake8
python -m pip install poetry
- name: Configure Poetry to use in-project .venv
run: |
python -m poetry config virtualenvs.in-project true
- name: Install Dependencies with Poetry
run: |
python -m poetry install
- name: Lint with flake8 (via Poetry)
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
poetry run flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=.venv/,edu_sharing_openapi/
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
poetry run flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude=.venv/,edu_sharing_openapi/
- name: Test with pytest
run: |
pytest
poetry run pytest
26 changes: 26 additions & 0 deletions .run/bne_portal_spider.run.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<component name="ProjectRunConfigurationManager">
<configuration default="false" name="bne_portal_spider" type="PythonConfigurationType" factoryName="Python">
<output_file path="$PROJECT_DIR$/logs/bne_portal_spider_console.log" is_save="true" />
<module name="oeh-search-etl" />
<option name="ENV_FILES" value="" />
<option name="INTERPRETER_OPTIONS" value="" />
<option name="PARENT_ENVS" value="true" />
<envs>
<env name="PYTHONUNBUFFERED" value="1" />
</envs>
<option name="SDK_HOME" value="" />
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/" />
<option name="IS_MODULE_SDK" value="true" />
<option name="ADD_CONTENT_ROOTS" value="true" />
<option name="ADD_SOURCE_ROOTS" value="true" />
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
<option name="SCRIPT_NAME" value="./.venv/bin/scrapy" />
<option name="PARAMETERS" value="crawl bne_portal_spider -O &quot;../../logs/bne_portal_spider.json&quot;" />
<option name="SHOW_COMMAND_LINE" value="false" />
<option name="EMULATE_TERMINAL" value="false" />
<option name="MODULE_MODE" value="false" />
<option name="REDIRECT_INPUT" value="false" />
<option name="INPUT_FILE" value="" />
<method v="2" />
</configuration>
</component>
9 changes: 5 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
FROM python:3.11.6-slim-bookworm
FROM python:3.12.5-slim-bookworm

# ENV CRAWLER wirlernenonline_spider

WORKDIR /

COPY entrypoint.sh entrypoint.sh
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY edu_sharing_openapi/ edu_sharing_openapi/
COPY pyproject.toml poetry.lock ./
RUN pip3 install poetry
RUN poetry install
COPY scrapy.cfg scrapy.cfg
COPY setup.cfg setup.cfg
COPY converter/ converter/
COPY csv/ csv/
COPY edu_sharing_client/ edu_sharing_client/
COPY valuespace_converter/ valuespace_converter/


Expand Down
18 changes: 11 additions & 7 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Open Edu Hub Search ETL

## Step 1: Project Setup - Python (manual approach)
## Step 1: Project Setup - Python 3.12 (manual approach)

- make sure you have python3 installed (<https://docs.python-guide.org/starting/installation/>)
- (Python 3.10 or newer is required)
- (Python 3.12 or newer is required)
- go to project root
- Run the following commands:

Expand All @@ -16,15 +16,19 @@ python3 -m venv .venv

`.venv\Scripts\activate.bat` (on Windows)

`pip3 install -r requirements.txt`
`pip3 install poetry`

`poetry install`

## Step 1 (alternative): Project Setup - Python (automated, via `poetry`)

- Step 1: Make sure that you have [Poetry](https://python-poetry.org) v1.5.0+ installed
- Step 2: Open your terminal in the project root directory:
- Step 2.1: (this is an optional, strictly personal preference) If you want to have your `.venv` to be created in the project root directory:
- for detailed instructions, please consult the [Poetry Installation Guide](https://python-poetry.org/docs/#installation)
- Step 2: Open your terminal **in the project root directory**:
- Step 2.1: If you want to have your `.venv` to be created inside the project root directory:
- `poetry config virtualenvs.in-project true`
- Step 3: Install dependencies (according to `pyproject.toml`) by running: `poetry install`
- *(this is an optional, strictly personal preference)*
- Step 3: **Install dependencies** (according to `pyproject.toml`) by running: `poetry install`

## Step 2: Project Setup - required Docker Containers
If you have Docker installed, use `docker-compose up` to start up the multi-container for `Splash` and `Playwright`-integration.
Expand Down Expand Up @@ -58,7 +62,7 @@ docker compose up
- To create a new spider, create a file inside `converter/spiders/<myname>_spider.py`
- We recommend inheriting the `LomBase` class in order to get out-of-the-box support for our metadata model
- You may also Inherit a Base Class for crawling data, if your site provides LRMI metadata, the `LrmiBase` is a good start. If your system provides an OAI interface, you may use the `OAIBase`
- As a sample/template, please take a look at the `sample_spider.py`
- As a sample/template, please take a look at the `sample_spider.py` or `sample_spider_alternative.py`
- To learn more about the LOM standard we're using, you'll find useful information at https://en.wikipedia.org/wiki/Learning_object_metadata

# Still have questions? Check out our GitHub-Wiki!
Expand Down
Loading

0 comments on commit a8013ff

Please sign in to comment.