Skip to content

Commit

Permalink
fix mgcd bug
Browse files Browse the repository at this point in the history
  • Loading branch information
tzt-star committed Dec 5, 2024
2 parents 8f6ae5d + c4c7413 commit f4ebcaa
Show file tree
Hide file tree
Showing 283 changed files with 22,564 additions and 1 deletion.
38 changes: 38 additions & 0 deletions .github/workflows/python-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Upload Python Package

on:
release:
types: [published]

permissions:
contents: read

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.9'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
pip install pytest
pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install -e . --verbose
pip install -r requirements.txt
- name: Test
run: |
cd tests && pytest && cd ..
- name: Build package
run: python -m build
- name: Publish package
uses: pypa/[email protected]
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
21 changes: 21 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
.idea/
.ipynb_checkpoints/
__pycache__/
/data/
/temp/
/archive/
/test/
/conf/
/.vscode/
/examples/**/archive/
/examples/**/temp/
/examples/**/data/
/build/
/dist/
*.egg-info/
docs/build/
/.pytest_cache/
/tests/archive/
/tests/conf/
/tests/temp/
/tests/data/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 HFUT-LEC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
recursive-include edustudio/assets/ *.yaml
77 changes: 77 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
![logo](./assets/logo.png)
---

<p float="left">
<img src="https://img.shields.io/badge/python-v3.8+-blue">
<img src="https://img.shields.io/badge/pytorch-v1.10+-blue">
<img src="https://img.shields.io/badge/License-MIT-blue">
<img src="https://img.shields.io/github/issues/HFUT-LEC/EduStudio.svg">
<a href="https://journal.hep.com.cn/fcs/EN/10.1007/s11704-024-40372-3">
<img src="https://img.shields.io/badge/Paper-EduStudio-blue" alt="Paper EduStudio Badge">
</a>
</p>

EduStudio is a Unified Library for Student Cognitive Modeling including Cognitive Diagnosis(CD) and Knowledge Tracing(KT) based on Pytorch.

## :loudspeaker: Announcement

- The paper of our work is just accepted by Frontiers of Computer Science. 📃: [Paper](https://journal.hep.com.cn/fcs/EN/10.1007/s11704-024-40372-3)

## Navigation


| Resource Name | Description |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| [Eco-Repository](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling) | A repository containing resources about student cognitive modeling: [papers](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/papers), [datasets](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/datasets), [conferences&journals](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/conferences%26journals) |
| [Eco-Leaderboard](https://leaderboard.edustudio.ai) | A leaderboard demonstrating performance of implemented models |
| [EduStudio Documentation](https://edustudio.readthedocs.io/) | The document for EduStudio usage |
| [Reference Table](https://edustudio.readthedocs.io/en/latest/user_guide/reference_table.html) | The reference table demonstrating the corresponding templates of each model |

## Description

EduStudio first decomposes the general algorithmic workflow into six steps: `configuration reading`, `data prepration`, `model implementation`, `training control`, `model evaluation`, and `Log Storage`. Subsequently, to enhance the `reusability` and `scalability` of each step, we extract the commonalities of each algorithm at each step into individual templates for templatization.

<p align="center">
<img src="assets/framework.png" alt="EduStudio Architecture" width="600">
<br>
<b>Figure</b>: Overall Architecture of EduStudio
</p>


## Quick Start

Install `EduStudio`:

```bash
pip install -U edustudio
```

Example: Run `NCDM` model:

```python
from edustudio.quickstart import run_edustudio

run_edustudio(
dataset='FrcSub',
cfg_file_name=None,
traintpl_cfg_dict={
'cls': 'GeneralTrainTPL',
},
datatpl_cfg_dict={
'cls': 'CDInterExtendsQDataTPL'
},
modeltpl_cfg_dict={
'cls': 'NCDM',
},
evaltpl_cfg_dict={
'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
}
)

```

To find out which templates are used for a model, we can find in the [Reference Table](https://edustudio.readthedocs.io/en/latest/user_guide/reference_table.html)

## License

EduStudio uses [MIT License](https://github.com/HFUT-LEC/EduStudio/blob/main/LICENSE).
Binary file added assets/framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.11"

# Build documentation in the docs/ directory with Sphinx
sphinx:
configuration: docs/source/conf.py

# We recommend specifying your dependencies to enable reproducible builds:
# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
python:
install:
- requirements: docs/requirements.txt

20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
6 changes: 6 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
sphinx==6.2.1
sphinx_rtd_theme==1.2.2
myst-parser==2.0.0
esbonio==0.16.1
sphinx_copybutton
sphinxcontrib-napoleon
Binary file added docs/source/_static/favicon.ico
Binary file not shown.
Binary file added docs/source/assets/dataflow.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/assets/framework.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/assets/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
41 changes: 41 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'EduStudio'
copyright = '2023, HFUT-LEC'
author = 'HFUT-LEC'
release = 'v1.1.3'

import sphinx_rtd_theme
import os
import sys

sys.path.insert(0, os.path.abspath("../.."))

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
'myst_parser',
"sphinx.ext.autodoc",
"sphinx.ext.napoleon",
"sphinx.ext.viewcode",
"sphinx_copybutton",
]
source_suffix = ['.rst', '.md']
templates_path = ['_templates']
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
html_static_path = ['_static']
html_favicon = '_static/favicon.ico'
95 changes: 95 additions & 0 deletions docs/source/developer_guide/customize_atomic_op.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Customize Atomic Operation

In EduStudio, we treat the whole data processing as multiple atomic operations called atomic operation sequence.
The first atomic operation, inheriting the protocol class `BaseRaw2Mid`, is the process from raw data to middle data.
The following atomic operations, inheriting the protocol class `BaseMid2Cache`, construct the process from middle data to cache data.

## BaseRaw2Mid

The atomic operations inheriting `BaseRaw2Mid` preprocess the raw dataset into middle dataset (standardized data files).

### Protocols

The protocols in `BaseRaw2Mid` are listed as follows:

| name | description | type | note |
| ------------ | ---------------------------------------------- | ------------------ | ----------------------- |
| self.dt | current dataset name | instance variable | given in BaseRaw2Mid |
| self.rawpath | raw data path of current dataset | instance variable | given in BaseRaw2Mid |
| self.midpath | middle data path of current dataset | instance variable | given in BaseRaw2Mid |
| self.logger | logger object | instance variable | given in BaseRaw2Mid |
| process | preprocess the raw dataset into middle dataset | function interface | implemented by subclass |

### Example

The following example illustrates the process of `Assistment 2019-2010 ` dataset from raw data to middle data.

```python
class R2M_ASSIST_0910(BaseRaw2Mid):
def process(self):
df = pd.read_csv(f"{self.rawpath}/skill_builder_data.csv", encoding='ISO-8859-1')

......

df_inter.to_csv(f"{self.midpath}/{self.dt}.inter.csv", index=False, encoding='utf-8')
df_user.to_csv(f"{self.midpath}/{self.dt}.stu.csv", index=False, encoding='utf-8')
df_exer.to_csv(f"{self.midpath}/{self.dt}.exer.csv", index=False, encoding='utf-8')
```

The above function first read raw data file from specific folder path (i.e., self.rawpath). After processing middle data, it will save middle data in specific folder path (i.e., self.midpath).

## BaseMid2Cache

The atomic operations inheriting `BaseMid2Cache` preprocess the middle dataset into cache dataset (standardized data files). Different from atomic operations inheriting `BaseRaw2Mid`, in one atomic operation sequence, atomic operations inheriting `BaseRaw2Mid` should be unique and be the first position. Atomic operations inheriting `BaseMid2Cache` could be multiple and dominate following all operations.

### Protocols

The protocols in `BaseMid2Cache` are listed as follows:

| name | description | type | note |
| ------------- | ----------------------------------------------------------------- | ------------------ | ----------------------- |
| default_cfg | the default configuration of operation | class variable | |
| self.logger | logger object | instance variable | given in BaseMid2Cache |
| self.m2c_cfg | actual configuration in running process | instance variable | given in BaseMid2Cache |
| _check_params | check rationality of configuration | function interface | implemented by subclass |
| process | preprocess the raw dataset into middle dataset | function interface | implemented by subclass |
| set_dt_info | store dataset information in the process (such as student number) | function interface | implemented by subclass |

### Example

The following example illustrates the partial process code of `M2C_RandomDataSplit4CD` atomic operation, which splits datasets for cognitive diagnosis.

```python
class M2C_RandomDataSplit4CD(BaseMid2Cache):
default_cfg = {
'seed': 2023,
"divide_scale_list": [7,1,2],
}

def _check_params(self):
super()._check_params()
assert 2 <= len(self.m2c_cfg['divide_scale_list']) <= 3
assert sum(self.m2c_cfg['divide_scale_list']) == 10

def process(self, **kwargs):
df = kwargs['df']

if self.n_folds == 1:
assert kwargs.get("df_train", None) is None
assert kwargs.get("df_valid", None) is None
assert kwargs.get("df_test", None) is None
df_train, df_valid, df_test = self.one_fold_split(df)
kwargs['df_train_folds'] = [df_train]
kwargs['df_valid_folds'] = [df_valid] if df_valid is not None else []
kwargs['df_test_folds'] = [df_test]
else:
df_train_list, df_test_list = self.multi_fold_split(df)
kwargs['df_train_folds'] = df_train_list
kwargs['df_test_folds'] = df_test_list

return kwargs

def set_dt_info(self, dt_info, **kwargs):
if 'stu_id:token' in kwargs['df'].columns:
dt_info['stu_count'] = int(kwargs['df']['stu_id:token'].max() + 1)
```
Loading

0 comments on commit f4ebcaa

Please sign in to comment.