fix mgcd bug

HFUT-LEC · Dec 5, 2024 · f4ebcaa · f4ebcaa
2 parents 8f6ae5d + c4c7413
commit f4ebcaa
Show file tree

Hide file tree

Showing 283 changed files with 22,564 additions and 1 deletion.
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -0,0 +1,38 @@
+name: Upload Python Package
+
+on:
+  release:
+    types: [published]
+
+permissions:
+  contents: read
+
+jobs:
+  deploy:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python
+      uses: actions/setup-python@v3
+      with:
+        python-version: '3.9'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install build
+        pip install pytest
+        pip install torch --index-url https://download.pytorch.org/whl/cpu
+        pip install -e . --verbose
+        pip install -r requirements.txt
+    - name: Test
+      run: |
+        cd tests && pytest && cd ..
+    - name: Build package
+      run: python -m build
+    - name: Publish package
+      uses: pypa/[email protected]
+      with:
+        user: __token__
+        password: ${{ secrets.PYPI_API_TOKEN }}
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,21 @@
+.idea/
+.ipynb_checkpoints/
+__pycache__/
+/data/
+/temp/
+/archive/
+/test/
+/conf/
+/.vscode/
+/examples/**/archive/
+/examples/**/temp/
+/examples/**/data/
+/build/
+/dist/
+*.egg-info/
+docs/build/
+/.pytest_cache/
+/tests/archive/
+/tests/conf/
+/tests/temp/
+/tests/data/
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 HFUT-LEC
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1 @@
+recursive-include edustudio/assets/ *.yaml
diff --git a/README.md b/README.md
@@ -0,0 +1,77 @@
+![logo](./assets/logo.png)
+---
+
+<p float="left">
+<img src="https://img.shields.io/badge/python-v3.8+-blue">
+<img src="https://img.shields.io/badge/pytorch-v1.10+-blue">
+<img src="https://img.shields.io/badge/License-MIT-blue">
+<img src="https://img.shields.io/github/issues/HFUT-LEC/EduStudio.svg">
+<a href="https://journal.hep.com.cn/fcs/EN/10.1007/s11704-024-40372-3">
+  <img src="https://img.shields.io/badge/Paper-EduStudio-blue" alt="Paper EduStudio Badge">
+</a>
+</p>
+
+EduStudio is a Unified Library for Student Cognitive Modeling including Cognitive Diagnosis(CD) and Knowledge Tracing(KT) based on Pytorch.
+
+## :loudspeaker: Announcement
+
+- The paper of our work is just accepted by Frontiers of Computer Science. 📃: [Paper](https://journal.hep.com.cn/fcs/EN/10.1007/s11704-024-40372-3)
+
+## Navigation
+
+
+| Resource Name                                                | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [Eco-Repository](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling) | A  repository containing resources about student cognitive modeling: [papers](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/papers), [datasets](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/datasets), [conferences&journals](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/conferences%26journals) |
+| [Eco-Leaderboard](https://leaderboard.edustudio.ai)          | A leaderboard demonstrating performance of implemented models |
+| [EduStudio Documentation](https://edustudio.readthedocs.io/) | The document for EduStudio usage                             |
+| [Reference Table](https://edustudio.readthedocs.io/en/latest/user_guide/reference_table.html) | The reference table demonstrating the corresponding templates of each model |
+
+## Description
+
+EduStudio first decomposes the general algorithmic workflow into six steps: `configuration reading`, `data prepration`, `model implementation`, `training control`, `model evaluation`, and `Log Storage`. Subsequently, to enhance the `reusability` and `scalability` of each step, we extract the commonalities of each algorithm at each step into individual templates for templatization.
+
+<p align="center">
+  <img src="assets/framework.png" alt="EduStudio Architecture" width="600">
+  <br>
+  <b>Figure</b>: Overall Architecture of EduStudio
+</p>
+
+
+## Quick Start
+
+Install `EduStudio`:
+
+```bash
+pip install -U edustudio
+```
+
+Example: Run `NCDM` model:
+
+```python
+from edustudio.quickstart import run_edustudio
+
+run_edustudio(
+    dataset='FrcSub',
+    cfg_file_name=None,
+    traintpl_cfg_dict={
+        'cls': 'GeneralTrainTPL',
+    },
+    datatpl_cfg_dict={
+        'cls': 'CDInterExtendsQDataTPL'
+    },
+    modeltpl_cfg_dict={
+        'cls': 'NCDM',
+    },
+    evaltpl_cfg_dict={
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
+    }
+)
+
+```
+
+To find out which templates are used for a model, we can find in the [Reference Table](https://edustudio.readthedocs.io/en/latest/user_guide/reference_table.html)
+
+## License
+
+EduStudio uses [MIT License](https://github.com/HFUT-LEC/EduStudio/blob/main/LICENSE). 
diff --git a/assets/framework.png b/assets/framework.png
diff --git a/assets/logo.png b/assets/logo.png
diff --git a/docs/.readthedocs.yaml b/docs/.readthedocs.yaml
@@ -0,0 +1,23 @@
+# .readthedocs.yaml
+# Read the Docs configuration file
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
+# Required
+version: 2
+
+# Set the version of Python and other tools you might need
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.11"
+
+# Build documentation in the docs/ directory with Sphinx
+sphinx:
+  configuration: docs/source/conf.py
+
+# We recommend specifying your dependencies to enable reproducible builds:
+# https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
+python:
+  install:
+  - requirements: docs/requirements.txt
+
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/make.bat b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,6 @@
+sphinx==6.2.1
+sphinx_rtd_theme==1.2.2
+myst-parser==2.0.0
+esbonio==0.16.1
+sphinx_copybutton
+sphinxcontrib-napoleon
diff --git a/docs/source/_static/favicon.ico b/docs/source/_static/favicon.ico
diff --git a/docs/source/assets/dataflow.jpg b/docs/source/assets/dataflow.jpg
diff --git a/docs/source/assets/framework.png b/docs/source/assets/framework.png
diff --git a/docs/source/assets/logo.png b/docs/source/assets/logo.png
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -0,0 +1,41 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = 'EduStudio'
+copyright = '2023, HFUT-LEC'
+author = 'HFUT-LEC'
+release = 'v1.1.3'
+
+import sphinx_rtd_theme
+import os
+import sys
+
+sys.path.insert(0, os.path.abspath("../.."))
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    'myst_parser',
+    "sphinx.ext.autodoc",
+    "sphinx.ext.napoleon",
+    "sphinx.ext.viewcode",
+    "sphinx_copybutton",
+]
+source_suffix = ['.rst', '.md']
+templates_path = ['_templates']
+exclude_patterns = []
+
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = 'sphinx_rtd_theme'
+# html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+html_static_path = ['_static']
+html_favicon = '_static/favicon.ico'
diff --git a/docs/source/developer_guide/customize_atomic_op.md b/docs/source/developer_guide/customize_atomic_op.md
@@ -0,0 +1,95 @@
+# Customize Atomic Operation
+
+In EduStudio, we treat the whole data processing as multiple atomic operations called atomic operation sequence. 
+The first atomic operation, inheriting the protocol class `BaseRaw2Mid`, is the process from raw data to middle data.
+The following atomic operations, inheriting the protocol class `BaseMid2Cache`,  construct the process from middle data to cache data.
+
+## BaseRaw2Mid
+
+The atomic operations inheriting `BaseRaw2Mid` preprocess the raw dataset into middle dataset (standardized data files).
+
+### Protocols
+
+The protocols in `BaseRaw2Mid` are listed as follows:
+
+| name         | description                                    | type               | note                    |
+| ------------ | ---------------------------------------------- | ------------------ | ----------------------- |
+| self.dt      | current dataset name                           | instance variable  | given in BaseRaw2Mid    |
+| self.rawpath | raw data path of current dataset               | instance variable  | given in BaseRaw2Mid    |
+| self.midpath | middle data path of current dataset            | instance variable  | given in BaseRaw2Mid    |
+| self.logger  | logger object                                  | instance variable  | given in BaseRaw2Mid    |
+| process      | preprocess the raw dataset into middle dataset | function interface | implemented by subclass |
+
+### Example
+
+The following example illustrates the process of `Assistment 2019-2010 ` dataset from raw data to middle data.
+
+```python
+class R2M_ASSIST_0910(BaseRaw2Mid):
+    def process(self):
+        df = pd.read_csv(f"{self.rawpath}/skill_builder_data.csv", encoding='ISO-8859-1')
+
+        ......
+
+        df_inter.to_csv(f"{self.midpath}/{self.dt}.inter.csv", index=False, encoding='utf-8')
+        df_user.to_csv(f"{self.midpath}/{self.dt}.stu.csv", index=False, encoding='utf-8')
+        df_exer.to_csv(f"{self.midpath}/{self.dt}.exer.csv", index=False, encoding='utf-8')
+```
+
+The above function first read raw data file from specific folder path (i.e., self.rawpath). After processing middle data, it will save middle data in specific folder path (i.e., self.midpath).
+
+## BaseMid2Cache
+
+The atomic operations inheriting `BaseMid2Cache` preprocess the middle dataset into cache dataset (standardized data files). Different from  atomic operations inheriting `BaseRaw2Mid`, in one atomic operation sequence, atomic operations inheriting  `BaseRaw2Mid` should be unique and be the first position. Atomic operations inheriting `BaseMid2Cache` could be multiple and dominate following all operations.
+
+### Protocols
+
+The protocols in `BaseMid2Cache` are listed as follows:
+
+| name          | description                                                       | type               | note                    |
+| ------------- | ----------------------------------------------------------------- | ------------------ | ----------------------- |
+| default_cfg   | the default configuration of operation                            | class variable     |                         |
+| self.logger   | logger object                                                     | instance variable  | given in BaseMid2Cache  |
+| self.m2c_cfg  | actual configuration in running process                           | instance variable  | given in BaseMid2Cache  |
+| _check_params | check rationality of configuration                                | function interface | implemented by subclass |
+| process       | preprocess the raw dataset into middle dataset                    | function interface | implemented by subclass |
+| set_dt_info   | store dataset information in the process (such as student number) | function interface | implemented by subclass |
+
+### Example
+
+The following example illustrates the partial process code of `M2C_RandomDataSplit4CD` atomic operation, which splits datasets for cognitive diagnosis.
+
+```python
+class M2C_RandomDataSplit4CD(BaseMid2Cache):
+    default_cfg = {
+        'seed': 2023,
+        "divide_scale_list": [7,1,2],
+    }
+
+    def _check_params(self):
+        super()._check_params()
+        assert 2 <= len(self.m2c_cfg['divide_scale_list']) <= 3
+        assert sum(self.m2c_cfg['divide_scale_list']) == 10
+
+    def process(self, **kwargs):
+        df = kwargs['df']
+
+        if self.n_folds == 1:
+            assert kwargs.get("df_train", None) is None
+            assert kwargs.get("df_valid", None) is None
+            assert kwargs.get("df_test", None) is None
+            df_train, df_valid, df_test = self.one_fold_split(df)
+            kwargs['df_train_folds'] = [df_train]
+            kwargs['df_valid_folds'] = [df_valid] if df_valid is not None else []
+            kwargs['df_test_folds'] = [df_test]
+        else:
+            df_train_list, df_test_list = self.multi_fold_split(df)
+            kwargs['df_train_folds'] = df_train_list
+            kwargs['df_test_folds'] = df_test_list
+
+        return kwargs
+
+    def set_dt_info(self, dt_info, **kwargs):
+        if 'stu_id:token' in kwargs['df'].columns:
+            dt_info['stu_count'] = int(kwargs['df']['stu_id:token'].max() + 1)
+```