Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
kervias committed Feb 4, 2024
1 parent 2158f36 commit 34c13e0
Show file tree
Hide file tree
Showing 8 changed files with 107 additions and 179 deletions.
8 changes: 2 additions & 6 deletions docs/source/features/atomic_files.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
# Atomic File Protocol
# Middle Data Format Protocol

In `EduStudio`, we adopt a flexible CSV (Comma-Separated Values) file format following [Recbole](https://recbole.io/atomic_files.html). The flexible CSV format is defined in `middata` stage of dataset (see dataset stage protocol for details).

The atomic file protocol including two parts: `Columns name Format` and `Filename Format`.

**Note**: The atomic files protocol is defined in `Inherited Architecture`. In fact, users can abandon the atomic files protocol by inheriting the data template protocol class in `Basic Architecture`(i.e. `BaseDataTPL`).


The Middle Data Format Protocol including two parts: `Columns name Format` and `Filename Format`.

## Columns Name Format

Expand Down
31 changes: 15 additions & 16 deletions docs/source/features/atomic_operations.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,30 @@
# Atomic Operations
# Atomic Data Operations

In `Edustudio`, we view the dataset from three stages: `rawdata`, `middata`, `cachedata`.

we treat the whole data processing as multiple atomic operations called atomic operation sequence.
We treat the whole data processing as multiple atomic operations called atomic operation sequence.
The first atomic operation, inheriting the protocol class `BaseRaw2Mid`, is the process from raw data to middle data.
The following atomic operations, inheriting the protocol class `BaseMid2Cache`, construct the process from middle data to cache data.

The atomic operation protocol can be seen at `Atomic Operation Protocol`.

## Partial Atomic Operation Table


## Atomic Operation Table

In the following, we give a table to display existing atomic operations.
In the following, we give a table to display some existing atomic operations. For more detailed Atomic Operation Table, please see the `user_guide/Atomic Data Operation List`

### Raw2Mid

| name | description |
For the conversion from rawdata to middata, we implement a specific atomic data operation prefixed with `R2M` for each dataset.

| name | Corresponding datase |
| --------------- | ------------------------------------------------------------ |
| R2M_ASSIST_0910 | The atomic operation that process the Assistment_0910 dataset from rawdata into midata |
| R2M_FrcSub | The atomic operation that process the FrcSub dataset from rawdata into midata |
| R2M_ASSIST_1213 | The atomic operation that process the Assistment_1213 dataset from rawdata into midata |
| R2M_Math1 | The atomic operation that process the Math1dataset from rawdata into midata |
| R2M_Math2 | The atomic operation that process the Math2 dataset from rawdata into midata |
| R2M_AAAI_2023 | The atomic operation that process the AAAI 2023 challenge dataset from rawdata into midata |
| R2M_Algebra_0506 | The atomic operation that process the Algebra 2005-2006 dataset from rawdata into midata |
| R2M_ASSIST_1516 | The atomic operation that process the Assistment 2015-2016 dataset from rawdata into midata |
| R2M_ASSIST_0910 | ASSISTment 2009-2010 |
| R2M_FrcSub | Frcsub |
| R2M_ASSIST_1213 | ASSISTment 2012-2013 |
| R2M_Math1 | Math1 |
| R2M_Math2 | Math2 |
| R2M_AAAI_2023 | AAAI 2023 Global Knowledge Tracing Challenge |
| R2M_Algebra_0506 | Algebra 2005-2006 |
| R2M_ASSIST_1516 | ASSISTment 2015-2016 |

### Mid2Cache

Expand Down
3 changes: 3 additions & 0 deletions docs/source/features/dataset_folder_protocol.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Dataset Stage Protocol

In `Edustudio`, we view the dataset as three stages: `rawdata`, `middata`, `cachedata`.
- inconsistent rawdata: the original data format provided by the dataset publisher.
- standardized middata: the standardized middle data format(see Middle Data Format Protocol) defined by EduStudio.
- model-friendly cachedata: the data format that is convenient for model usage.


## Dataset Folder Format Example
Expand Down
12 changes: 5 additions & 7 deletions docs/source/features/global_cfg_obj.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,12 @@ The description of five config objects is illustrated in Table below.



## Four Entry Points of Configuration
## Four Configuration Portals

There are four entry points of configuration:
There are four configuration portals:

- default_cfg: inheritable class varible
- config file
- parameter dict
- default_cfg: inheritable python class varible
- configuration file
- parameter dictionary
- command line



4 changes: 2 additions & 2 deletions docs/source/features/inheritable_config.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Inheritable Configuration
# Inheritable Default Configuration

The management of default configuration in Edustudio is implemented by class variable, i.e. a dictionary object called default_config.

Templates usually introduce new features through inheritance, and these new features may require corresponding configurations, so the default configuration we provide is inheritable.

## Example

The inheritance example of data template is illustrated as follows:
The inheritance example of data template is illustrated as follows. We present an example in the data preparation procedure. There are three data template classes (DataTPLs) that inherit from each other: BaseDataTPL, GeneralDataTPL, and EduDataTPL. If users specify current DataTPL is EduDataTPL, the eventual default\_config of data preparation procedure is a merger of default\_cfg of three templates. When a configuration conflict is encountered, the default\_config of subclass template takes precedence over that of parent class templates. As a result, other configuration portals (i.e, configuration file, parameter dictionary, and command line) can only specify parameters that are confined within the default configuration. The advantage of the inheritable design is that it facilitates the reader to locate the numerous hyperparameters.

```python
class BaseDataTPL(Dataset):
Expand Down
45 changes: 22 additions & 23 deletions docs/source/user_guide/datasets.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,29 @@
# Dataset List

We collect the commonly used datasets and listed them here. The meaning of the fields in the table below is as follows:
- Exercise Text: contain textual information of exercise or not
- Concet Relation: contain relations among knowledge concepts or not (tree or prerequisite)
- Time: contain time for students to start answering questions or not
We have showcased the preprocessed dataset (i.e, provide raw2mid atomic data operation) of EduStudio here. The meaning of the fields in the table below is as follows:

- Auto download: support download `middata` of the dataset or not in EduStudio
- R2M Script: name of script to process the rawdata into middata in EduStudio



| Dataset Name | Exercise Text | Concept Relation | Time | Auto Download | R2M Script Name | Note |
| :----------------------------------------------------------- | :-----------: | :--------------: | :--: | :-----------: | :----------------------- | :----------------------------------------------------------- |
| [FrcSub](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | ✖️ | ✖️ | ✖️ | ✔️ | R2M_FrcSub | |
| [Math1](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | ✖️ | ✖️ | ✖️ | ✔️ | R2M_Math1 | |
| [Math2](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | ✖️ | ✖️ | ✖️ | ✔️ | R2M_Math2 | |
| [AAAI_2023](https://docs.google.com/forms/d/e/1FAIpQLScWjxiXdSMAKBtlPJZm9MsudUG9CQS16lT0GVfajpVj-mWReA/viewform?pli=1) | ✔️ | ✔️(tree) | ✔️ | ✔️ | R2M_AAAI_2023 | [AAAI2023 Global Knowledge Tracing Challenge](https://ai4ed.cc/competitions/aaai2023competition) |
| [ASSISTment_2009-2010](https://drive.google.com/file/d/0B2X0QD6q79ZJUFU1cjYtdGhVNjg/view?resourcekey=0-OyI8ZWxtGSAzhodUIcMf_g) | ✖️ | ✖️ | ✔️ | ✔️ | R2M_ASSIST_0910 | |
| [ASSISTment_2012-2013](https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect) | ✖️ | ✖️ | ✔️ | ✖️ | R2M_ASSIST_1213 | |
| [ASSISTment_2015-2016](https://sites.google.com/site/assistmentsdata/datasets/2015-assistments-skill-builder-data) | ✖️ | ✖️ | ✔️ | ✖️ | R2M_ASSIST_1516 | |
| [ASSISTment_2017](https://sites.google.com/view/assistmentsdatamining/dataset) | ✖️ | ✖️ | ✔️ | ✖️ | R2M_ASSIST_17 | |
| [Algebera_2005-2006](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | ✖️ | ✖️ | ✔️ | ✖️ | R2M_Algebera_0506 | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
| [Algebera_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | ✖️ | ✖️ | ✔️ | ✖️ | R2M_Algebera_0607 | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
| [Bridge2Algebra_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | ✖️ | ✖️ | ✔️ | ✖️ | R2M_Bridge2Algebra_0607 | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
| [Junyi_AreaTopicAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) | ✖️ | ✔️(tree) | ✔️ | ✖️ | R2M_Junyi_AreaTopicAsCpt | Area&Topic field as concept |
| [Junyi_ExerAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) | ✖️ | ✔️(prerequisite) | ✔️ | ✖️ | R2M_Junyi_ExerAsCpt | Exercice as concept |
| EdNet_KT1 | ✖️ | ✖️ | ✔️ | ✖️ | R2M_EdNet_KT1 | [download1](http://bit.ly/ednet-content), [download2](http://bit.ly/ednet-content) |
| [Eedi_2020_Task1&2](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) | ✖️ | ✔️(tree) | ✔️ | ✖️ | R2M_Eedi_20_T12 | [NeurIPS 2020 Education Challenge: Task1&2](https://eedi.com/projects/neurips-education-challenge) |
| [Eedi_2020_Task3&4](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) | ✔️(images) | ✔️(tree) | ✔️ | ✖️ | R2M_Eedi_20_T34 | [NeurIPS 2020 Education Challenge: Task3&4](https://eedi.com/projects/neurips-education-challenge) |

| Dataset Name | R2M Script Name | Auto Download | Note |
| :----------------------------------------------------------- | :----------------------- | ------------- | :----------------------------------------------------------: |
| [FrcSub](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | R2M_FrcSub | ✔️ | |
| [Math1](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | R2M_Math1 | ✔️ | |
| [Math2](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | R2M_Math2 | ✔️ | |
| [AAAI_2023](https://docs.google.com/forms/d/e/1FAIpQLScWjxiXdSMAKBtlPJZm9MsudUG9CQS16lT0GVfajpVj-mWReA/viewform?pli=1) | R2M_AAAI_2023 | ✔️ | [AAAI2023 Global Knowledge Tracing Challenge](https://ai4ed.cc/competitions/aaai2023competition) |
| [ASSISTment_2009-2010](https://drive.google.com/file/d/0B2X0QD6q79ZJUFU1cjYtdGhVNjg/view?resourcekey=0-OyI8ZWxtGSAzhodUIcMf_g) | R2M_ASSIST_0910 | ✔️ | |
| [ASSISTment_2012-2013](https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect) | R2M_ASSIST_1213 | ✖️ | |
| [ASSISTment_2015-2016](https://sites.google.com/site/assistmentsdata/datasets/2015-assistments-skill-builder-data) | R2M_ASSIST_1516 | ✖️ | |
| [ASSISTment_2017](https://sites.google.com/view/assistmentsdatamining/dataset) | R2M_ASSIST_17 | ✖️ | |
| [Algebera_2005-2006](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | R2M_Algebera_0506 | ✖️ | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
| [Algebera_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | R2M_Algebera_0607 | ✖️ | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
| [Bridge2Algebra_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | R2M_Bridge2Algebra_0607 | ✖️ | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
| [Junyi_AreaTopicAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) | R2M_Junyi_AreaTopicAsCpt | ✖️ | Area&Topic field as concept |
| [Junyi_ExerAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) | R2M_Junyi_ExerAsCpt | ✖️ | Exercice as concept |
| EdNet_KT1 | R2M_EdNet_KT1 | ✖️ | [download1](http://bit.ly/ednet-content), [download2](http://bit.ly/ednet-content) |
| [Eedi_2020_Task1&2](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) | R2M_Eedi_20_T12 | ✖️ | [NeurIPS 2020 Education Challenge: Task1&2](https://eedi.com/projects/neurips-education-challenge) |
| [Eedi_2020_Task3&4](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) | R2M_Eedi_20_T34 | ✖️ | [NeurIPS 2020 Education Challenge: Task3&4](https://eedi.com/projects/neurips-education-challenge) |
| [SLP_English](https://aic-fe.bnu.edu.cn/en/data/index.html) | R2M_SLP_English | ✖️ | [[paper](https://aic-fe.bnu.edu.cn/fj/2021-ICCE-SLP.pdf)\], Smart Learning Partner |
| [SLP_Math](https://aic-fe.bnu.edu.cn/en/data/index.html) | R2M_SLP_Math | ✖️ | [[paper](https://aic-fe.bnu.edu.cn/fj/2021-ICCE-SLP.pdf)\], Smart Learning Partner |
Loading

0 comments on commit 34c13e0

Please sign in to comment.