HFUT-LEC · Aug 30, 2023 · Sep 20, 2023 · Oct 18, 2023 · Oct 18, 2023 · Nov 14, 2023
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -23,7 +23,7 @@ jobs:
         python -m pip install --upgrade pip
         pip install build
         pip install pytest
-        pip install torch==1.12.1 --index-url https://download.pytorch.org/whl/cpu
+        pip install torch --index-url https://download.pytorch.org/whl/cpu
         pip install -e . --verbose
         pip install -r requirements.txt
     - name: Test

diff --git a/README.md b/README.md
@@ -6,24 +6,29 @@
 <img src="https://img.shields.io/badge/pytorch-v1.10+-blue">
 <img src="https://img.shields.io/badge/License-MIT-blue">
 <img src="https://img.shields.io/github/issues/HFUT-LEC/EduStudio.svg">
+<a href="https://journal.hep.com.cn/fcs/EN/10.1007/s11704-024-40372-3">
+  <img src="https://img.shields.io/badge/Paper-EduStudio-blue" alt="Paper EduStudio Badge">
+</a>
 </p>
 
-EduStudio is a Unified and Templatized Framework for Student Assessment Models including Cognitive Diagnosis(CD) and Knowledge Tracing(KT) based on Pytorch.
+EduStudio is a Unified Library for Student Cognitive Modeling including Cognitive Diagnosis(CD) and Knowledge Tracing(KT) based on Pytorch.
 
-## Announcement
+## Navigation
 
-- We are working hard to reproduce the results presented in their papers for all models. These results will be published later on https://edustudio.ai/.
-- We are organizing more comprehensive resources related to student assessment models to build a complete ecosystem for EduStudio.
 
-## Description
-EduStudio first decomposes the general algorithmic workflow into five steps: `configuration reading`, `data processing`, `model implementation`, `training control`, and `result evaluation`. Subsequently, to enhance the `reusability `of each step, we extract the commonalities of each algorithm at each step into individual templates for templatization.
+| Resource Name                                                | Description                                                  |
+| ------------------------------------------------------------ | ------------------------------------------------------------ |
+| [Eco-Repository](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling) | A  repository containing resources about student cognitive modeling: [papers](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/papers), [datasets](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/datasets), [conferences&journals](https://github.com/HFUT-LEC/awesome-student-cognitive-modeling/tree/main/conferences%26journals) |
+| [Eco-Leaderboard](https://leaderboard.edustudio.ai)          | A leaderboard demonstrating performance of implemented models |
+| [EduStudio Documentation](https://edustudio.readthedocs.io/) | The document for EduStudio usage                             |
+| [Reference Table](https://edustudio.readthedocs.io/en/latest/user_guide/reference_table.html) | The reference table demonstrating the corresponding templates of each model |
 
-As illustrated in the Figure below, to better implement a templatized framework, we implement an `inheritance-style` EduStudio that contains basic architecture and inherited architecture with different responsibilities. The **basic architecture emphasizes domain-irrelevant content and strives to build templatized protocols**. The **inherited architecture obeys the protocol in the basic architecture and focuses on domain-relevant content**. The inheritance-style separates domainrelevant and domain-irrelevant content, greatly simplifying framework structure and enhancing `readability`.
+## Description
 
-The documentation is available [here](https://edustudio.readthedocs.io).
+EduStudio first decomposes the general algorithmic workflow into six steps: `configuration reading`, `data prepration`, `model implementation`, `training control`, `model evaluation`, and `Log Storage`. Subsequently, to enhance the `reusability` and `scalability` of each step, we extract the commonalities of each algorithm at each step into individual templates for templatization.
 
 <p align="center">
-  <img src="assets/framework.svg" alt="EduStudio Architecture" width="600">
+  <img src="assets/framework.png" alt="EduStudio Architecture" width="600">
   <br>
   <b>Figure</b>: Overall Architecture of EduStudio
 </p>
@@ -46,7 +51,7 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL'
@@ -55,15 +60,34 @@ run_edustudio(
         'cls': 'NCDM',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 
 ```
 
 To find out which templates are used for a model, we can find in the [Reference Table](https://edustudio.readthedocs.io/en/latest/user_guide/reference_table.html)
 
+## Citation
+```
+@article{Le WU:198342,
+author = {Le WU, Xiangzhi CHEN, Fei LIU, Junsong XIE, Chenao XIA, Zhengtao TAN, Mi TIAN, Jinglong LI, Kun ZHANG, Defu LIAN, Richang HONG, Meng WANG},
+title = {EduStudio: towards a unified library for student cognitive modeling},
+publisher = {Front. Comput. Sci.},
+year = {2025},
+journal = {Frontiers of Computer Science},
+volume = {19},
+number = {8},
+eid = {198342},
+numpages = {0},
+pages = {198342},
+keywords = {open-source library;student cognitive modeling;intelligence education},
+url = {https://journal.hep.com.cn/fcs/EN/abstract/article_47994.shtml},
+doi = {10.1007/s11704-024-40372-3}
+}
+```
+
+
 ## License
 
 EduStudio uses [MIT License](https://github.com/HFUT-LEC/EduStudio/blob/main/LICENSE). 
-
diff --git a/assets/framework.png b/assets/framework.png
diff --git a/assets/framework.svg b/assets/framework.svg
diff --git a/docs/source/assets/dataflow.jpg b/docs/source/assets/dataflow.jpg
diff --git a/docs/source/assets/framework.png b/docs/source/assets/framework.png
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -9,7 +9,7 @@
 project = 'EduStudio'
 copyright = '2023, HFUT-LEC'
 author = 'HFUT-LEC'
-release = 'v1.0.0-beta2.1'
+release = 'v1.1.4'
 
 import sphinx_rtd_theme
 import os

diff --git a/docs/source/developer_guide/customize_evaltpl.md b/docs/source/developer_guide/customize_evaltpl.md
@@ -23,14 +23,14 @@ The protocols in ``BaseEvalTPL`` are listed as follows.
 EvalTPLs
 ----------------------
 
-EduStudio provides ``BinaryClassificationEvalTPL`` and ``CognitiveDiagnosisEvalTPL``, which inherent ``BaseEvalTPL``.
+EduStudio provides ``PredictionEvalTPL`` and ``InterpretabilityEvalTPL``, which inherent ``BaseEvalTPL``.
 
-### BinaryClassificationEvalTPL
+### PredictionEvalTPL
 This EvalTPL is for the model evaluation using binary classification metrics.
-The protocols in ``BinaryClassificationEvalTPL`` are listed as follows.
+The protocols in ``PredictionEvalTPL`` are listed as follows.
 
 
-### CognitiveDiagnosisEvalTPL
+### InterpretabilityEvalTPL
 This EvalTPL is for the model evaluation for interpretability. It uses states of students and Q matrix for ``eval``, which are domain-specific in student assessment.
 
 ## Develop a New EvalTPL in EduStudio

diff --git a/docs/source/developer_guide/customize_traintpl.md b/docs/source/developer_guide/customize_traintpl.md
@@ -8,19 +8,19 @@ The TrainTPL Protocol is detailed in ``BaseTrainTPL``. The function to start the
 
 ## TrainTPLs 
 
-By inherenting the TrainTPL Protocol, EduStudio provides the class ``EduStudio.edustudio.traintpl.traintpl.gd_traintpl.GDTrainTPL``(``GDTrainTPL``) and ``EduStudio.edustudio.traintpl.edu_traintpl.EduTrainTPL``(``EduTrainTPL``), which are suitable for most gradient descent optimization-based models and most student evaluation models.  ``GDTrainTPL`` inherits ``BaseTrainTPL``  and rewrites ``start()``. The function to get optimizer according to the parameter ``default_cfg.optim`` is ``GDTrainTPL._get_optim()``. The function to obtain loaders of train, val, and test dataset is ``GDTrainTPL.build_loaders()``.  ``EduTrainTPL`` inherits ``GDTrainTPL`` and rewrites ``start()``. In the ``EduTrainTPL.start()``, the functions for each dataloader is ``EduTrainTPL.fit()`` .
+By inherenting the TrainTPL Protocol, EduStudio provides the class ``EduStudio.edustudio.traintpl.traintpl.gd_traintpl.GDTrainTPL``(``GDTrainTPL``) and ``EduStudio.edustudio.traintpl.edu_traintpl.GeneralTrainTPL``(``GeneralTrainTPL``), which are suitable for most gradient descent optimization-based models and most student evaluation models.  ``GDTrainTPL`` inherits ``BaseTrainTPL``  and rewrites ``start()``. The function to get optimizer according to the parameter ``default_cfg.optim`` is ``GDTrainTPL._get_optim()``. The function to obtain loaders of train, val, and test dataset is ``GDTrainTPL.build_loaders()``.  ``GeneralTrainTPL`` inherits ``GDTrainTPL`` and rewrites ``start()``. In the ``GeneralTrainTPL.start()``, the functions for each dataloader is ``GeneralTrainTPL.fit()`` .
 
 ## Develop a New TrainTPL in EduStudio
 
-If the developed model needs more complex training method, then one can inherent ``BaseTrainTPL`` and revise the function ``start()``. One can also define the configuration of the new training template in the dictionary ``default_cfg``.  Similarly, one can inherent ``GDTrainTPL`` and ``EduTrainTPL`` and revise the ``start`` function and ``default_cfg`` dictionary.
+If the developed model needs more complex training method, then one can inherent ``BaseTrainTPL`` and revise the function ``start()``. One can also define the configuration of the new training template in the dictionary ``default_cfg``.  Similarly, one can inherent ``GDTrainTPL`` and ``GeneralTrainTPL`` and revise the ``start`` function and ``default_cfg`` dictionary.
 
 Example
 -------------------------
-If you need to modify TrainTPl in the student assessment model so that only ``main_loss`` is used after a certain epoch, then you just need to inherit ``EduTrainTPL``, set the ``epoch_to_change`` parameter in ``default_cfg``.
+If you need to modify TrainTPl in the student assessment model so that only ``main_loss`` is used after a certain epoch, then you just need to inherit ``GeneralTrainTPL``, set the ``epoch_to_change`` parameter in ``default_cfg``.
 
 ```python
-from .edu_traintpl import EduTrainTPL
-class NewTrainTPL(EduTrainTPL):
+from .edu_traintpl import GeneralTrainTPL
+class NewTrainTPL(GeneralTrainTPL):
     default_cfg = {
         'epoch_to_change': 10,
     }
@@ -59,8 +59,8 @@ def fit(self, train_loader, val_loader):
 The complete code of example is detailed as follows.
 
 ```python
-from .edu_traintpl import EduTrainTPL
-class NewTrainTPL(EduTrainTPL):
+from .edu_traintpl import GeneralTrainTPL
+class NewTrainTPL(GeneralTrainTPL):
     default_cfg = {
         'epoch_to_change': 10,
     }

diff --git a/docs/source/features/atomic_files.md b/docs/source/features/atomic_files.md
@@ -1,12 +1,8 @@
-# Atomic File Protocol
+# Middle Data Format Protocol
 
 In `EduStudio`, we adopt a flexible CSV (Comma-Separated Values) file format following [Recbole](https://recbole.io/atomic_files.html).  The flexible CSV format is defined in `middata` stage of dataset (see dataset stage protocol for details).
 
-The atomic file protocol including two parts:  `Columns name Format` and `Filename Format`.
-
-**Note**: The atomic files protocol is  defined in `Inherited Architecture`. In fact, users can abandon the atomic files protocol by inheriting the data template protocol class in `Basic Architecture`(i.e. `BaseDataTPL`). 
-
-
+The Middle Data Format Protocol including two parts:  `Columns name Format` and `Filename Format`.
 
 ## Columns Name Format
 

diff --git a/docs/source/features/atomic_operations.md b/docs/source/features/atomic_operations.md
@@ -1,31 +1,30 @@
-# Atomic Operations
+# Atomic Data Operation Protocol
 
 In `Edustudio`, we view the dataset from three stages: `rawdata`, `middata`, `cachedata`.
 
- we treat the whole data processing as multiple atomic operations called atomic operation sequence. 
+We treat the whole data processing as multiple atomic operations called atomic operation sequence. 
 The first atomic operation, inheriting the protocol class `BaseRaw2Mid`, is the process from raw data to middle data.
 The following atomic operations, inheriting the protocol class `BaseMid2Cache`,  construct the process from middle data to cache data.
 
-The atomic operation protocol can be seen at `Atomic Operation Protocol`.
 
+## Partial Atomic Operation Table
 
-
-## Atomic Operation Table
-
-In the following, we give a table to display existing atomic operations.
+In the following, we give a table to display some existing atomic operations. For more detailed Atomic Operation Table, please see the `user_guide/Atomic Data Operation List`
 
 ### Raw2Mid
 
-| name            | description                                                  |
+For the conversion from rawdata to middata, we implement a specific atomic data operation prefixed with `R2M` for each dataset.
+
+| name            | Corresponding datase                                                |
 | --------------- | ------------------------------------------------------------ |
-| R2M_ASSIST_0910 | The atomic operation that process the Assistment_0910 dataset from rawdata into midata |
-| R2M_FrcSub      | The atomic operation that process the FrcSub dataset from rawdata into midata |
-| R2M_ASSIST_1213 | The atomic operation that process the Assistment_1213 dataset from rawdata into midata |
-| R2M_Math1       | The atomic operation that process the Math1dataset from rawdata into midata |
-| R2M_Math2       | The atomic operation that process the Math2 dataset from rawdata into midata |
-| R2M_AAAI_2023   | The atomic operation that process the AAAI 2023 challenge dataset from rawdata into midata |
-| R2M_Algebra_0506 | The atomic operation that process the Algebra 2005-2006 dataset from rawdata into midata |
-| R2M_ASSIST_1516 | The atomic operation that process the Assistment 2015-2016 dataset from rawdata into midata |
+| R2M_ASSIST_0910 |  ASSISTment 2009-2010  |
+| R2M_FrcSub      | Frcsub |
+| R2M_ASSIST_1213 | ASSISTment 2012-2013  |
+| R2M_Math1       | Math1 |
+| R2M_Math2       | Math2 |
+| R2M_AAAI_2023   | AAAI 2023 Global Knowledge Tracing Challenge |
+| R2M_Algebra_0506 | Algebra 2005-2006 |
+| R2M_ASSIST_1516 | ASSISTment 2015-2016 |
 
 ### Mid2Cache
 
@@ -50,7 +49,7 @@ In the following, we give a table to display existing atomic operations.
 | name                   | description                                 |
 | ---------------------- | ------------------------------------------- |
 | M2C_BuildSeqInterFeats | Build Sequential Features and Split dataset |
-| M2C_CptAsExer          | Treat knowledge concept as exercise         |
-| M2C_GenCptSeq          | Generate knowledge concept seq              |
-| M2C_GenUnFoldCptSeq    | Unfold knowledge concepts                   |
+| M2C_KCAsExer          | Treat knowledge concept as exercise         |
+| M2C_GenKCSeq          | Generate knowledge concept seq              |
+| M2C_GenUnFoldKCSeq    | Unfold knowledge concepts                   |
 
diff --git a/docs/source/features/dataset_folder_protocol.md b/docs/source/features/dataset_folder_protocol.md
@@ -1,6 +1,9 @@
-# Dataset Stage Protocol
+# Dataset Status Protocol
 
-In `Edustudio`, we view the dataset as three stages: `rawdata`, `middata`, `cachedata`.
+In `Edustudio`, we view the dataset as three statuses: `rawdata`, `middata`, `cachedata`.
+- inconsistent rawdata: the original data format provided by the dataset publisher.
+- standardized middata: the standardized middle data format(see Middle Data Format Protocol) defined by EduStudio.
+- model-friendly cachedata: the data format that is convenient for model usage.
 
 
 ## Dataset Folder Format Example
@@ -51,18 +54,18 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL',
-        'load_data_from": "rawdata", # specify the loading stage of the dataset
+        'load_data_from': "rawdata", # specify the loading stage of the dataset
         'raw2mid_op': 'R2M_FrcSub' # specify the R2M atomic operation 
     },
     modeltpl_cfg_dict={
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```
@@ -78,19 +81,19 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL',
-        'load_data_from": "middata", # specify the loading stage of the dataset
-        'is_save_cache': True # whether to save cache data
+        'load_data_from': "middata", # specify the loading stage of the dataset
+        'is_save_cache': True, # whether to save cache data
         'cache_id': 'cache_default', # cache id, valid when is_save_cache=True
     },
     modeltpl_cfg_dict={
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```
@@ -107,18 +110,19 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL',
-        'load_data_from": "cachedata", # specify the loading stage of the dataset
+        'load_data_from': "cachedata", # specify the loading stage of the dataset
+        'is_save_cache': False,
         'cache_id': 'cache_default', # cache id, valid when is_save_cache=True
     },
     modeltpl_cfg_dict={
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```
@@ -141,11 +145,11 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL',
-        'load_data_from": "rawdata", # specify the loading stage of the dataset
+        'load_data_from': "rawdata", # specify the loading stage of the dataset
         'raw2mid_op': 'R2M_FrcSub', 
         # the 'mid2cache_op_seq' option specify the atomic operation sequence
         'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_FilterRecords4CD', 'M2C_ReMapId', 'M2C_RandomDataSplit4CD', 'M2C_GenQMat'],
@@ -154,7 +158,7 @@ run_edustudio(
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```
diff --git a/docs/source/features/global_cfg_obj.md b/docs/source/features/global_cfg_obj.md
@@ -16,14 +16,12 @@ The description of five config objects is illustrated in Table below.
 
 
 
-## Four Entry Points of Configuration
+## Four Configuration Portals
 
-There are four entry points of configuration:
+There are four configuration portals:
 
-- default_cfg: inheritable class varible
-- config file
-- parameter dict
+- default_cfg: inheritable python class varible
+- configuration file
+- parameter dictionary
 - command line
 
-
-
diff --git a/docs/source/features/inheritable_config.md b/docs/source/features/inheritable_config.md
@@ -1,12 +1,12 @@
-# Inheritable Configuration
+# Inheritable Default Configuration
 
 The management of default configuration in Edustudio is implemented by class variable, i.e. a dictionary object called default_config. 
 
 Templates usually introduce new features through inheritance, and these new features may require corresponding configurations, so the default configuration we provide is inheritable.
 
 ## Example
 
-The inheritance example of data template is illustrated as follows:
+The inheritance example of data template is illustrated as follows. We present an example in the data preparation procedure. There are three data template classes (DataTPLs) that inherit from each other: BaseDataTPL, GeneralDataTPL, and EduDataTPL. If users specify current DataTPL is EduDataTPL, the eventual default\_config of data preparation procedure is a merger of default\_cfg of three templates. When a configuration conflict is encountered, the default\_config of subclass template takes precedence over that of parent class templates. As a result, other configuration portals (i.e, configuration file, parameter dictionary, and command line) can only specify parameters that are confined within the default configuration. The advantage of the inheritable design is that it facilitates the reader to locate the numerous hyperparameters.
 
 ```python
 class BaseDataTPL(Dataset):

diff --git a/docs/source/features/standard_datamodule.md b/docs/source/features/standard_datamodule.md
@@ -0,0 +1,15 @@
+# Standardized Data Module
+
+For data module, we provide a standardized design with three protocols (see following sections for details):
+- Data Status Protocol
+- Middle Data Format Protocol
+- Atomic Operation Protocol
+
+![](../assets/dataflow.jpg)
+
+The first step of Data Templates is to load the raw data from the hard disk. Then, a series of processing steps are performed to obtain model-friendly data objects. Finally, these data objects are passed on to other modules.
+We simplify the data preparation into three into three stages:
+
+- Data loading: Loading necessary data from the hard disk.
+- Data processing: Convert the raw data into model-friendly data objects by a range of data processing operations.
+- Data delivery: Deliver model-friendly data objects to the training, model, and evaluation templates.
diff --git a/docs/source/get_started/quick_start.md b/docs/source/get_started/quick_start.md
@@ -13,7 +13,7 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL'
@@ -22,7 +22,7 @@ run_edustudio(
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -1,22 +1,25 @@
 .. EduStudio documentation master file.
-.. title:: EduStudio v1.0.0-beta2.1
+.. title:: EduStudio v1.1.4
 .. image:: assets/logo.png
 
 =========================================================
 
-`HomePage <https://edustudio.ai/>`_ | `Docs <https://edustudio.ai/docs/>`_ | `GitHub <https://github.com/HFUT-LEC/EduStudio>`_ 
+`HomePage <https://edustudio.ai/>`_ | `Docs <https://edustudio.ai/docs/>`_ | `GitHub <https://github.com/HFUT-LEC/EduStudio>`_ | `Paper <https://journal.hep.com.cn/fcs/EN/10.1007/s11704-024-40372-3>`_ 
 
 Introduction
 -------------------------
-EduStudio is a Unified and Templatized Framework for Student Assessment Models including Cognitive Diagnosis(CD) and Knowledge Tracing(KT) based on Pytorch.
+EduStudio is a Unified Library for Student Assessment Models including Cognitive Diagnosis(CD) and Knowledge Tracing(KT) based on Pytorch.
 
-EduStudio first decomposes the general algorithmic workflow into five steps: ``configuration reading``, ``data processing``, ``model implementation``, ``training control``, and ``result evaluation``. Subsequently, to enhance the ``reusability`` of each step, we extract the commonalities of each algorithm at each step into individual templates for templatization.
+EduStudio first decomposes the general algorithmic workflow into six steps: `configuration reading`, `data prepration`, `model implementation`, `training control`, `model evaluation`, and `Log Storage`. Subsequently, to enhance the `reusability` and `scalability` of each step, we extract the commonalities of each algorithm at each step into individual templates for templatization.
 
-As illustrated in the Figure below, to better implement a templatized framework, we implement an ``inheritance-style`` EduStudio that contains basic architecture and inherited architecture with different responsibilities. 
+- Configuration Reading (Step 1) aims to collect, categorize and deliver configurations from different configuration portals. 
+- Data Preparation (Step 2) aims to convert raw data from the hard disk into model-friendly data objects. 
+- Model Implementation (Step 3) refers to the process of implementing the structure of each model and facilitating the reuse of model components. 
+- Training Control (Step 4) focuses primarily on the training methods of various models. 
+- Model Evaluation (Step 5) primarily focuses on the implementation of various evaluation metrics. 
+- Log Storage (Step 6) aims to implement storage specification when store generated data.
 
-The **basic architecture emphasizes domain-irrelevant content and strives to build templatized protocols**. 
-The **inherited architecture obeys the protocol in the basic architecture and focuses on domain-relevant content**. 
-The inheritance-style separates domainrelevant and domain-irrelevant content, greatly simplifying framework structure and enhancing ``readability``.
+The modularization establishes clear boundaries between various programs in the algorithm pipeline, facilitating the introduction of new content to individual modules and enhancing scalability.
 
 The overall structure is illustrated as follows:
 
@@ -42,14 +45,16 @@ The overall structure is illustrated as follows:
 
    features/global_cfg_obj
    features/inheritable_config
-   features/atomic_files
+   features/standard_datamodule
    features/dataset_folder_protocol
+   features/atomic_files
    features/atomic_operations
 
 .. toctree::
    :maxdepth: 1
    :caption: User Guide
 
+   user_guide/atom_op
    user_guide/datasets
    user_guide/models
    user_guide/reference_table

diff --git a/docs/source/user_guide/atom_op.md b/docs/source/user_guide/atom_op.md
@@ -0,0 +1,21 @@
+# M2C Atomic Data Operation List
+
+
+|    M2C Atomic operation    | M2C Atomic Type | Description                                                  |
+| :------------------------: | --------------- | ------------------------------------------------------------ |
+|       M2C_Label2Int        | Data Cleaning   | Binarization for answering response                          |
+|    M2C_FilterRecords4CD    | Data Cleaning   | Filter some students or exercises according specific conditions |
+| M2C_FilteringRecordsByAttr | Data Cleaning   | Filtering Students without attribute values, Commonly used by Fair Models |
+|        M2C_ReMapId         | Data Conversion | ReMap Column ID                                              |
+|     M2C_BuildMissingQ      | Data Conversion | Build Missing Q-matrix                                       |
+|   M2C_BuildSeqInterFeats   | Data Conversion | Build  sample format for Question-based KT                   |
+|       M2C_CKCAsExer        | Data Conversion | Build  sample format for KC-based KT                         |
+|   M2C_MergeDividedSplits   | Data Conversion | Merge train/valid/test set into one dataframe                |
+|   M2C_RandomDataSplit4CD   | Data Partition  | Data partitioning for Cognitive Diagnosis                    |
+|   M2C_RandomDataSplit4KT   | Data Partition  | Data partitioning for Knowledge Tracing                      |
+|        M2C_GenKCSeq        | Data Generation | Generate Knowledge Component Sequence                        |
+|        M2C_GenQMat         | Data Generation | Generate Q-matrix (i.e, exercise-KC relation)                |
+|    M2C_BuildKCRelation     | Data Generation | Build Knowledge Component Relation Graph                     |
+|     M2C_GenUnFoldKCSeq     | Data Generation | Generate Unfolded Knowledge Component Sequence               |
+|      M2C_FillMissingQ      | Data Generation | Fill Missing Q-matrix                                        |
+
diff --git a/docs/source/user_guide/datasets.md b/docs/source/user_guide/datasets.md
@@ -1,30 +1,29 @@
 # Dataset List
 
-We collect the commonly used datasets and listed them here. The meaning of the fields in the table below is as follows:
-- Exercise Text: contain textual information of exercise or not
-- Concet Relation: contain relations among knowledge concepts or not (tree or prerequisite)
-- Time: contain time for students to start answering questions or not
+We have showcased the preprocessed dataset (i.e, provide raw2mid atomic data operation) of EduStudio here. The meaning of the fields in the table below is as follows:
+
 - Auto download: support download `middata` of the dataset  or not in EduStudio
 - R2M Script: name of script to process the rawdata into middata  in EduStudio
 
 
 
-| Dataset Name                                                 | Exercise Text | Concept Relation | Time | Auto Download | R2M Script Name          | Note                                                         |
-| :----------------------------------------------------------- | :-----------: | :--------------: | :--: | :-----------: | :----------------------- | :----------------------------------------------------------- |
-| [FrcSub](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) |       ✖️       |        ✖️         |  ✖️   |       ✔️       | R2M_FrcSub               |                                                              |
-| [Math1](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) |       ✖️       |        ✖️         |  ✖️   |       ✔️       | R2M_Math1                |                                                              |
-| [Math2](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) |       ✖️       |        ✖️         |  ✖️   |       ✔️       | R2M_Math2                |                                                              |
-| [AAAI_2023](https://docs.google.com/forms/d/e/1FAIpQLScWjxiXdSMAKBtlPJZm9MsudUG9CQS16lT0GVfajpVj-mWReA/viewform?pli=1) |       ✔️       |     ✔️(tree)      |  ✔️   |       ✔️       | R2M_AAAI_2023         | [AAAI2023 Global Knowledge Tracing Challenge](https://ai4ed.cc/competitions/aaai2023competition) |
-| [ASSISTment_2009-2010](https://drive.google.com/file/d/0B2X0QD6q79ZJUFU1cjYtdGhVNjg/view?resourcekey=0-OyI8ZWxtGSAzhodUIcMf_g) |       ✖️       |        ✖️         |  ✔️   |       ✔️       | R2M_ASSIST_0910          |                                                              |
-| [ASSISTment_2012-2013](https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect) |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_ASSIST_1213          |                                                              |
-| [ASSISTment_2015-2016](https://sites.google.com/site/assistmentsdata/datasets/2015-assistments-skill-builder-data) |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_ASSIST_1516          |                                                              |
-| [ASSISTment_2017](https://sites.google.com/view/assistmentsdatamining/dataset) |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_ASSIST_17            |                                                              |
-| [Algebera_2005-2006](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_Algebera_0506        | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
-| [Algebera_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_Algebera_0607        | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
-| [Bridge2Algebra_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_Bridge2Algebra_0607  | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
-| [Junyi_AreaTopicAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) |       ✖️       |     ✔️(tree)      |  ✔️   |       ✖️       | R2M_Junyi_AreaTopicAsCpt | Area&Topic field as concept                                  |
-| [Junyi_ExerAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) |       ✖️       | ✔️(prerequisite)  |  ✔️   |       ✖️       | R2M_Junyi_ExerAsCpt      | Exercice as concept                                          |
-| EdNet_KT1                                                    |       ✖️       |        ✖️         |  ✔️   |       ✖️       | R2M_EdNet_KT1            | [download1](http://bit.ly/ednet-content), [download2](http://bit.ly/ednet-content) |
-| [Eedi_2020_Task1&2](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) |       ✖️       |     ✔️(tree)      |  ✔️   |       ✖️       | R2M_Eedi_20_T12          | [NeurIPS 2020 Education Challenge: Task1&2](https://eedi.com/projects/neurips-education-challenge) |
-| [Eedi_2020_Task3&4](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) |       ✔️(images)       |     ✔️(tree)      |  ✔️   |       ✖️       | R2M_Eedi_20_T34          | [NeurIPS 2020 Education Challenge: Task3&4](https://eedi.com/projects/neurips-education-challenge) |
-
+| Dataset Name                                                 | R2M Script Name          | Auto Download |                             Note                             |
+| :----------------------------------------------------------- | :----------------------- | ------------- | :----------------------------------------------------------: |
+| [FrcSub](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | R2M_FrcSub               | ✔️             |                                                              |
+| [Math1](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | R2M_Math1                | ✔️             |                                                              |
+| [Math2](http://staff.ustc.edu.cn/~qiliuql/data/math2015.rar) | R2M_Math2                | ✔️             |                                                              |
+| [AAAI_2023](https://docs.google.com/forms/d/e/1FAIpQLScWjxiXdSMAKBtlPJZm9MsudUG9CQS16lT0GVfajpVj-mWReA/viewform?pli=1) | R2M_AAAI_2023            | ✔️             | [AAAI2023 Global Knowledge Tracing Challenge](https://ai4ed.cc/competitions/aaai2023competition) |
+| [ASSISTment_2009-2010](https://drive.google.com/file/d/0B2X0QD6q79ZJUFU1cjYtdGhVNjg/view?resourcekey=0-OyI8ZWxtGSAzhodUIcMf_g) | R2M_ASSIST_0910          | ✔️             |                                                              |
+| [ASSISTment_2012-2013](https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect) | R2M_ASSIST_1213          | ✖️             |                                                              |
+| [ASSISTment_2015-2016](https://sites.google.com/site/assistmentsdata/datasets/2015-assistments-skill-builder-data) | R2M_ASSIST_1516          | ✖️             |                                                              |
+| [ASSISTment_2017](https://sites.google.com/view/assistmentsdatamining/dataset) | R2M_ASSIST_17            | ✖️             |                                                              |
+| [Algebera_2005-2006](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | R2M_Algebera_0506        | ✖️             | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
+| [Algebera_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | R2M_Algebera_0607        | ✖️             | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
+| [Bridge2Algebra_2006-2007](https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp) | R2M_Bridge2Algebra_0607  | ✖️             | [KDD Cup 2010](https://pslcdatashop.web.cmu.edu/KDDCup/rules_data_format.jsp) |
+| [Junyi_AreaTopicAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) | R2M_Junyi_AreaTopicAsCpt | ✖️             |                 Area&Topic field as concept                  |
+| [Junyi_ExerAsCpt](https://pslcdatashop.web.cmu.edu/Project?id=244) | R2M_Junyi_ExerAsCpt      | ✖️             |                     Exercice as concept                      |
+| EdNet_KT1                                                    | R2M_EdNet_KT1            | ✖️             | [download1](http://bit.ly/ednet-content), [download2](http://bit.ly/ednet-content) |
+| [Eedi_2020_Task1&2](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) | R2M_Eedi_20_T12          | ✖️             | [NeurIPS 2020 Education Challenge: Task1&2](https://eedi.com/projects/neurips-education-challenge) |
+| [Eedi_2020_Task3&4](https://dqanonymousdata.blob.core.windows.net/neurips-public/data.zip) | R2M_Eedi_20_T34          | ✖️             | [NeurIPS 2020 Education Challenge: Task3&4](https://eedi.com/projects/neurips-education-challenge) |
+| [SLP-English](https://aic-fe.bnu.edu.cn/en/data/index.html)  | R2M_SLP_English          | ✔️             | [[paper](https://aic-fe.bnu.edu.cn/fj/2021-ICCE-SLP.pdf)\], Smart Learning Partner |
+| [SLP-Math](https://aic-fe.bnu.edu.cn/en/data/index.html)     | R2M_SLP_Math             | ✔️             | [[paper](https://aic-fe.bnu.edu.cn/fj/2021-ICCE-SLP.pdf)\], Smart Learning Partner |
diff --git a/docs/source/user_guide/models.md b/docs/source/user_guide/models.md
diff --git a/docs/source/user_guide/reference_table.md b/docs/source/user_guide/reference_table.md
@@ -4,52 +4,55 @@
 
 | Model   |               DataTPL |    TrainTPL    | EvalTPL                                               |
 | :------ | ---------------------: | :-------------: | ------------------------------------------------------ |
-| IRT     |         CDInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| MIRT    |         CDInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| NCDM    | CDInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL、CognitiveDiagnosisEvalTPL |
-| CNCD_Q  |           CNCDQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| CNCD_F  |           CNCDFDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| DINA    | CDInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL、CognitiveDiagnosisEvalTPL |
-| HierCDF |         HierCDFDataTPL | EduTrainTPL | BinaryClassificationEvalTPL、CognitiveDiagnosisEvalTPL |
-| CDGK    |            CDGKDataTPL | EduTrainTPL | BinaryClassificationEvalTPL、CognitiveDiagnosisEvalTPL |
-| CDMFKC  | CDInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| ECD     |             ECDDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| IRR     |             IRRDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| KaNCD   | CDInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL、CognitiveDiagnosisEvalTPL |
-| KSCD    | CDInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| MGCD    |            MGCDDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
-| RCD     |             RCDDataTPL | EduTrainTPL | BinaryClassificationEvalTPL                            |
+| IRT     |         CDInterDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| MIRT    |         CDInterDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| MF     |         CDInterDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| NCDM    | CDInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL, InterpretabilityEvalTPL, IdentifiabilityEvalTPL |
+| CNCD_Q  |           CNCDQDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| CNCD_F  |           CNCDFDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| DINA    | CDInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL, InterpretabilityEvalTPL, IdentifiabilityEvalTPL |
+| HierCDF |         HierCDFDataTPL | GeneralTrainTPL | PredictionEvalTPL, InterpretabilityEvalTPL, IdentifiabilityEvalTPL |
+| CDGK    |            CDGKDataTPL | GeneralTrainTPL | PredictionEvalTPL, InterpretabilityEvalTPL, IdentifiabilityEvalTPL |
+| CDMFKC  | CDInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| ECD     |             ECDDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| IRR     |             IRRDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| KaNCD   | CDInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL, InterpretabilityEvalTPL, IdentifiabilityEvalTPL |
+| KSCD    | CDInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| MGCD    |            MGCDDataTPL | GroupCDTrainTPL | PredictionEvalTPL                            |
+| RCD     |             RCDDataTPL | GeneralTrainTPL | PredictionEvalTPL                            |
+| DCD     |             DCDDataTPL | DCDTrainTPL | PredictionEvalTPL, InterpretabilityEvalTPL, IdentifiabilityEvalTPL       |
+| FairCD  |          FAIRDataTPL | AdversarialTrainTPL | PredictionEvalTPL, FairnessEvalTPL       |
 
 ## KT models
 
 | Model        |                DataTPL |    TrainTPL    | EvalTPL                    |
 | :----------- | ----------------------: | :-------------: | --------------------------- |
-| AKT          | KTInterDataTPLCptUnfold | EduTrainTPL | BinaryClassificationEvalTPL |
-| ATKT         | KTInterDataTPLCptUnfold |  AtktTrainTPL   | BinaryClassificationEvalTPL |
-| CKT          |  KTInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| CL4KT        |            CL4KTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| CT_NCM       | KTInterCptUnfoldDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DeepIRT     |  KTInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DIMKT        |            DIMKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DKT          |          KTInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DKTDSC      |           DKTDSCDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DKTForget   |        DKTForgetDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DKT_plus         |          KTInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DKVMN        |  KTInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| DTransformer | KTInterCptUnfoldDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| EERNN        |            EERNNDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| EKT          |            EKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| GKT          |  GKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| HawkesKT     | KTInterCptUnfoldDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| IEKT         |  KTInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| KQN          |          KTInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| LPKT         |             LPKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| LPKT_S       |             LPKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| QDKT         |             QDKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| QIKT         |  KTInterExtendsQDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| RKT          |              RKTDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| SAINT        | KTInterCptUnfoldDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| SAINT_plus       | KTInterCptUnfoldDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| SAKT         |          KTInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| SimpleKT     | KTInterCptUnfoldDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
-| SKVMN        |          KTInterDataTPL | EduTrainTPL | BinaryClassificationEvalTPL |
+| AKT          | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| ATKT         | KTInterCptUnfoldDataTPL |  AtktTrainTPL   | PredictionEvalTPL |
+| CKT          |  KTInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| CL4KT        |            CL4KTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| CT_NCM       | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DeepIRT     |  KTInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DIMKT        |            DIMKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DKT          |          KTInterDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DKTDSC      |           DKTDSCDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DKTForget   |        DKTForgetDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DKT_plus         |          KTInterDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DKVMN        |  KTInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| DTransformer | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| EERNN        |            EERNNDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| EKT          |            EKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| GKT          |  GKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| HawkesKT     | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| IEKT         |  KTInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| KQN          |          KTInterDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| LPKT         |             LPKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| LPKT_S       |             LPKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| QDKT         |             QDKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| QIKT         |  KTInterExtendsQDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| RKT          |              RKTDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| SAINT        | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| SAINT_plus       | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| SAKT         |          KTInterDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| SimpleKT     | KTInterCptUnfoldDataTPL | GeneralTrainTPL | PredictionEvalTPL |
+| SKVMN        |          KTInterDataTPL | GeneralTrainTPL | PredictionEvalTPL |
diff --git a/docs/source/user_guide/usage/aht.md b/docs/source/user_guide/usage/aht.md
@@ -15,11 +15,14 @@ Here we list two demos for `Ray.Tune` and `HyperOpt`.
 ## Ray.Tune
 
 ```python
+# run following after installed edustudio
+
 from edustudio.quickstart import run_edustudio
 from ray import tune
 import ray
 ray.init(num_cpus=4, num_gpus=1)
-
+from edustudio.utils.common import IDUtil as idUtil
+import uuid
 
 def deliver_cfg(args):
     g_args = {
@@ -33,6 +36,9 @@ def deliver_cfg(args):
         g, k = k.split(".")
         assert g in g_args
         g_args[g][k] = v
+    g_args['frame_cfg'] = {
+        'ID': idUtil.get_random_id_bytime() + str(uuid.uuid4()).split("-")[-1]
+    }
     return g_args
 
 
@@ -54,20 +60,21 @@ def objective_function(args):
 
 
 search_space= {
-    'traintpl_cfg.cls': tune.grid_search(['EduTrainTPL']),
+    'traintpl_cfg.cls': tune.grid_search(['GeneralTrainTPL']),
     'datatpl_cfg.cls': tune.grid_search(['CDInterExtendsQDataTPL']),
     'modeltpl_cfg.cls': tune.grid_search(['KaNCD']),
-    'evaltpl_cfg.clses': tune.grid_search([['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL']]),
+    'evaltpl_cfg.clses': tune.grid_search([['PredictionEvalTPL', 'InterpretabilityEvalTPL']]),
 
 
     'traintpl_cfg.batch_size': tune.grid_search([256,]),
     'traintpl_cfg.epoch_num': tune.grid_search([2]),
-    'traintpl_cfg.device': tune.grid_search(["cpu"]),
-    'modeltpl_cfg.emb_dim': tune.grid_search([20,40])
+    'traintpl_cfg.device': tune.grid_search(["cuda:0"]),
+    'modeltpl_cfg.emb_dim': tune.grid_search([20,40]),
+    'frame_cfg.DISABLE_LOG_STDOUT': tune.grid_search([False]),
 }
 
 tuner = tune.Tuner(
-    objective_function, param_space=search_space, tune_config=tune.TuneConfig(max_concurrent_trials=1)
+    tune.with_resources(objective_function, {"gpu": 1}), param_space=search_space, tune_config=tune.TuneConfig(max_concurrent_trials=1),
 ) 
 results = tuner.fit()
 
@@ -78,23 +85,29 @@ print(results.get_best_result(metric="auc", mode="max").config)
 ## HyperOpt
 
 ```python
+import sys
+import os
+
 from edustudio.quickstart import run_edustudio
 from hyperopt import hp
 from hyperopt import fmin, tpe, space_eval
-
+from edustudio.utils.common import IDUtil as idUtil
+import uuid
 
 def deliver_cfg(args):
     g_args = {
         'traintpl_cfg': {},
         'datatpl_cfg': {},
         'modeltpl_cfg': {},
         'evaltpl_cfg': {},
-        'frame_cfg': {},
     }
     for k,v in args.items():
         g, k = k.split(".")
         assert g in g_args
         g_args[g][k] = v
+    g_args['frame_cfg'] = {
+        'ID': idUtil.get_random_id_bytime() + str(uuid.uuid4()).split("-")[-1]
+    }
     return g_args
 
 
@@ -109,16 +122,16 @@ def objective_function(args):
         modeltpl_cfg_dict=g_args['modeltpl_cfg'],
         evaltpl_cfg_dict=g_args['evaltpl_cfg'],
         frame_cfg_dict=g_args['frame_cfg'],
-        return_cfg_and_result=True
+        return_cfg_and_result=True,
     )
     return res['auc']
 
 
 space = {
-    'traintpl_cfg.cls': hp.choice('traintpl_cfg.cls', ['EduTrainTPL']),
+    'traintpl_cfg.cls': hp.choice('traintpl_cfg.cls', ['GeneralTrainTPL']),
     'datatpl_cfg.cls': hp.choice('datapl_cfg.cls', ['CDInterExtendsQDataTPL']),
     'modeltpl_cfg.cls': hp.choice('modeltpl_cfg.cls', ['KaNCD']),
-    'evaltpl_cfg.clses': hp.choice('evaltpl_cfg.clses', [['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL']]),
+    'evaltpl_cfg.clses': hp.choice('evaltpl_cfg.clses', [['PredictionEvalTPL', 'InterpretabilityEvalTPL']]),
 
 
     'traintpl_cfg.batch_size': hp.choice('traintpl_cfg.batch_size', [256,]),
@@ -131,4 +144,5 @@ best = fmin(objective_function, space, algo=tpe.suggest, max_evals=10, verbose=F
 print("=="*10)
 print(best)
 print(space_eval(space, best))
+
 ```
diff --git a/docs/source/user_guide/usage/run_edustudio.md b/docs/source/user_guide/usage/run_edustudio.md
@@ -11,7 +11,7 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL'
@@ -20,7 +20,7 @@ run_edustudio(
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```
@@ -48,14 +48,14 @@ datatpl_cfg:
   cls: CDInterDataTPL
 
 traintpl_cfg:
-  cls: EduTrainTPL
+  cls: GeneralTrainTPL
   batch_size: 512
 
 modeltpl_cfg:
   cls: NCDM
 
 evaltpl_cfg:
-  clses: [BinaryClassificationEvalTPL, CognitiveDiagnosisEvalTPL]
+  clses: [PredictionEvalTPL, InterpretabilityEvalT]
 ```
 
 then, run command:

diff --git a/docs/source/user_guide/usage/use_case_of_config.md b/docs/source/user_guide/usage/use_case_of_config.md
@@ -22,7 +22,7 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL',
@@ -34,15 +34,15 @@ run_edustudio(
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
     }
 )
 ```
 
 ## Q2: How to specify the config of evaluate template
-The default_cfg of `BinaryClassificationEvalTPL` is as follows:
+The default_cfg of `PredictionEvalTPL` is as follows:
 ```python
-class BinaryClassificationEvalTPL(BaseEvalTPL):
+class PredictionEvalTPL(BaseEvalTPL):
     default_cfg = {
         'use_metrics': ['auc', 'acc', 'rmse']
     }
@@ -58,7 +58,7 @@ run_edustudio(
     dataset='FrcSub',
     cfg_file_name=None,
     traintpl_cfg_dict={
-        'cls': 'EduTrainTPL',
+        'cls': 'GeneralTrainTPL',
     },
     datatpl_cfg_dict={
         'cls': 'CDInterExtendsQDataTPL',
@@ -70,8 +70,8 @@ run_edustudio(
         'cls': 'KaNCD',
     },
     evaltpl_cfg_dict={
-        'clses': ['BinaryClassificationEvalTPL', 'CognitiveDiagnosisEvalTPL'],
-        'CognitiveDiagnosisEvalTPL': {
+        'clses': ['PredictionEvalTPL', 'InterpretabilityEvalTPL'],
+        'InterpretabilityEvalTPL': {
             'use_metrics': {"auc"} # look here
         }
     }

diff --git a/edustudio/__init__.py b/edustudio/__init__.py
@@ -2,4 +2,4 @@
 from __future__ import print_function
 from __future__ import division
 
-__version__ = 'v1.0.0-beta2.1'
+__version__ = 'v1.1.4'
diff --git a/edustudio/assets/datasets.yaml b/edustudio/assets/datasets.yaml
@@ -1,12 +1,15 @@
+# 1. all datasets are stored in https://huggingface.co/datasets/lmcRS/edustudio-datasets
+# 2. some datasets may not list here, but can still download, as edustudio will look up from external yaml file: https://huggingface.co/datasets/lmcRS/edustudio-datasets/raw/main/datasets.yaml
+
 ASSIST_0910:
-  middata_url: https://gitlab.com/hfut-lec/edudatafiles/-/raw/main/ASSIST_0910/ASSIST_0910-middata.zip
+  middata_url: https://huggingface.co/datasets/lmcRS/edustudio-datasets/resolve/main/ASSIST_0910/ASSIST_0910-middata.zip
 FrcSub:
-  middata_url: https://gitlab.com/hfut-lec/edudatafiles/-/raw/main/FrcSub/FrcSub-middata.zip
+  middata_url: https://huggingface.co/datasets/lmcRS/edustudio-datasets/resolve/main/FrcSub/FrcSub-middata.zip
 Math1:
-  middata_url: https://gitlab.com/hfut-lec/edudatafiles/-/raw/main/Math1/Math1-middata.zip
+  middata_url: https://huggingface.co/datasets/lmcRS/edustudio-datasets/resolve/main/Math1/Math1-middata.zip
 Math2:
-  middata_url: https://gitlab.com/hfut-lec/edudatafiles/-/raw/main/Math2/Math2-middata.zip
+  middata_url: https://huggingface.co/datasets/lmcRS/edustudio-datasets/resolve/main/Math2/Math2-middata.zip
 AAAI_2023:
-  middata_url: https://gitlab.com/hfut-lec/edudatafiles/-/raw/main/AAAI_2023/AAAI_2023-middata.zip
+  middata_url: https://huggingface.co/datasets/lmcRS/edustudio-datasets/resolve/main/AAAI_2023/AAAI_2023-middata.zip
 PISA_2015_ECD:
-  middata_url: https://gitlab.com/hfut-lec/edudatafiles/-/raw/main/PISA_2015_ECD/PISA_2015_ECD-middata.zip
+  middata_url: https://huggingface.co/datasets/lmcRS/edustudio-datasets/resolve/main/PISA_2015_ECD/PISA_2015_ECD-middata.zip
diff --git a/edustudio/atom_op/mid2cache/CD/data_split4cd.py b/edustudio/atom_op/mid2cache/CD/data_split4cd.py
@@ -104,5 +104,3 @@ def set_dt_info(self, dt_info, **kwargs):
                                 dt_info['cpt_count'] = max(dt_info.get('cpt_count', -1), df[col].max() + 1)
                             else:
                                 dt_info['cpt_count'] = max(dt_info.get('cpt_count', -1), np.max(list(chain(*df[col].to_list()))) + 1)
-
-            a = 1
diff --git a/edustudio/atom_op/mid2cache/KT/__init__.py b/edustudio/atom_op/mid2cache/KT/__init__.py
@@ -1,4 +1,5 @@
 from .build_seq_inter_feats import M2C_BuildSeqInterFeats
-from .cpt_as_exer import M2C_CptAsExer
-from .gen_cpt_seq import M2C_GenCptSeq
-from .gen_unfold_cpt_seq import M2C_GenUnFoldCptSeq
+from .cpt_as_exer import M2C_KCAsExer
+from .gen_cpt_seq import M2C_GenKCSeq
+from .gen_unfold_cpt_seq import M2C_GenUnFoldKCSeq
+from .data_split4kt import M2C_RandomDataSplit4KT
diff --git a/edustudio/atom_op/mid2cache/KT/build_seq_inter_feats.py b/edustudio/atom_op/mid2cache/KT/build_seq_inter_feats.py
@@ -7,13 +7,10 @@
 
 class M2C_BuildSeqInterFeats(BaseMid2Cache):
     default_cfg = {
-        'seed': 2023,
-        'divide_by': 'stu',
         'window_size': 100,
-        "divide_scale_list": [7,1,2],
         "extra_inter_feats": []
     }
-
+    
     def __init__(self, m2c_cfg, n_folds, is_dataset_divided) -> None:
         super().__init__(m2c_cfg)
         self.n_folds = n_folds
@@ -25,11 +22,7 @@ def from_cfg(cls, cfg):
         n_folds = cfg.datatpl_cfg.n_folds
         is_dataset_divided = cfg.datatpl_cfg.is_dataset_divided
         return cls(m2c_cfg, n_folds, is_dataset_divided)
-
-    def _check_params(self):
-        super()._check_params()
-        assert self.m2c_cfg['divide_by'] in {'stu', 'time'}
-
+
     def process(self, **kwargs):
         df = kwargs['df']
         df_train, df_valid, df_test = kwargs['df_train'], kwargs['df_valid'], kwargs['df_test']
@@ -40,96 +33,36 @@ def process(self, **kwargs):
 
         if not self.is_dataset_divided:
             assert df_train is None and df_valid is None and df_test is None
-            if self.m2c_cfg['divide_by'] == 'stu':
-                if self.n_folds == 1:
-                    train_dict, valid_dict, test_dict = self._divide_data_df_by_stu_one_fold(df)
-                    kwargs['df_train_folds'] = [train_dict]
-                    kwargs['df_valid_folds'] = [valid_dict]
-                    kwargs['df_test_folds'] = [test_dict]
-                else:
-                    kwargs['df_train_folds'], kwargs['df_valid_folds'], kwargs['df_test_folds'] = self._divide_data_df_by_stu_multi_fold(df)
-            elif self.m2c_cfg['divide_by'] == 'time':
-                raise NotImplementedError
-            else:
-                raise ValueError(f"unknown divide_by: {self.m2c_cfg['divide_by']}")
+            self.window_size = self.m2c_cfg['window_size']
+            if self.m2c_cfg['window_size'] <= 0 or self.m2c_cfg['window_size'] is None:
+                self.window_size = df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max()
+            self.logger.info(f"actual window size: {self.window_size}")
+            kwargs['df_seq'] = self.construct_df2dict(df)
+
         else: # dataset is divided
             assert df_train is not None and df_test is not None
             if self.m2c_cfg['window_size'] <= 0 or self.m2c_cfg['window_size'] is None:
                 self.window_size = np.max([
                     df_train[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max(),
                     df_valid[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max() if df_valid is not None else 0,
-                    df_valid[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max()
+                    df_test[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max()
                 ])
-                self.logger.info(f"actual window size: {self.window_size}")
             else:
                 self.window_size = self.m2c_cfg['window_size']
+            self.logger.info(f"actual window size: {self.window_size}")
+
             train_dict = self.construct_df2dict(df_train)
             valid_dict = self.construct_df2dict(df_valid)
             test_dict = self.construct_df2dict(df_test)
-            kwargs['df_train_folds'] = [train_dict]
-            kwargs['df_valid_folds'] = [valid_dict]
-            kwargs['df_test_folds'] = [test_dict]
+            kwargs['df_train_seq'] = train_dict
+            kwargs['df_valid_seq'] = valid_dict
+            kwargs['df_test_seq'] = test_dict
         return kwargs
 
     @staticmethod
     def sort_records(df, col='order_id:token'):
         if df is not None:
             return df.sort_values(by=col, ascending=True).reset_index(drop=True)
-
-    def _divide_data_df_by_stu_one_fold(self, df: pd.DataFrame):
-        train_stu_id, val_stu_id, test_stu_id = SpliterUtil.divide_data_df_one_fold(
-            df['stu_id:token'].drop_duplicates(), seed=self.m2c_cfg['seed'], shuffle=True,
-            divide_scale_list=self.m2c_cfg['divide_scale_list']
-        )
-        train_df = df[df['stu_id:token'].isin(train_stu_id)]
-        val_df = df[df['stu_id:token'].isin(val_stu_id)] if val_stu_id is not None else None
-        test_df = df[df['stu_id:token'].isin(test_stu_id)]
-
-        if self.m2c_cfg['window_size'] <= 0 or self.m2c_cfg['window_size'] is None:
-            self.window_size = np.max([
-                train_df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max(),
-                val_df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max() if val_df is not None else 0,
-                test_df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max()
-            ])
-            self.logger.info(f"actual window size: {self.window_size}")
-        else:
-            self.window_size = self.m2c_cfg['window_size']
-
-        train_dict = self.construct_df2dict(train_df)
-        val_dict = self.construct_df2dict(val_df)
-        test_dict = self.construct_df2dict(test_df)
-        return train_dict, val_dict, test_dict
-
-    def _divide_data_df_by_stu_multi_fold(self, df: pd.DataFrame):
-        res = SpliterUtil.divide_data_df_one_fold(
-            df['stu_id:token'].drop_duplicates(), seed=self.m2c_cfg['seed'], shuffle=True,
-            divide_scale_list=self.m2c_cfg['divide_scale_list']
-        )
-
-        train_list, valid_list, test_list = [], [], []
-        for train_stu_id, val_stu_id, test_stu_id in zip(res):
-            train_df = df[df['stu_id:token'].isin(train_stu_id)]
-            val_df = df[df['stu_id:token'].isin(val_stu_id)] if val_stu_id is not None else None
-            test_df = df[df['stu_id:token'].isin(test_stu_id)]
-
-            if self.m2c_cfg['window_size'] <= 0 or self.m2c_cfg['window_size'] is None:
-                self.window_size = np.max([
-                    train_df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max(),
-                    val_df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max() if val_df is not None else 0,
-                    test_df[['stu_id:token', 'exer_id:token']].groupby('stu_id:token').agg('count')['exer_id:token'].max()
-                ])
-                self.logger.info(f"actual window size: {self.window_size}")
-            else:
-                self.window_size = self.m2c_cfg['window_size']
-
-            train_dict = self.construct_df2dict(train_df)
-            valid_dict = self.construct_df2dict(val_df)
-            test_dict = self.construct_df2dict(test_df)
-            train_list.append(train_dict)
-            valid_list.append(valid_dict)
-            test_list.append(test_dict)
-
-        return train_list, valid_list, test_list
 
     def construct_df2dict(self, df: pd.DataFrame):
         if df is None: return None
@@ -170,24 +103,3 @@ def construct_df2dict(self, df: pd.DataFrame):
                 raise NotImplementedError
 
         return ret_dict
-
-    def set_dt_info(self, dt_info, **kwargs):
-        dt_info['real_window_size'] = self.window_size
-        if not self.is_dataset_divided:
-            if 'stu_id:token' in kwargs['df'].columns:
-                dt_info['stu_count'] = int(kwargs['df']['stu_id:token'].max() + 1)
-            if 'exer_id:token' in kwargs['df'].columns:
-                dt_info['exer_count'] = int(kwargs['df']['exer_id:token'].max() + 1)
-        else:
-            stu_count = max(kwargs['df_train']['stu_id:token'].max() + 1, kwargs['df_test']['stu_id:token'].max() + 1)
-            stu_count = max(kwargs['df_valid']['stu_id:token'].max() + 1, stu_count) if 'df_valid' in kwargs else stu_count
-
-            exer_count = max(kwargs['df_train']['exer_id:token'].max() + 1, kwargs['df_test']['exer_id:token'].max() + 1)
-            exer_count = max(kwargs['df_valid']['exer_id:token'].max() + 1, exer_count) if 'df_valid' in kwargs else exer_count
-
-            dt_info['stu_count'] = stu_count
-            dt_info['exer_count'] = exer_count
-
-        if kwargs.get('df_exer', None) is not None:
-            if 'cpt_seq:token_seq' in kwargs['df_exer']:
-                dt_info['cpt_count'] = len(set(list(chain(*kwargs['df_exer']['cpt_seq:token_seq'].to_list()))))
diff --git a/edustudio/atom_op/mid2cache/KT/cpt_as_exer.py b/edustudio/atom_op/mid2cache/KT/cpt_as_exer.py
@@ -3,7 +3,9 @@
 from itertools import chain
 
 
-class M2C_CptAsExer(BaseMid2Cache):
+class M2C_KCAsExer(BaseMid2Cache):
+    """Knowledge Concept As Exercise
+    """
     default_cfg = {}
 
     def process(self, **kwargs):

diff --git a/edustudio/atom_op/mid2cache/KT/data_split4kt.py b/edustudio/atom_op/mid2cache/KT/data_split4kt.py
@@ -0,0 +1,112 @@
+from ..common.base_mid2cache import BaseMid2Cache
+import pandas as pd
+import numpy as np
+from edustudio.datatpl.utils import SpliterUtil, PadSeqUtil
+from itertools import chain
+
+
+class M2C_RandomDataSplit4KT(BaseMid2Cache):
+    default_cfg = {
+        'seed': 2023,
+        'divide_by': 'stu',
+        "divide_scale_list": [7,1,2],
+    }
+
+    def __init__(self, m2c_cfg, n_folds, is_dataset_divided) -> None:
+        super().__init__(m2c_cfg)
+        self.n_folds = n_folds
+        self.is_dataset_divided = is_dataset_divided
+
+    @classmethod
+    def from_cfg(cls, cfg):
+        m2c_cfg = cfg.datatpl_cfg.get(cls.__name__)
+        n_folds = cfg.datatpl_cfg.n_folds
+        is_dataset_divided = cfg.datatpl_cfg.is_dataset_divided
+        return cls(m2c_cfg, n_folds, is_dataset_divided)
+
+    def _check_params(self):
+        super()._check_params()
+        assert self.m2c_cfg['divide_by'] in {'stu', 'time'}
+
+    def process(self, **kwargs):
+        df_seq = kwargs['df_seq']
+        df_train_seq = kwargs.get('df_train_seq', None)
+        df_valid_seq = kwargs.get('df_validn_seq', None)
+        df_test_seq = kwargs.get('df_test_seq', None)
+
+        if not self.is_dataset_divided:
+            assert df_train_seq is None and df_valid_seq is None and df_test_seq is None
+            self.window_size = df_seq['exer_seq:token_seq'].shape[1]
+            if self.m2c_cfg['divide_by'] == 'stu':
+                if self.n_folds == 1:
+                    train_dict, valid_dict, test_dict = self._divide_data_df_by_stu_one_fold(df_seq)
+                    kwargs['df_train_folds'] = [train_dict]
+                    kwargs['df_valid_folds'] = [valid_dict]
+                    kwargs['df_test_folds'] = [test_dict]
+                else:
+                    kwargs['df_train_folds'], kwargs['df_valid_folds'], kwargs['df_test_folds'] = self._divide_data_df_by_stu_multi_fold(df_seq)
+            elif self.m2c_cfg['divide_by'] == 'time':
+                raise NotImplementedError
+            else:
+                raise ValueError(f"unknown divide_by: {self.m2c_cfg['divide_by']}")
+        else:
+            assert df_train_seq is not None and df_test_seq is not None
+            self.window_size = df_train_seq['exer_seq:token_seq'].shape[1]
+            kwargs['df_train_folds'] = [df_train_seq]
+            kwargs['df_valid_folds'] = [df_valid_seq]
+            kwargs['df_test_folds'] = [df_test_seq]
+        return kwargs
+
+    def _dict_index_flag(self, df_seq:dict, flag: np.array):
+        return {
+            k: df_seq[k][flag] for k in df_seq
+        }
+
+    def _divide_data_df_by_stu_one_fold(self, df_seq: dict):
+        train_stu_id, valid_stu_id, test_stu_id = SpliterUtil.divide_data_df_one_fold(
+            pd.DataFrame({"stu_id:token": np.unique(df_seq['stu_id:token'])}), seed=self.m2c_cfg['seed'], shuffle=True,
+            divide_scale_list=self.m2c_cfg['divide_scale_list']
+        )
+
+        df_train_seq = self._dict_index_flag(df_seq, np.isin(df_seq['stu_id:token'], train_stu_id.to_numpy().flatten()))
+        df_test_seq = self._dict_index_flag(df_seq, np.isin(df_seq['stu_id:token'], test_stu_id.to_numpy().flatten()))
+        df_valid_seq = None
+        if valid_stu_id is not None:
+            df_valid_seq = self._dict_index_flag(df_seq, np.isin(df_seq['stu_id:token'], valid_stu_id.to_numpy().flatten()))
+
+        return df_train_seq, df_test_seq, df_valid_seq
+
+    def _divide_data_df_by_stu_multi_fold(self, df_seq: pd.DataFrame):
+        res = SpliterUtil.divide_data_df_multi_folds(
+            pd.DataFrame({"stu_id:token": np.unique(df_seq['stu_id:token'])}), seed=self.m2c_cfg['seed'], shuffle=True, n_folds=self.n_folds
+        )
+
+        train_list,  test_list = [], []
+        for (train_stu_id, test_stu_id) in zip(*res):
+            df_train_seq = self._dict_index_flag(df_seq, np.isin(df_seq['stu_id:token'], train_stu_id.to_numpy().flatten()))
+            df_test_seq = self._dict_index_flag(df_seq, np.isin(df_seq['stu_id:token'], test_stu_id.to_numpy().flatten()))
+            train_list.append(df_train_seq)
+            test_list.append(df_test_seq)
+
+        return train_list, [], test_list
+
+    def set_dt_info(self, dt_info, **kwargs):
+        dt_info['real_window_size'] = self.window_size
+        if not self.is_dataset_divided:
+            if 'stu_id:token' in kwargs['df'].columns:
+                dt_info['stu_count'] = int(kwargs['df']['stu_id:token'].max() + 1)
+            if 'exer_id:token' in kwargs['df'].columns:
+                dt_info['exer_count'] = int(kwargs['df']['exer_id:token'].max() + 1)
+        else:
+            stu_count = max(kwargs['df_train']['stu_id:token'].max() + 1, kwargs['df_test']['stu_id:token'].max() + 1)
+            stu_count = max(kwargs['df_valid']['stu_id:token'].max() + 1, stu_count) if 'df_valid' in kwargs else stu_count
+
+            exer_count = max(kwargs['df_train']['exer_id:token'].max() + 1, kwargs['df_test']['exer_id:token'].max() + 1)
+            exer_count = max(kwargs['df_valid']['exer_id:token'].max() + 1, exer_count) if 'df_valid' in kwargs else exer_count
+
+            dt_info['stu_count'] = stu_count
+            dt_info['exer_count'] = exer_count
+
+        if kwargs.get('df_exer', None) is not None:
+            if 'cpt_seq:token_seq' in kwargs['df_exer']:
+                dt_info['cpt_count'] = len(set(list(chain(*kwargs['df_exer']['cpt_seq:token_seq'].to_list()))))
diff --git a/edustudio/atom_op/mid2cache/KT/gen_cpt_seq.py b/edustudio/atom_op/mid2cache/KT/gen_cpt_seq.py
@@ -3,7 +3,9 @@
 from edustudio.datatpl.utils import PadSeqUtil
 
 
-class M2C_GenCptSeq(BaseMid2Cache):
+class M2C_GenKCSeq(BaseMid2Cache):
+    """Generate Knowledge Component Sequence
+    """
     default_cfg = {
         'cpt_seq_window_size': -1,
     }

diff --git a/edustudio/atom_op/mid2cache/KT/gen_unfold_cpt_seq.py b/edustudio/atom_op/mid2cache/KT/gen_unfold_cpt_seq.py
@@ -4,7 +4,7 @@
 import pandas as pd
 
 
-class M2C_GenUnFoldCptSeq(BaseMid2Cache):
+class M2C_GenUnFoldKCSeq(BaseMid2Cache):
     default_cfg = {}
 
     def __init__(self, m2c_cfg, n_folds, is_dataset_divided) -> None:

diff --git a/edustudio/atom_op/mid2cache/common/__init__.py b/edustudio/atom_op/mid2cache/common/__init__.py
@@ -3,4 +3,7 @@
 from .label2int import M2C_Label2Int
 from .merge_divided_splits import M2C_MergeDividedSplits
 from .remapid import M2C_ReMapId
-from .build_cpt_relation import M2C_BuildCptRelation
+from .build_cpt_relation import M2C_BuildKCRelation
+from .build_missing_Q import M2C_BuildMissingQ
+from .fill_missing_Q import M2C_FillMissingQ
+from .filtering_records_by_attr import M2C_FilteringRecordsByAttr
diff --git a/edustudio/atom_op/mid2cache/common/build_cpt_relation.py b/edustudio/atom_op/mid2cache/common/build_cpt_relation.py
@@ -4,7 +4,7 @@
 from itertools import chain
 
 
-class M2C_BuildCptRelation(BaseMid2Cache):
+class M2C_BuildKCRelation(BaseMid2Cache):
     default_cfg = {
         'relation_type': 'rcd_transition',
         'threshold': None

diff --git a/edustudio/atom_op/mid2cache/common/build_dtinfo.py b/edustudio/atom_op/mid2cache/common/build_dtinfo.py
diff --git a/edustudio/atom_op/mid2cache/common/build_missing_Q.py b/edustudio/atom_op/mid2cache/common/build_missing_Q.py
@@ -0,0 +1,68 @@
+from .base_mid2cache import BaseMid2Cache
+import numpy as np
+import pandas as pd
+from itertools import chain
+import torch
+from edustudio.utils.common import set_same_seeds
+
+
+class M2C_BuildMissingQ(BaseMid2Cache):
+    default_cfg = {
+        'seed': 20230518,
+        'Q_delete_ratio': 0.0,
+    }
+
+    def process(self, **kwargs):
+        dt_info = kwargs['dt_info']
+        self.item_count = dt_info['exer_count']
+        self.cpt_count = dt_info['cpt_count']
+        self.df_Q = kwargs['df_exer'][['exer_id:token', 'cpt_seq:token_seq']]
+
+        self.missing_df_Q = self.get_missing_df_Q()
+        self.missing_Q_mat = self.get_Q_mat_from_df_arr(self.missing_df_Q, self.item_count, self.cpt_count)
+
+        kwargs['missing_df_Q'] = self.missing_df_Q
+        kwargs['missing_Q_mat'] = self.missing_Q_mat
+
+        return kwargs
+
+    def get_missing_df_Q(self):
+        set_same_seeds(seed=self.m2c_cfg['seed'])
+        ratio = self.m2c_cfg['Q_delete_ratio']
+        iid2cptlist = self.df_Q.set_index('exer_id:token')['cpt_seq:token_seq'].to_dict()
+        iid_lis = np.array(list(chain(*[[i]*len(iid2cptlist[i]) for i in iid2cptlist])))
+        cpt_lis = np.array(list(chain(*list(iid2cptlist.values()))))
+        entry_arr = np.vstack([iid_lis, cpt_lis]).T
+
+        np.random.shuffle(entry_arr)
+
+        # reference: https://stackoverflow.com/questions/64834655/python-how-to-find-first-duplicated-items-in-an-numpy-array
+        _, idx = np.unique(entry_arr[:, 1], return_index=True) # 先从每个知识点中选出1题出来
+        bool_idx = np.zeros_like(entry_arr[:, 1], dtype=bool)
+        bool_idx[idx] = True
+        preserved_exers = np.unique(entry_arr[bool_idx, 0]) # 选择符合条件的习题作为保留
+
+        delete_num = int(ratio * self.item_count)
+        preserved_num = self.item_count - delete_num
+
+        if len(preserved_exers) >= preserved_num:
+            self.logger.warning(
+                f"Cant Satisfy Delete Require: {len(preserved_exers)=},{preserved_num=}"
+            )
+        else:
+            need_preserved_num = preserved_num - len(preserved_exers)
+
+            left_iids = np.arange(self.item_count)
+            left_iids = left_iids[~np.isin(left_iids, preserved_exers)]
+            np.random.shuffle(left_iids)
+            choose_iids = left_iids[0:need_preserved_num]
+
+            preserved_exers = np.hstack([preserved_exers, choose_iids])
+
+        return self.df_Q.copy()[self.df_Q['exer_id:token'].isin(preserved_exers)].reset_index(drop=True)
+
+
+    def get_Q_mat_from_df_arr(self, df_Q_arr, item_count, cpt_count):
+        Q_mat = torch.zeros((item_count, cpt_count), dtype=torch.int64)
+        for _, item in df_Q_arr.iterrows(): Q_mat[item['exer_id:token'], item['cpt_seq:token_seq']] = 1
+        return Q_mat
diff --git a/edustudio/atom_op/mid2cache/common/fill_missing_Q.py b/edustudio/atom_op/mid2cache/common/fill_missing_Q.py
@@ -0,0 +1,112 @@
+from .base_mid2cache import BaseMid2Cache
+import numpy as np
+import pandas as pd
+from itertools import chain
+import torch
+from edustudio.utils.common import set_same_seeds, tensor2npy
+from tqdm import tqdm
+
+class M2C_FillMissingQ(BaseMid2Cache):
+    default_cfg = {
+        'Q_fill_type': "None",
+        'params_topk': 5, 
+        'params_votek': 2,
+    }
+
+    def __init__(self, m2c_cfg, cfg) -> None:
+        self.logger = cfg.logger
+        self.m2c_cfg = m2c_cfg
+        self.cfg = cfg
+
+    @classmethod
+    def from_cfg(cls, cfg):
+        return cls(cfg.datatpl_cfg.get(cls.__name__), cfg)
+
+    def process(self, **kwargs):
+        dt_info = kwargs['dt_info']
+        self.user_count = dt_info['stu_count']
+        self.item_count = dt_info['exer_count']
+        self.cpt_count = dt_info['cpt_count']
+        self.df_Q = kwargs['df_exer'][['exer_id:token', 'cpt_seq:token_seq']]
+
+        Q_mat = kwargs['Q_mat']
+        missing_Q_mat = kwargs['missing_Q_mat']
+
+        self.filling_Q_mat_list = []
+        for df_train in kwargs['df_train_folds']:
+            if (missing_Q_mat.sum(dim=1) == 0).sum() > 0:
+                if self.m2c_cfg['Q_fill_type'] == "sim_dist_for_by_exer":
+                    fill_df_Q = self.fill_df_Q_by_sim_dist(
+                        df_train, kwargs['missing_df_Q'], 
+                        params_topk=self.m2c_cfg['params_topk'], 
+                        params_votek=self.m2c_cfg['params_votek']
+                    )
+                    fill_Q_mat = self.get_Q_mat_from_df_arr(fill_df_Q, self.item_count, self.cpt_count)
+                    self.filling_Q_mat_list.append(fill_Q_mat)
+                elif self.m2c_cfg['Q_fill_type'] == "None":
+                    self.filling_Q_mat_list.append(missing_Q_mat)
+                else:
+                    raise ValueError(f"unknown Q_fill_type: {self.m2c_cfg['Q_fill_type']}")
+            else:
+                self.filling_Q_mat_list.append(Q_mat)
+
+        kwargs['filling_Q_mat_list'] = self.filling_Q_mat_list
+        return kwargs
+
+    def get_Q_mat_from_df_arr(self, df_Q_arr, item_count, cpt_count):
+        Q_mat = np.zeros((item_count, cpt_count), dtype=np.int64)
+        for _, item in df_Q_arr.iterrows(): Q_mat[item['exer_id:token'], item['cpt_seq:token_seq']] = 1
+        return Q_mat
+
+    def fill_df_Q_by_sim_dist(self, df_interaction, df_Q_left, params_topk=5, params_votek=2):
+        preserved_exers = df_Q_left['exer_id:token'].to_numpy()
+        interact_mat = torch.zeros((self.user_count, self.item_count), dtype=torch.int8).to(self.cfg.traintpl_cfg['device'])
+        idx = df_interaction[df_interaction['label:float'] == 1][['stu_id:token','exer_id:token']].to_numpy()
+        interact_mat[idx[:,0], idx[:,1]] = 1  
+        idx = df_interaction[df_interaction['label:float'] != 1][['stu_id:token','exer_id:token']].to_numpy()
+        interact_mat[idx[:,0], idx[:,1]] = -1 
+
+        interact_mat = interact_mat.T
+
+        sim_mat = torch.zeros((self.item_count, self.item_count))
+        missing_iids = np.array(list(set(np.arange(self.item_count)) - set(preserved_exers)))
+        for iid in tqdm(missing_iids, desc="[FILL_Q_MAT] compute sim_mat", ncols=self.cfg.frame_cfg['TQDM_NCOLS']):
+            temp = interact_mat[iid] != 0
+            same_mat =  interact_mat[iid] == interact_mat
+            bool_mat = (temp) & (interact_mat != 0) 
+            same_mat[~bool_mat] = False
+            sim_mat[iid] = same_mat.sum(dim=1) / (temp).sum()
+            sim_mat[iid, bool_mat.sum(dim=1) == 0] = 0.0
+            sim_mat[iid, iid] = -1.0
+            sim_mat[iid, missing_iids] = -1.0
+
+        assert torch.isnan(sim_mat).sum() == 0
+
+        _, topk_mat_idx = torch.topk(sim_mat, dim=1, k=params_topk, largest=True, sorted=True)
+        topk_mat_idx = tensor2npy(topk_mat_idx)
+
+        index_df_Q = df_Q_left.set_index('exer_id:token')
+        missing_iid_fill_cpts = {}
+        for iid in tqdm(missing_iids, desc="[FILL_Q_MAT] fill process", ncols=self.cfg.frame_cfg['TQDM_NCOLS']):
+            count_dict = dict(zip(*np.unique(
+                list(chain(*[index_df_Q.loc[iid2]['cpt_seq:token_seq'] for iid2 in topk_mat_idx[iid] if iid2 in preserved_exers])),
+                return_counts=True,
+            )))
+            count_dict = sorted(count_dict.items(), key=lambda x: x[1], reverse=True)
+            missing_iid_fill_cpts[iid] = [i[0] for i in count_dict[0:params_votek]]
+
+        missing_fill_df_Q = pd.DataFrame(
+            {'exer_id:token': list(missing_iid_fill_cpts.keys()),'cpt_seq:token_seq':list(missing_iid_fill_cpts.values())}
+        )
+        final_df_Q = pd.concat([df_Q_left, missing_fill_df_Q], axis=0, ignore_index=True)
+
+        hit_ratio = 0
+        t_Q = self.df_Q.set_index('exer_id:token')
+        for iid in missing_iid_fill_cpts:
+            if len(set(t_Q.loc[iid]['cpt_seq:token_seq']) & set(missing_iid_fill_cpts[iid])) > 0:
+                hit_ratio += 1
+        hit_ratio = hit_ratio / len(missing_iid_fill_cpts)
+
+        self.logger.info(f"[FILL_Q] Hit_ratio={hit_ratio}")
+
+        return final_df_Q
diff --git a/edustudio/atom_op/mid2cache/common/filtering_records_by_attr.py b/edustudio/atom_op/mid2cache/common/filtering_records_by_attr.py
@@ -0,0 +1,30 @@
+from .base_mid2cache import BaseMid2Cache
+import pandas as pd
+import numpy as np
+from itertools import chain
+
+
+class M2C_FilteringRecordsByAttr(BaseMid2Cache):
+    """Commonly used by Fair Models, and Filtering Students without attribute values
+    """
+    default_cfg = {
+        'filter_stu_attrs': ['gender:token']
+    }
+
+    def process(self, **kwargs):
+        df_stu = kwargs['df_stu']
+        df = kwargs['df']
+        df_stu = df_stu[df_stu[self.m2c_cfg['filter_stu_attrs']].notna().all(axis=1)].reset_index(drop=True)
+        df = df[df['stu_id:token'].isin(df_stu['stu_id:token'])].reset_index(drop=True)
+
+        kwargs['df'] = df
+        kwargs['df_stu'] = df_stu
+
+        return kwargs
+
+
+
+
+
+
+
diff --git a/edustudio/atom_op/mid2cache/single/M2C_CL4KT_OP.py b/edustudio/atom_op/mid2cache/single/M2C_CL4KT_OP.py
@@ -27,8 +27,8 @@ def process(self, **kwargs):
     def compute_cpt2difflevel(self, **kwargs):
         cpt_correct = defaultdict(int)
         cpt_count = defaultdict(int)
-        for i, (c_list, r_list) in enumerate(zip(kwargs['df_train_folds'][0]['cpt_unfold_seq:token_seq'], kwargs['df_train_folds'][0]['label_seq:float_seq'])):
-            for c, r in zip(c_list[kwargs['df_train_folds'][0]['mask_seq:token_seq'][i] == 1], r_list[kwargs['df_train_folds'][0]['mask_seq:token_seq'][i] == 1]):
+        for i, (c_list, r_list) in enumerate(zip(kwargs['df_seq']['cpt_unfold_seq:token_seq'], kwargs['df_seq']['label_seq:float_seq'])):
+            for c, r in zip(c_list[kwargs['df_seq']['mask_seq:token_seq'][i] == 1], r_list[kwargs['df_seq']['mask_seq:token_seq'][i] == 1]):
                 cpt_correct[c] += r
                 cpt_count[c] += 1
         cpt_diff = {c: cpt_correct[c] / float(cpt_count[c]) for c in cpt_correct}  # cpt difficult

diff --git a/edustudio/atom_op/mid2cache/single/M2C_QDKT_OP.py b/edustudio/atom_op/mid2cache/single/M2C_QDKT_OP.py
@@ -1,5 +1,7 @@
 import networkx as nx
 from ..common import BaseMid2Cache
+import torch
+from torch.nn import functional as F
 import numpy as np
 
 
@@ -12,11 +14,19 @@ def process(self, **kwargs):
         self.num_q = dt_info['exer_count']
         self.num_c = dt_info['cpt_count']
         self.Q_mat = kwargs['Q_mat']
-        graph = self.generate_graph()
-        laplacian_matrix = self.laplacian_matrix(graph)
+        laplacian_matrix = self.laplacian_matrix_by_vectorization()
         kwargs['laplacian_matrix'] = laplacian_matrix
         return kwargs
 
+    def laplacian_matrix_by_vectorization(self):
+        normQ = F.normalize(self.Q_mat.float(), p=2, dim=-1)
+        A = torch.mm(normQ, normQ.T) > (1 - 1/len(normQ))
+        A = A.int()  #Adjacency matrix
+        D = A.sum(-1, dtype=torch.int32)
+        diag_idx = [range(len(A)), range(len(A))]
+        A[diag_idx] = D - A[diag_idx]
+        return A
+
     def generate_graph(self):
 
         graph = nx.Graph()

diff --git a/edustudio/atom_op/raw2mid/__init__.py b/edustudio/atom_op/raw2mid/__init__.py
@@ -15,7 +15,8 @@
 from .nips12 import R2M_Eedi_20_T12
 from .nips34 import R2M_Eedi_20_T34
 from .simulated5 import R2M_Simulated5
-
+from .slp_english import R2M_SLP_English
+from .slp_math import R2M_SLP_Math
 
 # look up api dict
 _cli_api_dict_ = {}
@@ -35,3 +36,5 @@
 _cli_api_dict_['R2M_Eedi_20_T12'] = R2M_Eedi_20_T12.from_cli
 _cli_api_dict_['R2M_Eedi_20_T34'] = R2M_Eedi_20_T34.from_cli
 _cli_api_dict_['R2M_Simulated5'] = R2M_Simulated5.from_cli
+_cli_api_dict_['R2M_SLP_Math'] = R2M_SLP_Math.from_cli
+_cli_api_dict_['R2M_SLP_English'] = R2M_SLP_English.from_cli
diff --git a/edustudio/atom_op/raw2mid/nips12.py b/edustudio/atom_op/raw2mid/nips12.py
@@ -8,7 +8,7 @@
 
 
 class R2M_Eedi_20_T12(BaseRaw2Mid):
-    """R2M_NIPS12 is to preprocess NIPS 2020 challenge Task 1&2 dataset"""
+    """R2M_Eedi_20_T12 is to preprocess NIPS 2020 challenge Task 1&2 dataset"""
     def process(self):
         super().process()
         # 读入数据 查看

diff --git a/edustudio/atom_op/raw2mid/slp_english.py b/edustudio/atom_op/raw2mid/slp_english.py
@@ -0,0 +1,91 @@
+from edustudio.atom_op.raw2mid import BaseRaw2Mid
+import pandas as pd
+import numpy as np
+import time
+
+"""
+    SLP Dataset: https://aic-fe.bnu.edu.cn/en/data/index.html
+"""
+
+
+class R2M_SLP_English(BaseRaw2Mid):
+    """
+        rawdata: https://aic-fe.bnu.edu.cn/en/data/index.html
+    """
+    def process(self):
+        super().process()
+
+        # for stu
+        df_stu = pd.read_csv(f"{self.rawpath}/student.csv")
+        df_stu.dropna(subset=['school_id'], inplace=True, how='any', axis=0)
+        df_stu = df_stu[df_stu['school_id'] != 'n.a.']
+
+        df_stu = df_stu.merge(
+            pd.read_csv(f"{self.rawpath}/family.csv", index_col=False),
+            on=['student_id'], how='inner'
+        )
+
+        df_stu = df_stu.merge(
+                pd.read_csv(f"{self.rawpath}/school.csv"),
+                on=['school_id'], how='inner'
+        )
+
+        df_stu.drop([
+            'rate_of_higher_educated_teachers',
+            "rate_of_teachers_with_master's_degree_and_above"
+        ], inplace=True, axis=1)
+        df_stu.rename(columns={
+            'student_id': 'stu_id:token', 'gender': 'gender:token', 
+            'school_id': 'sch_id:token', 'class_id': 'class_id:token', 
+            'age_father': 'age_father:float', 'age_mother': 'age_mother:token',
+            'edubg_father': 'edubg_father:token', 'edubg_mother':'edubg_mother:token', 
+            'affiliation_father':'affiliation_father:token', 
+            'affiliation_mother': 'affiliation_mother:token',
+            'family_income': 'family_income:token', 'is_only_child':'is_only_child:token',
+            'live_on_campus': 'live_on_campus:token', 
+            'gathering_frequency_father':'gathering_frequency_father:token',
+            'gathering_frequency_mother':'gathering_frequency_mother:token', 
+            'family_traveling_times': "family_traveling_times:token", 
+            'school_type': 'school_type:token',
+            'dist_to_downtown': 'dist_to_downtown:float',
+            #'rate_of_higher_educated_teachers': 'rate_of_higher_educated_teachers:float',
+            #"rate_of_teachers_with_master's_degree_and_above": "rate_of_teachers_with_master's_degree_and_above:float",
+        }, inplace=True)
+
+        # for inter
+        df_inter = pd.read_csv(f"{self.rawpath}/term-eng.csv", index_col=False, low_memory=False)
+        df_inter = df_inter[(df_inter == 'n.a.').sum(axis=1) == 0].reset_index(drop=True)
+        df_inter = df_inter[df_inter['concept'] != 'n.a.']
+        df_inter['label'] = df_inter['score']/df_inter['full_score'].astype(float)
+
+        df_exer = df_inter[['question_id', 'exam_id', 'subject_abbr', 'concept']]
+        df_inter = df_inter[['student_id', 'question_id', 'score', 'full_score', 'time_access', 'label']]
+        df_exer.drop_duplicates(subset=['question_id'], inplace=True)
+        df_exer['concept'] = df_exer['concept'].apply(lambda x: x.split(";"))
+        df_inter['time_access'] = df_inter['time_access'].apply(lambda x: self.convert2timestamp(x))
+
+        df_inter.rename(columns={
+            'student_id': 'stu_id:token', 'question_id': 'exer_id:token',
+            'score': 'score:float', 'full_score':'full_score:float', 
+            'time_access': 'start_timestamp:float', 'label':'label:float'
+        }, inplace=True)
+
+        df_exer.rename(columns={
+             'question_id': 'exer_id:token',
+             'exam_id': 'exam_id:token', 
+             'subject_abbr': 'subject_abbr:token',
+             'concept': 'cpt_seq:token_seq'
+        }, inplace=True)
+
+        df_inter['order_id:token'] = df_inter['start_timestamp:float'].astype(int)
+
+        # save
+        df_inter.to_csv(f"{self.midpath}/{self.dt}.inter.csv", index=False, encoding='utf-8')
+        df_stu.to_csv(f"{self.midpath}/{self.dt}.stu.csv", index=False, encoding='utf-8')
+        df_exer.to_csv(f"{self.midpath}/{self.dt}.exer.csv", index=False, encoding='utf-8')
+
+    @staticmethod
+    def convert2timestamp(dt):
+        timeArray = time.strptime(dt, "%Y-%m-%d %H:%M:%S")
+        timestamp = time.mktime(timeArray)
+        return timestamp
diff --git a/edustudio/atom_op/raw2mid/slp_math.py b/edustudio/atom_op/raw2mid/slp_math.py
@@ -0,0 +1,86 @@
+from edustudio.atom_op.raw2mid import BaseRaw2Mid
+import pandas as pd
+import numpy as np
+import time
+
+"""
+    SLP Dataset: https://aic-fe.bnu.edu.cn/en/data/index.html
+"""
+
+class R2M_SLP_Math(BaseRaw2Mid):
+    def process(self):
+        super().process()
+
+        # for stu
+        df_stu = pd.read_csv(f"{self.rawpath}/student.csv")
+        df_stu.dropna(subset=['school_id'], inplace=True, how='any', axis=0)
+        df_stu = df_stu[df_stu['school_id'] != 'n.a.']
+
+        df_stu = df_stu.merge(
+            pd.read_csv(f"{self.rawpath}/family.csv", index_col=False),
+            on=['student_id'], how='inner'
+        )
+
+        df_stu = df_stu.merge(
+                pd.read_csv(f"{self.rawpath}/school.csv"),
+                on=['school_id'], how='inner'
+        )
+
+        df_stu.drop([
+            'rate_of_higher_educated_teachers',
+            "rate_of_teachers_with_master's_degree_and_above"
+        ], inplace=True, axis=1)
+        df_stu.rename(columns={
+            'student_id': 'stu_id:token', 'gender': 'gender:token', 
+            'school_id': 'sch_id:token', 'class_id': 'class_id:token', 
+            'age_father': 'age_father:float', 'age_mother': 'age_mother:token',
+            'edubg_father': 'edubg_father:token', 'edubg_mother':'edubg_mother:token', 
+            'affiliation_father':'affiliation_father:token', 
+            'affiliation_mother': 'affiliation_mother:token',
+            'family_income': 'family_income:token', 'is_only_child':'is_only_child:token',
+            'live_on_campus': 'live_on_campus:token', 
+            'gathering_frequency_father':'gathering_frequency_father:token',
+            'gathering_frequency_mother':'gathering_frequency_mother:token', 
+            'family_traveling_times': "family_traveling_times:token", 
+            'school_type': 'school_type:token',
+            'dist_to_downtown': 'dist_to_downtown:float',
+            #'rate_of_higher_educated_teachers': 'rate_of_higher_educated_teachers:float',
+            #"rate_of_teachers_with_master's_degree_and_above": "rate_of_teachers_with_master's_degree_and_above:float",
+        }, inplace=True)
+
+        # for inter
+        df_inter = pd.read_csv(f"{self.rawpath}/term-mat.csv", index_col=False)
+        df_inter = df_inter[df_inter['concept'] != 'n.a.']
+        df_inter['label'] = df_inter['score']/df_inter['full_score']
+
+        df_exer = df_inter[['question_id', 'exam_id', 'subject_abbr', 'concept']]
+        df_inter = df_inter[['student_id', 'question_id', 'score', 'full_score', 'time_access', 'label']]
+        df_exer.drop_duplicates(subset=['question_id'], inplace=True)
+        df_exer['concept'] = df_exer['concept'].apply(lambda x: x.split(";"))
+        df_inter['time_access'] = df_inter['time_access'].apply(lambda x: self.convert2timestamp(x))
+
+        df_inter.rename(columns={
+            'student_id': 'stu_id:token', 'question_id': 'exer_id:token',
+            'score': 'score:float', 'full_score':'full_score:float', 
+            'time_access': 'start_timestamp:float', 'label':'label:float'
+        }, inplace=True)
+
+        df_exer.rename(columns={
+             'question_id': 'exer_id:token',
+             'exam_id': 'exam_id:token', 
+             'subject_abbr': 'subject_abbr:token',
+             'concept': 'cpt_seq:token_seq'
+        }, inplace=True)
+
+        df_inter['order_id:token'] = df_inter['start_timestamp:float'].astype(int)
+
+        # save
+        df_inter.to_csv(f"{self.midpath}/{self.dt}.inter.csv", index=False, encoding='utf-8')
+        df_stu.to_csv(f"{self.midpath}/{self.dt}.stu.csv", index=False, encoding='utf-8')
+        df_exer.to_csv(f"{self.midpath}/{self.dt}.exer.csv", index=False, encoding='utf-8')
+
+    @staticmethod
+    def convert2timestamp(dt):
+        timeArray = time.strptime(dt, "%Y-%m-%d %H:%M:%S")
+        timestamp = time.mktime(timeArray)
+        return timestamp
diff --git a/edustudio/datatpl/CD/DCDDataTPL.py b/edustudio/datatpl/CD/DCDDataTPL.py
@@ -0,0 +1,70 @@
+import os
+from ..common.edu_datatpl import EduDataTPL
+import json
+from edustudio.datatpl.common.general_datatpl import DataTPLStatus
+import torch
+
+
+class DCDDataTPL(EduDataTPL):
+    default_cfg = {
+        'n_folds': 5,
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_FilterRecords4CD', 'M2C_ReMapId', 'M2C_RandomDataSplit4CD', 'M2C_GenQMat', 'M2C_BuildMissingQ', 'M2C_FillMissingQ'],
+        'cpt_relation_file_name': 'cpt_relation',
+    }
+
+    def __init__(self, cfg, df, df_train=None, df_valid=None, df_test=None, dict_cpt_relation=None, status=DataTPLStatus(), df_stu=None, df_exer=None):
+        self.dict_cpt_relation = dict_cpt_relation
+        super().__init__(cfg, df, df_train, df_valid, df_test, df_stu, df_exer, status)
+
+    def _check_param(self):
+        super()._check_params()
+        assert 0 <= self.datatpl_cfg['Q_delete_ratio'] < 1
+
+    @property
+    def common_str2df(self):
+        dic = super().common_str2df
+        dic['dict_cpt_relation'] = self.dict_cpt_relation
+        return dic
+
+
+    def process_data(self):
+        super().process_data()
+        dt_info = self.final_kwargs['dt_info']
+        user_count = dt_info['stu_count']
+        item_count = dt_info['exer_count']
+        self.interact_mat_list = []
+        for interact_df in self.final_kwargs['df_train_folds']:
+            interact_mat = torch.zeros((user_count, item_count), dtype=torch.int8)
+            idx = interact_df[interact_df['label:float'] == 1][['stu_id:token','exer_id:token']].to_numpy()
+            interact_mat[idx[:,0], idx[:,1]] = 1  
+            idx = interact_df[interact_df['label:float'] != 1][['stu_id:token','exer_id:token']].to_numpy()
+            interact_mat[idx[:,0], idx[:,1]] = -1
+            self.interact_mat_list.append(interact_mat)
+
+        self.final_kwargs['interact_mat_list'] = self.interact_mat_list
+
+        if self.final_kwargs['dict_cpt_relation'] is None:
+            self.final_kwargs['dict_cpt_relation'] = {i: [i] for i in range(self.final_kwargs['dt_info']['cpt_count'])}
+
+    @classmethod
+    def load_data(cls, cfg):
+        kwargs = super().load_data(cfg)
+        fph = f"{cfg.frame_cfg.data_folder_path}/middata/{cfg.datatpl_cfg['cpt_relation_file_name']}.json"
+        if os.path.exists(fph):
+            with open(fph, 'r', encoding='utf-8') as f:
+                kwargs['dict_cpt_relation'] = json.load(f)
+        else:
+            cfg.logger.warning("without cpt_relation.json")
+            kwargs['dict_cpt_relation'] = None
+        return kwargs
+
+    def get_extra_data(self):
+        extra_dict = super().get_extra_data()
+        extra_dict['filling_Q_mat'] = self.filling_Q_mat
+        extra_dict['interact_mat'] = self.interact_mat
+        return extra_dict
+
+    def set_info_for_fold(self, fold_id):
+        super().set_info_for_fold(fold_id)
+        self.filling_Q_mat = self.final_kwargs['filling_Q_mat_list'][fold_id]
+        self.interact_mat = self.final_kwargs['interact_mat_list'][fold_id]
diff --git a/edustudio/datatpl/CD/FAIRDataTPL.py b/edustudio/datatpl/CD/FAIRDataTPL.py
@@ -0,0 +1,7 @@
+from ..common import EduDataTPL
+
+class FAIRDataTPL(EduDataTPL):
+    default_cfg = {
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_FilteringRecordsByAttr', 'M2C_FilterRecords4CD', 'M2C_ReMapId', 'M2C_RandomDataSplit4CD', 'M2C_GenQMat'],
+    }
+
diff --git a/edustudio/datatpl/CD/RCDDataTPL.py b/edustudio/datatpl/CD/RCDDataTPL.py
@@ -7,10 +7,10 @@ class RCDDataTPL(EduDataTPL):
     default_cfg = {
         'mid2cache_op_seq': [
             'M2C_Label2Int', 'M2C_FilterRecords4CD', 'M2C_ReMapId', 
-            'M2C_RandomDataSplit4CD', 'M2C_BuildCptRelation', 
+            'M2C_RandomDataSplit4CD', 'M2C_BuildKCRelation', 
             'M2C_GenQMat', 'M2C_RCD_OP'
         ],
-        'M2C_BuildCptRelation': {
+        'M2C_BuildKCRelation': {
             'relation_type': 'rcd_transition',
             'threshold': None
         }

diff --git a/edustudio/datatpl/CD/__init__.py b/edustudio/datatpl/CD/__init__.py
@@ -7,4 +7,6 @@
 from .CNCDQDataTPL import CNCDQDataTPL
 from .RCDDataTPL import RCDDataTPL
 from .CDGKDataTPL import CDGKDataTPL
-from.ECDDataTPL import ECDDataTPL
+from .ECDDataTPL import ECDDataTPL
+from .DCDDataTPL import DCDDataTPL
+from .FAIRDataTPL import FAIRDataTPL
diff --git a/edustudio/datatpl/KT/CL4KTDataTPL.py b/edustudio/datatpl/KT/CL4KTDataTPL.py
@@ -4,7 +4,7 @@
 
 class CL4KTDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_GenUnFoldCptSeq', 'M2C_CL4KT_OP'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_GenUnFoldKCSeq', 'M2C_CL4KT_OP', 'M2C_RandomDataSplit4KT'],
         'M2C_CL4KT_OP': {
             'sequence_truncation': 'recent',
         }

diff --git a/edustudio/datatpl/KT/DIMKTDataTPL.py b/edustudio/datatpl/KT/DIMKTDataTPL.py
@@ -6,7 +6,7 @@
 
 class DIMKTDataTPL(KTInterExtendsQDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_GenUnFoldCptSeq', 'M2C_BuildSeqInterFeats', 'M2C_GenCptSeq', "M2C_DIMKT_OP"],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_GenUnFoldKCSeq', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT', 'M2C_GenKCSeq', "M2C_DIMKT_OP"],
         'M2C_BuildSeqInterFeats': {
             # 'window_size': 200,
             "extra_inter_feats": ['start_timestamp:float', 'cpt_unfold:token']

diff --git a/edustudio/datatpl/KT/DKTDSCDataTPL.py b/edustudio/datatpl/KT/DKTDSCDataTPL.py
@@ -6,7 +6,7 @@
 
 class DKTDSCDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ["M2C_CptAsExer", 'M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', "M2C_DKTDSC_OP"],
+        'mid2cache_op_seq': ["M2C_KCAsExer", 'M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats','M2C_RandomDataSplit4KT', "M2C_DKTDSC_OP"],
     }
 
     def __getitem__(self, index):

diff --git a/edustudio/datatpl/KT/DKTForgetDataTPL.py b/edustudio/datatpl/KT/DKTForgetDataTPL.py
@@ -3,7 +3,7 @@
 
 class DKTForgetDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', "M2C_DKTForget_OP"],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats','M2C_RandomDataSplit4KT', "M2C_DKTForget_OP"],
         'M2C_BuildSeqInterFeats': {
             "extra_inter_feats": ['start_timestamp:float']
         }

diff --git a/edustudio/datatpl/KT/EERNNDataTPL.py b/edustudio/datatpl/KT/EERNNDataTPL.py
@@ -5,7 +5,7 @@
 
 class EERNNDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_EERNN_OP'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats','M2C_RandomDataSplit4KT', 'M2C_EERNN_OP'],
     }
 
     def get_extra_data(self, **kwargs):

diff --git a/edustudio/datatpl/KT/EKTDataTPL.py b/edustudio/datatpl/KT/EKTDataTPL.py
@@ -5,7 +5,7 @@
 
 class EKTDataTPL(EERNNDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_GenCptSeq', 'M2C_EERNN_OP'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT', 'M2C_GenKCSeq', 'M2C_EERNN_OP'],
     }
 
     def __getitem__(self, index):

diff --git a/edustudio/datatpl/KT/GKTDataTPL.py b/edustudio/datatpl/KT/GKTDataTPL.py
@@ -5,7 +5,7 @@
 
 class GKTDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ["M2C_CptAsExer", 'M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats'],
+        'mid2cache_op_seq': ["M2C_KCAsExer", 'M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT'],
     }
 
     def process_load_data_from_middata(self):

diff --git a/edustudio/datatpl/KT/KTInterCptAsExerDataTPL.py b/edustudio/datatpl/KT/KTInterCptAsExerDataTPL.py
@@ -2,6 +2,6 @@
 
 class KTInterCptAsExerDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ["M2C_CptAsExer", 'M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats'],
+        'mid2cache_op_seq': ["M2C_KCAsExer", 'M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT'],
     }
 
diff --git a/edustudio/datatpl/KT/KTInterCptUnfoldDataTPL.py b/edustudio/datatpl/KT/KTInterCptUnfoldDataTPL.py
@@ -4,7 +4,7 @@
 
 class KTInterCptUnfoldDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_GenUnFoldCptSeq', 'M2C_BuildSeqInterFeats'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_GenUnFoldKCSeq', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT'],
         'M2C_BuildSeqInterFeats': {
             "extra_inter_feats": ['start_timestamp:float', 'cpt_unfold:token']
         }

diff --git a/edustudio/datatpl/KT/KTInterDataTPL.py b/edustudio/datatpl/KT/KTInterDataTPL.py
@@ -2,6 +2,6 @@
 
 class KTInterDataTPL(GeneralDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT'],
     }
 
diff --git a/edustudio/datatpl/KT/KTInterExtendsQDataTPL.py b/edustudio/datatpl/KT/KTInterExtendsQDataTPL.py
@@ -4,7 +4,7 @@
 
 class KTInterExtendsQDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_GenCptSeq'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT', 'M2C_GenKCSeq'],
     }
 
     def __getitem__(self, index):

diff --git a/edustudio/datatpl/KT/LPKTDataTPL.py b/edustudio/datatpl/KT/LPKTDataTPL.py
@@ -3,7 +3,7 @@
 
 class LPKTDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_LPKT_OP', "M2C_GenQMat"],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT', 'M2C_LPKT_OP', "M2C_GenQMat"],
         'M2C_BuildSeqInterFeats': {
             "extra_inter_feats": ['start_timestamp:float', 'answer_time:float']
         }

diff --git a/edustudio/datatpl/KT/QDKTDataTPL.py b/edustudio/datatpl/KT/QDKTDataTPL.py
@@ -7,7 +7,7 @@
 
 class QDKTDataTPL(KTInterExtendsQDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_GenCptSeq','M2C_GenQMat','M2C_QDKT_OP'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT', 'M2C_GenKCSeq','M2C_GenQMat','M2C_QDKT_OP'],
     }
 
     def get_extra_data(self, **kwargs):

diff --git a/edustudio/datatpl/KT/RKTDataTPL.py b/edustudio/datatpl/KT/RKTDataTPL.py
@@ -7,7 +7,7 @@
 
 class RKTDataTPL(EduDataTPL):
     default_cfg = {
-        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId','M2C_GenQMat', 'M2C_BuildSeqInterFeats'],
+        'mid2cache_op_seq': ['M2C_Label2Int', 'M2C_ReMapId','M2C_GenQMat', 'M2C_BuildSeqInterFeats', 'M2C_RandomDataSplit4KT'],
         'M2C_BuildSeqInterFeats': {
             "extra_inter_feats": ['start_timestamp:float']
         }

diff --git a/edustudio/datatpl/common/base_datatpl.py b/edustudio/datatpl/common/base_datatpl.py
@@ -6,6 +6,7 @@
 import yaml
 import re
 import os
+import requests
 
 
 class BaseDataTPL(Dataset):
@@ -73,15 +74,26 @@ def download_dataset(cls, cfg):
             cfg (UnifyConfig):the global config object
         """
         dt_name = cfg.dataset
-        cfg.logger.warning(f"Can't find dataset files of {dt_name} in local environment!")
-        cfg.logger.info(f"Prepare to download {dt_name} from Internet.")
+        cfg.logger.warning(f"Can't find dataset files of {dt_name} in local disk")
+
         fph = cfg.frame_cfg['DT_INFO_FILE_PATH']
         dataset_info = cls.read_yml_file(fph)
         dataset_info_from_cfg: dict = cfg['frame_cfg']['DT_INFO_DICT']
-        dataset_info_from_cfg.update(dataset_info)
+        dataset_info.update(dataset_info_from_cfg)
 
         if dt_name not in dataset_info:
-            raise Exception("Can't find dataset files from Local and Internet!")
+            cfg.logger.info(f"Prepare download external datasets.yaml to find dataset:{dt_name}")
+            url = "https://huggingface.co/datasets/lmcRS/edustudio-datasets/raw/main/datasets.yaml"
+            cfg.logger.info(f"Eexternal datasets.yaml url: {url}")
+            resp = requests.get(url)
+            dataset_info_external = yaml.load(resp.text, Loader=cls._build_yaml_loader())
+            if dt_name not in dataset_info_external:
+                raise Exception("Can't find dataset files from local disk and online")
+            else:
+                dataset_info.update(dataset_info_external)
+
+        cfg.logger.info(f"Prepare to download {dt_name} dataset from online")
+        cfg.logger.info(f"Download_url: {dataset_info[dt_name]['middata_url']}")
 
         if not os.path.exists(cfg.frame_cfg.data_folder_path):
             os.makedirs(cfg.frame_cfg.data_folder_path)

diff --git a/edustudio/datatpl/common/general_datatpl.py b/edustudio/datatpl/common/general_datatpl.py
@@ -36,7 +36,7 @@ class GeneralDataTPL(BaseDataTPL):
         'cache_id': 'cache_default',
         'load_data_from': 'middata', # ['rawdata', 'middata', 'cachedata']
         'inter_exclude_feat_names': (),
-        'raw2mid_op': None, 
+        'raw2mid_op': "None", 
         'mid2cache_op_seq': []
     }
 
@@ -70,6 +70,7 @@ def __init__(
         if self.datatpl_cfg['load_data_from'] == 'cachedata':
             self.load_cache()
             self.check_cache()
+            self.process_data()
             self.logger.info(f"Load from cache successfully: {self.datatpl_cfg['cache_id']}")
             self.logger.info(self.datatpl_cfg['dt_info'])
         else:
@@ -90,8 +91,7 @@ def from_cfg(cls, cfg):
         Returns:
            BaseDataTPL
         """
-        if not os.path.exists(f'{cfg.frame_cfg.data_folder_path}'):
-            print(cfg.frame_cfg.data_folder_path)
+        if not os.path.exists(cfg.frame_cfg.data_folder_path) or len(os.listdir(cfg.frame_cfg.data_folder_path)) == 0:
             cls.download_dataset(cfg)
 
         load_data_from = cfg.datatpl_cfg['load_data_from']
@@ -141,8 +141,6 @@ def process_data(self):
         load_data_from = self.datatpl_cfg['load_data_from']
         if load_data_from != 'cachedata':
             self.process_load_data_from_middata()
-        else:
-            raise ValueError(f"load_data_from={load_data_from} is not expected to appear here")
 
     @classmethod
     def load_data(cls, cfg): # 只在middata存在时调用
@@ -240,7 +238,7 @@ def save_cache(self):
         self.save_pickle(final_kwargs_fph, self.final_kwargs)
 
         with open(f"{self.cache_folder_path}/datatpl_cfg.json", 'w', encoding='utf-8') as f:
-            json.dump(json.loads(self.datatpl_cfg.dump_tpl()), fp=f, indent=2, ensure_ascii=False)
+            json.dump(json.loads(self.datatpl_cfg.dump_fmt()), fp=f, indent=2, ensure_ascii=False)
 
     def check_cache(self):
         """check whether the cache data is consistent with current config
@@ -251,11 +249,15 @@ def check_cache(self):
         temp_cache_datatpl_cfg = copy.deepcopy(cache_datatpl_cfg)
         del temp_cache_datatpl_cfg['dt_info']
         del temp_cache_datatpl_cfg['load_data_from']
+        if 'is_save_cache' in temp_cache_datatpl_cfg:
+            del temp_cache_datatpl_cfg['is_save_cache']
         # del temp_cache_datatpl_cfg['raw2mid_op']
         # del temp_cache_datatpl_cfg['mid2cache_op_seq']
-        curr_datatpl_cfg = copy.deepcopy(json.loads(self.datatpl_cfg.dump_tpl()))
+        curr_datatpl_cfg = copy.deepcopy(json.loads(self.datatpl_cfg.dump_fmt()))
         del curr_datatpl_cfg['dt_info']
         del curr_datatpl_cfg['load_data_from']
+        if 'is_save_cache' in curr_datatpl_cfg:
+            del curr_datatpl_cfg['is_save_cache']
         # del curr_datatpl_cfg['raw2mid_op']
         # del curr_datatpl_cfg['mid2cache_op_seq']
         diff = DeepDiff(temp_cache_datatpl_cfg, curr_datatpl_cfg)
@@ -283,6 +285,13 @@ def load_cache(self):
         self.dict_test_folds = self.load_pickle(test_folds_fph)
         self.final_kwargs = self.load_pickle(final_kwargs_fph)
 
+        for k,v in self.final_kwargs.items():
+            if not hasattr(self, k):
+                setattr(self, k, v)
+                self.logger.info(f"[load cache] set {k} from final_kwargs to current data template")
+            else:
+                self.logger.info(f"[load cache] duplicated attribute in final_kwargs: {k}")
+
     def build_datasets(self):
         """build datasets
         """
@@ -457,7 +466,7 @@ def _get_r2m_op(cls, cfg):
         """
         from edustudio.atom_op.raw2mid import BaseRaw2Mid
         r2m_op = cfg.datatpl_cfg['raw2mid_op']
-        assert r2m_op is not None
+        assert r2m_op is not None or r2m_op != "None"
         if isinstance(r2m_op, str):
             r2m_op = importlib.import_module('edustudio.atom_op.raw2mid').__getattribute__(r2m_op)
         elif issubclass(r2m_op, BaseRaw2Mid):
@@ -542,16 +551,25 @@ def _preprocess_feat(df):
         for col in df.columns:
             col_name, col_type = col.split(":")
             if col_type == 'token':
-                df[col] = df[col].astype('int64')
+                try:
+                    df[col] = df[col].astype('int64')
+                except:
+                    pass
             elif col_type == 'float':
                 df[col] = df[col].astype('float32')
             elif col_type == 'token_seq':
-                df[col] = df[col].astype(str).apply(lambda x: [int(i) for i in x.split(",")])
+                try:
+                    df[col] = df[col].astype(str).apply(lambda x: [int(i) for i in x.split(",")])
+                except:
+                    df[col] = df[col].astype(str).apply(lambda x: eval(x))
             elif col_type == 'float_seq':
-                df[col] = df[col].astype(str).apply(lambda x: [float(i) for i in x.split(",")])
+                try:
+                    df[col] = df[col].astype(str).apply(lambda x: [float(i) for i in x.split(",")])
+                except:
+                    df[col] = df[col].astype(str).apply(lambda x: eval(x))
             else:
-                raise ValueError(f"unknown field type of {col_type}")
-
+                pass
+            
     @staticmethod
     def _unwrap_feat(df:pd.DataFrame):
         """unwrap the type of field

diff --git a/edustudio/datatpl/utils/common.py b/edustudio/datatpl/utils/common.py
@@ -7,7 +7,7 @@
 class BigfileDownloader(object):
     @staticmethod
     def download(url, title, filepath, chunk_size=10240):
-        with closing(requests.get(url, stream=True)) as resp:
+        with closing(requests.get(url, stream=True, allow_redirects=True)) as resp:
             if resp.status_code != 200:
                 raise Exception("[ERROR]: {} - {} -{}".format(str(resp.status_code), title, url))
             chunk_size = chunk_size

diff --git a/edustudio/datatpl/utils/pad_seq_util.py b/edustudio/datatpl/utils/pad_seq_util.py
@@ -53,10 +53,15 @@ def pad_sequence(
 
         if return_idx:
             return_idx = np.concatenate(return_idx_list).astype(np.int64)
-
-        is_dtype_str = np.issubdtype(dtype, np.str_) or np.issubdtype(
-            dtype, np.unicode_
-        )
+
+        version = np.__version__
+
+        if version.startswith('2.'):
+            is_dtype_str = np.issubdtype(dtype, np.str_)
+        else:
+            is_dtype_str = np.issubdtype(dtype, np.str_) or np.issubdtype(
+                dtype, np.unicode_
+            )
         if isinstance(value, str) and dtype != object and not is_dtype_str:
             raise ValueError(
                 f"`dtype` {dtype} is not compatible with `value`'s type: "
-Original file line number
+Diff line change
@@ @@ -104,5 +104,3 @@ def set_dt_info(self, dt_info, **kwargs): @@
                                     dt_info['cpt_count'] = max(dt_info.get('cpt_count', -1), df[col].max() + 1)
                                 else:
                                     dt_info['cpt_count'] = max(dt_info.get('cpt_count', -1), np.max(list(chain(*df[col].to_list()))) + 1)
-                a = 1