Converts individual patient data into time interval feature vectors, suitable for filtering and concatenation into a data matrix D for binary classification machine learning tasks.
Example use case 1: I aim to compute the mean of n variables for each unique patient, resulting in a single row representing each patient.
Example use case 2: I intend to generate a monthly time series comprising patient data encompassing biochemistry, demographic details, and textual annotations (MedCat annotations) spanning the last 25 years. Each patient's data begins from a distinct start date (diagnosis date), providing a retrospective view.
- CogStack (cogstack_v8_lite) (cogstack_search_methods)
- Elasticsearch
- MedCat https://github.com/CogStack/MedCAT
See requirements.txt
- Single patient
- Batch patient
- Cohort search and creation
- Automated random controls
- Modular feature space selection
- Look back
- Look forward
- Individual patient time windows.
-
Clone the repository: cd to gloabl_files
git clone https://github.com/SamoraHunter/pat2vec.git cd pat2vec
Run the installation script:
install.bat
-
Add the
pat2vec
directory to the Python path:Before importing
pat2vec
in your Python script, add the following lines to the script, replacing/path/to/pat2vec
with the actual path to thepat2vec
directory inside your project:import sys sys.path.append('/path/to/pat2vec')
-
Import
pat2vec
in your Python script:import pat2vec
This option installs pat2vec
along with its dependencies, including:
pat2vec_env
(virtual environment)snomed_methods
cogstack_search_methods
clinical_note_splitter
Before running the installation, ensure you:
- Place the model pack in the appropriate directory gloabl_files/medcat_models/%modelpack%.zip
- Populate the credentials file under gloabl_files/credentials.py
- (Optional) Add a SNOMED file if needed gloabl_files/.. 'snomed', 'SnomedCT_InternationalRF2_PRODUCTION_20231101T120000Z', 'SnomedCT_InternationalRF2_PRODUCTION_20231101T120000Z', 'Full', 'Terminology', 'sct2_StatedRelationship_Full_INT_20231101.txt'
-
Copy the
install_pat2vec.sh
file to your installation directory. -
Grant execution permissions:
chmod +x install_pat2vec.sh
-
Run the installation using one of the following options:
- Standard installation:
./install_pat2vec.sh
- Installation with proxy mirror support:
./install_pat2vec.sh --proxy
- Install to a specific directory:
./install_pat2vec.sh --directory /path/to/install
- Skip cloning repositories (if already cloned manually):
./install_pat2vec.sh --no-clone
- Standard installation:
The script will clone the following repositories:
-
Clone the repository:
git clone https://github.com/SamoraHunter/pat2vec.git
. Run the installation script:
(Requires python3 on path and venv) chmod +x install.sh ./install.sh
cd pat2vec
-
Add the
pat2vec
directory to the Python path:Before importing
pat2vec
in your Python script, add the following lines to the script, replacing/path/to/pat2vec
with the actual path to thepat2vec
directory inside your project:import sys sys.path.append('/path/to/pat2vec')
-
Import
pat2vec
in your Python script:import pat2vec
-
Set paths, gloabl_files/medcat_models/modelpack.zip, gloabl_files/snomed_methods, gloabl_files/..
-
gloabl_files/
- medcat_models/
- modelpack.zip
- snomed_methods/snomed_methods_v1.py**
- pat2vec/
- pat2vec_projects/
- project_01/
- example_usage.ipynb
- treatment_docs.csv
- project_01/
- medcat_models/
*treatment_docs.csv should contain a column 'client_idcode' with your UUID's. **https://github.com/SamoraHunter/SNOMED_methods.git
-
Configure options
-
Run all
-
Examine example_usage.ipynb for additional functionality and use cases.
-
open example_usage.ipynb and hit run all.
-
If testing in a live environment ensure the testing flag is set to False in the config_obj.
Contributions are welcome! Please see the contributing guidelines for more information.
This project is licensed under the MIT License - see the LICENSE file for details