XpertAI harnesses the power of XAI and LLMs to uncover structure-property relationships and present them in natural language. Checkout the streamlit app!
XpertAI NLEs are specific to a given dataset/task, provide scientific explanations, and are highly accessible to non-technical experts. Currently, GPT-4 model is used to generate natural language explanations.
To facilitate the extraction of natural language explanations from raw data, XpertAI makes use of the following functionalities in the backend:
- Generating natural language XAI explanations from SHAP and LIME.
- Automatic generation of citations by reading the first page of a publication (consistent with scientific publications).
- Automatic refinement of feature labels in the raw dataset to increase human interpretability.
- Chain-of-thought prompting to generate scientific explanations.
- Go to the streamlit app
- Add your OpenAI key. Helpful resources: API-reference
- Upload your featurized raw dataset. Currently, XpertAI requires your dataset to be a
csv file
containing inputs and outputs. The input dataset should be featurized before upload eg; SMILES converted to MACCS keys. The Note: Your target labels must be in the last column of the dataset!!! XpertAI automatically selects the last column as the label column. Also make sure you use human interpretable descriptor/feature headers in your dataset. XpertAI will use these headers to extract structure-property relationships. A sample dataset for toxicity prediction task can be found atpaper/datasets
folder. - Select the surrogate model type:
Regressor
orClassifier
- Select your favorite XAI tool:
SHAP
LIME
orBoth
- Select how many features would you like in your final explanation. eg. if you select 3, XperAI will use the top 3 features from the XAI analysis to draw relationships.
- Provide literature to help XperAI make scientific explanations. You can either upload multiple publications (as PDFs) or ask XpertAI to scrape
arxiv.org
for relevant papers. In this case, you can add keywords to search arxiv and set the maximum number of references to be downloaded. Note: We have seen better performance with curated literature datasets as they are more specific. - Finally tell XpertAI what is the property you'd like explained. eg. "Solubility of small molecules". Click the
Generate Explanation
button to begin! Once the explanation is generated you can download it along with train-test error plot and the XAI plots.
This video is made faster for demonstration purposes. General run time in the streamlit app is ~3 minutes. But this may vary depending on the size of the dataframe and the literature dataset.
- Clone the XpertAI GitHub repository
git clone https://github.com/geemi725/XpertAI.git
- From the home directory, install the required packages given in reference.txt. You can do so with
pip install -r requirements.txt
, - Run the streamlit app locally with
streamlit run app.py
Wellawatte, Geemi P., and Philippe Schwaller. "Extracting human interpretable structure-property relationships in chemistry using XAI and large language models." arXiv preprint arXiv:2311.04047 (2023).
@article{wellawatte2023extracting,
title={Extracting human interpretable structure-property relationships in chemistry using XAI and large language models},
author={Wellawatte, Geemi P and Schwaller, Philippe},
journal={arXiv preprint arXiv:2311.04047},
year={2023}
}