-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
21 changed files
with
7,349 additions
and
0 deletions.
There are no files selected for viewing
591 changes: 591 additions & 0 deletions
591
notebooks/experiments/experiment_k1/claude-3-opus/exp_1.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
644 changes: 644 additions & 0 deletions
644
notebooks/experiments/experiment_k1/claude-3-opus/exp_10.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
294 changes: 294 additions & 0 deletions
294
notebooks/experiments/experiment_k1/claude-3-opus/exp_11.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,294 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"id": "8ff706fd-c267-4d05-af58-9a3848cce8ff", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from mdagent import MDAgent\n", | ||
"import os" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"id": "069a7bb4", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"claude-3-opus-20240229\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"prompt11 = \"Download the PDB files for 8PFK and 8PFQ. Then, compare the secondary structures of the two proteins, including the number of atoms, secondary structures, number of chains, etc.\"\n", | ||
"llm_var = \"claude-3-opus-20240229\"\n", | ||
"tools = \"all\"\n", | ||
"agent = MDAgent(agent_type=\"Structured\", model=llm_var, top_k_tools=tools, use_memory=False)\n", | ||
"print(llm_var)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"id": "ad2f6f66-e6b8-4244-b908-68fbebbf6e8b", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Thought: To compare the secondary structures of 8PFK and 8PFQ, I first need to download the PDB files for each protein. I can use the PDBFileDownloader tool to do this.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"PDBFileDownloader\",\n", | ||
" \"action_input\": \"8PFK\"\n", | ||
"}\n", | ||
"```\n", | ||
"\n", | ||
"PDB file found with this ID: 8PFK\n", | ||
"Thought: Now that I have downloaded the PDB file for 8PFK, I need to download the PDB file for 8PFQ as well using the PDBFileDownloader tool.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"PDBFileDownloader\",\n", | ||
" \"action_input\": \"8PFQ\"\n", | ||
"}\n", | ||
"```PDB file found with this ID: 8PFQ\n", | ||
"Thought: Now that I have downloaded the PDB files for both 8PFK and 8PFQ, I can use the SummarizeProteinStructure tool to get information about the number of atoms, residues, chains, frames, and bonds in each protein. I'll start with 8PFK.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"SummarizeProteinStructure\",\n", | ||
" \"action_input\": {\n", | ||
" \"traj_file\": \"8PFK_232021\",\n", | ||
" \"top_file\": \"8PFK_232021\"\n", | ||
" }\n", | ||
"}\n", | ||
"```\n", | ||
"\n", | ||
"Thought: Now that I have summarized the structure of 8PFK, I should do the same for 8PFQ to be able to compare them.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"SummarizeProteinStructure\",\n", | ||
" \"action_input\": {\n", | ||
" \"traj_file\": \"8PFQ_232027\",\n", | ||
" \"top_file\": \"8PFQ_232027\" \n", | ||
" }\n", | ||
"}\n", | ||
"```Thought: Now that I have summarized the structures of both 8PFK and 8PFQ, I can compare their secondary structure elements using the ComputeDSSP tool. I'll start with 8PFK.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"ComputeDSSP\",\n", | ||
" \"action_input\": {\n", | ||
" \"traj_file\": \"8PFK_232021\",\n", | ||
" \"top_file\": \"8PFK_232021\",\n", | ||
" \"target_frames\": \"all\"\n", | ||
" }\n", | ||
"}\n", | ||
"```\n", | ||
"\n", | ||
"Thought: The ComputeDSSP results for 8PFK show that all 109 residues are not assigned and not protein residues. This suggests there may be an issue with the PDB file or it may not contain a protein structure. To check, I should try computing the DSSP for 8PFQ and see if I get a similar result before drawing any conclusions.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"ComputeDSSP\",\n", | ||
" \"action_input\": {\n", | ||
" \"traj_file\": \"8PFQ_232027\",\n", | ||
" \"top_file\": \"8PFQ_232027\",\n", | ||
" \"target_frames\": \"all\"\n", | ||
" }\n", | ||
"}\n", | ||
"```\n", | ||
"\n", | ||
"Based on the analysis, it appears that the PDB files downloaded for 8PFK and 8PFQ do not contain valid protein structures that can be analyzed for secondary structure elements. The key observations are:\n", | ||
"\n", | ||
"1. The SummarizeProteinStructure tool showed that 8PFK has 487 atoms, 109 residues, 2 chains, 1 frame and 420 bonds. 8PFQ has 950 atoms, 194 residues, 6 chains, 1 frame and 833 bonds. \n", | ||
"\n", | ||
"2. However, when running the ComputeDSSP tool to assign secondary structure, it found that all 109 residues in 8PFK and all 194 residues in 8PFQ were \"not assigned, not a protein residue\". This means the DSSP algorithm did not identify any of the residues as being part of an α-helix, β-strand or coil, which are the typical secondary structure elements of proteins.\n", | ||
"\n", | ||
"3. Getting this \"not assigned, not a protein residue\" result for all residues in both structures strongly suggests there is an issue with the PDB files themselves. They may not contain properly formatted protein coordinate data that can be interpreted.\n", | ||
"\n", | ||
"In summary, while I was able to download PDB files with the IDs 8PFK and 8PFQ, the files do not appear to contain analyzable protein structures based on the DSSP results. It's not possible to compare the secondary structure composition without valid protein coordinate data. I would recommend double checking the PDB IDs and downloading the files again, or trying alternative PDB entries for the proteins of interest to troubleshoot the issue.\n", | ||
"\n", | ||
"Action:\n", | ||
"```\n", | ||
"{\n", | ||
" \"action\": \"Final Answer\",\n", | ||
" \"action_input\": \"I downloaded the PDB files for 8PFK and 8PFQ and attempted to compare their secondary structures. However, analysis with the DSSP algorithm found that none of the residues in either structure were identified as protein residues or assigned to typical secondary structure elements like helices, strands or coils. \n", | ||
"\n", | ||
"8PFK contains 487 atoms, 109 residues, 2 chains, and 420 bonds based on the PDB file. 8PFQ contains 950 atoms, 194 residues, 6 chains and 833 bonds. But DSSP assigned all residues in both as 'not a protein residue'.\n", | ||
"\n", | ||
"This strongly suggests there are issues with the PDB files themselves and they may not contain properly formatted protein structures that can be analyzed. Without valid coordinate data, it's not possible to assess and compare the secondary structure composition.\n", | ||
"\n", | ||
"I recommend double checking the PDB IDs to make sure 8PFK and 8PFQ are the correct entries for the proteins of interest. If so, try downloading the PDB files again or using alternative PDB entries to obtain structures that can be interpreted by the DSSP algorithm and other protein analysis tools. Let me know if you have any other questions!\"\n", | ||
"}\n", | ||
"```" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"answer = agent.run(prompt11)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "5d1f0c85-88d6-4d2a-aae1-6a2d763f280d", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"ckpt_39\n", | ||
"Files found in registry: 8PFK_232021: PDB file downloaded from RSCB\n", | ||
" PDBFile ID: 8PFK_232021\n", | ||
" 8PFQ_232027: PDB file downloaded from RSCB\n", | ||
" PDBFile ID: 8PFQ_232027\n", | ||
" rec0_232050: dssp values for trajectory with id: 8PFK_232021\n", | ||
" rec0_232058: dssp values for trajectory with id: 8PFQ_232027\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"registry = agent.path_registry\n", | ||
"print(os.path.basename(registry.ckpt_dir))\n", | ||
"paths_and_descriptions = registry.list_path_names_and_descriptions()\n", | ||
"print(\"\\n\".join(paths_and_descriptions.split(\",\")))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 11, | ||
"id": "044e9ae5", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{'n_atoms': 487, 'n_residues': 109, 'n_chains': 2, 'n_frames': 1, 'n_bonds': 420}\n", | ||
"{'n_atoms': 950, 'n_residues': 194, 'n_chains': 6, 'n_frames': 1, 'n_bonds': 833}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from mdagent.tools.base_tools.analysis_tools.secondary_structure import SummarizeProteinStructure\n", | ||
"\n", | ||
"summarizer = SummarizeProteinStructure(registry)\n", | ||
"\n", | ||
"print (summarizer._run(\"8PFK_232021\"))\n", | ||
"print (summarizer._run(\"8PFQ_232027\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"id": "2ab4e124-2086-46ab-b747-28f6aebb850e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import re\n", | ||
"import os\n", | ||
"match = re.search(r\"8PFK_\\d+\", paths_and_descriptions)\n", | ||
"file_id1 = match.group(0)\n", | ||
"path1 = registry.get_mapped_path(file_id1)\n", | ||
"assert os.path.exists(path1)\n", | ||
"\n", | ||
"match = re.search(r\"8PFQ_\\d+\", paths_and_descriptions)\n", | ||
"file_id2 = match.group(0)\n", | ||
"path2 = registry.get_mapped_path(file_id2)\n", | ||
"assert os.path.exists(path2)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"id": "5c9a6ace-69e8-4042-9d35-ca598f4d00c2", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"protein 8PFK\n", | ||
"Number of chains: 2\n", | ||
"Number of atoms: 487\n", | ||
"Number of residues in sheets: 0\n", | ||
"Number of residues in helices: 0\n", | ||
"Number of residues in coils: 0\n", | ||
"\n", | ||
"protein 8PFQ\n", | ||
"Number of chains: 6\n", | ||
"Number of atoms: 950\n", | ||
"Number of residues in sheets: 0\n", | ||
"Number of residues in helices: 0\n", | ||
"Number of residues in coils: 0\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"import mdtraj as md\n", | ||
"traj = md.load(path1)\n", | ||
"top = traj.topology\n", | ||
"number_of_chains = top.n_chains\n", | ||
"number_of_atoms = top.n_atoms\n", | ||
"print('protein 8PFK')\n", | ||
"print('Number of chains: ', number_of_chains)\n", | ||
"print('Number of atoms: ', number_of_atoms)\n", | ||
"secondary_structure = md.compute_dssp(traj,simplified=True)\n", | ||
"print(\"Number of residues in sheets: \",len([i for i in secondary_structure[0] if i == 'E']))\n", | ||
"print(\"Number of residues in helices: \",len([i for i in secondary_structure[0] if i == 'H']))\n", | ||
"print(\"Number of residues in coils: \",len([i for i in secondary_structure[0] if i == 'C']))\n", | ||
"\n", | ||
"traj = md.load(path2)\n", | ||
"top = traj.topology\n", | ||
"number_of_chains = top.n_chains\n", | ||
"number_of_atoms = top.n_atoms\n", | ||
"print('\\nprotein 8PFQ')\n", | ||
"print('Number of chains: ', number_of_chains)\n", | ||
"print('Number of atoms: ', number_of_atoms)\n", | ||
"secondary_structure = md.compute_dssp(traj,simplified=True)\n", | ||
"print(\"Number of residues in sheets: \",len([i for i in secondary_structure[0] if i == 'E']))\n", | ||
"print(\"Number of residues in helices: \",len([i for i in secondary_structure[0] if i == 'H']))\n", | ||
"print(\"Number of residues in coils: \",len([i for i in secondary_structure[0] if i == 'C']))" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "mda-aug20", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.12.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.