claude sonnet & opus experiments

ur-whitelab · Sep 16, 2024 · cf30476 · cf30476
1 parent 1f90b77
commit cf30476
Show file tree

Hide file tree

Showing 50 changed files with 15,567 additions and 0 deletions.
diff --git a/notebooks/experiments/experiment_k1/claude-3-5-sonnet/exp_1.ipynb b/notebooks/experiments/experiment_k1/claude-3-5-sonnet/exp_1.ipynb
diff --git a/notebooks/experiments/experiment_k1/claude-3-5-sonnet/exp_10.ipynb b/notebooks/experiments/experiment_k1/claude-3-5-sonnet/exp_10.ipynb
diff --git a/notebooks/experiments/experiment_k1/claude-3-5-sonnet/exp_11.ipynb b/notebooks/experiments/experiment_k1/claude-3-5-sonnet/exp_11.ipynb
@@ -0,0 +1,276 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "8ff706fd-c267-4d05-af58-9a3848cce8ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from mdagent import MDAgent\n",
+    "import os"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "069a7bb4",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "claude-3-5-sonnet-20240620\n"
+     ]
+    }
+   ],
+   "source": [
+    "prompt11 = \"Download the PDB files for 8PFK and 8PFQ. Then, compare the secondary structures of the two proteins, including the number of atoms, secondary structures, number of chains, etc.\"\n",
+    "llm_var = \"claude-3-5-sonnet-20240620\"\n",
+    "tools = \"all\"\n",
+    "agent = MDAgent(agent_type=\"Structured\", model=llm_var, top_k_tools=tools, use_memory=False)\n",
+    "print(llm_var)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ad2f6f66-e6b8-4244-b908-68fbebbf6e8b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Thought: To answer this question, I need to first download the PDB files for 8PFK and 8PFQ, and then analyze their structures. Let's start by downloading the PDB file for 8PFK.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "    \"action\": \"PDBFileDownloader\",\n",
+      "    \"action_input\": \"8PFK\"\n",
+      "}\n",
+      "```PDB file found with this ID: 8PFK\n",
+      "Thought: Great, we've successfully downloaded the PDB file for 8PFK. Now, let's download the PDB file for 8PFQ.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "    \"action\": \"PDBFileDownloader\",\n",
+      "    \"action_input\": \"8PFQ\"\n",
+      "}\n",
+      "```PDB file found with this ID: 8PFQ\n",
+      "Thought: Now that we have both PDB files downloaded, we can analyze their structures. Let's start by summarizing the protein structure for 8PFK.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "    \"action\": \"SummarizeProteinStructure\",\n",
+      "    \"action_input\": {\n",
+      "        \"traj_file\": \"8PFK_214903\",\n",
+      "        \"top_file\": \"8PFK_214903\"\n",
+      "    }\n",
+      "}\n",
+      "```Thought: We've successfully summarized the protein structure for 8PFK. Now, let's do the same for 8PFQ to compare them.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "    \"action\": \"SummarizeProteinStructure\",\n",
+      "    \"action_input\": {\n",
+      "        \"traj_file\": \"8PFQ_214905\",\n",
+      "        \"top_file\": \"8PFQ_214905\"\n",
+      "    }\n",
+      "}\n",
+      "```Thought: Now that we have the basic structural information for both proteins, let's analyze their secondary structures using the DSSP tool. We'll start with 8PFK.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "    \"action\": \"ComputeDSSP\",\n",
+      "    \"action_input\": {\n",
+      "        \"traj_file\": \"8PFK_214903\",\n",
+      "        \"top_file\": \"8PFK_214903\",\n",
+      "        \"target_frames\": \"all\"\n",
+      "    }\n",
+      "}\n",
+      "```Thought: I apologize for the confusion in my previous analysis. It seems that the DSSP tool couldn't properly assign secondary structures to 8PFK. This could be due to various reasons, such as the file format or content. Let's try to analyze the secondary structure of 8PFQ to see if we encounter the same issue.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "    \"action\": \"ComputeDSSP\",\n",
+      "    \"action_input\": {\n",
+      "        \"traj_file\": \"8PFQ_214905\",\n",
+      "        \"top_file\": \"8PFQ_214905\",\n",
+      "        \"target_frames\": \"all\"\n",
+      "    }\n",
+      "}\n",
+      "```Thought: It appears that we're encountering issues with the DSSP analysis for both proteins. This could be due to the file format or content not being compatible with the DSSP tool. Given this limitation, we should focus on the information we have successfully obtained and provide a comparison based on that. Let's summarize our findings and provide a final answer.\n",
+      "\n",
+      "Action:\n",
+      "```\n",
+      "{\n",
+      "  \"action\": \"Final Answer\",\n",
+      "  \"action_input\": \"After analyzing the PDB files for 8PFK and 8PFQ, we can provide a comparison of their structures based on the available information:\n",
+      "\n",
+      "1. Number of atoms:\n",
+      "   - 8PFK: 487 atoms\n",
+      "   - 8PFQ: 950 atoms\n",
+      "\n",
+      "2. Number of residues:\n",
+      "   - 8PFK: 109 residues\n",
+      "   - 8PFQ: 194 residues\n",
+      "\n",
+      "3. Number of chains:\n",
+      "   - 8PFK: 2 chains\n",
+      "   - 8PFQ: 6 chains\n",
+      "\n",
+      "4. Number of frames:\n",
+      "   - Both proteins have 1 frame, indicating they are static structures\n",
+      "\n",
+      "5. Number of bonds:\n",
+      "   - 8PFK: 420 bonds\n",
+      "   - 8PFQ: 833 bonds\n",
+      "\n",
+      "Unfortunately, we were unable to obtain detailed information about the secondary structures (helices, strands, and coils) for either protein using the DSSP tool. This could be due to issues with the file format or content.\n",
+      "\n",
+      "Comparing the two proteins:\n",
+      "1. 8PFQ is significantly larger than 8PFK, with almost twice the number of atoms and residues.\n",
+      "2. 8PFQ has more chains (6) compared to 8PFK (2), suggesting a more complex quaternary structure.\n",
+      "3. The number of bonds in 8PFQ (833) is proportionally higher than in 8PFK (420), which is consistent with its larger size.\n",
+      "\n",
+      "To get more detailed information about the secondary structures, we might need to use alternative tools or methods, or investigate why the DSSP tool was unable to process these particular PDB files. It's possible that the files may need some preprocessing or that they contain non-standard residues or other structural features that the tool couldn't interpret correctly.\"\n",
+      "}\n",
+      "```"
+     ]
+    }
+   ],
+   "source": [
+    "answer = agent.run(prompt11)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "5d1f0c85-88d6-4d2a-aae1-6a2d763f280d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ckpt_26\n",
+      "Files found in registry: 8PFK_214903: PDB file downloaded from RSCB\n",
+      " PDBFile ID: 8PFK_214903\n",
+      " 8PFQ_214905: PDB file downloaded from RSCB\n",
+      " PDBFile ID: 8PFQ_214905\n",
+      " rec0_214912: dssp values for trajectory with id: 8PFK_214903\n",
+      " rec0_214914: dssp values for trajectory with id: 8PFQ_214905\n"
+     ]
+    }
+   ],
+   "source": [
+    "registry = agent.path_registry\n",
+    "print(os.path.basename(registry.ckpt_dir))\n",
+    "paths_and_descriptions = registry.list_path_names_and_descriptions()\n",
+    "print(\"\\n\".join(paths_and_descriptions.split(\",\")))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2ab4e124-2086-46ab-b747-28f6aebb850e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re\n",
+    "import os\n",
+    "match = re.search(r\"8PFK_\\d+\", paths_and_descriptions)\n",
+    "file_id1 = match.group(0)\n",
+    "path1 = registry.get_mapped_path(file_id1)\n",
+    "assert os.path.exists(path1)\n",
+    "\n",
+    "match = re.search(r\"8PFQ_\\d+\", paths_and_descriptions)\n",
+    "file_id2 = match.group(0)\n",
+    "path2 = registry.get_mapped_path(file_id2)\n",
+    "assert os.path.exists(path2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "5c9a6ace-69e8-4042-9d35-ca598f4d00c2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "protein 8PFK\n",
+      "Number of chains:  2\n",
+      "Number of atoms:  487\n",
+      "Number of residues in sheets:  0\n",
+      "Number of residues in helices:  0\n",
+      "Number of residues in coils:  0\n",
+      "\n",
+      "protein 8PFQ\n",
+      "Number of chains:  6\n",
+      "Number of atoms:  950\n",
+      "Number of residues in sheets:  0\n",
+      "Number of residues in helices:  0\n",
+      "Number of residues in coils:  0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import mdtraj as md\n",
+    "traj = md.load(path1)\n",
+    "top = traj.topology\n",
+    "number_of_chains = top.n_chains\n",
+    "number_of_atoms = top.n_atoms\n",
+    "print('protein 8PFK')\n",
+    "print('Number of chains: ', number_of_chains)\n",
+    "print('Number of atoms: ', number_of_atoms)\n",
+    "secondary_structure = md.compute_dssp(traj,simplified=True)\n",
+    "print(\"Number of residues in sheets: \",len([i for i in secondary_structure[0] if i == 'E']))\n",
+    "print(\"Number of residues in helices: \",len([i for i in secondary_structure[0] if i == 'H']))\n",
+    "print(\"Number of residues in coils: \",len([i for i in secondary_structure[0] if i == 'C']))\n",
+    "\n",
+    "traj = md.load(path2)\n",
+    "top = traj.topology\n",
+    "number_of_chains = top.n_chains\n",
+    "number_of_atoms = top.n_atoms\n",
+    "print('\\nprotein 8PFQ')\n",
+    "print('Number of chains: ', number_of_chains)\n",
+    "print('Number of atoms: ', number_of_atoms)\n",
+    "secondary_structure = md.compute_dssp(traj,simplified=True)\n",
+    "print(\"Number of residues in sheets: \",len([i for i in secondary_structure[0] if i == 'E']))\n",
+    "print(\"Number of residues in helices: \",len([i for i in secondary_structure[0] if i == 'H']))\n",
+    "print(\"Number of residues in coils: \",len([i for i in secondary_structure[0] if i == 'C']))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "mda-aug20",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}