feat: Implement user input handling for codebase and output directories

- Add command-line interface (CLI) to prompt user for codebase and output directory paths - Validate user input and handle directory creation if necessary - Update .gitignore to exclude .venv and __pycache__ directories - Optimize prompt in docs/Prompts.md for concise information extraction - Remove outdated execution flow details from docs/ExecutionFlow.md
PriNova · May 2, 2024 · 6c0ff61 · 6c0ff61
1 parent 94508b2
commit 6c0ff61
Show file tree

Hide file tree

Showing 4 changed files with 50 additions and 47 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+.venv
+**/__pycache__
diff --git a/docs/ExecutionFlow.md b/docs/ExecutionFlow.md
@@ -1,48 +1,3 @@
-1. User Input:
-   - Prompt the user to provide the directory path of the codebase they want to analyze.
-
-2. Codebase Traversal:
-   - Recursively traverse the provided codebase directory and its subdirectories.
-   - Identify and collect all relevant source code files based on their file extensions.
-
-3. Code Analysis:
-   - For each source code file:
-     - Read the file contents.
-     - Extract relevant information such as module names, class names, function names, and their respective descriptions or comments.
-     - Identify dependencies, libraries, and frameworks used in the file.
-     - Store the extracted information in a structured format (e.g., a dictionary or a custom data structure).
-
-4. LLM Interaction:
-   - For each module, class, or function:
-     - Prepare a prompt for the LLM, including the extracted information and any relevant context.
-     - Send the prompt to the LLM via the OpenAI API.
-     - Receive the LLM's response, which should provide a concise description or summary of the module, class, or function.
-     - Store the LLM-generated descriptions alongside the corresponding code elements.
-
-5. Embedding Generation:
-   - For each module, class, or function:
-     - Generate a vector embedding using the extracted information and the LLM-generated description.
-     - Store the vector embedding in ChromaDB along with relevant metadata (e.g., file path, module name, class name, function name).
-
-6. Report Generation:
-   - High-level Overview:
-     - Retrieve the stored information and embeddings from ChromaDB.
-     - Generate a high-level overview of the codebase's architecture and structure based on the collected information.
-     - Include a summary of the modules, classes, and functions, along with their descriptions.
-   - Detailed Reports:
-     - Generate separate reports for each module, class, or function, providing more detailed information and descriptions.
-     - Include information about dependencies, libraries, and frameworks used.
-   - Save the generated reports in Markdown format within the codebase directory or a specified output directory.
-
-7. Error Handling:
-   - Implement error handling mechanisms to gracefully handle any exceptions or errors that may occur during the execution of the program.
-   - Display appropriate error messages to the user and terminate the program if necessary.
-
-8. CLI Interface:
-   - Implement a command-line interface (CLI) that allows users to specify the codebase directory they want to analyze.
-   - Provide options for generating different types of reports (e.g., high-level overview, detailed reports) based on user preferences.
-
-
 **Hierarchical Execution Flow**
 
 1. User Input:

diff --git a/docs/Prompts.md b/docs/Prompts.md
@@ -57,9 +57,9 @@
     - "deployment_instructions"
     - "additional_resources"
 
-    2. For each key, provide the corresponding information extracted from the documentation.
+    2. For each key, provide the corresponding information extracted from the documentation very briefly.
 
-    3. If any information is missing or couldn't be extracted, set the value of the corresponding key to "UNKNOWN".
+    3. If any information is missing, couldn't be extracted or is not known, set the value of the corresponding key to "UNKNOWN".
 
     4. Ensure that the JSON object is well-formatted, with proper indentation and syntax.
 

diff --git a/main.py b/main.py
@@ -0,0 +1,46 @@
+""" 1. The directory path of the codebase to be analyzed
+2. The output directory for the generated reports
+
+To implement this step, we'll break it down into smaller tasks:
+
+1. Create a command-line interface (CLI) for the program
+2. Prompt the user to enter the codebase directory path
+3. Validate the provided codebase directory path
+4. Prompt the user to enter the output directory path
+5. Validate the provided output directory path
+6. Store the user input for later use in the program """
+
+import argparse
+import os
+
+def validate_codebase_dir(codebase_dir):
+    if not os.path.exists(codebase_dir):
+        raise argparse.ArgumentTypeError(f"Codebase directory '{codebase_dir}' does not exist.")
+
+    git_dir = os.path.join(codebase_dir, '.git')
+    if not os.path.isdir(git_dir):
+        raise argparse.ArgumentTypeError(f"Codebase directory '{codebase_dir}' is not a local GitHub repository.")
+
+    return codebase_dir
+
+def main():
+    # Create a command-line interface (CLI) for the program
+    parser = argparse.ArgumentParser(description='CodyArchitect')
+    parser.add_argument('codebase_dir', type=validate_codebase_dir, help='Path to the codebase directory')
+    parser.add_argument('--output_dir', '-o', help='Path to the output directory for generated reports')
+
+    # Prompt the user to enter the codebase directory path
+    args = parser.parse_args()
+
+    # Store the user input for later use in the program
+    codebase_dir = args.codebase_dir
+    output_dir = args.output_dir
+
+    # If the user did not provide an output directory path, create one in the codebase directory
+    if not output_dir:
+        output_dir = os.path.join(codebase_dir, '.codyarchitect')
+        if not os.path.exists(output_dir):
+            os.makedirs(output_dir)
+
+if __name__ == '__main__':
+    main()