This is a Retrieval-Augmented Generation (RAG)-based multi-modal AI assistant that leverages advanced AI models to provide intelligent, context-aware responses to various types of input including text, images, code, and voice. This project uses the following models:
- Text Assistance: Handle general text-based queries.
- Code Assistance: Provide coding assistance and help with code-related queries.
- Image Analysis: Analyze and describe images.
- Voice Recognition: Convert spoken language into text.
Generative agent/
├── models/
│ ├── llama.py
│ ├── phi_vision.py
│ ├── granite.py
│ └── whisper_asr.py
├── chains/
│ ├── language_assistant.py
│ ├── code_assistant.py
│ └── vision_assistant.py
├── utils/
│ └── image_processor.py
├── agent/
│ ├── tools/
│ │ └── uml_to_code.py
│ ├── prompt_templates.py
│ └── llm_agent.py
└── app.py
- Python 3.8 or higher
- Streamlit
- Required Python packages listed in
requirements.txt
-
Clone the repository:
git clone [https://github.com/your-username/agent-nesh.git](https://github.com/ganeshnehru/RAG-Multi-Modal-Generative-AI-Agent.git) cd RAG-Multi-Modal-Generative-AI-Agent
-
Create a virtual environment and activate it:
python3 -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required packages:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.env
file in the root directory. - Add your NVIDIA_API_KEY and OPENAI_API_KEY.
- Create a
-
Run the Streamlit application:
streamlit run app.py
-
Open your browser and navigate to the provided URL to interact with Agent-Nesh.
- Text Queries: Type your text queries in the provided input box and get responses from the language model.
- Code Assistance: Enter your coding queries to receive code assistance.
- Image Analysis: Upload images for analysis and description.
- Voice Input: Use the voice input feature to transcribe spoken language into text.
- E-Mail: [email protected]
- Telegram: @bettyjk_0915