Senior Data Scientist | GenAI & ML Systems | Team Lead
*Designing AI systems that scale and teams that ship.
*
- Lead development of GenAI systems using LLMs, RAG, and agentic pipelines
- Architect ML solutions from experimentation to scalable production
- Drive data product strategy and cross-functional execution in e-commerce
- Mentor data teams and implement best practices for ML/AI delivery
- Translate complex business problems into real-world AI products
-
Two-Stage RAG for Document QA
🟢 Production Ready - Backend + Frontend (App)
A scalable Retrieval-Augmented Generation (RAG) pipeline leveraging two-stage retrieval: keyword and semantic search.
This approach enhances precision and reduces computational costs, achieving over 75% reduction in retrieval overhead for enterprise-scale QA.
Read More: Two-Stage Consecutive RAG for Document QA on Medium
▶️ Access the Application (if in sleep mode, click the "Wake Up" button)
Tech: RAG, Sentence Transformers, Cross-Encoder Reranker, LangChain, ChromaDB, Docker, Poetry, Streamlit -
LLM Agents for Clinical Trials
🔵 Development Notebooks
Agentic LLM pipeline automating clinical trial eligibility and patient matching.
Includes data analysis, compliance verification, hallucination grading, and human-in-the-loop workflows.
Tech: LangGraph, OpenAI, Agentic, Tool-calling, Pydantic, Gradio -
LLM Tutorials & Applications
🔵 Development Notebooks - Educational
A collection of practical LLM architectures and end-to-end notebooks featuring carefully selected case studies across domains like: healthcare, customer support, product search.
Includes RAG, tool-using agents, clinical trial retrieval, chatbot workflows, and document QA with real-world data sources.
Tech: OpenAI, RAG, LangChain, ChromaDB, Pinecone, Streamlit
-
Social Sphere: Student Social-Media Analytics
🟡 Ongoing
As the team lead for this SuperDataScience community project, I am spearheading efforts to predict relationship conflicts and self-reported addiction levels from digital behavior, segment students into behavior-based clusters, and visualize trends in usage intensity, platform preference, and mental well-being.
🗂️ Data Source: Utilizing a fresh dataset of nearly 700 students aged 16–25 from high school to graduate programs across multiple countries, collected in Q1 2025, from Kaggle.
🖥️ MLflow Dashboard on Dagshub
Tech: Python, Scikit-Learn, XGBoost, Regression, Clustering, Data Visualization, MLflow, SHAP -
Energy Forecasting with SARIMAX
🟢 Production Ready - Backend + Frontend
Community project with SuperDataScience to forecast building energy consumption using a synthetic Kaggle dataset.
Built a SARIMAX pipeline with uncertainty-injected exogenous inputs (random walks) and time-series CV.
App delivers EDA, forecast accuracy, and feature relevance visualizations.
▶️ Live App (if in sleep mode, click the "Wake Up" button)
Tech: Python, pandas, statsmodels, SARIMAX, Streamlit -
Data Science & ML Mini Tasks
🔵 Development Notebooks
Single-notebook projects showcasing applied machine learning and data science, including:- Ad Response Prediction,
- Material Strength Prediction,
- Recipe Recommender,
- Customer Satisfaction Classification,
- Hotel Staff Size Prediction.
Tech: Python, scikit-learn, XGBoost, Streamlit, pandas, matplotlib
- Guardrails in LLM Apps – Strategies for implementing ethical safeguards, ensuring compliance, and enhancing security in Large Language Model applications.
- LLM Model Selection and Updates – Guidelines for selecting appropriate Large Language Models and managing their updates to balance quality, cost, and scalability in AI applications.
- Two-Stage RAG for Document QA – An innovative approach to document-based question answering using a two-stage retrieval strategy to enhance precision and scalability in Retrieval-Augmented Generation systems.
- Data Engineers: The Unsung Heroes Behind AI – An exploration of the pivotal role data engineers play in AI development, emphasizing their contributions to data quality, infrastructure, and the overall success of data science teams.
Feel free to reach out for collaboration, leadership opportunities, or just to swap ideas on building better GenAI systems.
“Build AI that works — and teams that last.”