Senior Data Scientist | GenAI & ML Systems | Team Lead
*Designing AI systems that scale and teams that ship.
*
Feel free to reach out for collaboration, professional opportunities, or to swap ideas on building better GenAI systems.
π« Email β’ π LinkedIn β’ π Website
- Lead development of GenAI systems using LLMs, RAG, and agentic pipelines
- Architect ML solutions from experimentation to scalable production
- Drive data product strategy and cross-functional execution in e-commerce
- Mentor data teams and implement best practices for ML/AI delivery
- Translate complex business problems into real-world AI products
Problem: Manual call logging was time-consuming and error-prone.
Solution: Developed a generative AI system summarizing 2,000+ calls/month using LLMs and audio transcripts.
Impact: Increased productivity and customer satisfaction.
Backend Tech: OpenAI, LangChain, AWS Lambda, AWS S3, AWS CloudWatch, AWS SAM, Audio Preprocessing, Prompt Engineering, SQL
DevOps: CLI, Bash
Problem: Business teams lacked real-time insights from complex data.
Solution: Built an LLM-powered dashboard enabling natural-language querying of business metrics.
Impact: Accelerated insight generation and decision-making.
Backend Tech: LangChain, Google Cloud, LangGraph, SQL, Pandas, Prompt Eng.
Frontend Tech: Data Visualization, Streamlit
Problem: Retention teams struggled to identify at-risk customers.
Solution: Built predictive models using behavior and transaction data.
Impact: Reduced churn by 20% via proactive outreach.
Backend Tech: XGBoost, Scikit-learn, SHAP.
Frontend Tech: Data Visualization, Google Cloud, MLflow
Problem: Users struggled to find relevant products in a large portfolio.
Solution: Developed collaborative and content-based recommendation system for a 55K-product catalog.
Impact: Boosted sales and improved user engagement.
Backend Tech: FastAPI, Matrix Factorization, Pandas, PySpark
DevOps: Docker, MLflow
π’ Production Ready - Backend + Frontend
Problem: Companies using AI for document search often suffer from poor precision or high compute costs. Traditional RAG systems either retrieve irrelevant content or waste resources.
Data: Enterprise documents used in QA systems, requiring accurate and scalable semantic search.
Solution: Designed a two-stage retrieval pipeline combining fast keyword search with precise semantic reranking, using Sentence Transformers and Cross-Encoders. Built with LangChain, ChromaDB, and Docker.
Impact:
- Cut retrieval overhead by 75%
- Boosted precision and reduced latency for real-world document QA use cases
- Live demo deployed with a frontend for stakeholders and clients
Backend Tech: RAG, Sentence Transformers, Cross-Encoder Reranker, LangChain, ChromaDB, Poetry
Frontend Tech: Streamlit
DevOps: Docker, Modal
π’ Production Ready - Backend + Frontend
Problem: Matching patients to clinical trials is complex, time-consuming, and prone to regulatory risks, often requiring human review of hundreds of eligibility criteria.
Data: Clinical trial criteria and patient profiles, with structured and unstructured medical data.
Solution: Built an agentic LLM workflow using LangGraph to automate trial eligibility screening, hallucination detection, and compliance checks. Integrated human-in-the-loop review and tool calling.
Impact:
- Improved review speed and consistency
- Reduced manual effort in preliminary trial screening
- Framework extensible to other regulated domains like insurance or finance
Backend Tech: LangGraph, OpenAI, Agentic, Tool-calling, Pydantic
Frontend Tech: Gradio
π΅ Development Notebooks - Educational
A collection of practical LLM architectures and end-to-end notebooks featuring carefully selected case studies across domains like:
healthcare,
customer support,
product search.
Includes RAG, tool-using agents, clinical trial retrieval, chatbot workflows, and document QA with real-world data sources.
π GitHub Repo
Backend Tech: OpenAI, RAG, LangChain, ChromaDB, Pinecone
Frontend Tech: Streamlit
π Social Sphere: Student Behavior Analytics (SuperDataScience community project)
π’ Production Ready - Backend + Frontend
Problem: Educators and psychologists seek to understand how digital behavior affects students' mental health, relationships, and academic performance.
Data: Survey of ~700 students aged 16β25 across multiple countries (Kaggle Q1 2025 dataset). Features include screen time, platform usage, conflicts, sleep, and well-being.
Solution: Led this SuperDataScience community project to predict social-media addiction scores and relationship conflicts. Conducted comprehensive data exploration to uncover key behavioral patterns and insights.
Impact:
- Accurately predicted student addiction levels with 1% error
- Flagged at-risk relationship conflicts with 99% sensitivity (Recall)
- Revealed critical insights into student behavior, such as the impact of daily screen time and platform usage on mental health and relationships
Backend Tech: Python, Scikit-Learn, XGBoost, Regression, Data Visualization, MLflow, SHAP
Frontend Tech: Streamlit
DevOps: Modal, MLflow, Dagshub
πEnergy Forecasting with SARIMAX (SuperDataScience community project)
π’ Production Ready - Backend + Frontend
Problem: Facility managers and sustainability teams need reliable short-term energy forecasts to optimize planning and reduce costs. However, real-world energy consumption is volatile and only weakly correlated with exogenous drivers like temperature.
Data: Synthetic building energy dataset (Kaggle) with hourly and daily electricity usage, temperature, humidity, and occupancy variables.
Solution:
- Built a time-series forecasting pipeline using SARIMAX, including:
- Time-series CV, ADF tests, and outlier detection for robust preprocessing
- Simulation of noisy exogenous inputs via random walks to mimic real-world uncertainty
- SARIMAX trained and benchmarked against ARIMA, achieving RΒ² β 0.33 despite injected noise
Backend Tech: Python, pandas, statsmodels, SARIMAX
Frontend Tech: Streamlit
DevOps: Modal
π΅ Development Notebooks
Single-notebook projects showcasing applied machine learning and data science, including:
- Ad Response Prediction,
- Material Strength Prediction,
- Recipe Recommender,
- Customer Satisfaction Classification,
- Hotel Staff Size Prediction.
Backend Tech: Python, scikit-learn, XGBoost, pandas, matplotlib, Jupyter
Frontend Tech: Streamlit
- Guardrails in LLM Apps β Strategies for implementing ethical safeguards, ensuring compliance, and enhancing security in Large Language Model applications.
- LLM Model Selection and Updates β Guidelines for selecting appropriate Large Language Models and managing their updates to balance quality, cost, and scalability in AI applications.
- Two-Stage RAG for Document QA β An innovative approach to document-based question answering using a two-stage retrieval strategy to enhance precision and scalability in Retrieval-Augmented Generation systems.
- Data Engineers: The Unsung Heroes Behind AI β An exploration of the pivotal role data engineers play in AI development, emphasizing their contributions to data quality, infrastructure, and the overall success of data science teams.
Feel free to reach out for collaboration, professional opportunities, or just to swap ideas on building better GenAI systems.
π« Email β’ π LinkedIn β’ π Website
βBuild AI that works β and teams that last.β