Skip to content
View bab-git's full-sized avatar

Block or report bab-git

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bab-git/README.md

Bob Hosseini

Senior Data Scientist | GenAI & ML Systems | Team Lead
*Designing AI systems that scale and teams that ship. *

πŸ’¬ Let’s Connect

Feel free to reach out for collaboration, professional opportunities, or to swap ideas on building better GenAI systems.
πŸ“« Email β€’ πŸ”— LinkedIn β€’ 🌐 Website


πŸ”§ What I Do

  • Lead development of GenAI systems using LLMs, RAG, and agentic pipelines
  • Architect ML solutions from experimentation to scalable production
  • Drive data product strategy and cross-functional execution in e-commerce
  • Mentor data teams and implement best practices for ML/AI delivery
  • Translate complex business problems into real-world AI products


🏒 Real-World AI Systems (Professional)

πŸ“ž LLM-Based Call Summarization

Problem: Manual call logging was time-consuming and error-prone.
Solution: Developed a generative AI system summarizing 2,000+ calls/month using LLMs and audio transcripts.
Impact: Increased productivity and customer satisfaction.
Backend Tech: OpenAI, LangChain, AWS Lambda, AWS S3, AWS CloudWatch, AWS SAM, Audio Preprocessing, Prompt Engineering, SQL
DevOps: CLI, Bash


πŸ“Š Data Analytics Assistant

Problem: Business teams lacked real-time insights from complex data.
Solution: Built an LLM-powered dashboard enabling natural-language querying of business metrics.
Impact: Accelerated insight generation and decision-making.
Backend Tech: LangChain, Google Cloud, LangGraph, SQL, Pandas, Prompt Eng.
Frontend Tech: Data Visualization, Streamlit


πŸ”„ Customer Churn Prediction

Problem: Retention teams struggled to identify at-risk customers.
Solution: Built predictive models using behavior and transaction data.
Impact: Reduced churn by 20% via proactive outreach.
Backend Tech: XGBoost, Scikit-learn, SHAP.
Frontend Tech: Data Visualization, Google Cloud, MLflow


πŸ›’ Product Recommendation Engine

Problem: Users struggled to find relevant products in a large portfolio.
Solution: Developed collaborative and content-based recommendation system for a 55K-product catalog.
Impact: Boosted sales and improved user engagement.
Backend Tech: FastAPI, Matrix Factorization, Pandas, PySpark
DevOps: Docker, MLflow



🧠 Generative AI Projects (Personal)

πŸ”Two-Stage RAG for Document QA

🟒 Production Ready - Backend + Frontend

Problem: Companies using AI for document search often suffer from poor precision or high compute costs. Traditional RAG systems either retrieve irrelevant content or waste resources.

Data: Enterprise documents used in QA systems, requiring accurate and scalable semantic search.

Solution: Designed a two-stage retrieval pipeline combining fast keyword search with precise semantic reranking, using Sentence Transformers and Cross-Encoders. Built with LangChain, ChromaDB, and Docker.

Impact:

  • Cut retrieval overhead by 75%
  • Boosted precision and reduced latency for real-world document QA use cases
  • Live demo deployed with a frontend for stakeholders and clients

▢️ Live MVP, Medium Writeup, GitHub
Backend Tech: RAG, Sentence Transformers, Cross-Encoder Reranker, LangChain, ChromaDB, Poetry
Frontend Tech: Streamlit
DevOps: Docker, Modal


🧬 LLM Agents for Clinical Trials

🟒 Production Ready - Backend + Frontend

Problem: Matching patients to clinical trials is complex, time-consuming, and prone to regulatory risks, often requiring human review of hundreds of eligibility criteria.

Data: Clinical trial criteria and patient profiles, with structured and unstructured medical data.

Solution: Built an agentic LLM workflow using LangGraph to automate trial eligibility screening, hallucination detection, and compliance checks. Integrated human-in-the-loop review and tool calling.

Impact:

  • Improved review speed and consistency
  • Reduced manual effort in preliminary trial screening
  • Framework extensible to other regulated domains like insurance or finance

▢️ Live MVP β€’ πŸ”— GitHub Repo
Backend Tech: LangGraph, OpenAI, Agentic, Tool-calling, Pydantic
Frontend Tech: Gradio


LLM Tutorials & Applications

πŸ”΅ Development Notebooks - Educational

A collection of practical LLM architectures and end-to-end notebooks featuring carefully selected case studies across domains like: healthcare, customer support, product search.
Includes RAG, tool-using agents, clinical trial retrieval, chatbot workflows, and document QA with real-world data sources.
πŸ”— GitHub Repo
Backend Tech: OpenAI, RAG, LangChain, ChromaDB, Pinecone
Frontend Tech: Streamlit



πŸ“ˆ Machine Learning & Data Science Projects (Personal)

πŸ“Š Social Sphere: Student Behavior Analytics (SuperDataScience community project)

🟒 Production Ready - Backend + Frontend

Problem: Educators and psychologists seek to understand how digital behavior affects students' mental health, relationships, and academic performance.

Data: Survey of ~700 students aged 16–25 across multiple countries (Kaggle Q1 2025 dataset). Features include screen time, platform usage, conflicts, sleep, and well-being.

Solution: Led this SuperDataScience community project to predict social-media addiction scores and relationship conflicts. Conducted comprehensive data exploration to uncover key behavioral patterns and insights.

Impact:

  • Accurately predicted student addiction levels with 1% error
  • Flagged at-risk relationship conflicts with 99% sensitivity (Recall)
  • Revealed critical insights into student behavior, such as the impact of daily screen time and platform usage on mental health and relationships

▢️ Live MVP β€’ πŸ”— GitHub Repo β€’ Live MLflow Dashboard
Backend Tech: Python, Scikit-Learn, XGBoost, Regression, Data Visualization, MLflow, SHAP
Frontend Tech: Streamlit
DevOps: Modal, MLflow, Dagshub


πŸ”‹Energy Forecasting with SARIMAX (SuperDataScience community project)

🟒 Production Ready - Backend + Frontend

Problem: Facility managers and sustainability teams need reliable short-term energy forecasts to optimize planning and reduce costs. However, real-world energy consumption is volatile and only weakly correlated with exogenous drivers like temperature.

Data: Synthetic building energy dataset (Kaggle) with hourly and daily electricity usage, temperature, humidity, and occupancy variables.

Solution:

  • Built a time-series forecasting pipeline using SARIMAX, including:
  • Time-series CV, ADF tests, and outlier detection for robust preprocessing
  • Simulation of noisy exogenous inputs via random walks to mimic real-world uncertainty
  • SARIMAX trained and benchmarked against ARIMA, achieving RΒ² β‰ˆ 0.33 despite injected noise

▢️ Live MVP πŸ”— GitHub Repo
Backend Tech: Python, pandas, statsmodels, SARIMAX
Frontend Tech: Streamlit
DevOps: Modal


Data Science & ML Mini Tasks

πŸ”΅ Development Notebooks

Single-notebook projects showcasing applied machine learning and data science, including:

Backend Tech: Python, scikit-learn, XGBoost, pandas, matplotlib, Jupyter
Frontend Tech: Streamlit



✍️ Writing

  • Guardrails in LLM Apps – Strategies for implementing ethical safeguards, ensuring compliance, and enhancing security in Large Language Model applications.
  • LLM Model Selection and Updates – Guidelines for selecting appropriate Large Language Models and managing their updates to balance quality, cost, and scalability in AI applications.
  • Two-Stage RAG for Document QA – An innovative approach to document-based question answering using a two-stage retrieval strategy to enhance precision and scalability in Retrieval-Augmented Generation systems.
  • Data Engineers: The Unsung Heroes Behind AI – An exploration of the pivotal role data engineers play in AI development, emphasizing their contributions to data quality, infrastructure, and the overall success of data science teams.

πŸ’¬ Let’s Connect

Feel free to reach out for collaboration, professional opportunities, or just to swap ideas on building better GenAI systems.

πŸ“« Email β€’ πŸ”— LinkedIn β€’ 🌐 Website


β€œBuild AI that works β€” and teams that last.”

Pinned Loading

  1. SDS-CP027-watt-wise SDS-CP027-watt-wise Public

    Forked from SuperDataScience-Community-Projects/SDS-CP027-watt-wise

    Timeseries forecasting of energy consumption

    Jupyter Notebook

  2. two-stage-conrag two-stage-conrag Public

    A RAG pipeline that optimizes both precision and scalability by employing a sequential retrieval strategy that leverages the strengths of both keyword-based and semantic search while minimizing com…

    Jupyter Notebook 1

  3. SDS-social-sphere SDS-social-sphere Public

    Forked from SuperDataScience-Community-Projects/SDS-CP029-social-sphere

    A machine learning project which analyzes students social media addictions to their relationships

    Jupyter Notebook

  4. data-science-and-ml-mini-projects data-science-and-ml-mini-projects Public

    This repository represent a range of data analysis and machine learning exercises, typically completed within a single Jupyter notebook. Some of these may be derived from interview challenges I've …

    Jupyter Notebook

  5. llm_pharma llm_pharma Public

    This is a tutorial of an agentic Large Language Model (LLM) application to automate the evaluation of patients for clinical trials. It leverages documents related to patients' medical histories, cl…

    Jupyter Notebook 1