Skip to content
View SRDdev's full-sized avatar
:octocat:
Empowering lives through AI
:octocat:
Empowering lives through AI

Highlights

  • Pro

Block or report SRDdev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
SRDdev/README.md

Shreyas Dixit

AI Researcher, Computer Vision & Machine Learning Engineer

Actively Seeking Full-Time Opportunities

Profile Views
Twitter
LinkedIn


About Me

I am a passionate researcher and engineer specializing in Deep Learning, with a focus on Computer Vision. Currently, I work as a Computer Vision Engineer at Techolution and lead the Machine Learning Society (MLSC) at VIIT Pune as its President. My journey in research has been marked by impactful work on multimodal AI systems, and I hold two patents related to assistive technologies for the visually impaired.

  • 🔭 Focused on developing advanced Deep Learning architectures and exploring cutting-edge Computer Vision techniques.
  • 🤗 Actively publishing and sharing machine learning models on Huggingface.
  • 🌍 You can explore my research and projects at shreyasdixit.me.
  • Motto: Lead, Learn, Inspire

Actively Seeking Full-Time Opportunities

I am currently looking for full-time roles in AI Research, Machine Learning Engineering, or Computer Vision Engineering at organizations that value cutting-edge research and innovation. I’m passionate about working on impactful projects that push the boundaries of AI and technology, particularly in areas like multimodal AI, AI for accessibility, and AI in robotics. If you have opportunities that align with my expertise, I would love to connect and explore how we can work together.


Latest Updates

  • WaveFormer : Published in Qurtile 1 Journal for Long Time Ocean Wave Forecasting (2024).
  • Joined Techolution as a Computer Vision Engineering Intern.
  • Patent #2 Published: "Real-Time MultiModal Video Narration Platform for Visually Impaired People" (2023).
  • Indian Patent Published: "Assistance Platform for Visually Impaired Person Using Image Captioning".

Projects

Here are some of my latest projects in Deep Learning and Computer Vision. Click on the project names for their GitHub repositories and live demos.

  • EchoSense
    A multimodal model that generates audio descriptions from images. This model combines computer vision and audio generation to assist the visually impaired.
    Live Demo on Huggingface.

  • HingFlow
    A neural machine translation model that translates English sentences into Hindi using transformer-based architectures.
    Live Demo on Huggingface.

  • MaskedLM
    A PyTorch implementation of BART architecture for Masked Language Modeling on English and Hinglish datasets.
    Model on Huggingface.

  • Neural Translation
    A transformer-based Neural Machine Translation model built from scratch for English to Hinglish translation. Achieved 94% accuracy in 24 hours during the Neuro Hackathon.

  • Image Captioning
    A multimodal model for generating text descriptions from images. This system combines vision and natural language processing to create accurate image captions.


Current Focus

  • AI-Generated Content Safety: Researching methods to detect, mitigate, and prevent harmful, biased, or unsafe outputs in AI systems.
  • AI in Robotics: Investigating the intersection of AI, computer vision, and robotics to create smarter, autonomous systems.
  • Multimodal AI: Advancing the integration of vision, language, and audio for more intelligent and human-like systems.
  • AI for Accessibility: Designing and building AI-driven tools to improve accessibility for the visually impaired, enhancing inclusivity and quality of life.

Notion & Newsletter

Productivity Pro Newsletter

I write about productivity, time management, and life organization in my weekly newsletter, Productivity Pro. Join over 900 readers to stay productive and informed.
Subscribe Now.


Notion Templates for Productivity

I use Notion as my "second brain" to organize research, personal projects, and day-to-day tasks. I also create and share Notion templates to help others stay organized.


Let's Connect

Feel free to reach out to me through LinkedIn or Twitter to collaborate, discuss research ideas, or just chat about AI and technology.


The best way to predict the future is to invent it. – Alan Kay

Pinned Loading

  1. PaliGemma PaliGemma Public

    Building PaliGemma from scratch, a Vision Language Model by GoogleDeepmind designed to address a broad range of vision-language tasks. It combines the SigLIP-So400m vision encoder and the Gemma-2B …

    Python

  2. YouTube-Llama YouTube-Llama Public

    A question-answering chatbot for any YouTube video using Local Llama2 & Retrival Augmented Generation

    Python 3 3

  3. Multi-Head-Yolov9 Multi-Head-Yolov9 Public

    This repository contains the implementation of a multi-head YOLOv9 model for clothes detection and instance segmentation. The model is trained on the DeepFashion dataset and evaluated using MSCOCO …

    Jupyter Notebook 2

  4. OpenAI-CLIP OpenAI-CLIP Public

    Simple Educational Implementation of OpenAI CLIP in PyTorch

    Jupyter Notebook 4 1

  5. SwinTransformer SwinTransformer Public

    This project aims to replicate the architecture proposed in the Swin Transformer paper for medical image semantic segmentation.

    Jupyter Notebook

  6. PaLM-RLHF PaLM-RLHF Public

    Forked from lucidrains/PaLM-rlhf-pytorch

    Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

    Python 30 4