Actively Seeking Full-Time Opportunities
I am a passionate researcher and engineer specializing in Deep Learning, with a focus on Computer Vision. Currently, I work as a Computer Vision Engineer at Techolution and lead the Machine Learning Society (MLSC) at VIIT Pune as its President. My journey in research has been marked by impactful work on multimodal AI systems, and I hold two patents related to assistive technologies for the visually impaired.
- 🔭 Focused on developing advanced Deep Learning architectures and exploring cutting-edge Computer Vision techniques.
- 🤗 Actively publishing and sharing machine learning models on Huggingface.
- 🌍 You can explore my research and projects at shreyasdixit.me.
- Motto: Lead, Learn, Inspire
I am currently looking for full-time roles in AI Research, Machine Learning Engineering, or Computer Vision Engineering at organizations that value cutting-edge research and innovation. I’m passionate about working on impactful projects that push the boundaries of AI and technology, particularly in areas like multimodal AI, AI for accessibility, and AI in robotics. If you have opportunities that align with my expertise, I would love to connect and explore how we can work together.
- WaveFormer : Published in Qurtile 1 Journal for Long Time Ocean Wave Forecasting (2024).
- Joined Techolution as a Computer Vision Engineering Intern.
- Patent #2 Published: "Real-Time MultiModal Video Narration Platform for Visually Impaired People" (2023).
- Indian Patent Published: "Assistance Platform for Visually Impaired Person Using Image Captioning".
Here are some of my latest projects in Deep Learning and Computer Vision. Click on the project names for their GitHub repositories and live demos.
-
EchoSense
A multimodal model that generates audio descriptions from images. This model combines computer vision and audio generation to assist the visually impaired.
Live Demo on Huggingface. -
HingFlow
A neural machine translation model that translates English sentences into Hindi using transformer-based architectures.
Live Demo on Huggingface. -
MaskedLM
A PyTorch implementation of BART architecture for Masked Language Modeling on English and Hinglish datasets.
Model on Huggingface. -
Neural Translation
A transformer-based Neural Machine Translation model built from scratch for English to Hinglish translation. Achieved 94% accuracy in 24 hours during the Neuro Hackathon. -
Image Captioning
A multimodal model for generating text descriptions from images. This system combines vision and natural language processing to create accurate image captions.
- AI-Generated Content Safety: Researching methods to detect, mitigate, and prevent harmful, biased, or unsafe outputs in AI systems.
- AI in Robotics: Investigating the intersection of AI, computer vision, and robotics to create smarter, autonomous systems.
- Multimodal AI: Advancing the integration of vision, language, and audio for more intelligent and human-like systems.
- AI for Accessibility: Designing and building AI-driven tools to improve accessibility for the visually impaired, enhancing inclusivity and quality of life.
I write about productivity, time management, and life organization in my weekly newsletter, Productivity Pro. Join over 900 readers to stay productive and informed.
Subscribe Now.
I use Notion as my "second brain" to organize research, personal projects, and day-to-day tasks. I also create and share Notion templates to help others stay organized.
Feel free to reach out to me through LinkedIn or Twitter to collaborate, discuss research ideas, or just chat about AI and technology.
The best way to predict the future is to invent it. – Alan Kay