- My CV Link
- I have 3+ years of experience in Data Science & Engineering, proficient in Python, SQL, Big Data Analytics, ML modeling and Cloud computing.
- π Master's in Computer Science at Arizona State University, graduated in May 2024.
- Open to Full time Opportunities in Data Engineering, Data Science, and Machine Learning Engineer roles.
A detail-oriented Data Engineer with 3+ years of experience in Data Science & Engineering, having experience in designing, developing analytical dashboards, ML Models and implementing efficient data pipelines. Proficient in Python, SQL, Power BI, and cloud technologies, skilled in using big data tools like Apache Spark, Airflow, and Databricks, as well as cloud platforms like AWS, and GCP to deliver scalable and high-performance data solutions.
- Programming languages:
- Database:
- Frameworks:
- Backend Technologies:
- Cloud technologies:
- Other technologies:
Volunteer Data Science Engineer at EPICS ASU (Part-time Volunteer) [June 2024 - Present]
- Collaborated on ETL (Extract, Transform, Load) tasks, maintaining data integrity and verifying pipeline stability.
- Redesigned legacy SQL relational database, implementing normalized schema and efficient query structures to enhance data integrity and retrieval speed.
- Prepared written summaries to accompany results and maintain documentation.
- Generated detailed studies on potential third-party data handling solutions, verifying compliance with internal needs and stakeholder requirements.
Data Science Engineer at Piramal Finance [April 2021 - July 2022]
- Improved customer service efficiency by 15%, achieved through Implementing a Scalable ETL Batch Data processing pipeline for CSAT dashboard, improvement was measured by pre- and post-implementation CSAT scores and response time metrics. Utilized Python, SQL queries and tools like AWS Glue, PostgreSQL, EC2, Code Commit and Apache Airflow.
- Significantly reduced dashboard data loading latency by 50%, cutting average load times from 10 minutes to 5 minutes, by implementing an ELT pipeline using AWS Glue, and Airflow to designing a DataMart optimized with Python/PySpark and SQL queries.
- Boosted employee retention by 15% through a predictive churn model. Integrated a Random Forest model into sales operations to proactively identify at-risk employees, enabling targeted retention strategies.
- Optimized data processing workflows and sales analysis using Python API scripts for automated scheduling, resulting in a 50 % faster in timely and accurate sales reporting.
Junior Data Scientist at Radome Technology [June 2019 β March 2021]
- Achieved 85% accuracy in inventory and sales forecasting by developing and deploying modules using ARIMA, ARMA, Random Forest, and Support Vector Machine models. Leveraged GCP services like VM, BigQuery, Cloud Storage, PostgreSQL and Apache Airflow.
- Improved model performance and prediction accuracy by 10% through pre-processing techniques such as feature engineering and dimensionality reduction.
- Achieved 83% accuracy in real-time aircraft detection at 30 frames per second by developing an end-to-end object detection application using the Regional CNN model. Utilized TensorFlow, Code Commit, and Python Flask.
- Enhanced R&D efforts by researching machine learning papers on forecasting and object detection, developing proof-of-concepts, and presenting demos to senior team members and clients.
Credit Card Fraud Analytics & Machine Learning Modeling - Link
- A case study to build a Credit Card Fraud detecting Model, from highly variable and imbalanced real-dataset, using classification model (Logistic Regression, Decision Tree, K-Nearest Neighbor, SVC).
- Plot correlation matrix to check the influence of variables on Target label and Box plot to identify the data distribution and outlier patterns.
- Performed PCA dimensionality reduction, Robust scaling to remove outliers and Sampling to get equal number of Fraud/Not Fraud cases.
- Key metric to access our model performance is False Negative rate. Specificity score of models β (Logistic Regression β 0.98, Support Vector Classifier β 0.99)
- Plot correlation matrix to check the influence of variables on Target Price variable and Box plot to identify the data distribution and outlier patterns.
- Use NLTK(Natural Language Toolkit) package to extract the key amenities from the data given column, form a Word Cloud. β’ After, doing complete EDA, we found that type of room, Property and number of bedrooms influenced a lot on pricing. Essentials amenities like Workspace, Parking, Laptop Friendly, Hair Dryer and Wi-Fi are most common in expensive listings.
Like my work and want to connect.
You are currently here! π GitHub: https://github.com/imnischaygowda
π LinkedIn: https://www.linkedin.com/in/nischayggowda/
π Blog: https://imnischaygowda.hashnode.dev/