A passionate data scientist specializing in Machine Learning and Artificial Intelligence. With extensive experience in Python, SQL, and deep learning, I thrive on solving complex problems and turning data into actionable insights.
I have a robust background in data science, honed through hands-on projects and active participation in data science platforms like Kaggle and Hugging Face. My journey has led me to explore various domains, from automating ML workflows and task-specific object detection to creating RAG chatbots. I am always eager to tackle new challenges and contribute to the data science community.
Resume
Developed an advanced resume parsing tool using NLP for text extraction and LLMs for context-aware information extraction, enhancing candidate profile accuracy by 30%.
Implemented grammar and spelling error detection with intelligent suggestions, improving the quality and professionalism of resumes by 40%.
Designed and developed a CPQ (Configure Price Quote) tool tailored to specific customer requirements, enabling efficient and customized pricing strategies.
Created automated ETL pipeline using Pyspark on Azure Data Factory(ADF), reducing delivery times by 30%.
Designed a RAG AI chatbot featuring secure user authentication via Firebase and advanced text extraction from handwritten images and PDFs using Google Document AI, achieving 95% text recognition accuracy by integrating text embeddings of the extracted text, saved embedding in the ChromDB enhancing search efficiency and reducing query response time by 40%.
Developed a face mask detection application for public safety systems during pandemics like COVID-19, using a custom CNN model trained on a dataset of 12,000 images to achieve 99% accuracy. Enhanced the model's performance and robustness with data augmentation, and deployed it with a user-friendly interface via Streamlit.
An interactive Decision Treee Visualizer app provides an interface where you can customize various parameters of the Decision Tree model and observe its performance on the Iris dataset by visualizing Decision Boundry and Decision Tree Graph.
DataFlow Prp is a Python application designed to automate the process of building, tuning, and evaluating machine learning models based on json provided in RTF/JSON/TXT file format. This application follows a structured flow to read the json file, extract dataset information, transform features, split data, build and tune models, and evaluate their performance.
Grouping customers based on purchasing behavior using clustering techniques (Kmeans) to increase understanding of customer preferences accordingly.
An interactive dashboard using Tableau that provides analysis of sales, profits, and total sold. This provides an in-depth understanding of the sales performance of adidas products in the US.
This dashboard provides the latest data visualization about the number of confirmed cases, recovery rate, number of deaths. Users can explore the data by filtering by province, time range and other parameters.
The Streamlit website for stroke prediction was built using Streamlit, where users can enter patient data such as age, gender, blood sugar levels, body mass index (BMI), and other risk factors to obtain stroke risk predictions.