Reddit Data Pipeline ? Airflow, AWS Glue, Redshift
- Designed and deployed a fault-tolerant data pipeline ingesting Reddit data with Apache Airflow workflows. Utilized AWS Glue and Athena for data cataloging and querying, and loaded transformed data into Amazon Redshift for efficient, large-scale analytics and reporting.
Movie Recommendation Engine ? Collaborative Filtering, Python
- Implemented a recommendation system integrating collaborative filtering and weighted ratings, supported by exploratory data analysis (EDA). Enhanced user personalization and prediction accuracy, illustrating key skills in building scalable, data-driven product features.
Food Object Detection ? CNNs (ResNet, VGG16), Transfer Learning
- Built and fine-tuned multi-class CNN models (ResNet, VGG16) with transfer learning for precise classification of food items on a custom dataset, demonstrating strong image recognition capabilities applicable to e-commerce product categorization.
Smart Article Summarizer ? Flask, Hugging Face, AWS, RAG
- Developed a scalable article summarization system leveraging Retrieval-Augmented Generation (RAG) with Hugging Face Transformers. Automated web content extraction using BeautifulSoup from Common Crawl and ArXiv datasets. Deployed as a Flask web service on AWS, optimizing summary quality using ROUGE metrics for enhanced accuracy.