Data Science, AI engineer, Data Anayslsis, Data Enginner
Aktualisiert am 05.02.2026
Profil
Freiberufler / Selbstständiger
Verfügbar ab: 16.02.2026
Verfügbar zu: 100%
davon vor Ort: 100%
SQL
Pyhton
RAG
English
C2
German
B2

Einsatzorte

Einsatzorte

Deutschland
nicht möglich

Projekte

Projekte

5 months
2025-11 - now

Designed reproducible experimentation pipelines

AI Research Intern
AI Research Intern
  • Designed reproducible experimentation pipelines for in-cabin driver activity recognition, enabling reliable benchmarking across datasets and domains.
  • Built scalable synthetic-data generation workflows (diffusion models) and evaluation loops to measure domain shift; improved generalization by 20?30%.
  • Developed structured evaluation artifacts (metrics, ablations, dataset versions) to support technical decision-making and stakeholder alignment.
  • Applied knowledge distillation and domain adaptation strategies to improve robustness across heterogeneous data distributions.
  • Containerized training and inference workflows using Docker and GPU-based infrastructure for scalable experimentation.
Aumovio GmbH, Berlin, DE
5 months
2025-06 - 2025-10

Designed end-to-end ML data pipelines

Data / ML Engineer ? Working Student
Data / ML Engineer ? Working Student
  • Designed end-to-end ML data pipelines with automated labeling, validation, and quality checks using vision?language-models, reducing noisy data in production workflows and improving downstream model performance.
  • Implemented governance-like checks (schema validation, consistency rules, QA gates), reducing noisy and invalid labels by 30% and increasing training data reliability.
  • Built an object-level SKU data store (lake-style structured summaries) to standardize features consumed by downstream vision?language action workflows, improving data reuse across teams.
  • Designed low-latency retrieval on PostgreSQL (indexing and query patterns) to support real-time inference and analytics use-cases in production systems.
  • Integrated model outputs (DETR masks and bounding boxes) into downstream robotic learning pipelines, enabling reliable handoff between perception and action modules.
  • Deployed production pipelines using Docker and CI/CD (GitLab), achieving 98% SKU-level accuracy and enabling repeatable, production-safe releases.
Sereact GmbH, Stuttgart, DE
6 months
2025-04 - 2025-09

Designed an end-to-end ETL workflow

Master?s Thesis - Time Series Forecasting & Interpretability
Master?s Thesis - Time Series Forecasting & Interpretability
  • Designed an end-to-end ETL workflow for large-scale retail time-series data, including exogenous signals (holidays, promotions, seasonality, trend).
  • Implemented scalable preprocessing and feature engineering (SQL/Python) with standardized data contracts to support repeatable training and evaluation.
  • Trained and evaluated Autoformer for multi-horizon forecasting, producing decision-ready metrics and model comparison artifacts.
  • Performed model interpretability using SHAP and probing (SVM/XGBoost) to validate learned representations and communicate drivers to stakeholders.
  • Delivered forecasting results through dashboards and reports to support business planning and demand forecasting decisions.
Schaeffler GmbH & FAU, Erlangen, DE
7 months
2024-09 - 2025-03

Built GDPR-compliant data anonymization pipelines

Computer Vision Intern
Computer Vision Intern
  • Built GDPR-compliant data anonymization pipelines with automated privacy and data-quality checks for scalable processing.
  • Improved pipeline performance by 30% through data-quality filtering and controlled experimentation.
  • Designed and managed large-scale dataset storage on Amazon S3 using partitioning and lifecycle policies to optimize cost and access.
  • Deployed containerized services using Docker with GitLab CI/CD on Kubernetes/Kubeflow for reliable releases.
IAV GmbH, Ingolstadt, DE
7 months
2024-01 - 2024-07

Built an OCR + retrieval pipeline to transform scanned real-estate PDFs

Data Engineer - Working Student
Data Engineer - Working Student
  • Built an OCR + retrieval pipeline to transform scanned real-estate PDFs into structured tables; achieved 70% extraction accuracy.
  • Defined SQL-based data cleaning/validation rules and implemented incremental loads into a warehouse to support reliable reporting.
  • Developed Power BI dashboards with automated refresh for self-service analytics and stakeholder consumption.
  • Prototyped a FastAPI-served, LangChain-based assistant to enable natural-language querying over listings and reduce manual lookup effort.
Cotinga GmbH, Frankfurt, DE
1 year 1 month
2021-08 - 2022-08

Designed and implemented an end-to-end ML-driven quality analytics pipeline

Project Engineer (Data & ML)
Project Engineer (Data & ML)
  • Designed and implemented an end-to-end ML-driven quality analytics pipeline, covering data ingestion, preprocessing, model training, and batch inference for defect identification.
  • Built structured datasets and labeling schemas for supervised learning, enabling reliable train/validation splits and consistent model evaluation.
  • Defined data-quality checks and evaluation metrics to monitor model outputs, improving operational throughput by 18% and system reliability by 25%.
  • Packaged data processing and ML workflows using Docker, ensuring reproducible experiments and consistent deployment across environments.
Bharat Electronics Limited, Bengaluru, India

Aus- und Weiterbildung

Aus- und Weiterbildung

3 years 6 months
2022-10 - now

Master of Science in Computational Engineering (Specialization in Data Science and AI)

GPA: 2 (1.0 being the best), Friedrich-Alexander-Universität, Erlangen, Germany
GPA: 2 (1.0 being the best)
Friedrich-Alexander-Universität, Erlangen, Germany
4 years
2016-08 - 2020-07

Bachelor of Technology in Mechanical Engineering

GPA: 1.7 (1.0 being the best), C. K. Pithawala College of Engineering, India
GPA: 1.7 (1.0 being the best)
C. K. Pithawala College of Engineering, India

Kompetenzen

Kompetenzen

Top-Skills

SQL Pyhton RAG

Produkte / Standards / Erfahrungen / Methoden

PROFESSIONAL SUMMARY

Data Scientist | GenAI

Data Scientist (ML & Applied AI) with 3 years of experience delivering production-grade machine learning systems that create measurable business impact. Strong background in Python and SQL, owning the full ML lifecycle from problem framing and experimentation to deployment and monitoring. Hands-on experience with LLMs (RAG, fine-tuning, evaluation), reinforcement learning, and forecasting, combined with solid software engineering practices (Docker, CI/CD, MLflow). Proven ability to work in fast production cycles, collaborate with product and business stakeholders, and translate complex data problems into scalable, reliable solutions.


TECHNICAL SKILLS

Machine Learning & AI:

Classical ML, Deep Learning (CNN, RNN), Natural Language Processing (NER, Sentiment Analysis), Reinforcement Learning, Computer Vision (Object Detection, Segmentation), Predictive Analytics, Time-Series Forecasting


LLMs & Generative AI:

Fine-tuning (LoRA/PEFT), RAG, Agentic AI, Diffusion Models, GANs, VAEs, Prompt Engineering, Evaluation


Libraries & Frameworks:

Pandas, NumPy, PySpark, Scikit-learn, XGBoost, PyTorch, TensorFlow, HuggingFace Transformers, Tokenizers, Ollama, LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI, OpenAI, Azure AI, OpenCV, NLTK, Spacy, Gym, Streamlit, Flask, FastAPI, Matplotlib, Plotly, PowerBI


MLOps:

Git, ArgoCD, MLflow, TensorBoard, GitHub Actions (CI/CD), Azure DevOps, Docker (Containerization), Kubernetes


Cloud Platforms:

Microsoft Azure, AWS (Basics)


Data Engineering:

ETL/ELT, Data Lake / Lakehouse, Databricks, MySQL, PostgreSQL, Vector Databases (Azure AI Search, FAISS, ChromaDB, Pinecone)


Data Engineering:

ETL, Databricks, MySQL, PostgreSQL, Vector Databases (Azure AI Search, FAISS, ChromaDB, Pinecone)


Soft Skills:

Stakeholder Communication, Product Thinking, Analytical Reasoning, Cross-functional Collaboration


PROJECTS

LLM-Driven AI Copilot

  • Developed a multi-modal AI co-pilot using a fine-tuned LLaMA model with PEFT (LoRA), Whisper ASR, and vision-language-models for real-time ATC communication and in-flight decision-making.
  • Deployed a RAG pipeline (LangChain, FAISS, HuggingFace) to retrieve and contextualize emergency protocols, enabling accurate and timely autonomous responses.
  • Integrated RL-based control agents with LLM agents in LangGraph to enable context-aware autonomy, supporting human-in-the-loop overrides and enhancing overall flight safety.


RAG Pipeline for Semi-Structured PDF Querying

  • Developed a RAG pipeline using LangChain, ChromaDB, and HuggingFace Transformers to extract, summarize, and query text and tabular data from semi-structured PDFs via a Streamlit-based UI.
  • Implemented chunking, summarization, vector embedding, and multi-vector retrieval, enabling accurate semantic search and real-time LLM responses, with local caching to eliminate redundant computation.
  • Integrated OCR (Tesseract) and table parsing (Poppler, Unstructured library) for handling scanned PDFs and complex tables, significantly improving data accessibility and reducing query latency.


Scalable Movie Recommendation Platform (MERN Stack)

  • Designed and developed a production-ready movie recommendation website using the MERN stack with secure authentication and RESTful APIs.
  • Implemented personalized movie recommendations using collaborative filtering and content-based algorithms, improving user retention and engagement.
  • Integrated Google Ads and analytics for monetization and performance tracking, supporting scalable deployment.

Programmiersprachen

Python
C/C++
C#
SQL
TypeScript
Shell Scripting

Einsatzorte

Einsatzorte

Deutschland
nicht möglich

Projekte

Projekte

5 months
2025-11 - now

Designed reproducible experimentation pipelines

AI Research Intern
AI Research Intern
  • Designed reproducible experimentation pipelines for in-cabin driver activity recognition, enabling reliable benchmarking across datasets and domains.
  • Built scalable synthetic-data generation workflows (diffusion models) and evaluation loops to measure domain shift; improved generalization by 20?30%.
  • Developed structured evaluation artifacts (metrics, ablations, dataset versions) to support technical decision-making and stakeholder alignment.
  • Applied knowledge distillation and domain adaptation strategies to improve robustness across heterogeneous data distributions.
  • Containerized training and inference workflows using Docker and GPU-based infrastructure for scalable experimentation.
Aumovio GmbH, Berlin, DE
5 months
2025-06 - 2025-10

Designed end-to-end ML data pipelines

Data / ML Engineer ? Working Student
Data / ML Engineer ? Working Student
  • Designed end-to-end ML data pipelines with automated labeling, validation, and quality checks using vision?language-models, reducing noisy data in production workflows and improving downstream model performance.
  • Implemented governance-like checks (schema validation, consistency rules, QA gates), reducing noisy and invalid labels by 30% and increasing training data reliability.
  • Built an object-level SKU data store (lake-style structured summaries) to standardize features consumed by downstream vision?language action workflows, improving data reuse across teams.
  • Designed low-latency retrieval on PostgreSQL (indexing and query patterns) to support real-time inference and analytics use-cases in production systems.
  • Integrated model outputs (DETR masks and bounding boxes) into downstream robotic learning pipelines, enabling reliable handoff between perception and action modules.
  • Deployed production pipelines using Docker and CI/CD (GitLab), achieving 98% SKU-level accuracy and enabling repeatable, production-safe releases.
Sereact GmbH, Stuttgart, DE
6 months
2025-04 - 2025-09

Designed an end-to-end ETL workflow

Master?s Thesis - Time Series Forecasting & Interpretability
Master?s Thesis - Time Series Forecasting & Interpretability
  • Designed an end-to-end ETL workflow for large-scale retail time-series data, including exogenous signals (holidays, promotions, seasonality, trend).
  • Implemented scalable preprocessing and feature engineering (SQL/Python) with standardized data contracts to support repeatable training and evaluation.
  • Trained and evaluated Autoformer for multi-horizon forecasting, producing decision-ready metrics and model comparison artifacts.
  • Performed model interpretability using SHAP and probing (SVM/XGBoost) to validate learned representations and communicate drivers to stakeholders.
  • Delivered forecasting results through dashboards and reports to support business planning and demand forecasting decisions.
Schaeffler GmbH & FAU, Erlangen, DE
7 months
2024-09 - 2025-03

Built GDPR-compliant data anonymization pipelines

Computer Vision Intern
Computer Vision Intern
  • Built GDPR-compliant data anonymization pipelines with automated privacy and data-quality checks for scalable processing.
  • Improved pipeline performance by 30% through data-quality filtering and controlled experimentation.
  • Designed and managed large-scale dataset storage on Amazon S3 using partitioning and lifecycle policies to optimize cost and access.
  • Deployed containerized services using Docker with GitLab CI/CD on Kubernetes/Kubeflow for reliable releases.
IAV GmbH, Ingolstadt, DE
7 months
2024-01 - 2024-07

Built an OCR + retrieval pipeline to transform scanned real-estate PDFs

Data Engineer - Working Student
Data Engineer - Working Student
  • Built an OCR + retrieval pipeline to transform scanned real-estate PDFs into structured tables; achieved 70% extraction accuracy.
  • Defined SQL-based data cleaning/validation rules and implemented incremental loads into a warehouse to support reliable reporting.
  • Developed Power BI dashboards with automated refresh for self-service analytics and stakeholder consumption.
  • Prototyped a FastAPI-served, LangChain-based assistant to enable natural-language querying over listings and reduce manual lookup effort.
Cotinga GmbH, Frankfurt, DE
1 year 1 month
2021-08 - 2022-08

Designed and implemented an end-to-end ML-driven quality analytics pipeline

Project Engineer (Data & ML)
Project Engineer (Data & ML)
  • Designed and implemented an end-to-end ML-driven quality analytics pipeline, covering data ingestion, preprocessing, model training, and batch inference for defect identification.
  • Built structured datasets and labeling schemas for supervised learning, enabling reliable train/validation splits and consistent model evaluation.
  • Defined data-quality checks and evaluation metrics to monitor model outputs, improving operational throughput by 18% and system reliability by 25%.
  • Packaged data processing and ML workflows using Docker, ensuring reproducible experiments and consistent deployment across environments.
Bharat Electronics Limited, Bengaluru, India

Aus- und Weiterbildung

Aus- und Weiterbildung

3 years 6 months
2022-10 - now

Master of Science in Computational Engineering (Specialization in Data Science and AI)

GPA: 2 (1.0 being the best), Friedrich-Alexander-Universität, Erlangen, Germany
GPA: 2 (1.0 being the best)
Friedrich-Alexander-Universität, Erlangen, Germany
4 years
2016-08 - 2020-07

Bachelor of Technology in Mechanical Engineering

GPA: 1.7 (1.0 being the best), C. K. Pithawala College of Engineering, India
GPA: 1.7 (1.0 being the best)
C. K. Pithawala College of Engineering, India

Kompetenzen

Kompetenzen

Top-Skills

SQL Pyhton RAG

Produkte / Standards / Erfahrungen / Methoden

PROFESSIONAL SUMMARY

Data Scientist | GenAI

Data Scientist (ML & Applied AI) with 3 years of experience delivering production-grade machine learning systems that create measurable business impact. Strong background in Python and SQL, owning the full ML lifecycle from problem framing and experimentation to deployment and monitoring. Hands-on experience with LLMs (RAG, fine-tuning, evaluation), reinforcement learning, and forecasting, combined with solid software engineering practices (Docker, CI/CD, MLflow). Proven ability to work in fast production cycles, collaborate with product and business stakeholders, and translate complex data problems into scalable, reliable solutions.


TECHNICAL SKILLS

Machine Learning & AI:

Classical ML, Deep Learning (CNN, RNN), Natural Language Processing (NER, Sentiment Analysis), Reinforcement Learning, Computer Vision (Object Detection, Segmentation), Predictive Analytics, Time-Series Forecasting


LLMs & Generative AI:

Fine-tuning (LoRA/PEFT), RAG, Agentic AI, Diffusion Models, GANs, VAEs, Prompt Engineering, Evaluation


Libraries & Frameworks:

Pandas, NumPy, PySpark, Scikit-learn, XGBoost, PyTorch, TensorFlow, HuggingFace Transformers, Tokenizers, Ollama, LangChain, LangGraph, AutoGen, Semantic Kernel, CrewAI, OpenAI, Azure AI, OpenCV, NLTK, Spacy, Gym, Streamlit, Flask, FastAPI, Matplotlib, Plotly, PowerBI


MLOps:

Git, ArgoCD, MLflow, TensorBoard, GitHub Actions (CI/CD), Azure DevOps, Docker (Containerization), Kubernetes


Cloud Platforms:

Microsoft Azure, AWS (Basics)


Data Engineering:

ETL/ELT, Data Lake / Lakehouse, Databricks, MySQL, PostgreSQL, Vector Databases (Azure AI Search, FAISS, ChromaDB, Pinecone)


Data Engineering:

ETL, Databricks, MySQL, PostgreSQL, Vector Databases (Azure AI Search, FAISS, ChromaDB, Pinecone)


Soft Skills:

Stakeholder Communication, Product Thinking, Analytical Reasoning, Cross-functional Collaboration


PROJECTS

LLM-Driven AI Copilot

  • Developed a multi-modal AI co-pilot using a fine-tuned LLaMA model with PEFT (LoRA), Whisper ASR, and vision-language-models for real-time ATC communication and in-flight decision-making.
  • Deployed a RAG pipeline (LangChain, FAISS, HuggingFace) to retrieve and contextualize emergency protocols, enabling accurate and timely autonomous responses.
  • Integrated RL-based control agents with LLM agents in LangGraph to enable context-aware autonomy, supporting human-in-the-loop overrides and enhancing overall flight safety.


RAG Pipeline for Semi-Structured PDF Querying

  • Developed a RAG pipeline using LangChain, ChromaDB, and HuggingFace Transformers to extract, summarize, and query text and tabular data from semi-structured PDFs via a Streamlit-based UI.
  • Implemented chunking, summarization, vector embedding, and multi-vector retrieval, enabling accurate semantic search and real-time LLM responses, with local caching to eliminate redundant computation.
  • Integrated OCR (Tesseract) and table parsing (Poppler, Unstructured library) for handling scanned PDFs and complex tables, significantly improving data accessibility and reducing query latency.


Scalable Movie Recommendation Platform (MERN Stack)

  • Designed and developed a production-ready movie recommendation website using the MERN stack with secure authentication and RESTful APIs.
  • Implemented personalized movie recommendations using collaborative filtering and content-based algorithms, improving user retention and engagement.
  • Integrated Google Ads and analytics for monetization and performance tracking, supporting scalable deployment.

Programmiersprachen

Python
C/C++
C#
SQL
TypeScript
Shell Scripting

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.