I am a Data Engineer. I have additional expertise in Process Mining Analyst and Applied AI engineer Role Would prefer any one or combination of roles.
Aktualisiert am 02.01.2026
Profil
Freiberufler / Selbstständiger
Remote-Arbeit
Verfügbar ab: 02.01.2026
Verfügbar zu: 100%
davon vor Ort: 100%
Python
ETL
Azure
Data Modeling
Data Engineer
RAG
Big Data
Apache Spark
pandas
numpy
English
C1
German
Actively Learning B1
Tamil
Native

Einsatzorte

Einsatzorte

Deutschland
möglich

Projekte

Projekte

Fault Tolerance Analysis - Assembly of Hang-on Parts


Tasks:

Data Engineering

  • Designed and implemented ETL pipelines using Azure Databricks and Pyspark to analyze assembly gaps and flush values for each hang-on parts ,integrating threshold checks for automated quality monitoring.
  • Reduced downtime by 75% and saved 8 manual engineer hours daily by replacing manual inspection through automated issue detection and reporting dashboard developed in collaboration with BI team using Power BI.


Skills

Azure Databricks, Pandas, Power BI


Report


Role: Chief Operation Officer(COO)


Tasks:

Data Engineering

  • Built a complete data pipeline integrating five heterogeneous sources to support COO-level reporting on cost drivers, production stage efficiency, and operational risk. Used Medallion Architecture in Azure Data Lake to structure raw data for real-time dashboards in Power BI.
  • Reduced processing time by 90%, replaced reports with a unified analytics view, and improved strategic decision-making through reliable, timely data.


Skills:

Azure Databricks, Azure Data Lake , Pyspark, Power BI


Comprehensive Dashboard for Material Procurement Department


Tasks:

Data Engineering

  • Orchestrated an end-to-end ETL pipeline using PySpark in Databricks to consolidate procurement data from 5+ departments.
  • Applied Snowflake data modeling to structure and unify data for real-time dashboarding and supplier evaluation. Reduced processing time, enhanced visibility into procurement performance, and enabled faster, more accurate supplier and licensing decisions.


Skills:

Azure Databricks, Pyspark


MAX Proplan Data ? Global Planner Dashboard


Tasks:

Data Engineering

  • Translated business requirements to SQL queries to create reports with Python and Streamlit, integrating 3 of 5 global planning reports.
  • Improved issue resolution and reduced redundancy across more than five plants.


Skills:

SQL, Python, Streamlit


Rework Part Ordering Analysis


Tasks:

Process Mining & Data Engineering

  • Designed and built a data pipeline using data engineering best practices to transform raw production data into structured event logs with PySpark.
  • Connected the logs to SAP Signavio via Azure SQL Database and developed KPIs linked to BPMN diagrams for dynamic process mining.
  • Reduced rework order cycle times by identifying process bottlenecks and enable greater transparency in critical workflows.


Skills:

Azure Databricks, Pyspark, Signavio


Ident Tracking in Production Line


Tasks:

Process Mining & Data Engineering

  • Built and optimized a data pipeline to track production line throughput and compliance, initially deployed at one plant and scaled to others with improved code efficiency.
  • Migrated legacy dashboards from PAFnow/Power BI to Power Automate (formerly Minit) for modern process mining.
  • Increased transparency across production workflows and standardized tracking systems across global teams, resulting in improved process consistency and reduced manual oversight.


Skills:

Power BI , PAFnow, Power Automate, Python


Gen AI chatbot for Equipment & Supplier Planners


Tasks:

Gen AI Projects

  • Developed a Retrieval-Augmented Generation (RAG) chatbot using Azure AI Search to support equipment and supplier planners with instant access to critical guidelines and documentation (362+ files in PDF, DOCX, PPT, Excel). Integrated Cosmos DB to capture user feedback for continuous improvement.
  • Deployed to 200+ users via Azure App Service, significantly reducing time spent searching documents and enhancing decision-making efficiency through on-demand Q&A support.


Skills:

Azure AI Search, Cosmos DB, Azure App Service


SQL Agent for Tabular Data


Tasks:

Gen AI

  • Built a Gen AI agent using the Langchain framework that queries structured data across Azure SQL via ODBC.
  • Created an interactive Streamlit frontend for real-time querying and deployed on Azure App Service.


Skills:

Azure SQL, Streamlit, LangChain

Aus- und Weiterbildung

Aus- und Weiterbildung

4 Jahre 10 Monate
2016-06 - 2021-03

Software Engineering

Master of Technology (CGPA 8.62/10.00), Vellore Institute of Technology- Vellore, India
Master of Technology (CGPA 8.62/10.00)
Vellore Institute of Technology- Vellore, India
Integrated course

Kompetenzen

Kompetenzen

Top-Skills

Python ETL Azure Data Modeling Data Engineer RAG Big Data Apache Spark pandas numpy

Produkte / Standards / Erfahrungen / Methoden

Skills

Data Warehousing:

  • Snowflake


Cloud Services:

  • Azure Databricks
  • Azure datalake storage Gen2
  • Cosmos DB
  • Azure SQL DB
  • Azure AI Services
  • AWS S3
  • AWS Redshij
  • AWS Lambda


Process Mining Tools:

  • SAP Signavio
  • Microsoft Power Automate
  • PAFnow (Power BI Plugin)
  • Celonis


Data Visualization:

  • Power BI
  • Microsoft Fabric


Workflow & Orchestration:

  • Apache Airflow


Version Control:

  • Git
  • GitHub


Work Experience

08/2021 - 07/2025

Role: Data Engineer

Customer: Mercedes-Benz Research and Development, India - Bangalore, India


Tasks:

  • Developed and optimized scalable ETL pipelines in Azure Databricks and PySpark, transforming manufacturing vehicle data into high-quality, analysis-ready datasets stored in Azure Data Lake Storage (ADLS) Gen2
  • Integrated data from ERP,sensor-based production system into unified Data Lakehouse architecture using Medallion or Data Vault2.0, supporting real-time operational dashboards and planning tools.
  • Implemented API integration with RESTful endpoints, including authentication mechanisms to retrieve and process data, ensuring secure and reliable data ingestion, and optimized PySpark jobs using persistence, cluster tuning, fault tolerance, and caching.
  • Designed dimensional data models (star and snowflake schema) and optimized SQL queries for BI and analytics.
  • Built Power BI dashboards supporting COO-level decision-making and operational efficiency monitoring.
  • Created process mining pipelines using SAP Signavio to identify production bottlenecks and optimize factory processes
  • Developed Gen AI solutions by deploying a RAG solution chatbot and agents for tabular data.
  • Implemented data quality checks, performance tuning, and CI/CD pipelines, version control(GitHub) for automated data workflows.
  • Supported data architecture standardization and data governance initiatives across manufacturing department.
  • Collaborated cross-functionally with production planning, supply chain, drivetrain, and quality management teams, Process Experts and Product Owners,Data scientist,BI developer to ensure alignment on data requirements to design robust data models and transformation logic for BI and AI use cases.

Programmiersprachen

Python
Pandas, PySpark, NumPy, Matplotlib, SciPy
Spark SQL
Hugging Face transformers
AutoGen
LangChain


Datenbanken

SQL
MongoDB
NoSQL


Einsatzorte

Einsatzorte

Deutschland
möglich

Projekte

Projekte

Fault Tolerance Analysis - Assembly of Hang-on Parts


Tasks:

Data Engineering

  • Designed and implemented ETL pipelines using Azure Databricks and Pyspark to analyze assembly gaps and flush values for each hang-on parts ,integrating threshold checks for automated quality monitoring.
  • Reduced downtime by 75% and saved 8 manual engineer hours daily by replacing manual inspection through automated issue detection and reporting dashboard developed in collaboration with BI team using Power BI.


Skills

Azure Databricks, Pandas, Power BI


Report


Role: Chief Operation Officer(COO)


Tasks:

Data Engineering

  • Built a complete data pipeline integrating five heterogeneous sources to support COO-level reporting on cost drivers, production stage efficiency, and operational risk. Used Medallion Architecture in Azure Data Lake to structure raw data for real-time dashboards in Power BI.
  • Reduced processing time by 90%, replaced reports with a unified analytics view, and improved strategic decision-making through reliable, timely data.


Skills:

Azure Databricks, Azure Data Lake , Pyspark, Power BI


Comprehensive Dashboard for Material Procurement Department


Tasks:

Data Engineering

  • Orchestrated an end-to-end ETL pipeline using PySpark in Databricks to consolidate procurement data from 5+ departments.
  • Applied Snowflake data modeling to structure and unify data for real-time dashboarding and supplier evaluation. Reduced processing time, enhanced visibility into procurement performance, and enabled faster, more accurate supplier and licensing decisions.


Skills:

Azure Databricks, Pyspark


MAX Proplan Data ? Global Planner Dashboard


Tasks:

Data Engineering

  • Translated business requirements to SQL queries to create reports with Python and Streamlit, integrating 3 of 5 global planning reports.
  • Improved issue resolution and reduced redundancy across more than five plants.


Skills:

SQL, Python, Streamlit


Rework Part Ordering Analysis


Tasks:

Process Mining & Data Engineering

  • Designed and built a data pipeline using data engineering best practices to transform raw production data into structured event logs with PySpark.
  • Connected the logs to SAP Signavio via Azure SQL Database and developed KPIs linked to BPMN diagrams for dynamic process mining.
  • Reduced rework order cycle times by identifying process bottlenecks and enable greater transparency in critical workflows.


Skills:

Azure Databricks, Pyspark, Signavio


Ident Tracking in Production Line


Tasks:

Process Mining & Data Engineering

  • Built and optimized a data pipeline to track production line throughput and compliance, initially deployed at one plant and scaled to others with improved code efficiency.
  • Migrated legacy dashboards from PAFnow/Power BI to Power Automate (formerly Minit) for modern process mining.
  • Increased transparency across production workflows and standardized tracking systems across global teams, resulting in improved process consistency and reduced manual oversight.


Skills:

Power BI , PAFnow, Power Automate, Python


Gen AI chatbot for Equipment & Supplier Planners


Tasks:

Gen AI Projects

  • Developed a Retrieval-Augmented Generation (RAG) chatbot using Azure AI Search to support equipment and supplier planners with instant access to critical guidelines and documentation (362+ files in PDF, DOCX, PPT, Excel). Integrated Cosmos DB to capture user feedback for continuous improvement.
  • Deployed to 200+ users via Azure App Service, significantly reducing time spent searching documents and enhancing decision-making efficiency through on-demand Q&A support.


Skills:

Azure AI Search, Cosmos DB, Azure App Service


SQL Agent for Tabular Data


Tasks:

Gen AI

  • Built a Gen AI agent using the Langchain framework that queries structured data across Azure SQL via ODBC.
  • Created an interactive Streamlit frontend for real-time querying and deployed on Azure App Service.


Skills:

Azure SQL, Streamlit, LangChain

Aus- und Weiterbildung

Aus- und Weiterbildung

4 Jahre 10 Monate
2016-06 - 2021-03

Software Engineering

Master of Technology (CGPA 8.62/10.00), Vellore Institute of Technology- Vellore, India
Master of Technology (CGPA 8.62/10.00)
Vellore Institute of Technology- Vellore, India
Integrated course

Kompetenzen

Kompetenzen

Top-Skills

Python ETL Azure Data Modeling Data Engineer RAG Big Data Apache Spark pandas numpy

Produkte / Standards / Erfahrungen / Methoden

Skills

Data Warehousing:

  • Snowflake


Cloud Services:

  • Azure Databricks
  • Azure datalake storage Gen2
  • Cosmos DB
  • Azure SQL DB
  • Azure AI Services
  • AWS S3
  • AWS Redshij
  • AWS Lambda


Process Mining Tools:

  • SAP Signavio
  • Microsoft Power Automate
  • PAFnow (Power BI Plugin)
  • Celonis


Data Visualization:

  • Power BI
  • Microsoft Fabric


Workflow & Orchestration:

  • Apache Airflow


Version Control:

  • Git
  • GitHub


Work Experience

08/2021 - 07/2025

Role: Data Engineer

Customer: Mercedes-Benz Research and Development, India - Bangalore, India


Tasks:

  • Developed and optimized scalable ETL pipelines in Azure Databricks and PySpark, transforming manufacturing vehicle data into high-quality, analysis-ready datasets stored in Azure Data Lake Storage (ADLS) Gen2
  • Integrated data from ERP,sensor-based production system into unified Data Lakehouse architecture using Medallion or Data Vault2.0, supporting real-time operational dashboards and planning tools.
  • Implemented API integration with RESTful endpoints, including authentication mechanisms to retrieve and process data, ensuring secure and reliable data ingestion, and optimized PySpark jobs using persistence, cluster tuning, fault tolerance, and caching.
  • Designed dimensional data models (star and snowflake schema) and optimized SQL queries for BI and analytics.
  • Built Power BI dashboards supporting COO-level decision-making and operational efficiency monitoring.
  • Created process mining pipelines using SAP Signavio to identify production bottlenecks and optimize factory processes
  • Developed Gen AI solutions by deploying a RAG solution chatbot and agents for tabular data.
  • Implemented data quality checks, performance tuning, and CI/CD pipelines, version control(GitHub) for automated data workflows.
  • Supported data architecture standardization and data governance initiatives across manufacturing department.
  • Collaborated cross-functionally with production planning, supply chain, drivetrain, and quality management teams, Process Experts and Product Owners,Data scientist,BI developer to ensure alignment on data requirements to design robust data models and transformation logic for BI and AI use cases.

Programmiersprachen

Python
Pandas, PySpark, NumPy, Matplotlib, SciPy
Spark SQL
Hugging Face transformers
AutoGen
LangChain


Datenbanken

SQL
MongoDB
NoSQL


Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.