Data Engineer & Machine Learning Consultant; Python, CI/CD & DevOps Experte, Python Data Science, Team & Project Lead, Pandas, Docker, Apache Airflow
Aktualisiert am 10.04.2024
Profil
Mitarbeiter eines Dienstleisters
Remote-Arbeit
Verfügbar ab: 10.04.2024
Verfügbar zu: 70%
davon vor Ort: 80%
Skill-Profil eines fest angestellten Mitarbeiters des Dienstleisters
Deutsch
Muttersprache
Englisch
Verhandlungssicher

Einsatzorte

Einsatzorte

Deutschland
möglich

Projekte

Projekte

1 Jahr 7 Monate
2022-12 - heute

Data Warehouse infrastructure and data pipeline development

Data Warehouse Architect Python Pandas Azure ...
Data Warehouse Architect
Project:
Data Warehouse infrastructure and data pipeline development
Role: Lead Architect, Lead Engineer
Team size: 4
Project language: German, Englisch


Project Skills/Methodology:

? Design of goal Data Warehouse architecture

? Comparison and selection of DWH technologies and frameworks

? Agile refactoring of existing Data Warehouse components into a sustainable, scalable warehouse for company-wide data-driven reporting

? Maintenance and support for custom legacy Python data pipeline framework


Technology:

? Microsoft SQL Server

? Python/Pandas for legacy data processing framework

? Introduction of Apache Airflow as the next generation workflow orchestration platform

? Azure DevOps and Azure Pipelines for code version control and CI/CD


MS SQL Server Apache Airflow
Python Pandas Azure Azure DevOps Azure Pipelines CI/CD
1 Jahr 8 Monate
2022-11 - heute

Data Pipeline and Cloud Data Warehouse infrastructure development

Lead Data Engineer Azure Databricks Apache Airflow Python ...
Lead Data Engineer
Project:
Data Pipeline and Cloud Data Warehouse infrastructure development
Role: Lead Data Engineer
Team size: 4
Project language: German, English


Project Skills/Methodology:

? Data pipeline development lead for another 3 data engineers

? Planning, implementing and maintaining of end-to-end data pipelines based on customer specification

? Development of shared Python data engineering utility module

? Software and data pipeline quality control in an agile development process

? Development of new infrastructure components to create a Data Mesh self-service architecture 

  • Requirements analysis and system architecture design


Technology:

? Azure Databricks with Unity Data Catalog

? PostreSQL database cluster

? Apache Airflow workflow orchestration platform

? Python/PySpark and SQL-based ETL/ELT data engineering stack

? Azure DevOps and Azure Pipelines for code version control and CI/CD

? Different data source types like Azure Blob Storage, REST API, Kafka topics

? Azure DevOps for source code management (Git) and CI/CD pipelines


Azure PostreSQL database cluster Azure Blob Storage RestAPI
Azure Databricks Apache Airflow Python PySpark ETL ELT Azure DevOps Azure Data Pipelines Kafka Git CI/CD Pipelines
Remote
2 Jahre 3 Monate
2022-04 - heute

Apache Airflow technical support & Data integration development services

Lead Consultant Python Flask Pandas ...
Lead Consultant
Project:
Apache Airflow technical support & Data integration development services
Role: Lead consultant, Project Manager
Team size: 4
Project language: German, English


Project Skills/Methodology:

? Technical and architectural analysis and continuous improvement of an existing private cloud data engineering platform

? Development and integration of new system components for Apache Airflow

? Technical support for data pipeline orchestration using CI/CD infrastructure


Technology:

? Apache Airflow

? Python: Flask, Pandas, SQLAlchemy ORM

? Data pipelines collecting data from various source systems into SQL based data warehouse

? CI/CD pipelines with Gitlab

? Docker, Docker-Compose for service orchestration

? GitLab for source code management (Git) and CI/CD pipelines




Apache Airflow CI/CD Gitlab Private Cloud Data Engineering Platform
Python Flask Pandas SQLAlchemy SQL SQL-Datenbanken Docker
2 Monate
2024-01 - 2024-02

Architecture review of existing cloud Business Intelligence platform (Azure Synapse Analytics & Microsoft PowerBI) and presentation of potential roadmap towards more efficient alternatives (Azure Databricks)

Data Platform Architect Azure Databricks Architekt Data Platform
Data Platform Architect
Project:
Architecture review of existing cloud Business Intelligence platform (Azure Synapse Analytics & Microsoft PowerBI) and presentation of potential roadmap towards more efficient alternatives (Azure Databricks)
Role: Data Platform Architect
Teamsize: 3
Project Language: German/English


Project Skills/Methodology:

? In-depth review of existing system architecture based on available technical documentation and a series of stakeholder interviews

? High-level requirements analysis for client Business Intelligence platform needs

? A solution strategy for weaknesses and missings of the current architecture has been presented to the client, including

? specific technological implementation patterns for ad-hoc improvements, and

? a plan for mid-term gradual shift from Azure Synapse Analytics towards Azure Databricks for more efficiency



Azure Synapse Analytics Microsoft Power BI Power BI Azure Databricks
Azure Databricks Architekt Data Platform
Remote
1 Jahr
2022-12 - 2023-11

Sales Prediction with AzureML for re-supply planning

Senior Data Engineer Jupyter Python Pandas ...
Senior Data Engineer
Project (1):
Implementation of a proof-of-concept delta-loading data pipeline using Azure Data Factory SAP CDC connector
Role: Senior Data Engineer, Project Lead
Teamsize: 3

Project Language: German


Project Skills/Methodology:

? Exploration of efficient delta loading mechanisms to transfer live sales data from SAP S4/HANA to an Azure SQL Data Warehouse

? Optimization of existing data pipelines

? Optimization of database transactions for ELT operations (extract, load, transform)

? Implementation and documentation of proof-of-concept data pipeline


Technology:

? Azure Data Factory as the data integration service

? Azure SQL Server as the cloud data warehouse

? SAP S4/HANA as data source; using SAP Operational Data Processing (ODP) framework for delta provisioning

? Azure Data Factory "SAP CDC" connector modules (change data capture) for data extraction

? TSQL statements to create and optimize database indexes


Project (2):

Sales Prediction with AzureML for re-supply planning


Role: Machine Learning Engineer, Project Lead
Teamsize: 3

Project Language: German, English,


Project Skills/Methodology:

? Implementation and evaluation of a minimum-viable-product system for future sales prediction to automate existing planning and re-supply processes

? Analysis of existing manual planning methodology and development of an automated approach

? Creation of evaluation metrics for meaningful prediction quality

? Visualization of results and quality criteria for inspection and monitoring


Technology:
? Azure Machine Learning Studio / AzureML as the processing infrastructure
? Jupyter Notebooks in AzureML as primary programming interface
? Azure Blob Storage and AzureML Tables for versioned persistent data storage
? Azure DevOps Git repository for code versioning
? Python as primary programming language with AzureML SDKv2
? Python Pandas, Numpy and Scikit-Learn as the main processing libraries
? Various time series prediction and regression models from the Scikit-Learn and Facebook Prophet libraries

Azure Machine Learning Studio Azure
Jupyter Python Pandas Numpy Scikit-Learn Facebook Prophet Libraries
Remote
2 Monate
2022-08 - 2022-09

Implementation of a scalable data analysis framework

Lead Architect Pandas SQLAlchemy PyTest ...
Lead Architect
Project:
Implementation of a scalable data analysis framework


Role: Lead architect

Team size: 3
Project language: German


Project Skills/Methodology:

  • Requirements analysis and system architecture design  
  •  Creation of a DevOps process
  • Coaching on customer staff


Technology:

? Apache Airflow as a horizontally scalable compute framework

? Pandas, SQLAlchemy, PyTest as Python libraries for analysis task implementation

? GitLab CI/CD as a process automation framework

? vmWare vCenter as server virtualization framework

? Docker, Docker-Compose for service orchestration


Apache Airflow Gitlab
Pandas SQLAlchemy PyTest CI/CD Docker
Remote
2 Jahre 3 Monate
2019-10 - 2021-12

Developing a data management system for biomedical research project

Lead Developer, Project Maintainer JavaScript/jQuery HTML CSS ...
Lead Developer, Project Maintainer

Project Skills/Methodology:

  • Development and integration of new system components in an agile interdisciplinary team

  • Maintenance, migration, and consolidation of legacy system components

  • Open source release of the project code

Technology:

  • PHP, Drupal framework as main programming language

  • JavaScript/jQuery, HTML, CSS as supporting languages/tools

  • MariaDB database, RESTful Object Storage service

  • CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation

  • CI/CD pipelines with Gitlab

  • LDAP, OAuth2/OIDC for integrated identity management

Team size: 6

Project language: German, English

MariaDB RESTful Object Storage Service
JavaScript/jQuery HTML CSS Ubuntu
University Medical Center
Göttingen
2 Jahre 9 Monate
2017-01 - 2019-09

Establishing a research data warehouse for healthcare information

Team lead, system architect Python Pandas Angular ...
Team lead, system architect

Project Skills/Methodology:

  • Designing and implementing a data warehouse ecosystem to meet contemporary and future requirements of data-driven medical research

  • Developing a highly flexible framework for automated, scalable, and reproducible data processing tasks (ETL)

  • Co-developing a secure data sharing interface for cross-site collaboration

  • Growing a new team of software and data engineers

  • Defining and establishing an agile team collaboration model suitable for heterogenous DevOps reality

  • Internal and external project management

Technology:

  • Python as main programming language

    • FastAPI framework for REST webservice implementation

    • Pandas as primary data science library

  • Angular, HTML, CSS as supporting languages/tools

  • PostgreSQL, MariaDB, CouchDB, Object Storage services

  • CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation

  • Task automation with ActiveWorkflow, Apache Airflow, Celery, Gitlab CI/CD

  • LDAP, OAuth2/OIDC, Keycloak for integrated identity management

Team size: 8

Project language: German, English

Apache Airflow Git Docker Celery
Python Pandas Angular HTML CSS PostgreSQL MariaDB CouchDB CentOS Ubuntu
University Medical Center
Göttingen

Aus- und Weiterbildung

Aus- und Weiterbildung

Position

Position

Data Platform Architect

Lead Data Engineer

Lead Architect

Data Engineering Consultant,

CI/CD & DevOps Expert,

Agile Consultant

Branchen

Branchen

Healthcare

Einsatzorte

Einsatzorte

Deutschland
möglich

Projekte

Projekte

1 Jahr 7 Monate
2022-12 - heute

Data Warehouse infrastructure and data pipeline development

Data Warehouse Architect Python Pandas Azure ...
Data Warehouse Architect
Project:
Data Warehouse infrastructure and data pipeline development
Role: Lead Architect, Lead Engineer
Team size: 4
Project language: German, Englisch


Project Skills/Methodology:

? Design of goal Data Warehouse architecture

? Comparison and selection of DWH technologies and frameworks

? Agile refactoring of existing Data Warehouse components into a sustainable, scalable warehouse for company-wide data-driven reporting

? Maintenance and support for custom legacy Python data pipeline framework


Technology:

? Microsoft SQL Server

? Python/Pandas for legacy data processing framework

? Introduction of Apache Airflow as the next generation workflow orchestration platform

? Azure DevOps and Azure Pipelines for code version control and CI/CD


MS SQL Server Apache Airflow
Python Pandas Azure Azure DevOps Azure Pipelines CI/CD
1 Jahr 8 Monate
2022-11 - heute

Data Pipeline and Cloud Data Warehouse infrastructure development

Lead Data Engineer Azure Databricks Apache Airflow Python ...
Lead Data Engineer
Project:
Data Pipeline and Cloud Data Warehouse infrastructure development
Role: Lead Data Engineer
Team size: 4
Project language: German, English


Project Skills/Methodology:

? Data pipeline development lead for another 3 data engineers

? Planning, implementing and maintaining of end-to-end data pipelines based on customer specification

? Development of shared Python data engineering utility module

? Software and data pipeline quality control in an agile development process

? Development of new infrastructure components to create a Data Mesh self-service architecture 

  • Requirements analysis and system architecture design


Technology:

? Azure Databricks with Unity Data Catalog

? PostreSQL database cluster

? Apache Airflow workflow orchestration platform

? Python/PySpark and SQL-based ETL/ELT data engineering stack

? Azure DevOps and Azure Pipelines for code version control and CI/CD

? Different data source types like Azure Blob Storage, REST API, Kafka topics

? Azure DevOps for source code management (Git) and CI/CD pipelines


Azure PostreSQL database cluster Azure Blob Storage RestAPI
Azure Databricks Apache Airflow Python PySpark ETL ELT Azure DevOps Azure Data Pipelines Kafka Git CI/CD Pipelines
Remote
2 Jahre 3 Monate
2022-04 - heute

Apache Airflow technical support & Data integration development services

Lead Consultant Python Flask Pandas ...
Lead Consultant
Project:
Apache Airflow technical support & Data integration development services
Role: Lead consultant, Project Manager
Team size: 4
Project language: German, English


Project Skills/Methodology:

? Technical and architectural analysis and continuous improvement of an existing private cloud data engineering platform

? Development and integration of new system components for Apache Airflow

? Technical support for data pipeline orchestration using CI/CD infrastructure


Technology:

? Apache Airflow

? Python: Flask, Pandas, SQLAlchemy ORM

? Data pipelines collecting data from various source systems into SQL based data warehouse

? CI/CD pipelines with Gitlab

? Docker, Docker-Compose for service orchestration

? GitLab for source code management (Git) and CI/CD pipelines




Apache Airflow CI/CD Gitlab Private Cloud Data Engineering Platform
Python Flask Pandas SQLAlchemy SQL SQL-Datenbanken Docker
2 Monate
2024-01 - 2024-02

Architecture review of existing cloud Business Intelligence platform (Azure Synapse Analytics & Microsoft PowerBI) and presentation of potential roadmap towards more efficient alternatives (Azure Databricks)

Data Platform Architect Azure Databricks Architekt Data Platform
Data Platform Architect
Project:
Architecture review of existing cloud Business Intelligence platform (Azure Synapse Analytics & Microsoft PowerBI) and presentation of potential roadmap towards more efficient alternatives (Azure Databricks)
Role: Data Platform Architect
Teamsize: 3
Project Language: German/English


Project Skills/Methodology:

? In-depth review of existing system architecture based on available technical documentation and a series of stakeholder interviews

? High-level requirements analysis for client Business Intelligence platform needs

? A solution strategy for weaknesses and missings of the current architecture has been presented to the client, including

? specific technological implementation patterns for ad-hoc improvements, and

? a plan for mid-term gradual shift from Azure Synapse Analytics towards Azure Databricks for more efficiency



Azure Synapse Analytics Microsoft Power BI Power BI Azure Databricks
Azure Databricks Architekt Data Platform
Remote
1 Jahr
2022-12 - 2023-11

Sales Prediction with AzureML for re-supply planning

Senior Data Engineer Jupyter Python Pandas ...
Senior Data Engineer
Project (1):
Implementation of a proof-of-concept delta-loading data pipeline using Azure Data Factory SAP CDC connector
Role: Senior Data Engineer, Project Lead
Teamsize: 3

Project Language: German


Project Skills/Methodology:

? Exploration of efficient delta loading mechanisms to transfer live sales data from SAP S4/HANA to an Azure SQL Data Warehouse

? Optimization of existing data pipelines

? Optimization of database transactions for ELT operations (extract, load, transform)

? Implementation and documentation of proof-of-concept data pipeline


Technology:

? Azure Data Factory as the data integration service

? Azure SQL Server as the cloud data warehouse

? SAP S4/HANA as data source; using SAP Operational Data Processing (ODP) framework for delta provisioning

? Azure Data Factory "SAP CDC" connector modules (change data capture) for data extraction

? TSQL statements to create and optimize database indexes


Project (2):

Sales Prediction with AzureML for re-supply planning


Role: Machine Learning Engineer, Project Lead
Teamsize: 3

Project Language: German, English,


Project Skills/Methodology:

? Implementation and evaluation of a minimum-viable-product system for future sales prediction to automate existing planning and re-supply processes

? Analysis of existing manual planning methodology and development of an automated approach

? Creation of evaluation metrics for meaningful prediction quality

? Visualization of results and quality criteria for inspection and monitoring


Technology:
? Azure Machine Learning Studio / AzureML as the processing infrastructure
? Jupyter Notebooks in AzureML as primary programming interface
? Azure Blob Storage and AzureML Tables for versioned persistent data storage
? Azure DevOps Git repository for code versioning
? Python as primary programming language with AzureML SDKv2
? Python Pandas, Numpy and Scikit-Learn as the main processing libraries
? Various time series prediction and regression models from the Scikit-Learn and Facebook Prophet libraries

Azure Machine Learning Studio Azure
Jupyter Python Pandas Numpy Scikit-Learn Facebook Prophet Libraries
Remote
2 Monate
2022-08 - 2022-09

Implementation of a scalable data analysis framework

Lead Architect Pandas SQLAlchemy PyTest ...
Lead Architect
Project:
Implementation of a scalable data analysis framework


Role: Lead architect

Team size: 3
Project language: German


Project Skills/Methodology:

  • Requirements analysis and system architecture design  
  •  Creation of a DevOps process
  • Coaching on customer staff


Technology:

? Apache Airflow as a horizontally scalable compute framework

? Pandas, SQLAlchemy, PyTest as Python libraries for analysis task implementation

? GitLab CI/CD as a process automation framework

? vmWare vCenter as server virtualization framework

? Docker, Docker-Compose for service orchestration


Apache Airflow Gitlab
Pandas SQLAlchemy PyTest CI/CD Docker
Remote
2 Jahre 3 Monate
2019-10 - 2021-12

Developing a data management system for biomedical research project

Lead Developer, Project Maintainer JavaScript/jQuery HTML CSS ...
Lead Developer, Project Maintainer

Project Skills/Methodology:

  • Development and integration of new system components in an agile interdisciplinary team

  • Maintenance, migration, and consolidation of legacy system components

  • Open source release of the project code

Technology:

  • PHP, Drupal framework as main programming language

  • JavaScript/jQuery, HTML, CSS as supporting languages/tools

  • MariaDB database, RESTful Object Storage service

  • CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation

  • CI/CD pipelines with Gitlab

  • LDAP, OAuth2/OIDC for integrated identity management

Team size: 6

Project language: German, English

MariaDB RESTful Object Storage Service
JavaScript/jQuery HTML CSS Ubuntu
University Medical Center
Göttingen
2 Jahre 9 Monate
2017-01 - 2019-09

Establishing a research data warehouse for healthcare information

Team lead, system architect Python Pandas Angular ...
Team lead, system architect

Project Skills/Methodology:

  • Designing and implementing a data warehouse ecosystem to meet contemporary and future requirements of data-driven medical research

  • Developing a highly flexible framework for automated, scalable, and reproducible data processing tasks (ETL)

  • Co-developing a secure data sharing interface for cross-site collaboration

  • Growing a new team of software and data engineers

  • Defining and establishing an agile team collaboration model suitable for heterogenous DevOps reality

  • Internal and external project management

Technology:

  • Python as main programming language

    • FastAPI framework for REST webservice implementation

    • Pandas as primary data science library

  • Angular, HTML, CSS as supporting languages/tools

  • PostgreSQL, MariaDB, CouchDB, Object Storage services

  • CentOS 7 and Ubuntu servers, Docker, Docker-Compose for system operation

  • Task automation with ActiveWorkflow, Apache Airflow, Celery, Gitlab CI/CD

  • LDAP, OAuth2/OIDC, Keycloak for integrated identity management

Team size: 8

Project language: German, English

Apache Airflow Git Docker Celery
Python Pandas Angular HTML CSS PostgreSQL MariaDB CouchDB CentOS Ubuntu
University Medical Center
Göttingen

Aus- und Weiterbildung

Aus- und Weiterbildung

Position

Position

Data Platform Architect

Lead Data Engineer

Lead Architect

Data Engineering Consultant,

CI/CD & DevOps Expert,

Agile Consultant

Branchen

Branchen

Healthcare

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.