Freelancer: Senior Data Engineer mit Cloud-Erfahrung und Hintergrund als Software-Entwickler. Besonderer Schwerpunkt Apache Toolstack.

Freiberufler / Selbstst�ndiger

Remote-Arbeit

Verf�gbar ab: 01.07.2024

Verf�gbar zu: 100%

davon vor Ort: 5%

Top-Skills

airflow

Python

kubernetes

Flink

docker

Terraform

AWS

GCP

flink

TDD

Apache Spark

Java

SQL

Spring

Spring MVC

Machine Learning

Datawarehouse

Streaming

Redis

Sprachen

German

English

French

Einsatzorte

L�nder

Deutschland, Schweiz, �sterreich

Remote-Arbeit

m�glich

Projekte

7 Monate

2022-06 - 2022-12

Cluster Migration of Internal Data Warehouse

Senior Data Engineer

Rolle

Senior Data Engineer

Projektinhalte

As data volumes continue to grow for eCommerce companies and the number of data consumers within the�organization is increasing, sometimes old infrastructure will not be able to keep up with the challenges.�Additionally, In this particular case, the computing and warehousing cluster has to be on-premise for data�security reasons. After new cluster-infrastructure had been provided by an external provider, all data warehouse�and computing logic has to be migrated from the old infrastructure to the new infrastructure. Additional�challenges are maintaining backwards compatibility of the migrated processes at all times and adhering to strict�security standards.

Migration and Deployment of 30+ airflow DAGs with 20 ? 50 Tasks each on new infrastructure
Co-development of a python client library for Apache Livy that is used by 100+ airflow tasks
Deployment of 20+ Apache Hive databases with 10 ? 50 tables each in three Data Warehouse layers via Ansible
Code review of 5-10 merge requests per week

Produkte

Apache airflow Python Apache Hive Apache Spark PySpark Apache Livy Apache Hadoop Ansible Gitlab CI Jenkins Apache Knox Scrum Jira Confluece

2 Jahre 5 Monate

2020-02 - 2022-06

Platform for Real Time Fraud Detection

Lead Developer, Architect

Rolle

Lead Developer, Architect

Projektinhalte

In order to prevent financial and reputational loss in eCommerce platforms an automated security system is�needed that can detect fraud patterns in online shop. The software, written in Java with Apache Flink, should be�able to scale out over multiple shop systems and data sources. Further requirements are monitoring traffic in real�time and incorporating expert knowledge alongside machine learning and Artificial Intelligence (A.I.) models.�The software is deployed and operated on the customers cloud environtment by using modern Continuous�Integration (CI) and DevOps principles.

Lead design of the platform, Technical Lead for a team of 5 Developers
Implementation of a proof of concept with Java from which 80% of code made the first product iteration
Prototyping of two end-to-end MLOps workflows with MLflow and AWS Sagemaker
Successful deployment and zero downtime operations on customer premises at around 15 million events�per day
Design of cloud based testing environment that can be brought up in less than 15 minutes (Infrastructure�as Code) and handle up to 10 times of the production workload

Produkte

Java JUnit Apache maven Apache Flink Apache Kafka Redis Terraform AWS EKS AWS Cloudformation kubernetes helm docker Datadog Gitlab CI MLFlow AWS Sagemaker scikit-learn AWS S3 AWS RDS Trello

Kunde

IT Consultancy, Internal Product Development

4 Jahre 10 Monate

2017-03 - 2021-12

ETL-Pipeline Architecture with Apache Airflow und kubernetes

Data Engineer

Rolle

Data Engineer

Projektinhalte

A datadriven company needs to have a reliable and scalable infrastructure as a key components of the corporate�decision making. Engineers as well as analysts need to be enabled to create ETL-processes, Artificial�Intelligence (A.I.) jobs and ad-hoc reports without the need to consult with a data engineer. The data architecture�of the company needs to provide scalability, clear separation between testing and production and ease of use.�Modern DevOps practices like Continuous Integration (CI) and Infrastructure as Code need to be employed�across the whole infrastructure.

Leading conception of cloud based infrastructure based on the above requirements
Initial training of 5 developers an onboarding of more than 10 developers since
Initial setup and operation of apache airflow with intially ca. 10 jobs, scaling up to more than 100�regular scheduled jobs at present

Produkte

Apache airflow kubernetes docker AWS EKS AWS EC2 AWS IAM AWS S3 AWS EMR AWS RDS Gitlab CI Scrum Jira Confluence

Kunde

Multichannel Retailer, Furniture

4 Jahre 11 Monate

2017-02 - 2021-12

A/B-Testing Plattform

Data Scientist, Lead Developer

Rolle

Data Scientist, Lead Developer

Projektinhalte

In order to enable an eCommerce organization to become a datadriven organization there must be (among other things) a framework present to compare different version of the website against each other. Many members of the organization and departments need to be able to create and conduct experiments without the assistance of a data engineer. Anther important factor for the framework was the usage Bayesian statistics.

Leading Conception of testing framework including randomization logic, statistical modelling and grapical presentation in the frontend
Implementation of proof of concept for statistical engine
Implementation of production code for frontend, backend, statistical engine
Training of stakeholders from 3 different departments in methodology and statistical background of A/B-Testing

Produkte

python PyMC3 Python SciPy Apache Spark Python pySpark Apache airflow docker Jenkins kubernetes VueJS Redshift Scrum Jira Confluence

Kunde

Multichannel Retailer, Furniture

3 Jahre 1 Monat

2018-06 - 2021-06

Webtracking Event Pipeline with snowplow in AWS

Senior Data Engineer

Rolle

Senior Data Engineer

Projektinhalte

For an eCommerce Platform it is crucial to have a detailed picture of customer behaviour on which business�decisions can be based. Either in real-time or from the data warehouse. For that a flexible, scalable, and fieldtestet�solution is necessary which can run in the cloud. Additionally, all browser events need a custom�enrichment with business information from the backend in order to provide necessary context e.g. for ?Add to�Cart?-events. The webtracking pipeline is managed by using modern DevOps principles: Continuous Integration�(CI), zero downtime deployments and Infrastructure as Code.

Integration of snowplow event-pipeline in cloud based shop architecture
Day to day operations of event-pipeline at ca. 4 million events per day
Co-Engineering of custom enrichment in the webshop backend (ca. 1000+ lines of code) and handover�of ownership to the backend team
Setup of custom real time event monitoring (< 1s latency) with elasticsearch and kibana
Setup of custom scheduling and deployment processes for 5 components of the snowplow event-pipeline

Produkte

snowplow kubernetes AWS EMR AWS EKS AWS EC2 AWS kinesis AWS redshift Apache airflow kibana elasticsearch NodeJS Gitlab CI AWS RDS Scala Scrum Jira Cofluence

Kunde

Multichannel Retailer, Furniture

1 Jahr 3 Monate

2018-02 - 2019-04

Product Recommendation Engines: Collaborative Filtering and Item Similarity with Neural Nets

Data Scientist, Data Engineer

Rolle

Data Scientist, Data Engineer

Projektinhalte

To enrich the shopping experience of the customer and to drive additional sales, the eCommerce platform should be able to recommend customers additional products. Two orthogonal strategies are employed: Product based similiarity based on neural network embeddings and collaborative filtering based on user behaviour. Additionally, Performance monitoring for the recommendations is needed.

Productionize both models based on proof of concepts by ML engineer including data aquisition, running of the model and data output
Scheduling and operations of productionized models, including 3 different code bases and more than 5 regularly scheduled jobs
Operationalization of 10+ performance metrics over 5 dashboards for stakeholders

Produkte

Tensorflow keras scikit-learn Python pandas AWS EMR Java ant Spring hybris Apache Mahout AWS Redshift Apache airflow apache superset Scrum Confluence Jira

Kunde

Multichannel Retailer, Furniture

Aus- und Weiterbildung

2003 - 2008

Christian-Albrechts-Universit�t zu Kiel, Germany

Degree: Magister / Master of Arts

Focus:

Major: Philosophy
Minors: Musicology, Computer Science

2002

Gymnasium Winsen/Luhe, Germany

Abitur

Online Courses

2019

Coursera

DeepLearning.AI Deep Learning Specialization
DeepLearning.AI TensorFlow Developer
Probabilistic Graphical Models: Representation

2014

Coursera

Machine Learning

Certificates

2022

Coursera

Machine Learning Engineering for Production (MLOps)

2019

Coursera

DeepLearning.AI TensorFlow Developer
DeepLearning.AI Deep Learning

2013

IT agile - Scrum Master

Position

Data Engineering
ML Ops

Kompetenzen

Top-Skills

airflow Python kubernetes Flink docker Terraform AWS GCP flink TDD Apache Spark Java SQL Spring Spring MVC Machine Learning Datawarehouse Streaming Redis

Produkte / Standards / Erfahrungen / Methoden

Frameworks

Python:

pandas
upyter
numpy
matplotlib
flask
scikit-learn
keras
Tensorflow
Apache airflow
pySpark
pyMC3�

Java:

Spring
JUnit
Mockito
maven
ant
hybris�

JavaScript:

NodeJS
ExpressJS
VueJS
ChartJS

Cloud DevOps

AWS
kubernetes
helm
docker
terraform
Gitlab CI
Jenkins
Apache airflow
Datadog
Hadoop (HDFS)
AWS EKS
AWS EMR
AWS EC2
AWS Cloudformation
AWS Secrets Manager
AWS RDS
AWS S3
GCP

Machine Learning

Tensorflow
PyTorch
keras
scikit-learn
pyMC3
MLflow
AWS Sagemaker
LakeFS
Pinecone

Streaming

Apache Spark
Apache Flink
Apache Kafka
amazon Kinesis
snowplow

Engineering Concept

Object Oriented Programming
Test Driven Development (TDD)
Functional Programming
Domain Driven Design (DDD)
Clean Code

Security

ssh
Snyk
kerberos
Apache Knox
AWS IAM
VPN

Agile Concepts and Tools

Scrum (Certified Scrum Master)
Kanban
Jira
Confluence
Trello
Redmine

Software

IntelliJ Idea
PyCharm
vim
tmux
bash
fish

Platforms

Linux
Mac/OSX

Experience

2020 - today

Role: Team Lead Data Engineering / Data Science�

Customer: Neuland ? B�ro f�r Informatik

2017 - 2020

Role: Data Engineer / Data Scientist�

Customer: Neuland ? B�ro f�r Informatik

2015 - 2017

Role: Back End Developer�

Customer: Neuland ? B�ro f�r Informatik

2012 - 2015�

Role: Project Manager�

Customer: Neuland ? B�ro f�r Informatik

2012 - 2012

Role: Management Assistant to the CTO�

Customer: OXID eSales

2010 - 2012�

Role: Public Relations Consultant�

Customer: rheinfaktor

Programmiersprachen

Python

Java

SQL

JavaScript

bash

Lisp

Haskell

Octave

Datenbanken

MySQL

PostgreSQL

Redis

AWS Redshift

Cassandra

AWS Athena

Apache Hive

Apache solr

elasticsearch

Einsatzorte

L�nder

Deutschland, Schweiz, �sterreich

Remote-Arbeit

m�glich

Projekte

7 Monate

2022-06 - 2022-12

Cluster Migration of Internal Data Warehouse

Senior Data Engineer

Rolle

Senior Data Engineer

Projektinhalte

Migration and Deployment of 30+ airflow DAGs with 20 ? 50 Tasks each on new infrastructure
Co-development of a python client library for Apache Livy that is used by 100+ airflow tasks
Deployment of 20+ Apache Hive databases with 10 ? 50 tables each in three Data Warehouse layers via Ansible
Code review of 5-10 merge requests per week

Produkte

Apache airflow Python Apache Hive Apache Spark PySpark Apache Livy Apache Hadoop Ansible Gitlab CI Jenkins Apache Knox Scrum Jira Confluece

2 Jahre 5 Monate

2020-02 - 2022-06

Platform for Real Time Fraud Detection

Lead Developer, Architect

Rolle

Lead Developer, Architect

Projektinhalte

Lead design of the platform, Technical Lead for a team of 5 Developers
Implementation of a proof of concept with Java from which 80% of code made the first product iteration
Prototyping of two end-to-end MLOps workflows with MLflow and AWS Sagemaker
Successful deployment and zero downtime operations on customer premises at around 15 million events�per day
Design of cloud based testing environment that can be brought up in less than 15 minutes (Infrastructure�as Code) and handle up to 10 times of the production workload

Produkte

Java JUnit Apache maven Apache Flink Apache Kafka Redis Terraform AWS EKS AWS Cloudformation kubernetes helm docker Datadog Gitlab CI MLFlow AWS Sagemaker scikit-learn AWS S3 AWS RDS Trello

Kunde

IT Consultancy, Internal Product Development

4 Jahre 10 Monate

2017-03 - 2021-12

ETL-Pipeline Architecture with Apache Airflow und kubernetes

Data Engineer

Rolle

Data Engineer

Projektinhalte

Leading conception of cloud based infrastructure based on the above requirements
Initial training of 5 developers an onboarding of more than 10 developers since
Initial setup and operation of apache airflow with intially ca. 10 jobs, scaling up to more than 100�regular scheduled jobs at present

Produkte

Apache airflow kubernetes docker AWS EKS AWS EC2 AWS IAM AWS S3 AWS EMR AWS RDS Gitlab CI Scrum Jira Confluence

Kunde

Multichannel Retailer, Furniture

4 Jahre 11 Monate

2017-02 - 2021-12

A/B-Testing Plattform

Data Scientist, Lead Developer

Rolle

Data Scientist, Lead Developer

Projektinhalte

Leading Conception of testing framework including randomization logic, statistical modelling and grapical presentation in the frontend
Implementation of proof of concept for statistical engine
Implementation of production code for frontend, backend, statistical engine
Training of stakeholders from 3 different departments in methodology and statistical background of A/B-Testing

Produkte

python PyMC3 Python SciPy Apache Spark Python pySpark Apache airflow docker Jenkins kubernetes VueJS Redshift Scrum Jira Confluence

Kunde

Multichannel Retailer, Furniture

3 Jahre 1 Monat

2018-06 - 2021-06

Webtracking Event Pipeline with snowplow in AWS

Senior Data Engineer

Rolle

Senior Data Engineer

Projektinhalte

Integration of snowplow event-pipeline in cloud based shop architecture
Day to day operations of event-pipeline at ca. 4 million events per day
Co-Engineering of custom enrichment in the webshop backend (ca. 1000+ lines of code) and handover�of ownership to the backend team
Setup of custom real time event monitoring (< 1s latency) with elasticsearch and kibana
Setup of custom scheduling and deployment processes for 5 components of the snowplow event-pipeline

Produkte

snowplow kubernetes AWS EMR AWS EKS AWS EC2 AWS kinesis AWS redshift Apache airflow kibana elasticsearch NodeJS Gitlab CI AWS RDS Scala Scrum Jira Cofluence

Kunde

Multichannel Retailer, Furniture

1 Jahr 3 Monate

2018-02 - 2019-04

Product Recommendation Engines: Collaborative Filtering and Item Similarity with Neural Nets

Data Scientist, Data Engineer

Rolle

Data Scientist, Data Engineer

Projektinhalte

Productionize both models based on proof of concepts by ML engineer including data aquisition, running of the model and data output
Scheduling and operations of productionized models, including 3 different code bases and more than 5 regularly scheduled jobs
Operationalization of 10+ performance metrics over 5 dashboards for stakeholders

Produkte

Tensorflow keras scikit-learn Python pandas AWS EMR Java ant Spring hybris Apache Mahout AWS Redshift Apache airflow apache superset Scrum Confluence Jira

Kunde

Multichannel Retailer, Furniture

Aus- und Weiterbildung

2003 - 2008

Christian-Albrechts-Universit�t zu Kiel, Germany

Degree: Magister / Master of Arts

Focus:

Major: Philosophy
Minors: Musicology, Computer Science

2002

Gymnasium Winsen/Luhe, Germany

Abitur

Online Courses

2019

Coursera

DeepLearning.AI Deep Learning Specialization
DeepLearning.AI TensorFlow Developer
Probabilistic Graphical Models: Representation

2014

Coursera

Machine Learning

Certificates

2022

Coursera

Machine Learning Engineering for Production (MLOps)

2019

Coursera

DeepLearning.AI TensorFlow Developer
DeepLearning.AI Deep Learning

2013

IT agile - Scrum Master

Position

Data Engineering
ML Ops

Kompetenzen

Top-Skills

airflow Python kubernetes Flink docker Terraform AWS GCP flink TDD Apache Spark Java SQL Spring Spring MVC Machine Learning Datawarehouse Streaming Redis

Produkte / Standards / Erfahrungen / Methoden

Frameworks

Python:

pandas
upyter
numpy
matplotlib
flask
scikit-learn
keras
Tensorflow
Apache airflow
pySpark
pyMC3�

Java:

Spring
JUnit
Mockito
maven
ant
hybris�

JavaScript:

NodeJS
ExpressJS
VueJS
ChartJS

Cloud DevOps

AWS
kubernetes
helm
docker
terraform
Gitlab CI
Jenkins
Apache airflow
Datadog
Hadoop (HDFS)
AWS EKS
AWS EMR
AWS EC2
AWS Cloudformation
AWS Secrets Manager
AWS RDS
AWS S3
GCP

Machine Learning

Tensorflow
PyTorch
keras
scikit-learn
pyMC3
MLflow
AWS Sagemaker
LakeFS
Pinecone

Streaming

Apache Spark
Apache Flink
Apache Kafka
amazon Kinesis
snowplow

Engineering Concept

Object Oriented Programming
Test Driven Development (TDD)
Functional Programming
Domain Driven Design (DDD)
Clean Code

Security

ssh
Snyk
kerberos
Apache Knox
AWS IAM
VPN

Agile Concepts and Tools

Scrum (Certified Scrum Master)
Kanban
Jira
Confluence
Trello
Redmine

Software

IntelliJ Idea
PyCharm
vim
tmux
bash
fish

Platforms

Linux
Mac/OSX

Experience

2020 - today

Role: Team Lead Data Engineering / Data Science�

Customer: Neuland ? B�ro f�r Informatik

2017 - 2020

Role: Data Engineer / Data Scientist�

Customer: Neuland ? B�ro f�r Informatik

2015 - 2017

Role: Back End Developer�

Customer: Neuland ? B�ro f�r Informatik

2012 - 2015�

Role: Project Manager�

Customer: Neuland ? B�ro f�r Informatik

2012 - 2012

Role: Management Assistant to the CTO�

Customer: OXID eSales

2010 - 2012�

Role: Public Relations Consultant�

Customer: rheinfaktor

Programmiersprachen

Python

Java

SQL

JavaScript

bash

Lisp

Haskell

Octave

Datenbanken

MySQL

PostgreSQL

Redis

AWS Redshift

Cassandra

AWS Athena

Apache Hive

Apache solr

elasticsearch

Vertrauen Sie auf Randstad

Im Bereich Freelancing

Im Bereich Arbeitnehmer�berlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Name E-Mail-Adresse Ihre Frage

Telefonnummer Unternehmen

Ich habe die Datenschutzbestimmungen gelesen und bin damit einverstanden.

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Top-Skills

Produkte / Standards / Erfahrungen / Methoden

Programmiersprachen

Datenbanken

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Top-Skills

Produkte / Standards / Erfahrungen / Methoden

Programmiersprachen

Datenbanken

Vertrauen Sie auf Randstad

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.