Principal Data Engineer Postgres, Spark, python, scala, AWS, schema and ETLs fine-tuning
Aktualisiert am 18.07.2024
Profil
Freiberufler / Selbstständiger
Remote-Arbeit
Verfügbar ab: 23.06.2024
Verfügbar zu: 100%
davon vor Ort: 100%
Python
Apache Spark
PostgreSQL
Scala
C++
Elastic Search
Apache Cassandra
Redis
Kafka
AWS
Terraform
Kubernetes
Helm
CI/CD
Data-Mart
Datawarehouse
Applikationsarchitektur

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich
möglich

Projekte

Projekte

2 months
2024-06 - 2024-07

Performance tuning - 2k read/writes per second - postgres, Django, python

PostgreSQL Django Python ...

Analyse Postgres data access patterns: read vs writes, hot tables, redundant and missing indexes, table and index sizes.

Check analyze and vacuum stats.

Collect query statistics - number of calls, mean and max time.

Review app architecture - job's queue, webhook and user triggered endpoint + kronjobs in kubernetes.

Collect metrics in prometheus and grafana related to db usage and code execution to map CPU spikes with suspicious code.

Together with engineering team run several sessions of code optimisation's that allow to improve overall performance 3 times.

Document runbooks and roadmap to prepare codebase for further growth.  

PostgreSQL Django Python Redis
Rakt technologies
2 years 6 months
2022-01 - 2024-06

Build a Uber-like match-making platform for cleaners and home-owners

CTO PostgreSQL Python Elastic Search
CTO

  • Define initial platform architecture and support its evolution over time
  • Design and implement ETLs for analytics dashboards
  • Adjust REST/SQL schemas for new requirements
  • Design a feature store and train/inference pipelines for ML components - Recomendation Engine, Cleaning Duration Estimation Engine, Image Moderation
  • Implement business critical components in backend, infra and frontend
AWS Google Cloud Azure
PostgreSQL Python Elastic Search
Syzygy AI LLC
Berlin
2 years 8 months
2019-06 - 2022-01

Data infra cloud migration

senior data engineer Apache Spark Scala Python ...
senior data engineer
Help company to implement data-driven strategy to migrate from data warehouse to data mesh architecture:
  • Migrate number of data pipelines from Cloudera private hosting environment into AWS cloud to support exponential demand for growth during Covid (Terraform, Kubernetes, AWS tech stack: EMR, ECR, IAM)
  • Create data pipeline for analysis of data for A\B testing using Optimizely and Google analytics events as a source (Spark, Airflow at K8s)
  • Develop framework for data quality assurance for the whole data warehouse
  • Implement new and extend existent ETLs according to stakeholder?s needs
  • Participate in on-call rotation to support SLA for business critical datasets
  • Performance tuning of stream and batch data pipelines - Spark, python, scala, Impala, Hive, Airflow, Parquet, HDFS, AWS infrastructure - S3, RDS, EC2, Lambda, DMS
AWS
Apache Spark Scala Python AWS Terraform Kubernetes
Hellofresh
Berlin
2 years 8 months
2015-11 - 2018-06

Build analytical project portfolio

Head of analytics department
Head of analytics department
Build analytics department from ground up:
  • identify possible value preposition for existing clients
  • build POC and gather feedback
  • teamed up with other team to prepare solution


Design, build and optimize high throughput highly available multi component data processing pipelines (up to 80k events per seconds, depending on cluster configuration).

Cassandra, Kafka, Elastic performance tuning - schema adjustments & implementation of high performance multithreaded shooters in C++, jvm & linux kernel optimization

Containerization of applications: maintaining internal docker registry, automate orchestration of dockers vms, docker performance tuning

Introduce company for modern SDLC practices and help to integrate it in a way to be coherent with business goals: continuous integration, devops, integration tests & auto-tests

Participate in business critical activities: sizing and hardware planning, develop POC for pre-sales events, requirements assessments, investigation of disaster incidents, etc


Head of analytics department:

Add several key functionalities for company's products portfolio.

Implement various analytical modules on top of Apache Spark.

Build a toolbox for algorithms assessments for data discovery and quick hypothesis checks

Assessment and adaptation of methods of AI\ML for such tasks as predictions\clusterization.

Application of methods Natural Language Processing (NLP) according to business goals.


Used technologies\tools:

Spark, Cassandra, Zeppelin, postgres, redis, kafka, akka, scala, Java, SOA, microservices, reverse engineering, application of math statistics, data mining, predictive analysis & DSP, C++, boost, python, scikit, docker

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich
möglich

Projekte

Projekte

2 months
2024-06 - 2024-07

Performance tuning - 2k read/writes per second - postgres, Django, python

PostgreSQL Django Python ...

Analyse Postgres data access patterns: read vs writes, hot tables, redundant and missing indexes, table and index sizes.

Check analyze and vacuum stats.

Collect query statistics - number of calls, mean and max time.

Review app architecture - job's queue, webhook and user triggered endpoint + kronjobs in kubernetes.

Collect metrics in prometheus and grafana related to db usage and code execution to map CPU spikes with suspicious code.

Together with engineering team run several sessions of code optimisation's that allow to improve overall performance 3 times.

Document runbooks and roadmap to prepare codebase for further growth.  

PostgreSQL Django Python Redis
Rakt technologies
2 years 6 months
2022-01 - 2024-06

Build a Uber-like match-making platform for cleaners and home-owners

CTO PostgreSQL Python Elastic Search
CTO

  • Define initial platform architecture and support its evolution over time
  • Design and implement ETLs for analytics dashboards
  • Adjust REST/SQL schemas for new requirements
  • Design a feature store and train/inference pipelines for ML components - Recomendation Engine, Cleaning Duration Estimation Engine, Image Moderation
  • Implement business critical components in backend, infra and frontend
AWS Google Cloud Azure
PostgreSQL Python Elastic Search
Syzygy AI LLC
Berlin
2 years 8 months
2019-06 - 2022-01

Data infra cloud migration

senior data engineer Apache Spark Scala Python ...
senior data engineer
Help company to implement data-driven strategy to migrate from data warehouse to data mesh architecture:
  • Migrate number of data pipelines from Cloudera private hosting environment into AWS cloud to support exponential demand for growth during Covid (Terraform, Kubernetes, AWS tech stack: EMR, ECR, IAM)
  • Create data pipeline for analysis of data for A\B testing using Optimizely and Google analytics events as a source (Spark, Airflow at K8s)
  • Develop framework for data quality assurance for the whole data warehouse
  • Implement new and extend existent ETLs according to stakeholder?s needs
  • Participate in on-call rotation to support SLA for business critical datasets
  • Performance tuning of stream and batch data pipelines - Spark, python, scala, Impala, Hive, Airflow, Parquet, HDFS, AWS infrastructure - S3, RDS, EC2, Lambda, DMS
AWS
Apache Spark Scala Python AWS Terraform Kubernetes
Hellofresh
Berlin
2 years 8 months
2015-11 - 2018-06

Build analytical project portfolio

Head of analytics department
Head of analytics department
Build analytics department from ground up:
  • identify possible value preposition for existing clients
  • build POC and gather feedback
  • teamed up with other team to prepare solution


Design, build and optimize high throughput highly available multi component data processing pipelines (up to 80k events per seconds, depending on cluster configuration).

Cassandra, Kafka, Elastic performance tuning - schema adjustments & implementation of high performance multithreaded shooters in C++, jvm & linux kernel optimization

Containerization of applications: maintaining internal docker registry, automate orchestration of dockers vms, docker performance tuning

Introduce company for modern SDLC practices and help to integrate it in a way to be coherent with business goals: continuous integration, devops, integration tests & auto-tests

Participate in business critical activities: sizing and hardware planning, develop POC for pre-sales events, requirements assessments, investigation of disaster incidents, etc


Head of analytics department:

Add several key functionalities for company's products portfolio.

Implement various analytical modules on top of Apache Spark.

Build a toolbox for algorithms assessments for data discovery and quick hypothesis checks

Assessment and adaptation of methods of AI\ML for such tasks as predictions\clusterization.

Application of methods Natural Language Processing (NLP) according to business goals.


Used technologies\tools:

Spark, Cassandra, Zeppelin, postgres, redis, kafka, akka, scala, Java, SOA, microservices, reverse engineering, application of math statistics, data mining, predictive analysis & DSP, C++, boost, python, scikit, docker

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.