Data and ML Engineering
Aktualisiert am 23.08.2024
Profil
Freiberufler / Selbstständiger
Remote-Arbeit
Verfügbar ab: 01.10.2024
Verfügbar zu: 100%
davon vor Ort: 40%
Python
AWS
Data Engineering
GCP
Kubernetes
Kubeflow
Scala
seldon
sql
BigQuery
Postgres
scikit-learn
Apache Spark
pandas
Albanian
Mother tongue
English
C2
German
C1

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich
möglich

Projekte

Projekte

4 Jahre 5 Monate
2020-04 - heute

Sales Gap Analysis based on publicly available datasets and transactional data

Tech Lead Kubernetes Kubeflow Apache Spark ...
Tech Lead

Using Machine Learning and NLP techniques, infer what the customers are not buying for the supplier and target these gaps with personalized sales promotions. The product is used to assist Sales Manages in increasing basket sizes for their customers.

  • Starting from locally running MVP, move to a production solution running on Kubernetes on Cloud.
  • Lead implementation of incremental performance improvements, to fit the demands of production load.
  • Spearheaded the implementation of modern ML application deployment technologies and frameworks to the project.
  • Automate data ingestion, model training, evaluation and deployment.
  • Unit, integration and end-to-end test automation.
Kubernetes Kubeflow Apache Spark Python BigQuery Helm scikit-learn Docker
Multinational wholesale chain
Düsseldorf
5 Monate
2019-11 - 2020-03

Precalculation Layer in Analytics Platform

Senior Data Engineer Kubernetes SQL Python ...
Senior Data Engineer

In stock exchange companies, the volume of collected data in- creases daily. Also market value for these data increases. Considering these dynamic features of the project scope, we created an architecture where historical data could be provided to users and allowing them to enrich this data in their own scope.

  • Creating an architecture for fast and cost efficient access of historical data.
  • Development of data transformation strategy to persist extensions of pre-calculation layer.
  • Development of schema evolution strategy.
Kubernetes SQL Python Presto Scala Apache Spark AWS S3 Hive Docker
Stock Exchange Broker
Frankfurt am Main
4 Monate
2019-07 - 2019-10

Processing of Cash Register Data

Senior Data Engineer GCP Python BigQuery ...
Senior Data Engineer

Project aims to create a decoupled architecture to process and persist data on daily basis, collected into GCP. Also, in the scope of this project, we defined the CI/CD strategy for all the components of the data pipeline.

  • Create ingestion layer to make data available in BigQuery on daily basis.
  • Create the CI/CD pipeline for all components.
  • E2E-Integration testing (design, data generation, execution)
  • Documentation
GCP Python BigQuery Docker Gitlab
Retailer
Salzburg
1 Jahr 3 Monate
2018-05 - 2019-07

Metadata and data management application on top of a Data lake environment

Senior Data Engineer Scala Python Apache Spark ...
Senior Data Engineer

Create a self service framework for ingesting data from different sources and preprocessing, keeping in mind requirements from end consumer of these data. The framework is designed to work in a cloud native environment. A set of HTTP Rest APIs was made available to users to provide necessary functionality around on-demand AWS EMR clusters.

  • Development of data transformation and aggregation algorithms (Scala) to facilitate new load types.
  • Development of the metadata management and data transformation orchestration application, written in Python, to support new destination environments.
  • Development of unit, integration and end-to-end test for all components of the application, Scala and Python.
Scala Python Apache Spark Jenkins Bitbucket AWS S3 AWS EMR Docker Hive
Globals Sports Brand
Herzogenaurath
1 Jahr 5 Monate
2017-01 - 2018-05

Information Marketplace

Data Engineer Apache Spark Docker Python ...
Data Engineer

Implemented a Big Data Platform for ingestion, storage, and processing of batch and real-time data from a variety of data sources within the customer. As part of that, a centralized search engine was designed and implemented for company-wide knowledge and documentation of data sources. In another project, we aimed to transfer the persistence layer of one analytic application from Oracle to Impala. Data were provided on regular basis, and the result of processing with Spark and Scala was a set of tables with more than 10000 columns. These jobs were scheduled on Apache Airflow.

Apache Spark Docker Python Scala Airflow Hive Kafka
Truck Manufacturer
Munich

Aus- und Weiterbildung

Aus- und Weiterbildung

2 Jahre 8 Monate
2013-10 - 2016-05

Software Systems Engineering

Master of Science, RWTH Aachen
Master of Science
RWTH Aachen

Data and Information Management

2 Jahre 10 Monate
2010-10 - 2013-07

Computer Science, Bachelor of Science

Graduated with Honors, University of Tirana, Tirana, Albania
Graduated with Honors
University of Tirana, Tirana, Albania

Position

Position

Data Engineer or Machine Learning Engineer building production grade data pipeline or model training and deployment pipelines

Kompetenzen

Kompetenzen

Top-Skills

Python AWS Data Engineering GCP Kubernetes Kubeflow Scala seldon sql BigQuery Postgres scikit-learn Apache Spark pandas

Produkte / Standards / Erfahrungen / Methoden

WORK EXPERIENCE:
03/2021 – present

Rolle: Senior Consultant, ML/Data Engineer
Kunde: Freelance
Einsatzort: Munich, Germany


12/2016 – 03/2021
Rolle: Senior Consultant, ML/Data Engineer
Kunde: Data Reply
Einsatzort: Munich, Germany

EARLIER WORK EXPERIENCES:

01/2014 – 07/2015

Rolle: Student Helper

Kunde: RWTH Aachen University, Informatik Zentrum, Lehrstuhl für Informatik 5

Aufgaben:

Development of Web Services to allow storing and annotating multimedia data using relational and non-relational databases in microservice architecture.

09/2010 – 08/2013

Rolle: Software Developer

Kunde: Helius Systems, Tirana, Albania

Aufgaben:

Full-stack software developer in several projects used by up to 1500 users. Languages used: VB, SQL, C# and ASP.NET, Crystal Reports.

Cloud Providers:

  • AWS
  • GCP
  • Azure

Betriebssysteme

AWS
GCP

Programmiersprachen

Docker
Helm
Java
Jenkins
Kubeflow
Kubernetes
Python
Scala
scikit-learn
Spark
SQL

Datenbanken

Apache Spark
AWS S3
BigQuery
Hive
MongoDB
MySQL
Presto
Redshift
SQL

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich
möglich

Projekte

Projekte

4 Jahre 5 Monate
2020-04 - heute

Sales Gap Analysis based on publicly available datasets and transactional data

Tech Lead Kubernetes Kubeflow Apache Spark ...
Tech Lead

Using Machine Learning and NLP techniques, infer what the customers are not buying for the supplier and target these gaps with personalized sales promotions. The product is used to assist Sales Manages in increasing basket sizes for their customers.

  • Starting from locally running MVP, move to a production solution running on Kubernetes on Cloud.
  • Lead implementation of incremental performance improvements, to fit the demands of production load.
  • Spearheaded the implementation of modern ML application deployment technologies and frameworks to the project.
  • Automate data ingestion, model training, evaluation and deployment.
  • Unit, integration and end-to-end test automation.
Kubernetes Kubeflow Apache Spark Python BigQuery Helm scikit-learn Docker
Multinational wholesale chain
Düsseldorf
5 Monate
2019-11 - 2020-03

Precalculation Layer in Analytics Platform

Senior Data Engineer Kubernetes SQL Python ...
Senior Data Engineer

In stock exchange companies, the volume of collected data in- creases daily. Also market value for these data increases. Considering these dynamic features of the project scope, we created an architecture where historical data could be provided to users and allowing them to enrich this data in their own scope.

  • Creating an architecture for fast and cost efficient access of historical data.
  • Development of data transformation strategy to persist extensions of pre-calculation layer.
  • Development of schema evolution strategy.
Kubernetes SQL Python Presto Scala Apache Spark AWS S3 Hive Docker
Stock Exchange Broker
Frankfurt am Main
4 Monate
2019-07 - 2019-10

Processing of Cash Register Data

Senior Data Engineer GCP Python BigQuery ...
Senior Data Engineer

Project aims to create a decoupled architecture to process and persist data on daily basis, collected into GCP. Also, in the scope of this project, we defined the CI/CD strategy for all the components of the data pipeline.

  • Create ingestion layer to make data available in BigQuery on daily basis.
  • Create the CI/CD pipeline for all components.
  • E2E-Integration testing (design, data generation, execution)
  • Documentation
GCP Python BigQuery Docker Gitlab
Retailer
Salzburg
1 Jahr 3 Monate
2018-05 - 2019-07

Metadata and data management application on top of a Data lake environment

Senior Data Engineer Scala Python Apache Spark ...
Senior Data Engineer

Create a self service framework for ingesting data from different sources and preprocessing, keeping in mind requirements from end consumer of these data. The framework is designed to work in a cloud native environment. A set of HTTP Rest APIs was made available to users to provide necessary functionality around on-demand AWS EMR clusters.

  • Development of data transformation and aggregation algorithms (Scala) to facilitate new load types.
  • Development of the metadata management and data transformation orchestration application, written in Python, to support new destination environments.
  • Development of unit, integration and end-to-end test for all components of the application, Scala and Python.
Scala Python Apache Spark Jenkins Bitbucket AWS S3 AWS EMR Docker Hive
Globals Sports Brand
Herzogenaurath
1 Jahr 5 Monate
2017-01 - 2018-05

Information Marketplace

Data Engineer Apache Spark Docker Python ...
Data Engineer

Implemented a Big Data Platform for ingestion, storage, and processing of batch and real-time data from a variety of data sources within the customer. As part of that, a centralized search engine was designed and implemented for company-wide knowledge and documentation of data sources. In another project, we aimed to transfer the persistence layer of one analytic application from Oracle to Impala. Data were provided on regular basis, and the result of processing with Spark and Scala was a set of tables with more than 10000 columns. These jobs were scheduled on Apache Airflow.

Apache Spark Docker Python Scala Airflow Hive Kafka
Truck Manufacturer
Munich

Aus- und Weiterbildung

Aus- und Weiterbildung

2 Jahre 8 Monate
2013-10 - 2016-05

Software Systems Engineering

Master of Science, RWTH Aachen
Master of Science
RWTH Aachen

Data and Information Management

2 Jahre 10 Monate
2010-10 - 2013-07

Computer Science, Bachelor of Science

Graduated with Honors, University of Tirana, Tirana, Albania
Graduated with Honors
University of Tirana, Tirana, Albania

Position

Position

Data Engineer or Machine Learning Engineer building production grade data pipeline or model training and deployment pipelines

Kompetenzen

Kompetenzen

Top-Skills

Python AWS Data Engineering GCP Kubernetes Kubeflow Scala seldon sql BigQuery Postgres scikit-learn Apache Spark pandas

Produkte / Standards / Erfahrungen / Methoden

WORK EXPERIENCE:
03/2021 – present

Rolle: Senior Consultant, ML/Data Engineer
Kunde: Freelance
Einsatzort: Munich, Germany


12/2016 – 03/2021
Rolle: Senior Consultant, ML/Data Engineer
Kunde: Data Reply
Einsatzort: Munich, Germany

EARLIER WORK EXPERIENCES:

01/2014 – 07/2015

Rolle: Student Helper

Kunde: RWTH Aachen University, Informatik Zentrum, Lehrstuhl für Informatik 5

Aufgaben:

Development of Web Services to allow storing and annotating multimedia data using relational and non-relational databases in microservice architecture.

09/2010 – 08/2013

Rolle: Software Developer

Kunde: Helius Systems, Tirana, Albania

Aufgaben:

Full-stack software developer in several projects used by up to 1500 users. Languages used: VB, SQL, C# and ASP.NET, Crystal Reports.

Cloud Providers:

  • AWS
  • GCP
  • Azure

Betriebssysteme

AWS
GCP

Programmiersprachen

Docker
Helm
Java
Jenkins
Kubeflow
Kubernetes
Python
Scala
scikit-learn
Spark
SQL

Datenbanken

Apache Spark
AWS S3
BigQuery
Hive
MongoDB
MySQL
Presto
Redshift
SQL

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.