Data & AI Software Engineer: Java/Python/Scala, Spring Boot, AWS, Docker, Airflow, AI/Machine Learning, Algorithms, Apache Spark
Aktualisiert am 17.05.2024
Freiberufler / Selbstständiger
Verfügbar ab: 03.06.2024
Verfügbar zu: 90%
davon vor Ort: 5%
Machine Learning
Artifical Intelligence
Design Pattern
Apache Spark
Clean Code
Agile Softwareentwicklung
Spring Boot



Deutschland, Österreich, Schweiz



Selected project list - the complete list will be made available upon request.

Senior Machine Learning Engineer / Consultant, N.N.

07/2022 - 09/2022, Karlsruhe, Germany

o In depth analysis, evaluation and improvements of the current onboarding process for remote workers

o Need analysis of automated testing in real-time large-scale machine learning data pipelines

o Evaluation of the level and quality of important internal technical documentation

o Device efficient knowledge transfer strategies between the Data Science and Software Engineering teams

o Gradient boosted-tree (xGBoost), K-means clustering, Pandas, SQLAlchemy, GCP, BigQuery, Kubeflow, Apache Spark, Scala, Java 15, Python, podman, gitLab, git, slack, Jira, Confluence, IntelliJ, MacOS

Scientific Full Stack Developer/Consultant, Covestro Digital R&D

07/2020 ? 11/2020 Leverkusen, Germany

  • Portation of existing R&D high-performance compute scripts from Bash/Slurm to Python3

  • Containerization of these R&D high-performance Python codes using Docker/Podman for use in the AWS Cloud

  • Compilation of HPC code packages like e.g. LAMMPS, Gaussian etc in various compute platforms

  • Exploration of Apache Airflow as orchestration tool in a complex automated multi-scale quantum chemistry workflow

  • Consulting with respect to professional software development practices in HPC environments: git, gitLab, coding conventions, testing, DevOps (CI/CD) concepts etc

  • Python 3.x, Java, AWS Cloud, AWS Batch, Anaconda 3, Apache Airflow, Jupyter, Docker, Podman, Kubernetes (minikube, microK8s), git, gitLab, CentOS

Big Data Full Stack Developer, N.N.
03/2019 ? 09/2019 München, Germany

  • Evaluation of existing machine-learning/ETL pipeplines to an Apache Airflow-based system

  • Exploration of the python package Dask/Numba for parallel machine learning on Big Data sets

  • Evaluation of Apache Arrow as fast in-memory Big Data processing layer in heterogeneous Hadoop/ Spark analytics pipelines

  • Quality and performance evaluation of various novel machine learning algorithms like LightGMB (decision trees) and Genetic Programming gplearn (Symbolic Regression) etc

  • Eploration/setup of Docker-based PySpark Juypter notebook containters for use in Hadoop/Spark clusters

  • Python 3.7, Java, PySpark, C++, Anaconda 3, Airflow, Apache Arrow, Jupyter, Spyder, Docker/Podman, Kubernetes (minikube), git, github, Ubuntu 18.04, LTS

Big Data Science Consultant, N.N. AG
01/2018 ? 12/2018 München, Germany

  • Consulting/Evaluation in the area of Big Data Engineering, Search and Analytic of heterogenous data sets in a combined Hadoop-Spark + Elasticsearch cluster environment

  • Elasticsearch, Hadoop + Spark 2.2.0, Cloudera 5.14, Elasticsearch-Hadoop connector, SparkR, sparklyr, Apache Zeppelin, Jupyter, Java, Scala, R, Python, JSON, git, JIRA, Confluence, SCRUM

  • extensive benchmarking and performance optimization of various Big Data ETL, data engineering, data analysis and data visualization use cases

  • explicit exploration and performance benchmark of the Elasticsearch-Hadoop (EH) connector for use in a combined EH Big Data analysis platform

  • exploration of generated Scala code submission onto a Apache Spark cluster using a programmatic API (Apache Livy)

  • Consulting in using/employing Artificial Intelligence/Machine Learning ( AI / ML ) algorithms within exisiting R&D projects

Big Data Scientist & Machine Learning Software Engineer, Voith Digital Solutions
10/2016 - 04/2017 Heidenheim, Germany

  • Development of a large-scale Internet-of-Things (IoT, Industrie 4.0) platform using the Hadoop stack: Cloudera 5.9, HDFS, Apache Spark Streaming & MLlib, HBase, Impala, Python, Pandas, Apache Kafka, Hue, Java 8, Spring Boot, Scala, JAXP, git, IntelliJ, JIRA, SCRUM etc

  • Porting a complex outlier detection analysis algorithm in real-time for sensor-based time series data from Python scripts to object-oriented Java 8 within the Lambda architecture paradigm.

  • Performance optimization of the Java-based machine-learning algorithm on the Hadoop cluster

  • Consulting in various in-house Hadoop/Data Science/Machine-learning projects



Deutschland, Österreich, Schweiz



Selected project list - the complete list will be made available upon request.

Senior Machine Learning Engineer / Consultant, N.N.

07/2022 - 09/2022, Karlsruhe, Germany

o In depth analysis, evaluation and improvements of the current onboarding process for remote workers

o Need analysis of automated testing in real-time large-scale machine learning data pipelines

o Evaluation of the level and quality of important internal technical documentation

o Device efficient knowledge transfer strategies between the Data Science and Software Engineering teams

o Gradient boosted-tree (xGBoost), K-means clustering, Pandas, SQLAlchemy, GCP, BigQuery, Kubeflow, Apache Spark, Scala, Java 15, Python, podman, gitLab, git, slack, Jira, Confluence, IntelliJ, MacOS

Scientific Full Stack Developer/Consultant, Covestro Digital R&D

07/2020 ? 11/2020 Leverkusen, Germany

  • Portation of existing R&D high-performance compute scripts from Bash/Slurm to Python3

  • Containerization of these R&D high-performance Python codes using Docker/Podman for use in the AWS Cloud

  • Compilation of HPC code packages like e.g. LAMMPS, Gaussian etc in various compute platforms

  • Exploration of Apache Airflow as orchestration tool in a complex automated multi-scale quantum chemistry workflow

  • Consulting with respect to professional software development practices in HPC environments: git, gitLab, coding conventions, testing, DevOps (CI/CD) concepts etc

  • Python 3.x, Java, AWS Cloud, AWS Batch, Anaconda 3, Apache Airflow, Jupyter, Docker, Podman, Kubernetes (minikube, microK8s), git, gitLab, CentOS

Big Data Full Stack Developer, N.N.
03/2019 ? 09/2019 München, Germany

  • Evaluation of existing machine-learning/ETL pipeplines to an Apache Airflow-based system

  • Exploration of the python package Dask/Numba for parallel machine learning on Big Data sets

  • Evaluation of Apache Arrow as fast in-memory Big Data processing layer in heterogeneous Hadoop/ Spark analytics pipelines

  • Quality and performance evaluation of various novel machine learning algorithms like LightGMB (decision trees) and Genetic Programming gplearn (Symbolic Regression) etc

  • Eploration/setup of Docker-based PySpark Juypter notebook containters for use in Hadoop/Spark clusters

  • Python 3.7, Java, PySpark, C++, Anaconda 3, Airflow, Apache Arrow, Jupyter, Spyder, Docker/Podman, Kubernetes (minikube), git, github, Ubuntu 18.04, LTS

Big Data Science Consultant, N.N. AG
01/2018 ? 12/2018 München, Germany

  • Consulting/Evaluation in the area of Big Data Engineering, Search and Analytic of heterogenous data sets in a combined Hadoop-Spark + Elasticsearch cluster environment

  • Elasticsearch, Hadoop + Spark 2.2.0, Cloudera 5.14, Elasticsearch-Hadoop connector, SparkR, sparklyr, Apache Zeppelin, Jupyter, Java, Scala, R, Python, JSON, git, JIRA, Confluence, SCRUM

  • extensive benchmarking and performance optimization of various Big Data ETL, data engineering, data analysis and data visualization use cases

  • explicit exploration and performance benchmark of the Elasticsearch-Hadoop (EH) connector for use in a combined EH Big Data analysis platform

  • exploration of generated Scala code submission onto a Apache Spark cluster using a programmatic API (Apache Livy)

  • Consulting in using/employing Artificial Intelligence/Machine Learning ( AI / ML ) algorithms within exisiting R&D projects

Big Data Scientist & Machine Learning Software Engineer, Voith Digital Solutions
10/2016 - 04/2017 Heidenheim, Germany

  • Development of a large-scale Internet-of-Things (IoT, Industrie 4.0) platform using the Hadoop stack: Cloudera 5.9, HDFS, Apache Spark Streaming & MLlib, HBase, Impala, Python, Pandas, Apache Kafka, Hue, Java 8, Spring Boot, Scala, JAXP, git, IntelliJ, JIRA, SCRUM etc

  • Porting a complex outlier detection analysis algorithm in real-time for sensor-based time series data from Python scripts to object-oriented Java 8 within the Lambda architecture paradigm.

  • Performance optimization of the Java-based machine-learning algorithm on the Hadoop cluster

  • Consulting in various in-house Hadoop/Data Science/Machine-learning projects

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung


Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.