Data & AI Software Engineer: Java/Python/Scala, Spring Boot, AWS, Docker, Airflow, AI/Machine Learning, Algorithms, Apache Spark
Aktualisiert am 17.05.2024
Profil
Freiberufler / Selbstständiger
Remote-Arbeit
Verfügbar ab: 03.06.2024
Verfügbar zu: 90%
davon vor Ort: 5%
Java
Machine Learning
Python
Scala
Docker
Artifical Intelligence
Algorithms
Design Pattern
Apache Spark
AWS
OOP
Clean Code
Agile Softwareentwicklung
Spring Boot
deutsch
englisch

Einsatzorte

Einsatzorte

Deutschland, Österreich, Schweiz
möglich

Projekte

Projekte

Selected project list - the complete list will be made available upon request.


Senior Machine Learning Engineer / Consultant, N.N.

07/2022 - 09/2022, Karlsruhe, Germany

o In depth analysis, evaluation and improvements of the current onboarding process for remote workers

o Need analysis of automated testing in real-time large-scale machine learning data pipelines

o Evaluation of the level and quality of important internal technical documentation

o Device efficient knowledge transfer strategies between the Data Science and Software Engineering teams

o Gradient boosted-tree (xGBoost), K-means clustering, Pandas, SQLAlchemy, GCP, BigQuery, Kubeflow, Apache Spark, Scala, Java 15, Python, podman, gitLab, git, slack, Jira, Confluence, IntelliJ, MacOS




Scientific Full Stack Developer/Consultant, Covestro Digital R&D

07/2020 ? 11/2020 Leverkusen, Germany

  • Portation of existing R&D high-performance compute scripts from Bash/Slurm to Python3

  • Containerization of these R&D high-performance Python codes using Docker/Podman for use in the AWS Cloud

  • Compilation of HPC code packages like e.g. LAMMPS, Gaussian etc in various compute platforms

  • Exploration of Apache Airflow as orchestration tool in a complex automated multi-scale quantum chemistry workflow

  • Consulting with respect to professional software development practices in HPC environments: git, gitLab, coding conventions, testing, DevOps (CI/CD) concepts etc

  • Python 3.x, Java, AWS Cloud, AWS Batch, Anaconda 3, Apache Airflow, Jupyter, Docker, Podman, Kubernetes (minikube, microK8s), git, gitLab, CentOS




Big Data Full Stack Developer, N.N.
03/2019 ? 09/2019 München, Germany

  • Evaluation of existing machine-learning/ETL pipeplines to an Apache Airflow-based system

  • Exploration of the python package Dask/Numba for parallel machine learning on Big Data sets

  • Evaluation of Apache Arrow as fast in-memory Big Data processing layer in heterogeneous Hadoop/ Spark analytics pipelines

  • Quality and performance evaluation of various novel machine learning algorithms like LightGMB (decision trees) and Genetic Programming gplearn (Symbolic Regression) etc

  • Eploration/setup of Docker-based PySpark Juypter notebook containters for use in Hadoop/Spark clusters

  • Python 3.7, Java, PySpark, C++, Anaconda 3, Airflow, Apache Arrow, Jupyter, Spyder, Docker/Podman, Kubernetes (minikube), git, github, Ubuntu 18.04, LTS




Big Data Science Consultant, N.N. AG
01/2018 ? 12/2018 München, Germany

  • Consulting/Evaluation in the area of Big Data Engineering, Search and Analytic of heterogenous data sets in a combined Hadoop-Spark + Elasticsearch cluster environment

  • Elasticsearch, Hadoop + Spark 2.2.0, Cloudera 5.14, Elasticsearch-Hadoop connector, SparkR, sparklyr, Apache Zeppelin, Jupyter, Java, Scala, R, Python, JSON, git, JIRA, Confluence, SCRUM

  • extensive benchmarking and performance optimization of various Big Data ETL, data engineering, data analysis and data visualization use cases

  • explicit exploration and performance benchmark of the Elasticsearch-Hadoop (EH) connector for use in a combined EH Big Data analysis platform

  • exploration of generated Scala code submission onto a Apache Spark cluster using a programmatic API (Apache Livy)

  • Consulting in using/employing Artificial Intelligence/Machine Learning ( AI / ML ) algorithms within exisiting R&D projects


Big Data Scientist & Machine Learning Software Engineer, Voith Digital Solutions
10/2016 - 04/2017 Heidenheim, Germany

  • Development of a large-scale Internet-of-Things (IoT, Industrie 4.0) platform using the Hadoop stack: Cloudera 5.9, HDFS, Apache Spark Streaming & MLlib, HBase, Impala, Python, Pandas, Apache Kafka, Hue, Java 8, Spring Boot, Scala, JAXP, git, IntelliJ, JIRA, SCRUM etc

  • Porting a complex outlier detection analysis algorithm in real-time for sensor-based time series data from Python scripts to object-oriented Java 8 within the Lambda architecture paradigm.

  • Performance optimization of the Java-based machine-learning algorithm on the Hadoop cluster

  • Consulting in various in-house Hadoop/Data Science/Machine-learning projects

Einsatzorte

Einsatzorte

Deutschland, Österreich, Schweiz
möglich

Projekte

Projekte

Selected project list - the complete list will be made available upon request.


Senior Machine Learning Engineer / Consultant, N.N.

07/2022 - 09/2022, Karlsruhe, Germany

o In depth analysis, evaluation and improvements of the current onboarding process for remote workers

o Need analysis of automated testing in real-time large-scale machine learning data pipelines

o Evaluation of the level and quality of important internal technical documentation

o Device efficient knowledge transfer strategies between the Data Science and Software Engineering teams

o Gradient boosted-tree (xGBoost), K-means clustering, Pandas, SQLAlchemy, GCP, BigQuery, Kubeflow, Apache Spark, Scala, Java 15, Python, podman, gitLab, git, slack, Jira, Confluence, IntelliJ, MacOS




Scientific Full Stack Developer/Consultant, Covestro Digital R&D

07/2020 ? 11/2020 Leverkusen, Germany

  • Portation of existing R&D high-performance compute scripts from Bash/Slurm to Python3

  • Containerization of these R&D high-performance Python codes using Docker/Podman for use in the AWS Cloud

  • Compilation of HPC code packages like e.g. LAMMPS, Gaussian etc in various compute platforms

  • Exploration of Apache Airflow as orchestration tool in a complex automated multi-scale quantum chemistry workflow

  • Consulting with respect to professional software development practices in HPC environments: git, gitLab, coding conventions, testing, DevOps (CI/CD) concepts etc

  • Python 3.x, Java, AWS Cloud, AWS Batch, Anaconda 3, Apache Airflow, Jupyter, Docker, Podman, Kubernetes (minikube, microK8s), git, gitLab, CentOS




Big Data Full Stack Developer, N.N.
03/2019 ? 09/2019 München, Germany

  • Evaluation of existing machine-learning/ETL pipeplines to an Apache Airflow-based system

  • Exploration of the python package Dask/Numba for parallel machine learning on Big Data sets

  • Evaluation of Apache Arrow as fast in-memory Big Data processing layer in heterogeneous Hadoop/ Spark analytics pipelines

  • Quality and performance evaluation of various novel machine learning algorithms like LightGMB (decision trees) and Genetic Programming gplearn (Symbolic Regression) etc

  • Eploration/setup of Docker-based PySpark Juypter notebook containters for use in Hadoop/Spark clusters

  • Python 3.7, Java, PySpark, C++, Anaconda 3, Airflow, Apache Arrow, Jupyter, Spyder, Docker/Podman, Kubernetes (minikube), git, github, Ubuntu 18.04, LTS




Big Data Science Consultant, N.N. AG
01/2018 ? 12/2018 München, Germany

  • Consulting/Evaluation in the area of Big Data Engineering, Search and Analytic of heterogenous data sets in a combined Hadoop-Spark + Elasticsearch cluster environment

  • Elasticsearch, Hadoop + Spark 2.2.0, Cloudera 5.14, Elasticsearch-Hadoop connector, SparkR, sparklyr, Apache Zeppelin, Jupyter, Java, Scala, R, Python, JSON, git, JIRA, Confluence, SCRUM

  • extensive benchmarking and performance optimization of various Big Data ETL, data engineering, data analysis and data visualization use cases

  • explicit exploration and performance benchmark of the Elasticsearch-Hadoop (EH) connector for use in a combined EH Big Data analysis platform

  • exploration of generated Scala code submission onto a Apache Spark cluster using a programmatic API (Apache Livy)

  • Consulting in using/employing Artificial Intelligence/Machine Learning ( AI / ML ) algorithms within exisiting R&D projects


Big Data Scientist & Machine Learning Software Engineer, Voith Digital Solutions
10/2016 - 04/2017 Heidenheim, Germany

  • Development of a large-scale Internet-of-Things (IoT, Industrie 4.0) platform using the Hadoop stack: Cloudera 5.9, HDFS, Apache Spark Streaming & MLlib, HBase, Impala, Python, Pandas, Apache Kafka, Hue, Java 8, Spring Boot, Scala, JAXP, git, IntelliJ, JIRA, SCRUM etc

  • Porting a complex outlier detection analysis algorithm in real-time for sensor-based time series data from Python scripts to object-oriented Java 8 within the Lambda architecture paradigm.

  • Performance optimization of the Java-based machine-learning algorithm on the Hadoop cluster

  • Consulting in various in-house Hadoop/Data Science/Machine-learning projects

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.