Selected project list - the complete list will be made available upon request.
Senior Machine Learning Engineer / Consultant, N.N.
07/2022 - 09/2022, Karlsruhe, Germany
o In depth analysis, evaluation and improvements of the current onboarding process for remote
workers
o Need analysis of automated testing in real-time large-scale machine learning data pipelines
o Evaluation of the level and quality of important internal technical documentation
o Device efficient knowledge transfer strategies between the Data Science and Software Engineering
teams
o Gradient boosted-tree (xGBoost), K-means clustering, Pandas, SQLAlchemy, GCP, BigQuery, Kubeflow, Apache Spark, Scala, Java 15, Python, podman, gitLab, git, slack, Jira, Confluence, IntelliJ, MacOS
Scientific Full Stack Developer/Consultant, Covestro Digital R&D
07/2020 ? 11/2020 Leverkusen, Germany
Portation of existing R&D high-performance compute scripts from Bash/Slurm to Python3
Containerization of these R&D high-performance Python codes using Docker/Podman for use in the AWS Cloud
Compilation of HPC code packages like e.g. LAMMPS, Gaussian etc in various compute platforms
Exploration of Apache Airflow as orchestration tool in a complex automated multi-scale quantum chemistry workflow
Consulting with respect to professional software development practices in HPC environments: git, gitLab, coding conventions, testing, DevOps (CI/CD) concepts etc
Python 3.x, Java, AWS Cloud, AWS Batch, Anaconda 3, Apache Airflow, Jupyter, Docker, Podman, Kubernetes (minikube, microK8s), git, gitLab, CentOS
Big Data Full Stack Developer, N.N.
03/2019 ? 09/2019 München, Germany
Evaluation of existing machine-learning/ETL pipeplines to an Apache Airflow-based system
Exploration of the python package Dask/Numba for parallel machine learning on Big Data sets
Evaluation of Apache Arrow as fast in-memory Big Data processing layer in heterogeneous Hadoop/ Spark analytics pipelines
Quality and performance evaluation of various novel machine learning algorithms like LightGMB (decision trees) and Genetic Programming gplearn (Symbolic Regression) etc
Eploration/setup of Docker-based PySpark Juypter notebook containters for use in Hadoop/Spark clusters
Python 3.7, Java, PySpark, C++, Anaconda 3, Airflow, Apache Arrow, Jupyter, Spyder, Docker/Podman, Kubernetes (minikube), git, github, Ubuntu 18.04, LTS
Big Data Science Consultant, N.N. AG
01/2018 ? 12/2018 München, Germany
Consulting/Evaluation in the area of Big Data Engineering, Search and Analytic of heterogenous data sets in a combined Hadoop-Spark + Elasticsearch cluster environment
Elasticsearch, Hadoop + Spark 2.2.0, Cloudera 5.14, Elasticsearch-Hadoop connector, SparkR, sparklyr, Apache Zeppelin, Jupyter, Java, Scala, R, Python, JSON, git, JIRA, Confluence, SCRUM
extensive benchmarking and performance optimization of various Big Data ETL, data engineering, data analysis and data visualization use cases
explicit exploration and performance benchmark of the Elasticsearch-Hadoop (EH) connector for use in a combined EH Big Data analysis platform
exploration of generated Scala code submission onto a Apache Spark cluster using a programmatic API (Apache Livy)
Consulting in using/employing Artificial Intelligence/Machine Learning ( AI / ML ) algorithms within exisiting R&D projects
Big Data Scientist & Machine Learning Software Engineer, Voith Digital Solutions
10/2016 - 04/2017 Heidenheim, Germany
Development of a large-scale Internet-of-Things (IoT, Industrie 4.0) platform using the Hadoop stack: Cloudera 5.9, HDFS, Apache Spark Streaming & MLlib, HBase, Impala, Python, Pandas, Apache Kafka, Hue, Java 8, Spring Boot, Scala, JAXP, git, IntelliJ, JIRA, SCRUM etc
Porting a complex outlier detection analysis algorithm in real-time for sensor-based time series data from Python scripts to object-oriented Java 8 within the Lambda architecture paradigm.
Performance optimization of the Java-based machine-learning algorithm on the Hadoop cluster
Consulting in various in-house Hadoop/Data Science/Machine-learning projects
Selected project list - the complete list will be made available upon request.
Senior Machine Learning Engineer / Consultant, N.N.
07/2022 - 09/2022, Karlsruhe, Germany
o In depth analysis, evaluation and improvements of the current onboarding process for remote
workers
o Need analysis of automated testing in real-time large-scale machine learning data pipelines
o Evaluation of the level and quality of important internal technical documentation
o Device efficient knowledge transfer strategies between the Data Science and Software Engineering
teams
o Gradient boosted-tree (xGBoost), K-means clustering, Pandas, SQLAlchemy, GCP, BigQuery, Kubeflow, Apache Spark, Scala, Java 15, Python, podman, gitLab, git, slack, Jira, Confluence, IntelliJ, MacOS
Scientific Full Stack Developer/Consultant, Covestro Digital R&D
07/2020 ? 11/2020 Leverkusen, Germany
Portation of existing R&D high-performance compute scripts from Bash/Slurm to Python3
Containerization of these R&D high-performance Python codes using Docker/Podman for use in the AWS Cloud
Compilation of HPC code packages like e.g. LAMMPS, Gaussian etc in various compute platforms
Exploration of Apache Airflow as orchestration tool in a complex automated multi-scale quantum chemistry workflow
Consulting with respect to professional software development practices in HPC environments: git, gitLab, coding conventions, testing, DevOps (CI/CD) concepts etc
Python 3.x, Java, AWS Cloud, AWS Batch, Anaconda 3, Apache Airflow, Jupyter, Docker, Podman, Kubernetes (minikube, microK8s), git, gitLab, CentOS
Big Data Full Stack Developer, N.N.
03/2019 ? 09/2019 München, Germany
Evaluation of existing machine-learning/ETL pipeplines to an Apache Airflow-based system
Exploration of the python package Dask/Numba for parallel machine learning on Big Data sets
Evaluation of Apache Arrow as fast in-memory Big Data processing layer in heterogeneous Hadoop/ Spark analytics pipelines
Quality and performance evaluation of various novel machine learning algorithms like LightGMB (decision trees) and Genetic Programming gplearn (Symbolic Regression) etc
Eploration/setup of Docker-based PySpark Juypter notebook containters for use in Hadoop/Spark clusters
Python 3.7, Java, PySpark, C++, Anaconda 3, Airflow, Apache Arrow, Jupyter, Spyder, Docker/Podman, Kubernetes (minikube), git, github, Ubuntu 18.04, LTS
Big Data Science Consultant, N.N. AG
01/2018 ? 12/2018 München, Germany
Consulting/Evaluation in the area of Big Data Engineering, Search and Analytic of heterogenous data sets in a combined Hadoop-Spark + Elasticsearch cluster environment
Elasticsearch, Hadoop + Spark 2.2.0, Cloudera 5.14, Elasticsearch-Hadoop connector, SparkR, sparklyr, Apache Zeppelin, Jupyter, Java, Scala, R, Python, JSON, git, JIRA, Confluence, SCRUM
extensive benchmarking and performance optimization of various Big Data ETL, data engineering, data analysis and data visualization use cases
explicit exploration and performance benchmark of the Elasticsearch-Hadoop (EH) connector for use in a combined EH Big Data analysis platform
exploration of generated Scala code submission onto a Apache Spark cluster using a programmatic API (Apache Livy)
Consulting in using/employing Artificial Intelligence/Machine Learning ( AI / ML ) algorithms within exisiting R&D projects
Big Data Scientist & Machine Learning Software Engineer, Voith Digital Solutions
10/2016 - 04/2017 Heidenheim, Germany
Development of a large-scale Internet-of-Things (IoT, Industrie 4.0) platform using the Hadoop stack: Cloudera 5.9, HDFS, Apache Spark Streaming & MLlib, HBase, Impala, Python, Pandas, Apache Kafka, Hue, Java 8, Spring Boot, Scala, JAXP, git, IntelliJ, JIRA, SCRUM etc
Porting a complex outlier detection analysis algorithm in real-time for sensor-based time series data from Python scripts to object-oriented Java 8 within the Lambda architecture paradigm.
Performance optimization of the Java-based machine-learning algorithm on the Hadoop cluster
Consulting in various in-house Hadoop/Data Science/Machine-learning projects