Senior Bioinformatics Data Scientist | 9+ years in applied research & healthcare industry | ML/AI in precision medicine and (epi)genomics
Aktualisiert am 07.10.2025
Profil
Freiberufler / Selbstständiger
Remote-Arbeit
Verfügbar ab: 20.10.2025
Verfügbar zu: 100%
davon vor Ort: 100%
Machine Learning
Deep Learning
Pharma
Python
scikit-learn
PyTorch
R
Bioconductor
Nextflow
Snakemake
Regressionstest
Classification
LLM
Cross-functional collaboration
Clustering
English
Verhandlungssicher

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich
möglich

Projekte

Projekte

4 months
2025-04 - 2025-07

various

Bioinformatician | Research Assistant II
Bioinformatician | Research Assistant II
  • Gene regulation analysis (ML), and data visualization to uncover insights driving:
    • muscle aging (RNA-seq), chimera organogenesis (snRNA-seq)
    • assembly of Y chromosome of Mus minutoides (WGS)
    • yo-yo effect (RNA-seq, CUT&TAG)
  • Set-up ETL Nextflow Tower for NGS (MLOps):
    • Set-up reproducible end-to-end ETL Nextflow pipelines (RNA-seq, CUT&TAG) with Nextflow Tower on SLURM (ETH Zürich cluster) with containerization (Docker/ Singularity) and Spack
    • Maintained in-house ETL pipelines in Ruby on Rails, R, and Bash on FGCZ servers
  • Interactive web-based visualizations:
    • Interactive R Shiny apps via ShinyProxy for scRNA-seq data
    • shiny-public-fgcz-uzh-ch (snRNA-seq: MHUO, Mouse epiAT memory, Human AT memory)
  • Cross-disciplinary collaboration with wet-lab scientists to interpret models and experiments
ETH Zürich & Functional Genomics Center Zürich (FGCZ) of the University of Zurich (UZH)
2 years
2023-02 - 2025-01

various

Bioinformatics Data Scientist
Bioinformatics Data Scientist
Engaged in B2B contracts with bioinformatics, data science and software consultancy company Ardigen (CRO department) and with Selvita S.A. Key highlights include:
  • Healthcare AI and oncology (ML/AI):
    • Applied ML/AI for tumor antigen (TAAs) nomination and patient stratification for clinical trials
    • Performed survival analysis, classification (PU learning) and variational autoencoders (VAE)
    • Integrated cancer gene expression, mutation and drug response data: scRNA-seq, TCGA, TEMPUS, GenomeOncology
    • Stakeholder: Oncology Data Science Department at Merck
  • Biomarker discovery in Alzheimer?s disease (ML/AI):
    • Target gene and enhancer discovery in glia-to-neuron reprogramming
    • Applied ML/AI analysis, incl. a transformer for enhancer detection called Enformer by Avsec et al., and LASSO for DNA motifs detection by Machlab et al.
    • Integrated multi-omics datasets: RNA-seq, ATAC-seq, Bisulfite-seq, and H3K27ac ChIP-seq
    • Stakeholder: start-up Stardustries
  • Interactive web-based visualization:
    • Interactive visualisations of gene isoforms, and integration of in-house and publicly available sequencing (epi)genomic data
    • Implemented in the IGV app browser and CLI in Javascript and Python
    • Stakeholder: Novo Nordisk
  • Cross-disciplinary collaboration:
    • With software engineers, data scientists, UX/UI designers, and wet-lab scientists to translate user needs into robust, scalable technical solutions
    • Set priorities and made context-aware decisions.
  • Stakeholder communication & presentation:
    • Agile (Scrum) project management
    • Project leading, planning and mentoring, presentations of results
  • MLOps and quality:
    • Containerized workflows (Docker) on Kubernetes and AWS
    • Version control (git, gitlab/github)
    • Automated tests and documentation (Diátaxis)
Remote - Berlin (Germany)
6 years 2 months
2015-10 - 2021-11

Bioinformatics and Omics Data Science Platform

PhD Student - Computational Molecular Biology
PhD Student - Computational Molecular Biology
  • Led and contributed to end-to-end research projects applying advanced statistical and ML techniques, resulting in 4 publications in top-tier journals (350+ citations).
  • First co-author of a peer-reviewed review on statistical methods for Bisulfite-seq data analysis.
  • Genomic artifact detection (ML):
    • Led ChIP-seq bias reduction project, linking spurious ChIP-seq signals to RNA:DNA hybrids, and open-chromatin and hypomethylated regions
    • Performed ML analysis incl. elastic net, PCA
    • Integrated large-scale omics data: ENCODE, RoadmapEpigenomics, publicly available ChIPseq, RNA-seq, ATAC-seq, Bisulfite-seq, DRIP-seq
    • Collaborated with Prof. Dr. Tursun?s wet-lab to improve ChIP-seq protocol
  • Snakemake ETL pipeline for Bisulfite-seq (MLOps):
    • ETL pipeline implemented in snakemake, Python and R/Bioc
    • Peer-reviewed publication as part of the PiGX: Pipelines in Genomics project
    • Applied it on SLURM (Berlin Institute of Health at Charité), Grid Engine (MDC BIMSB)
  • cfDNA biomarker discovery for acute coronary syndrome (ML):
    • Applied ML incl. logistic regression with overdisperssion correction
    • Analyzed liquid biopsies (blood cell-free DNA) from patient-derived Bisulfite-seq data
    • In close collaboration with clinician Prof. Dr. med. Ulf Landmesser at the Charité hospital
Berlin Institute of Medical Systems Biology, Max Delbrück Center (MDC-BIMSB)
7 months
2015-03 - 2015-09

Bioinformatics and Omics Data Science Platform

Visiting Predoctoral Researcher
Visiting Predoctoral Researcher
  • Awarded a ?7,000 scholarship by the Institute of Computer Science, Polish Academy of Sciences and co-financed by the European Union
  • Developed Bioconductor R package genomation for visualization and annotation of genomic data
Max Delbrück Center (MDC-BIMSB)

Aus- und Weiterbildung

Aus- und Weiterbildung

2015 - 2021
Study - Computational Molecular Biology
Humboldt University of Berlin (Germany)
Degree: Ph.D.

2012 - 2014
Study - Computer Science and Bioinformatics
University of Warsaw (Poland)
Degree: M.Sc.

2010 - 2012
Study - Computer Science and Bioinformatics
University of Warsaw (Poland)
Degree: B.Sc.

Kompetenzen

Kompetenzen

Top-Skills

Machine Learning Deep Learning Pharma Python scikit-learn PyTorch R Bioconductor Nextflow Snakemake Regressionstest Classification LLM Cross-functional collaboration Clustering

Produkte / Standards / Erfahrungen / Methoden

Profile
Bioinformatics Data Scientist with 9+ years of applied research and healthcare industry in precision medicine and clinical genomics. Skilled in building and evaluating ML models - regression, classification, generative AI (VAEs), and transformers - and in developing workflows for biomarker discovery and translational research. Thrive in international, cross-functional, and creative environments.

Skills
Technical skills
  • ML & AI: statistical tests - incl. t-tests, Wilcoxon tests; regression - linear, logistic, Cox/survival analysis, elastic net/lasso/ridge; classification - random forest, XGBoost, SVM, LDA; clustering - K-means, EM, probabilistic models - Hidden Markov Models (HMMs), linear Gaussian state-space models; dimensionality reduction & factorization - PCA, t-SNE, MOFA, NMF; sampling/optimization - replica exchange monte carlo; deep learning - variational autoencoders (VAEs), CNNs, transformers, retrieval-augmented generation (RAG), federated learning
  • MLOps: workflow languages - Nextflow, Snakemake; SLURM, Grid Engine, Docker/Singularity, Kubernetes, AWS, DigitalOcean, workflow engines - Galaxy
  • Version Control and Software Management: Linux/Unix, git, svn, conda, GNU Guix
  • Bioinformatic Tools: samtools, BEDtools, GATK, IGV, Bowtie2, BWA, Bismark, BLAST
  • Omics Databases: TCGA, Roadmap Epigenomics, TEMPUS, GTEx, Ensembl, NCBI, GenomOncology
Soft Skills
  • Cross-functional collaboration and independent project leadership across cross-disciplinary teams, clear communication of complex concepts to technical and non-technical audiences, adaptability and agility in evolving scientific and technical environments, analytical thinking and problem solving, project planning and mentoring, scientific writing and international presentation

Programmiersprachen

Python
R - CRAN
Javascript
NumPy
pandas
scikit-learn
seaborn
PyTorch
Matplotlib
Bioconductor
incl. caret, tidyverse
unit test
bash
pytest
testthat

Datenbanken

PostgreSQL
SQLite
MySQL

Einsatzorte

Einsatzorte

Deutschland, Schweiz, Österreich
möglich

Projekte

Projekte

4 months
2025-04 - 2025-07

various

Bioinformatician | Research Assistant II
Bioinformatician | Research Assistant II
  • Gene regulation analysis (ML), and data visualization to uncover insights driving:
    • muscle aging (RNA-seq), chimera organogenesis (snRNA-seq)
    • assembly of Y chromosome of Mus minutoides (WGS)
    • yo-yo effect (RNA-seq, CUT&TAG)
  • Set-up ETL Nextflow Tower for NGS (MLOps):
    • Set-up reproducible end-to-end ETL Nextflow pipelines (RNA-seq, CUT&TAG) with Nextflow Tower on SLURM (ETH Zürich cluster) with containerization (Docker/ Singularity) and Spack
    • Maintained in-house ETL pipelines in Ruby on Rails, R, and Bash on FGCZ servers
  • Interactive web-based visualizations:
    • Interactive R Shiny apps via ShinyProxy for scRNA-seq data
    • shiny-public-fgcz-uzh-ch (snRNA-seq: MHUO, Mouse epiAT memory, Human AT memory)
  • Cross-disciplinary collaboration with wet-lab scientists to interpret models and experiments
ETH Zürich & Functional Genomics Center Zürich (FGCZ) of the University of Zurich (UZH)
2 years
2023-02 - 2025-01

various

Bioinformatics Data Scientist
Bioinformatics Data Scientist
Engaged in B2B contracts with bioinformatics, data science and software consultancy company Ardigen (CRO department) and with Selvita S.A. Key highlights include:
  • Healthcare AI and oncology (ML/AI):
    • Applied ML/AI for tumor antigen (TAAs) nomination and patient stratification for clinical trials
    • Performed survival analysis, classification (PU learning) and variational autoencoders (VAE)
    • Integrated cancer gene expression, mutation and drug response data: scRNA-seq, TCGA, TEMPUS, GenomeOncology
    • Stakeholder: Oncology Data Science Department at Merck
  • Biomarker discovery in Alzheimer?s disease (ML/AI):
    • Target gene and enhancer discovery in glia-to-neuron reprogramming
    • Applied ML/AI analysis, incl. a transformer for enhancer detection called Enformer by Avsec et al., and LASSO for DNA motifs detection by Machlab et al.
    • Integrated multi-omics datasets: RNA-seq, ATAC-seq, Bisulfite-seq, and H3K27ac ChIP-seq
    • Stakeholder: start-up Stardustries
  • Interactive web-based visualization:
    • Interactive visualisations of gene isoforms, and integration of in-house and publicly available sequencing (epi)genomic data
    • Implemented in the IGV app browser and CLI in Javascript and Python
    • Stakeholder: Novo Nordisk
  • Cross-disciplinary collaboration:
    • With software engineers, data scientists, UX/UI designers, and wet-lab scientists to translate user needs into robust, scalable technical solutions
    • Set priorities and made context-aware decisions.
  • Stakeholder communication & presentation:
    • Agile (Scrum) project management
    • Project leading, planning and mentoring, presentations of results
  • MLOps and quality:
    • Containerized workflows (Docker) on Kubernetes and AWS
    • Version control (git, gitlab/github)
    • Automated tests and documentation (Diátaxis)
Remote - Berlin (Germany)
6 years 2 months
2015-10 - 2021-11

Bioinformatics and Omics Data Science Platform

PhD Student - Computational Molecular Biology
PhD Student - Computational Molecular Biology
  • Led and contributed to end-to-end research projects applying advanced statistical and ML techniques, resulting in 4 publications in top-tier journals (350+ citations).
  • First co-author of a peer-reviewed review on statistical methods for Bisulfite-seq data analysis.
  • Genomic artifact detection (ML):
    • Led ChIP-seq bias reduction project, linking spurious ChIP-seq signals to RNA:DNA hybrids, and open-chromatin and hypomethylated regions
    • Performed ML analysis incl. elastic net, PCA
    • Integrated large-scale omics data: ENCODE, RoadmapEpigenomics, publicly available ChIPseq, RNA-seq, ATAC-seq, Bisulfite-seq, DRIP-seq
    • Collaborated with Prof. Dr. Tursun?s wet-lab to improve ChIP-seq protocol
  • Snakemake ETL pipeline for Bisulfite-seq (MLOps):
    • ETL pipeline implemented in snakemake, Python and R/Bioc
    • Peer-reviewed publication as part of the PiGX: Pipelines in Genomics project
    • Applied it on SLURM (Berlin Institute of Health at Charité), Grid Engine (MDC BIMSB)
  • cfDNA biomarker discovery for acute coronary syndrome (ML):
    • Applied ML incl. logistic regression with overdisperssion correction
    • Analyzed liquid biopsies (blood cell-free DNA) from patient-derived Bisulfite-seq data
    • In close collaboration with clinician Prof. Dr. med. Ulf Landmesser at the Charité hospital
Berlin Institute of Medical Systems Biology, Max Delbrück Center (MDC-BIMSB)
7 months
2015-03 - 2015-09

Bioinformatics and Omics Data Science Platform

Visiting Predoctoral Researcher
Visiting Predoctoral Researcher
  • Awarded a ?7,000 scholarship by the Institute of Computer Science, Polish Academy of Sciences and co-financed by the European Union
  • Developed Bioconductor R package genomation for visualization and annotation of genomic data
Max Delbrück Center (MDC-BIMSB)

Aus- und Weiterbildung

Aus- und Weiterbildung

2015 - 2021
Study - Computational Molecular Biology
Humboldt University of Berlin (Germany)
Degree: Ph.D.

2012 - 2014
Study - Computer Science and Bioinformatics
University of Warsaw (Poland)
Degree: M.Sc.

2010 - 2012
Study - Computer Science and Bioinformatics
University of Warsaw (Poland)
Degree: B.Sc.

Kompetenzen

Kompetenzen

Top-Skills

Machine Learning Deep Learning Pharma Python scikit-learn PyTorch R Bioconductor Nextflow Snakemake Regressionstest Classification LLM Cross-functional collaboration Clustering

Produkte / Standards / Erfahrungen / Methoden

Profile
Bioinformatics Data Scientist with 9+ years of applied research and healthcare industry in precision medicine and clinical genomics. Skilled in building and evaluating ML models - regression, classification, generative AI (VAEs), and transformers - and in developing workflows for biomarker discovery and translational research. Thrive in international, cross-functional, and creative environments.

Skills
Technical skills
  • ML & AI: statistical tests - incl. t-tests, Wilcoxon tests; regression - linear, logistic, Cox/survival analysis, elastic net/lasso/ridge; classification - random forest, XGBoost, SVM, LDA; clustering - K-means, EM, probabilistic models - Hidden Markov Models (HMMs), linear Gaussian state-space models; dimensionality reduction & factorization - PCA, t-SNE, MOFA, NMF; sampling/optimization - replica exchange monte carlo; deep learning - variational autoencoders (VAEs), CNNs, transformers, retrieval-augmented generation (RAG), federated learning
  • MLOps: workflow languages - Nextflow, Snakemake; SLURM, Grid Engine, Docker/Singularity, Kubernetes, AWS, DigitalOcean, workflow engines - Galaxy
  • Version Control and Software Management: Linux/Unix, git, svn, conda, GNU Guix
  • Bioinformatic Tools: samtools, BEDtools, GATK, IGV, Bowtie2, BWA, Bismark, BLAST
  • Omics Databases: TCGA, Roadmap Epigenomics, TEMPUS, GTEx, Ensembl, NCBI, GenomOncology
Soft Skills
  • Cross-functional collaboration and independent project leadership across cross-disciplinary teams, clear communication of complex concepts to technical and non-technical audiences, adaptability and agility in evolving scientific and technical environments, analytical thinking and problem solving, project planning and mentoring, scientific writing and international presentation

Programmiersprachen

Python
R - CRAN
Javascript
NumPy
pandas
scikit-learn
seaborn
PyTorch
Matplotlib
Bioconductor
incl. caret, tidyverse
unit test
bash
pytest
testthat

Datenbanken

PostgreSQL
SQLite
MySQL

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.