DevOps Engineer focused on Kubernetes, CI/CD, and cloud automation with IaC tools, experienced in Docker, Prometheus, and Grafana.
Aktualisiert am 05.05.2026
Profil
Freiberufler / Selbstständiger
Verfügbar ab: 10.05.2026
Verfügbar zu: 100%
davon vor Ort: 100%
English
Fluent
German
Proficient

Einsatzorte

Einsatzorte

nicht möglich

Projekte

Projekte

1 year 7 months
2024-10 - 2026-04

AI/ML Platform Engineering

AI/ML Platform Engineering LLM as Backend SystemArchitektur Back-End ...
AI/ML Platform Engineering
Built a scalable AI/ML platform for enterprise workloads, with support for RAG pipelines, model
routing/orchestration, fine-tuning, and content-moderation workflows. Delivered cross-language
SDKs, internal CLI tools, and fully automated CI/CD pipelines to streamline AI adoption for product
teams while optimizing for cost efficiency, security, and operational reliability.
Responsibilities:
?? Designed and implemented AI?first CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, Argo
Workflows/ArgoCD) that integrate model training, validation, and deployment into release
automation.
?? Built fine-tuning, embedding, and moderation pipelines using Azure OpenAI and Azure
Cognitive Services enhancing domain relevance, safety compliance, and multilingual support
across enterprise applications.
?? Implemented document processing pipelines integrating Azure Document Intelligence and
OCR to extract, structure, and route enterprise content into downstream RAG and analytics
workflows.
?? Automated deployment, scaling, and lifecycle management of GenAI workloads using Argo
Workflows, ArgoCD, Jenkins, and GitOps-based configuration management. Achieved faster
release cycles and significantly reduced configuration drift and deployment failures.
?? Integrated AI tooling into Git workflows and code quality processes (pre?merge LLM checks,
automated code review bots, Copilot evaluation & safe?use playbooks) for dev teams.
?? Authored operational runbooks, playbooks and hands?on workshop curricula (prompt
engineering, LLMOps, secure Copilot usage, CI/CD integration) and coached engineering/SRE
teams.
?? Enforced GDPR-compliant data handling, EU AI Act governance controls, and IAM-based access
policies across all model endpoints and platform components.
?? Designed and implemented multi-tenant model-routing services across AWS Bedrock, Azure
OpenAI and Google AI with dynamic LLM selection based on latency, token cost, throughput,
and task-specific performance.
?? Built FastAPI microservices as the backend foundation for SDK APIs and internal platform
services, exposing REST endpoints and supporting WebSocket-based streaming for real-time
inference responses.
?? Delivered cross-language SDK packages in Python, TypeScript and Go, enabling product teams
to integrate LLM capabilities with minimal boilerplate.
LLM as Backend SystemArchitektur Back-End Python Prompt-Engineering Azure OpenAI RAG ArgoCD Jenkins GitOps Go Kserve Knative Kuberntes Inference Azure Devops Kubernetes KNative KServe Observability AWS Bedrock TypeScript Helm
SAP
1 year 9 months
2023-02 - 2024-10

Observability Engineering (AI/ML)

Observability Engineer (AI/ML) Prometheus Grafana Kubernetes ...
Observability Engineer (AI/ML)

Designed and implemented a fully instrumented, cloud-native observability and telemetry

framework for hosted, fine-tuned, and proxied AI/ML models in enterprise-grade production

environments. Delivered end-to-end visibility into AI/ML training pipelines, inference workloads,

and model serving infrastructure.

Responsibilities:

?? Architected end-to-end observability pipelines for ML APIs and model-serving runtimes using

OpenTelemetry SDKs/collectors, Prometheus exporters, and Kubernetes operators,

instrumenting the full lifecycle of model training, inference, and system-level resource

utilization.

?? Instrumented model endpoints, batch/stream training jobs, and inference gateways to capture

high-resolution metrics such as tail latency, throughput (RPS/QPS), token-per-second

performance, GPU memory fragmentation, multi-node utilization, error budgets, and anomaly

detection signals.

?? Profiled and monitored inference optimization for vLLM and TensorRT-LLM deployments,

tracking CUDA kernel performance, NCCL communication overhead, and memory bandwidth

utilization to identify and resolve latency regressions.

?? Monitored distributed multi-GPU training runs across nodes connected via InfiniBand,

capturing per-GPU utilization, gradient synchronization bottlenecks, and HPC cluster health

signals for large-scale model training workloads.

?? Implemented automated alerting and SLO/SLA monitoring using Prometheus Alertmanager

and custom anomaly-detection pipelines to identify inference latency regressions, GPU/CPU

saturation events, memory leaks, container restarts, or failed model-training runs.

?? Collaborated with MLOps, SRE, and platform engineering teams to integrate telemetry into

CI/CD pipelines, automate environment drift detection, and enable data-driven scaling policies

for training and inference clusters.

Prometheus Grafana Kubernetes Helm Python OpenTelemetry Promitor Dynatrace Go Loki Jaeger Tempo KEDA HelmArgoCD MLflow ArgoCD vLLM TensorRT-LLM CUDA / NCCL Multi-GPU / HPC
SAP
11 months
2022-05 - 2023-03

DevOps / Platform Engineer

DevOps / Platform Engineer Go(lang) WebSocket OpenSearch ...
DevOps / Platform Engineer

Designed and built a scalable, event-driven container orchestration and control-plane platform inspired by Kubernetes, leveraging asynchronous processing, message queues, and streaming architectures to automate cloud resource provisioning across hundreds of services and thousands of tenants. Implemented infrastructure-as-code using AWS CDK, managing multi-environment deployments across AWS Lambda, Amazon ECS, and Amazon S3 to ensure high availability, scalability, and cost efficiency.


Responsibilities:

  • Architected and operated a cloud-native control plane with core components such as API server, RBAC, reconciliation loops, and namespace isolation, applying Kubernetes-style operational patterns for scalable infrastructure management.
  • Built and maintained infrastructure automation pipelines using AWS CDK, enabling repeatable, version-controlled provisioning and deployment across environments.
  • Designed event-driven systems using distributed messaging, WebSockets, and caching layers (Redis) to support real-time infrastructure state synchronization and high-throughput workloads.
  • Implemented robust state management and tenant isolation using SQL/ORM layers with support for concurrency control and consistent reads at scale.
  • Developed and maintained CI/CD pipelines with local cloud emulation using LocalStack, improving developer productivity and deployment reliability.
  • Established observability practices with Prometheus, Grafana, and Alertmanager, enabling proactive monitoring, alerting, and performance optimization.
  • Applied Kubernetes-native constructs (Deployments, StatefulSets, DaemonSets, Operators) to manage containerized workloads and ensure resilient service orchestration.
  • Enabled internal platform adoption by providing reusable tooling, SDKs, and abstractions for service teams, reducing operational overhead and improving deployment consistency.

Go(lang) WebSocket OpenSearch LocalStack Redis Prometheus Grafana AWS S3 Kubernetes RBAC Distributed Systems Webhook Go (Golang) TypeScript SQL AWS (Lambda ECS S3 CDK) Kubernetes
SAP
Walldorf
1 year 8 months
2020-09 - 2022-04

DevOps Engineer (Kubernetes | GitOps | Observability)

DevOps Engineer Prometheus API gateway Redis ...
DevOps Engineer

Built and operated a cloud-native, event-driven control-plane platform inspired by Kubernetes, enabling automated provisioning and lifecycle management of distributed workloads across thousands of tenants.

Strong focus on Kubernetes ecosystem tooling, GitOps workflows, observability, and secure, production-grade infrastructure, with hands-on experience designing scalable platforms using infrastructure-as-code and container-native patterns.


Responsibilities:

  • Designed and operated Kubernetes-style control-plane components (API server, reconciliation loops, RBAC, resource isolation), aligning closely with operator-based architectures and extensibility patterns.
  • Managed containerized workloads across Kubernetes and Docker, applying Helm-based deployments and declarative configuration practices.
  • Implemented Git-driven deployment workflows and CI/CD pipelines (GitLab CI/CD, trunk-based development), enabling automated build, test, and release cycles.
  • Built event-driven infrastructure using message queues, Redis, and WebSockets to support real-time orchestration and high-throughput distributed systems.
  • Automated infrastructure provisioning using AWS CDK and configuration management concepts aligned with tools like Ansible.
  • Integrated observability stack with Prometheus, Grafana, and alerting pipelines for metrics, logging, and anomaly detection.
  • Implemented persistent storage and state management strategies using SQL databases (PostgreSQL-compatible patterns) with strong consistency and concurrency controls.
  • Developed and maintained internal platform tooling and SDKs to standardize deployment workflows and reduce operational complexity for engineering teams.
  • Utilized local cloud emulation via LocalStack to improve CI reliability and developer feedback loops.
Prometheus API gateway Redis Postgres Grafana Kubernetes Helm Docker CI/CD Gitops ArgoCD Terraform RBAC
Walldorf
1 year 5 months
2019-04 - 2020-08

Developed cloud infra. for the SAP HANA-as-a-Service

DevOps engineer HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) ...
DevOps engineer
Engineered a scalable, multi-region cloud infrastructure platform for SAP HANA-as-a-Service on
AWS, enabling fully automated provisioning, upgrades, and lifecycle management of enterprise
database systems. Designed the system to meet high availability, security, and compliance
requirements while significantly accelerating deployment speed.
Responsibilities:
?? Architected and implemented infrastructure-as-code using Terraform and configuration
management with Ansible to automate SAP HANA installation, upgrades, and system lifecycle
operations across environments.
?? Designed and developed an event-driven automation layer, including a lightweight agent
integrated with Consul for service discovery and change propagation, triggering dynamic
infrastructure workflows.
?? Built and exposed APIs to handle customer HANA system provisioning and lifecycle operations,
enabling self-service and programmatic access.
?? Engineered secure, production-grade AWS network architecture (VPC, subnets, routing, IAM),
ensuring isolation, compliance, and high availability across regions.
?? Implemented automated backup and disaster recovery strategies leveraging S3 and Glacier,
ensuring data durability and rapid restoration.
?? Reduced end-to-end deployment time by 40% through automation and system optimization
while maintaining strict enterprise security and compliance standards.?
HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) Cloud Foundry Python Go (Golang) Bash HashiCorp (Terraform Vault Consul) AWS (VPC EC2 S3 Glacier Cloud Watch API Gateway)
SAP

Aus- und Weiterbildung

Aus- und Weiterbildung

2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science

Position

Position

Software and DevOps engineer with focus on cloud-native development and LLM application development.

Kompetenzen

Kompetenzen

Schwerpunkte

AI/ML Platform Engineering & LLMOps
Experte
Cloud-Native Observability & SRE
Experte
Distributed Systems & Platform Engineering
Fortgeschritten

AI/ML Platform Engineering & LLMOps

Deep expertise in building production-grade GenAI platforms and agentic AI systems, with comprehensive experience in LLM deployment, fine-tuning, RAG pipelines, and model orchestration. Specialized in architecting multi-tenant AI infrastructure that balances performance, cost optimization, and enterprise security requirements.


Cloud-Native Observability and SRE

Expert in designing end-to-end observability solutions for distributed systems and AI/ML workloads using OpenTelemetry, Prometheus, and Grafana. Proven ability to instrument complex environments from token-level metrics to infrastructure telemetry, enabling proactive incident management, anomaly detection, and data-driven optimization of high-throughput systems.


Distributed Systems and Platform Engineering

Strong foundation in building scalable, cloud-native platforms with expertise in Kubernetes ecosystem, control-plane architecture, and microservices orchestration. Skilled in implementing GitOps workflows, CI/CD automation, and infrastructure-as-code practices to deliver reliable, self-service platforms for enterprise-scale deployments.

Aufgabenbereiche

System Architecture
Experte
Software Engineering
Experte
DevOps
Fortgeschritten
  • Architecture and implementation of enterprise GenAI platforms supporting RAG, fine-tuning, model routing, and content moderation workflows
  • Design and deployment of observability frameworks for AI/ML systems, including distributed tracing, metrics pipelines, and SLO/SLA monitoring
  • Development of autonomous agent systems with tool integration, memory modules, and safety/governance controls
  • Building cloud-native microservices and APIs for multi-tenant SaaS offerings with focus on scalability and reliability
  • Infrastructure automation using GitOps, CI/CD pipelines, and infrastructure-as-code across AWS and Azure environments
  • Implementation of control-plane architectures for container orchestration and resource provisioning at scale
  • Establishment of LLMOps practices including experiment tracking, model versioning, drift detection, and compliance enforcement
  • Performance optimization through caching strategies, autoscaling policies, and resource utilization monitoring
  • Cross-functional collaboration with ML Ops, SRE, and platform engineering teams to accelerate AI adoption
  • Security implementation including RBAC, multi-tenant isolation, and content moderation pipelines

Produkte / Standards / Erfahrungen / Methoden

DevOps
Experte
Software
Experte
AWS
Fortgeschritten
OpenAI
Experte
Kubernetes
Experte
Observability
Fortgeschritten
GenAI
Experte
Development
Experte
Profile
As a freelance software engineer, I deliver tailored, scalable software solutions for enterprise systems. My focus spans software architecture and development, DevOps and LLM-powered AI applications, with strong expertise in building reliable, observable and cost-efficient distributed systems on AWS and Azure.

AI/ML & GenAI
OpenAI, Azure OpenAI, LangChain, LlamaIndex, Semantic Kernel, RAG (Retrieval-Augmented Generation), Prompt Engineering, LLM Fine-tuning, Model Inference, Vector Databases, MLflow, Kserve, Knative, MCP (Model Context Protocol), Chatbot Development, Agentic AI Systems


Cloud Platforms & Services

AWS (VPC, EC2, S3, Glacier, CloudWatch, API Gateway), Azure DevOps, Azure Cognitive Services, Cloud Foundry, Multi-cloud Architecture


Container Orchestration & Infrastructure

Kubernetes, Helm, ArgoCD, Argo Workflows, Docker, StatefulSets, DaemonSets, Custom Operators, Control Plane Architecture


Observability & Monitoring

OpenTelemetry, Prometheus, Grafana, Dynatrace, Promitor, Loki, Jaeger, Tempo, Alertmanager, Distributed Tracing, Metrics Engineering, SLO/SLA Monitoring


Programming Languages

Python, Go (Golang), Node.js, Java, Bash


Data Storage & Caching

Redis, MongoDB, PostgreSQL, Firebase Firestore, Vector Databases, OpenSearch, AWS S3


DevOps & Automation

GitOps, Jenkins, Terraform, Ansible, HashiCorp (Vault, Consul, Terraform), LocalStack, CI/CD Pipelines, GitHub Actions, Infrastructure-as-Code (IaC)


Networking & Communication

REST APIs, WebSocket, API Gateway, RBAC, Service Mesh


Development Practices & Patterns

Microservices Architecture, 12-Factor App Principles, SRE Practices, LLMOps, MLOps, Multi-tenant Design, Distributed Systems, Event-Driven Architecture, KEDA (Kubernetes Event-Driven Autoscaling)


Security & Compliance

RBAC (Role-Based Access Control), Content Moderation, Prompt Injection Defense, Multi-tenant Isolation, Policy Enforcement, Adversarial Testing


Data & ML Tools

DVC (Data Version Control), Weights & Biases, Model Registries, Experiment Tracking, Dataset Versioning

Betriebssysteme

Linux

Programmiersprachen

Go (Golang)
Python
Java
Node.js
Postgres
MongoDB
Firebase

Datenbanken

PostgresSQL
MongoDB
Redis
Firestore
Elasticsearch

Einsatzorte

Einsatzorte

nicht möglich

Projekte

Projekte

1 year 7 months
2024-10 - 2026-04

AI/ML Platform Engineering

AI/ML Platform Engineering LLM as Backend SystemArchitektur Back-End ...
AI/ML Platform Engineering
Built a scalable AI/ML platform for enterprise workloads, with support for RAG pipelines, model
routing/orchestration, fine-tuning, and content-moderation workflows. Delivered cross-language
SDKs, internal CLI tools, and fully automated CI/CD pipelines to streamline AI adoption for product
teams while optimizing for cost efficiency, security, and operational reliability.
Responsibilities:
?? Designed and implemented AI?first CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, Argo
Workflows/ArgoCD) that integrate model training, validation, and deployment into release
automation.
?? Built fine-tuning, embedding, and moderation pipelines using Azure OpenAI and Azure
Cognitive Services enhancing domain relevance, safety compliance, and multilingual support
across enterprise applications.
?? Implemented document processing pipelines integrating Azure Document Intelligence and
OCR to extract, structure, and route enterprise content into downstream RAG and analytics
workflows.
?? Automated deployment, scaling, and lifecycle management of GenAI workloads using Argo
Workflows, ArgoCD, Jenkins, and GitOps-based configuration management. Achieved faster
release cycles and significantly reduced configuration drift and deployment failures.
?? Integrated AI tooling into Git workflows and code quality processes (pre?merge LLM checks,
automated code review bots, Copilot evaluation & safe?use playbooks) for dev teams.
?? Authored operational runbooks, playbooks and hands?on workshop curricula (prompt
engineering, LLMOps, secure Copilot usage, CI/CD integration) and coached engineering/SRE
teams.
?? Enforced GDPR-compliant data handling, EU AI Act governance controls, and IAM-based access
policies across all model endpoints and platform components.
?? Designed and implemented multi-tenant model-routing services across AWS Bedrock, Azure
OpenAI and Google AI with dynamic LLM selection based on latency, token cost, throughput,
and task-specific performance.
?? Built FastAPI microservices as the backend foundation for SDK APIs and internal platform
services, exposing REST endpoints and supporting WebSocket-based streaming for real-time
inference responses.
?? Delivered cross-language SDK packages in Python, TypeScript and Go, enabling product teams
to integrate LLM capabilities with minimal boilerplate.
LLM as Backend SystemArchitektur Back-End Python Prompt-Engineering Azure OpenAI RAG ArgoCD Jenkins GitOps Go Kserve Knative Kuberntes Inference Azure Devops Kubernetes KNative KServe Observability AWS Bedrock TypeScript Helm
SAP
1 year 9 months
2023-02 - 2024-10

Observability Engineering (AI/ML)

Observability Engineer (AI/ML) Prometheus Grafana Kubernetes ...
Observability Engineer (AI/ML)

Designed and implemented a fully instrumented, cloud-native observability and telemetry

framework for hosted, fine-tuned, and proxied AI/ML models in enterprise-grade production

environments. Delivered end-to-end visibility into AI/ML training pipelines, inference workloads,

and model serving infrastructure.

Responsibilities:

?? Architected end-to-end observability pipelines for ML APIs and model-serving runtimes using

OpenTelemetry SDKs/collectors, Prometheus exporters, and Kubernetes operators,

instrumenting the full lifecycle of model training, inference, and system-level resource

utilization.

?? Instrumented model endpoints, batch/stream training jobs, and inference gateways to capture

high-resolution metrics such as tail latency, throughput (RPS/QPS), token-per-second

performance, GPU memory fragmentation, multi-node utilization, error budgets, and anomaly

detection signals.

?? Profiled and monitored inference optimization for vLLM and TensorRT-LLM deployments,

tracking CUDA kernel performance, NCCL communication overhead, and memory bandwidth

utilization to identify and resolve latency regressions.

?? Monitored distributed multi-GPU training runs across nodes connected via InfiniBand,

capturing per-GPU utilization, gradient synchronization bottlenecks, and HPC cluster health

signals for large-scale model training workloads.

?? Implemented automated alerting and SLO/SLA monitoring using Prometheus Alertmanager

and custom anomaly-detection pipelines to identify inference latency regressions, GPU/CPU

saturation events, memory leaks, container restarts, or failed model-training runs.

?? Collaborated with MLOps, SRE, and platform engineering teams to integrate telemetry into

CI/CD pipelines, automate environment drift detection, and enable data-driven scaling policies

for training and inference clusters.

Prometheus Grafana Kubernetes Helm Python OpenTelemetry Promitor Dynatrace Go Loki Jaeger Tempo KEDA HelmArgoCD MLflow ArgoCD vLLM TensorRT-LLM CUDA / NCCL Multi-GPU / HPC
SAP
11 months
2022-05 - 2023-03

DevOps / Platform Engineer

DevOps / Platform Engineer Go(lang) WebSocket OpenSearch ...
DevOps / Platform Engineer

Designed and built a scalable, event-driven container orchestration and control-plane platform inspired by Kubernetes, leveraging asynchronous processing, message queues, and streaming architectures to automate cloud resource provisioning across hundreds of services and thousands of tenants. Implemented infrastructure-as-code using AWS CDK, managing multi-environment deployments across AWS Lambda, Amazon ECS, and Amazon S3 to ensure high availability, scalability, and cost efficiency.


Responsibilities:

  • Architected and operated a cloud-native control plane with core components such as API server, RBAC, reconciliation loops, and namespace isolation, applying Kubernetes-style operational patterns for scalable infrastructure management.
  • Built and maintained infrastructure automation pipelines using AWS CDK, enabling repeatable, version-controlled provisioning and deployment across environments.
  • Designed event-driven systems using distributed messaging, WebSockets, and caching layers (Redis) to support real-time infrastructure state synchronization and high-throughput workloads.
  • Implemented robust state management and tenant isolation using SQL/ORM layers with support for concurrency control and consistent reads at scale.
  • Developed and maintained CI/CD pipelines with local cloud emulation using LocalStack, improving developer productivity and deployment reliability.
  • Established observability practices with Prometheus, Grafana, and Alertmanager, enabling proactive monitoring, alerting, and performance optimization.
  • Applied Kubernetes-native constructs (Deployments, StatefulSets, DaemonSets, Operators) to manage containerized workloads and ensure resilient service orchestration.
  • Enabled internal platform adoption by providing reusable tooling, SDKs, and abstractions for service teams, reducing operational overhead and improving deployment consistency.

Go(lang) WebSocket OpenSearch LocalStack Redis Prometheus Grafana AWS S3 Kubernetes RBAC Distributed Systems Webhook Go (Golang) TypeScript SQL AWS (Lambda ECS S3 CDK) Kubernetes
SAP
Walldorf
1 year 8 months
2020-09 - 2022-04

DevOps Engineer (Kubernetes | GitOps | Observability)

DevOps Engineer Prometheus API gateway Redis ...
DevOps Engineer

Built and operated a cloud-native, event-driven control-plane platform inspired by Kubernetes, enabling automated provisioning and lifecycle management of distributed workloads across thousands of tenants.

Strong focus on Kubernetes ecosystem tooling, GitOps workflows, observability, and secure, production-grade infrastructure, with hands-on experience designing scalable platforms using infrastructure-as-code and container-native patterns.


Responsibilities:

  • Designed and operated Kubernetes-style control-plane components (API server, reconciliation loops, RBAC, resource isolation), aligning closely with operator-based architectures and extensibility patterns.
  • Managed containerized workloads across Kubernetes and Docker, applying Helm-based deployments and declarative configuration practices.
  • Implemented Git-driven deployment workflows and CI/CD pipelines (GitLab CI/CD, trunk-based development), enabling automated build, test, and release cycles.
  • Built event-driven infrastructure using message queues, Redis, and WebSockets to support real-time orchestration and high-throughput distributed systems.
  • Automated infrastructure provisioning using AWS CDK and configuration management concepts aligned with tools like Ansible.
  • Integrated observability stack with Prometheus, Grafana, and alerting pipelines for metrics, logging, and anomaly detection.
  • Implemented persistent storage and state management strategies using SQL databases (PostgreSQL-compatible patterns) with strong consistency and concurrency controls.
  • Developed and maintained internal platform tooling and SDKs to standardize deployment workflows and reduce operational complexity for engineering teams.
  • Utilized local cloud emulation via LocalStack to improve CI reliability and developer feedback loops.
Prometheus API gateway Redis Postgres Grafana Kubernetes Helm Docker CI/CD Gitops ArgoCD Terraform RBAC
Walldorf
1 year 5 months
2019-04 - 2020-08

Developed cloud infra. for the SAP HANA-as-a-Service

DevOps engineer HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) ...
DevOps engineer
Engineered a scalable, multi-region cloud infrastructure platform for SAP HANA-as-a-Service on
AWS, enabling fully automated provisioning, upgrades, and lifecycle management of enterprise
database systems. Designed the system to meet high availability, security, and compliance
requirements while significantly accelerating deployment speed.
Responsibilities:
?? Architected and implemented infrastructure-as-code using Terraform and configuration
management with Ansible to automate SAP HANA installation, upgrades, and system lifecycle
operations across environments.
?? Designed and developed an event-driven automation layer, including a lightweight agent
integrated with Consul for service discovery and change propagation, triggering dynamic
infrastructure workflows.
?? Built and exposed APIs to handle customer HANA system provisioning and lifecycle operations,
enabling self-service and programmatic access.
?? Engineered secure, production-grade AWS network architecture (VPC, subnets, routing, IAM),
ensuring isolation, compliance, and high availability across regions.
?? Implemented automated backup and disaster recovery strategies leveraging S3 and Glacier,
ensuring data durability and rapid restoration.
?? Reduced end-to-end deployment time by 40% through automation and system optimization
while maintaining strict enterprise security and compliance standards.?
HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) Cloud Foundry Python Go (Golang) Bash HashiCorp (Terraform Vault Consul) AWS (VPC EC2 S3 Glacier Cloud Watch API Gateway)
SAP

Aus- und Weiterbildung

Aus- und Weiterbildung

2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science

Position

Position

Software and DevOps engineer with focus on cloud-native development and LLM application development.

Kompetenzen

Kompetenzen

Schwerpunkte

AI/ML Platform Engineering & LLMOps
Experte
Cloud-Native Observability & SRE
Experte
Distributed Systems & Platform Engineering
Fortgeschritten

AI/ML Platform Engineering & LLMOps

Deep expertise in building production-grade GenAI platforms and agentic AI systems, with comprehensive experience in LLM deployment, fine-tuning, RAG pipelines, and model orchestration. Specialized in architecting multi-tenant AI infrastructure that balances performance, cost optimization, and enterprise security requirements.


Cloud-Native Observability and SRE

Expert in designing end-to-end observability solutions for distributed systems and AI/ML workloads using OpenTelemetry, Prometheus, and Grafana. Proven ability to instrument complex environments from token-level metrics to infrastructure telemetry, enabling proactive incident management, anomaly detection, and data-driven optimization of high-throughput systems.


Distributed Systems and Platform Engineering

Strong foundation in building scalable, cloud-native platforms with expertise in Kubernetes ecosystem, control-plane architecture, and microservices orchestration. Skilled in implementing GitOps workflows, CI/CD automation, and infrastructure-as-code practices to deliver reliable, self-service platforms for enterprise-scale deployments.

Aufgabenbereiche

System Architecture
Experte
Software Engineering
Experte
DevOps
Fortgeschritten
  • Architecture and implementation of enterprise GenAI platforms supporting RAG, fine-tuning, model routing, and content moderation workflows
  • Design and deployment of observability frameworks for AI/ML systems, including distributed tracing, metrics pipelines, and SLO/SLA monitoring
  • Development of autonomous agent systems with tool integration, memory modules, and safety/governance controls
  • Building cloud-native microservices and APIs for multi-tenant SaaS offerings with focus on scalability and reliability
  • Infrastructure automation using GitOps, CI/CD pipelines, and infrastructure-as-code across AWS and Azure environments
  • Implementation of control-plane architectures for container orchestration and resource provisioning at scale
  • Establishment of LLMOps practices including experiment tracking, model versioning, drift detection, and compliance enforcement
  • Performance optimization through caching strategies, autoscaling policies, and resource utilization monitoring
  • Cross-functional collaboration with ML Ops, SRE, and platform engineering teams to accelerate AI adoption
  • Security implementation including RBAC, multi-tenant isolation, and content moderation pipelines

Produkte / Standards / Erfahrungen / Methoden

DevOps
Experte
Software
Experte
AWS
Fortgeschritten
OpenAI
Experte
Kubernetes
Experte
Observability
Fortgeschritten
GenAI
Experte
Development
Experte
Profile
As a freelance software engineer, I deliver tailored, scalable software solutions for enterprise systems. My focus spans software architecture and development, DevOps and LLM-powered AI applications, with strong expertise in building reliable, observable and cost-efficient distributed systems on AWS and Azure.

AI/ML & GenAI
OpenAI, Azure OpenAI, LangChain, LlamaIndex, Semantic Kernel, RAG (Retrieval-Augmented Generation), Prompt Engineering, LLM Fine-tuning, Model Inference, Vector Databases, MLflow, Kserve, Knative, MCP (Model Context Protocol), Chatbot Development, Agentic AI Systems


Cloud Platforms & Services

AWS (VPC, EC2, S3, Glacier, CloudWatch, API Gateway), Azure DevOps, Azure Cognitive Services, Cloud Foundry, Multi-cloud Architecture


Container Orchestration & Infrastructure

Kubernetes, Helm, ArgoCD, Argo Workflows, Docker, StatefulSets, DaemonSets, Custom Operators, Control Plane Architecture


Observability & Monitoring

OpenTelemetry, Prometheus, Grafana, Dynatrace, Promitor, Loki, Jaeger, Tempo, Alertmanager, Distributed Tracing, Metrics Engineering, SLO/SLA Monitoring


Programming Languages

Python, Go (Golang), Node.js, Java, Bash


Data Storage & Caching

Redis, MongoDB, PostgreSQL, Firebase Firestore, Vector Databases, OpenSearch, AWS S3


DevOps & Automation

GitOps, Jenkins, Terraform, Ansible, HashiCorp (Vault, Consul, Terraform), LocalStack, CI/CD Pipelines, GitHub Actions, Infrastructure-as-Code (IaC)


Networking & Communication

REST APIs, WebSocket, API Gateway, RBAC, Service Mesh


Development Practices & Patterns

Microservices Architecture, 12-Factor App Principles, SRE Practices, LLMOps, MLOps, Multi-tenant Design, Distributed Systems, Event-Driven Architecture, KEDA (Kubernetes Event-Driven Autoscaling)


Security & Compliance

RBAC (Role-Based Access Control), Content Moderation, Prompt Injection Defense, Multi-tenant Isolation, Policy Enforcement, Adversarial Testing


Data & ML Tools

DVC (Data Version Control), Weights & Biases, Model Registries, Experiment Tracking, Dataset Versioning

Betriebssysteme

Linux

Programmiersprachen

Go (Golang)
Python
Java
Node.js
Postgres
MongoDB
Firebase

Datenbanken

PostgresSQL
MongoDB
Redis
Firestore
Elasticsearch

Vertrauen Sie auf Randstad

Im Bereich Freelancing
Im Bereich Arbeitnehmerüberlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.