Freelancer: DevOps Engineer focused on Kubernetes, CI/CD, and cloud automation with IaC tools, experienced in Docker, Prometheus, and Grafana.

Freiberufler / Selbstst�ndiger

Verf�gbar ab: 01.06.2026

Verf�gbar zu: 100%

davon vor Ort: 100%

Sprachen

English

German

Einsatzorte

Remote-Arbeit

nicht m�glich

Projekte

1 year 7 months

2024-10 - 2026-04

AI/ML Platform Engineering

AI/ML Platform Engineering LLM as Backend SystemArchitektur Back-End ...

Rolle

AI/ML Platform Engineering

Projektinhalte

Built a scalable AI/ML platform for enterprise workloads, with support for RAG pipelines, model

routing/orchestration, fine-tuning, and content-moderation workflows. Delivered cross-language

SDKs, internal CLI tools, and fully automated CI/CD pipelines to streamline AI adoption for product

teams while optimizing for cost efficiency, security, and operational reliability.

Responsibilities:

?? Designed and implemented AI?first CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, Argo

Workflows/ArgoCD) that integrate model training, validation, and deployment into release

automation.

?? Built fine-tuning, embedding, and moderation pipelines using Azure OpenAI and Azure

Cognitive Services enhancing domain relevance, safety compliance, and multilingual support

across enterprise applications.

?? Implemented document processing pipelines integrating Azure Document Intelligence and

OCR to extract, structure, and route enterprise content into downstream RAG and analytics

workflows.

?? Automated deployment, scaling, and lifecycle management of GenAI workloads using Argo

Workflows, ArgoCD, Jenkins, and GitOps-based configuration management. Achieved faster

release cycles and significantly reduced configuration drift and deployment failures.

?? Integrated AI tooling into Git workflows and code quality processes (pre?merge LLM checks,

automated code review bots, Copilot evaluation & safe?use playbooks) for dev teams.

?? Authored operational runbooks, playbooks and hands?on workshop curricula (prompt

engineering, LLMOps, secure Copilot usage, CI/CD integration) and coached engineering/SRE

teams.

?? Enforced GDPR-compliant data handling, EU AI Act governance controls, and IAM-based access

policies across all model endpoints and platform components.

?? Designed and implemented multi-tenant model-routing services across AWS Bedrock, Azure

OpenAI and Google AI with dynamic LLM selection based on latency, token cost, throughput,

and task-specific performance.

?? Built FastAPI microservices as the backend foundation for SDK APIs and internal platform

services, exposing REST endpoints and supporting WebSocket-based streaming for real-time

inference responses.

?? Delivered cross-language SDK packages in Python, TypeScript and Go, enabling product teams

to integrate LLM capabilities with minimal boilerplate.

Kenntnisse

LLM as Backend SystemArchitektur Back-End Python Prompt-Engineering Azure OpenAI RAG ArgoCD Jenkins GitOps Go Kserve Knative Kuberntes Inference Azure Devops Kubernetes KNative KServe Observability AWS Bedrock TypeScript Helm

Kunde

SAP

1 year 9 months

2023-02 - 2024-10

Observability Engineering (AI/ML)

Observability Engineer (AI/ML) Prometheus Grafana Kubernetes ...

Rolle

Observability Engineer (AI/ML)

Projektinhalte

Designed and implemented a fully instrumented, cloud-native observability and telemetry

framework for hosted, fine-tuned, and proxied AI/ML models in enterprise-grade production

environments. Delivered end-to-end visibility into AI/ML training pipelines, inference workloads,

and model serving infrastructure.

Responsibilities:

?? Architected end-to-end observability pipelines for ML APIs and model-serving runtimes using

OpenTelemetry SDKs/collectors, Prometheus exporters, and Kubernetes operators,

instrumenting the full lifecycle of model training, inference, and system-level resource

utilization.

?? Instrumented model endpoints, batch/stream training jobs, and inference gateways to capture

high-resolution metrics such as tail latency, throughput (RPS/QPS), token-per-second

performance, GPU memory fragmentation, multi-node utilization, error budgets, and anomaly

detection signals.

?? Profiled and monitored inference optimization for vLLM and TensorRT-LLM deployments,

tracking CUDA kernel performance, NCCL communication overhead, and memory bandwidth

utilization to identify and resolve latency regressions.

?? Monitored distributed multi-GPU training runs across nodes connected via InfiniBand,

capturing per-GPU utilization, gradient synchronization bottlenecks, and HPC cluster health

signals for large-scale model training workloads.

?? Implemented automated alerting and SLO/SLA monitoring using Prometheus Alertmanager

and custom anomaly-detection pipelines to identify inference latency regressions, GPU/CPU

saturation events, memory leaks, container restarts, or failed model-training runs.

?? Collaborated with MLOps, SRE, and platform engineering teams to integrate telemetry into

CI/CD pipelines, automate environment drift detection, and enable data-driven scaling policies

for training and inference clusters.

Kenntnisse

Prometheus Grafana Kubernetes Helm Python OpenTelemetry Promitor Dynatrace Go Loki Jaeger Tempo KEDA HelmArgoCD MLflow ArgoCD vLLM TensorRT-LLM CUDA / NCCL Multi-GPU / HPC

Kunde

SAP

11 months

2022-05 - 2023-03

DevOps / Platform Engineer

DevOps / Platform Engineer Go(lang) WebSocket OpenSearch ...

Rolle

DevOps / Platform Engineer

Projektinhalte

Designed and built a scalable, event-driven container orchestration and control-plane platform inspired by Kubernetes, leveraging asynchronous processing, message queues, and streaming architectures to automate cloud resource provisioning across hundreds of services and thousands of tenants.�Implemented infrastructure-as-code using AWS CDK, managing multi-environment deployments across AWS Lambda, Amazon ECS, and Amazon S3 to ensure high availability, scalability, and cost efficiency.

Responsibilities:

Architected and operated a cloud-native control plane with core components such as API server, RBAC, reconciliation loops, and namespace isolation, applying Kubernetes-style operational patterns for scalable infrastructure management.
Built and maintained infrastructure automation pipelines using AWS CDK, enabling repeatable, version-controlled provisioning and deployment across environments.
Designed event-driven systems using distributed messaging, WebSockets, and caching layers (Redis) to support real-time infrastructure state synchronization and high-throughput workloads.
Implemented robust state management and tenant isolation using SQL/ORM layers with support for concurrency control and consistent reads at scale.
Developed and maintained CI/CD pipelines with local cloud emulation using LocalStack, improving developer productivity and deployment reliability.
Established observability practices with Prometheus, Grafana, and Alertmanager, enabling proactive monitoring, alerting, and performance optimization.
Applied Kubernetes-native constructs (Deployments, StatefulSets, DaemonSets, Operators) to manage containerized workloads and ensure resilient service orchestration.
Enabled internal platform adoption by providing reusable tooling, SDKs, and abstractions for service teams, reducing operational overhead and improving deployment consistency.

Kenntnisse

Go(lang) WebSocket OpenSearch LocalStack Redis Prometheus Grafana AWS S3 Kubernetes RBAC Distributed Systems Webhook Go (Golang) TypeScript SQL AWS (Lambda ECS S3 CDK) Kubernetes

Kunde

SAP

Einsatzort

Walldorf

1 year 8 months

2020-09 - 2022-04

DevOps Engineer (Kubernetes | GitOps | Observability)

DevOps Engineer Prometheus API gateway Redis ...

Rolle

DevOps Engineer

Projektinhalte

Built and operated a cloud-native, event-driven control-plane platform inspired by Kubernetes, enabling automated provisioning and lifecycle management of distributed workloads across thousands of tenants.

Strong focus on Kubernetes ecosystem tooling, GitOps workflows, observability, and secure, production-grade infrastructure, with hands-on experience designing scalable platforms using infrastructure-as-code and container-native patterns.

Responsibilities:

Designed and operated Kubernetes-style control-plane components (API server, reconciliation loops, RBAC, resource isolation), aligning closely with operator-based architectures and extensibility patterns.
Managed containerized workloads across Kubernetes and Docker, applying Helm-based deployments and declarative configuration practices.
Implemented Git-driven deployment workflows and CI/CD pipelines (GitLab CI/CD, trunk-based development), enabling automated build, test, and release cycles.
Built event-driven infrastructure using message queues, Redis, and WebSockets to support real-time orchestration and high-throughput distributed systems.
Automated infrastructure provisioning using AWS CDK and configuration management concepts aligned with tools like Ansible.
Integrated observability stack with Prometheus, Grafana, and alerting pipelines for metrics, logging, and anomaly detection.
Implemented persistent storage and state management strategies using SQL databases (PostgreSQL-compatible patterns) with strong consistency and concurrency controls.
Developed and maintained internal platform tooling and SDKs to standardize deployment workflows and reduce operational complexity for engineering teams.
Utilized local cloud emulation via LocalStack to improve CI reliability and developer feedback loops.

Kenntnisse

Prometheus API gateway Redis Postgres Grafana Kubernetes Helm Docker CI/CD Gitops ArgoCD Terraform RBAC

Einsatzort

Walldorf

1 year 5 months

2019-04 - 2020-08

Developed cloud infra. for the SAP HANA-as-a-Service

DevOps engineer HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) ...

Rolle

DevOps engineer

Projektinhalte

Engineered a scalable, multi-region cloud infrastructure platform for SAP HANA-as-a-Service on

AWS, enabling fully automated provisioning, upgrades, and lifecycle management of enterprise

database systems. Designed the system to meet high availability, security, and compliance

requirements while significantly accelerating deployment speed.

Responsibilities:

?? Architected and implemented infrastructure-as-code using Terraform and configuration

management with Ansible to automate SAP HANA installation, upgrades, and system lifecycle

operations across environments.

?? Designed and developed an event-driven automation layer, including a lightweight agent

integrated with Consul for service discovery and change propagation, triggering dynamic

infrastructure workflows.

?? Built and exposed APIs to handle customer HANA system provisioning and lifecycle operations,

enabling self-service and programmatic access.

?? Engineered secure, production-grade AWS network architecture (VPC, subnets, routing, IAM),

ensuring isolation, compliance, and high availability across regions.

?? Implemented automated backup and disaster recovery strategies leveraging S3 and Glacier,

ensuring data durability and rapid restoration.

?? Reduced end-to-end deployment time by 40% through automation and system optimization

while maintaining strict enterprise security and compliance standards.?

Kenntnisse

HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) Cloud Foundry Python Go (Golang) Bash HashiCorp (Terraform Vault Consul) AWS (VPC EC2 S3 Glacier Cloud Watch API Gateway)

Kunde

SAP

Aus- und Weiterbildung

2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science

Position

Software and DevOps engineer with focus on cloud-native development and LLM application development.

Kompetenzen

Schwerpunkte

AI/ML Platform Engineering & LLMOps

Cloud-Native Observability & SRE

Distributed Systems & Platform Engineering

AI/ML Platform Engineering & LLMOps

Deep expertise in building production-grade GenAI platforms and agentic AI systems, with comprehensive experience in LLM deployment, fine-tuning, RAG pipelines, and model orchestration. Specialized in architecting multi-tenant AI infrastructure that balances performance, cost optimization, and enterprise security requirements.

Cloud-Native Observability and SRE

Expert in designing end-to-end observability solutions for distributed systems and AI/ML workloads using OpenTelemetry, Prometheus, and Grafana. Proven ability to instrument complex environments from token-level metrics to infrastructure telemetry, enabling proactive incident management, anomaly detection, and data-driven optimization of high-throughput systems.

Distributed Systems and Platform Engineering

Strong foundation in building scalable, cloud-native platforms with expertise in Kubernetes ecosystem, control-plane architecture, and microservices orchestration. Skilled in implementing GitOps workflows, CI/CD automation, and infrastructure-as-code practices to deliver reliable, self-service platforms for enterprise-scale deployments.

Aufgabenbereiche

System Architecture

Software Engineering

DevOps

Architecture and implementation of enterprise GenAI platforms supporting RAG, fine-tuning, model routing, and content moderation workflows
Design and deployment of observability frameworks for AI/ML systems, including distributed tracing, metrics pipelines, and SLO/SLA monitoring
Development of autonomous agent systems with tool integration, memory modules, and safety/governance controls
Building cloud-native microservices and APIs for multi-tenant SaaS offerings with focus on scalability and reliability
Infrastructure automation using GitOps, CI/CD pipelines, and infrastructure-as-code across AWS and Azure environments
Implementation of control-plane architectures for container orchestration and resource provisioning at scale
Establishment of LLMOps practices including experiment tracking, model versioning, drift detection, and compliance enforcement
Performance optimization through caching strategies, autoscaling policies, and resource utilization monitoring
Cross-functional collaboration with ML Ops, SRE, and platform engineering teams to accelerate AI adoption
Security implementation including RBAC, multi-tenant isolation, and content moderation pipelines

Produkte / Standards / Erfahrungen / Methoden

DevOps

Software

AWS

OpenAI

Kubernetes

Observability

GenAI

Development

Profile
As a freelance software engineer, I deliver tailored, scalable software solutions for enterprise systems. My focus spans software architecture and development, DevOps and LLM-powered AI applications, with strong expertise in building reliable, observable and cost-efficient distributed systems on AWS and Azure.

AI/ML & GenAI

OpenAI, Azure OpenAI, LangChain, LlamaIndex, Semantic Kernel, RAG (Retrieval-Augmented Generation), Prompt Engineering, LLM Fine-tuning, Model Inference, Vector Databases, MLflow, Kserve, Knative, MCP (Model Context Protocol), Chatbot Development, Agentic AI Systems

Cloud Platforms & Services

AWS (VPC, EC2, S3, Glacier, CloudWatch, API Gateway), Azure DevOps, Azure Cognitive Services, Cloud Foundry, Multi-cloud Architecture

Container Orchestration & Infrastructure

Kubernetes, Helm, ArgoCD, Argo Workflows, Docker, StatefulSets, DaemonSets, Custom Operators, Control Plane Architecture

Observability & Monitoring

OpenTelemetry, Prometheus, Grafana, Dynatrace, Promitor, Loki, Jaeger, Tempo, Alertmanager, Distributed Tracing, Metrics Engineering, SLO/SLA Monitoring

Programming Languages

Python, Go (Golang), Node.js, Java, Bash

Data Storage & Caching

Redis, MongoDB, PostgreSQL, Firebase Firestore, Vector Databases, OpenSearch, AWS S3

DevOps & Automation

GitOps, Jenkins, Terraform, Ansible, HashiCorp (Vault, Consul, Terraform), LocalStack, CI/CD Pipelines, GitHub Actions, Infrastructure-as-Code (IaC)

Networking & Communication

REST APIs, WebSocket, API Gateway, RBAC, Service Mesh

Development Practices & Patterns

Microservices Architecture, 12-Factor App Principles, SRE Practices, LLMOps, MLOps, Multi-tenant Design, Distributed Systems, Event-Driven Architecture, KEDA (Kubernetes Event-Driven Autoscaling)

Security & Compliance

RBAC (Role-Based Access Control), Content Moderation, Prompt Injection Defense, Multi-tenant Isolation, Policy Enforcement, Adversarial Testing

Data & ML Tools

DVC (Data Version Control), Weights & Biases, Model Registries, Experiment Tracking, Dataset Versioning

Betriebssysteme

Linux

Programmiersprachen

Go (Golang)

Python

Java

Node.js

Postgres

MongoDB

Firebase

Datenbanken

PostgresSQL

MongoDB

Redis

Firestore

Elasticsearch

Einsatzorte

Remote-Arbeit

nicht m�glich

Projekte

1 year 7 months

2024-10 - 2026-04

AI/ML Platform Engineering

AI/ML Platform Engineering LLM as Backend SystemArchitektur Back-End ...

Rolle

AI/ML Platform Engineering

Projektinhalte

Built a scalable AI/ML platform for enterprise workloads, with support for RAG pipelines, model

routing/orchestration, fine-tuning, and content-moderation workflows. Delivered cross-language

SDKs, internal CLI tools, and fully automated CI/CD pipelines to streamline AI adoption for product

teams while optimizing for cost efficiency, security, and operational reliability.

Responsibilities:

?? Designed and implemented AI?first CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI, Argo

Workflows/ArgoCD) that integrate model training, validation, and deployment into release

automation.

?? Built fine-tuning, embedding, and moderation pipelines using Azure OpenAI and Azure

Cognitive Services enhancing domain relevance, safety compliance, and multilingual support

across enterprise applications.

?? Implemented document processing pipelines integrating Azure Document Intelligence and

OCR to extract, structure, and route enterprise content into downstream RAG and analytics

workflows.

?? Automated deployment, scaling, and lifecycle management of GenAI workloads using Argo

Workflows, ArgoCD, Jenkins, and GitOps-based configuration management. Achieved faster

release cycles and significantly reduced configuration drift and deployment failures.

?? Integrated AI tooling into Git workflows and code quality processes (pre?merge LLM checks,

automated code review bots, Copilot evaluation & safe?use playbooks) for dev teams.

?? Authored operational runbooks, playbooks and hands?on workshop curricula (prompt

engineering, LLMOps, secure Copilot usage, CI/CD integration) and coached engineering/SRE

teams.

?? Enforced GDPR-compliant data handling, EU AI Act governance controls, and IAM-based access

policies across all model endpoints and platform components.

?? Designed and implemented multi-tenant model-routing services across AWS Bedrock, Azure

OpenAI and Google AI with dynamic LLM selection based on latency, token cost, throughput,

and task-specific performance.

?? Built FastAPI microservices as the backend foundation for SDK APIs and internal platform

services, exposing REST endpoints and supporting WebSocket-based streaming for real-time

inference responses.

?? Delivered cross-language SDK packages in Python, TypeScript and Go, enabling product teams

to integrate LLM capabilities with minimal boilerplate.

Kenntnisse

Kunde

SAP

1 year 9 months

2023-02 - 2024-10

Observability Engineering (AI/ML)

Observability Engineer (AI/ML) Prometheus Grafana Kubernetes ...

Rolle

Observability Engineer (AI/ML)

Projektinhalte

Designed and implemented a fully instrumented, cloud-native observability and telemetry

framework for hosted, fine-tuned, and proxied AI/ML models in enterprise-grade production

environments. Delivered end-to-end visibility into AI/ML training pipelines, inference workloads,

and model serving infrastructure.

Responsibilities:

?? Architected end-to-end observability pipelines for ML APIs and model-serving runtimes using

OpenTelemetry SDKs/collectors, Prometheus exporters, and Kubernetes operators,

instrumenting the full lifecycle of model training, inference, and system-level resource

utilization.

?? Instrumented model endpoints, batch/stream training jobs, and inference gateways to capture

high-resolution metrics such as tail latency, throughput (RPS/QPS), token-per-second

performance, GPU memory fragmentation, multi-node utilization, error budgets, and anomaly

detection signals.

?? Profiled and monitored inference optimization for vLLM and TensorRT-LLM deployments,

tracking CUDA kernel performance, NCCL communication overhead, and memory bandwidth

utilization to identify and resolve latency regressions.

?? Monitored distributed multi-GPU training runs across nodes connected via InfiniBand,

capturing per-GPU utilization, gradient synchronization bottlenecks, and HPC cluster health

signals for large-scale model training workloads.

?? Implemented automated alerting and SLO/SLA monitoring using Prometheus Alertmanager

and custom anomaly-detection pipelines to identify inference latency regressions, GPU/CPU

saturation events, memory leaks, container restarts, or failed model-training runs.

?? Collaborated with MLOps, SRE, and platform engineering teams to integrate telemetry into

CI/CD pipelines, automate environment drift detection, and enable data-driven scaling policies

for training and inference clusters.

Kenntnisse

Prometheus Grafana Kubernetes Helm Python OpenTelemetry Promitor Dynatrace Go Loki Jaeger Tempo KEDA HelmArgoCD MLflow ArgoCD vLLM TensorRT-LLM CUDA / NCCL Multi-GPU / HPC

Kunde

SAP

11 months

2022-05 - 2023-03

DevOps / Platform Engineer

DevOps / Platform Engineer Go(lang) WebSocket OpenSearch ...

Rolle

DevOps / Platform Engineer

Projektinhalte

Responsibilities:

Architected and operated a cloud-native control plane with core components such as API server, RBAC, reconciliation loops, and namespace isolation, applying Kubernetes-style operational patterns for scalable infrastructure management.
Built and maintained infrastructure automation pipelines using AWS CDK, enabling repeatable, version-controlled provisioning and deployment across environments.
Designed event-driven systems using distributed messaging, WebSockets, and caching layers (Redis) to support real-time infrastructure state synchronization and high-throughput workloads.
Implemented robust state management and tenant isolation using SQL/ORM layers with support for concurrency control and consistent reads at scale.
Developed and maintained CI/CD pipelines with local cloud emulation using LocalStack, improving developer productivity and deployment reliability.
Established observability practices with Prometheus, Grafana, and Alertmanager, enabling proactive monitoring, alerting, and performance optimization.
Applied Kubernetes-native constructs (Deployments, StatefulSets, DaemonSets, Operators) to manage containerized workloads and ensure resilient service orchestration.
Enabled internal platform adoption by providing reusable tooling, SDKs, and abstractions for service teams, reducing operational overhead and improving deployment consistency.

Kenntnisse

Go(lang) WebSocket OpenSearch LocalStack Redis Prometheus Grafana AWS S3 Kubernetes RBAC Distributed Systems Webhook Go (Golang) TypeScript SQL AWS (Lambda ECS S3 CDK) Kubernetes

Kunde

SAP

Einsatzort

Walldorf

1 year 8 months

2020-09 - 2022-04

DevOps Engineer (Kubernetes | GitOps | Observability)

DevOps Engineer Prometheus API gateway Redis ...

Rolle

DevOps Engineer

Projektinhalte

Responsibilities:

Designed and operated Kubernetes-style control-plane components (API server, reconciliation loops, RBAC, resource isolation), aligning closely with operator-based architectures and extensibility patterns.
Managed containerized workloads across Kubernetes and Docker, applying Helm-based deployments and declarative configuration practices.
Implemented Git-driven deployment workflows and CI/CD pipelines (GitLab CI/CD, trunk-based development), enabling automated build, test, and release cycles.
Built event-driven infrastructure using message queues, Redis, and WebSockets to support real-time orchestration and high-throughput distributed systems.
Automated infrastructure provisioning using AWS CDK and configuration management concepts aligned with tools like Ansible.
Integrated observability stack with Prometheus, Grafana, and alerting pipelines for metrics, logging, and anomaly detection.
Implemented persistent storage and state management strategies using SQL databases (PostgreSQL-compatible patterns) with strong consistency and concurrency controls.
Developed and maintained internal platform tooling and SDKs to standardize deployment workflows and reduce operational complexity for engineering teams.
Utilized local cloud emulation via LocalStack to improve CI reliability and developer feedback loops.

Kenntnisse

Prometheus API gateway Redis Postgres Grafana Kubernetes Helm Docker CI/CD Gitops ArgoCD Terraform RBAC

Einsatzort

Walldorf

1 year 5 months

2019-04 - 2020-08

Developed cloud infra. for the SAP HANA-as-a-Service

DevOps engineer HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) ...

Rolle

DevOps engineer

Projektinhalte

Engineered a scalable, multi-region cloud infrastructure platform for SAP HANA-as-a-Service on

AWS, enabling fully automated provisioning, upgrades, and lifecycle management of enterprise

database systems. Designed the system to meet high availability, security, and compliance

requirements while significantly accelerating deployment speed.

Responsibilities:

?? Architected and implemented infrastructure-as-code using Terraform and configuration

management with Ansible to automate SAP HANA installation, upgrades, and system lifecycle

operations across environments.

?? Designed and developed an event-driven automation layer, including a lightweight agent

integrated with Consul for service discovery and change propagation, triggering dynamic

infrastructure workflows.

?? Built and exposed APIs to handle customer HANA system provisioning and lifecycle operations,

enabling self-service and programmatic access.

?? Engineered secure, production-grade AWS network architecture (VPC, subnets, routing, IAM),

ensuring isolation, compliance, and high availability across regions.

?? Implemented automated backup and disaster recovery strategies leveraging S3 and Glacier,

ensuring data durability and rapid restoration.

?? Reduced end-to-end deployment time by 40% through automation and system optimization

while maintaining strict enterprise security and compliance standards.?

Kenntnisse

Kunde

SAP

Aus- und Weiterbildung

2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science

Position

Software and DevOps engineer with focus on cloud-native development and LLM application development.

Kompetenzen

Schwerpunkte

AI/ML Platform Engineering & LLMOps

Cloud-Native Observability & SRE

Distributed Systems & Platform Engineering

AI/ML Platform Engineering & LLMOps

Cloud-Native Observability and SRE

Distributed Systems and Platform Engineering

Aufgabenbereiche

System Architecture

Software Engineering

DevOps

Architecture and implementation of enterprise GenAI platforms supporting RAG, fine-tuning, model routing, and content moderation workflows
Design and deployment of observability frameworks for AI/ML systems, including distributed tracing, metrics pipelines, and SLO/SLA monitoring
Development of autonomous agent systems with tool integration, memory modules, and safety/governance controls
Building cloud-native microservices and APIs for multi-tenant SaaS offerings with focus on scalability and reliability
Infrastructure automation using GitOps, CI/CD pipelines, and infrastructure-as-code across AWS and Azure environments
Implementation of control-plane architectures for container orchestration and resource provisioning at scale
Establishment of LLMOps practices including experiment tracking, model versioning, drift detection, and compliance enforcement
Performance optimization through caching strategies, autoscaling policies, and resource utilization monitoring
Cross-functional collaboration with ML Ops, SRE, and platform engineering teams to accelerate AI adoption
Security implementation including RBAC, multi-tenant isolation, and content moderation pipelines

Produkte / Standards / Erfahrungen / Methoden

DevOps

Software

AWS

OpenAI

Kubernetes

Observability

GenAI

Development

AI/ML & GenAI

Cloud Platforms & Services

AWS (VPC, EC2, S3, Glacier, CloudWatch, API Gateway), Azure DevOps, Azure Cognitive Services, Cloud Foundry, Multi-cloud Architecture

Container Orchestration & Infrastructure

Kubernetes, Helm, ArgoCD, Argo Workflows, Docker, StatefulSets, DaemonSets, Custom Operators, Control Plane Architecture

Observability & Monitoring

OpenTelemetry, Prometheus, Grafana, Dynatrace, Promitor, Loki, Jaeger, Tempo, Alertmanager, Distributed Tracing, Metrics Engineering, SLO/SLA Monitoring

Programming Languages

Python, Go (Golang), Node.js, Java, Bash

Data Storage & Caching

Redis, MongoDB, PostgreSQL, Firebase Firestore, Vector Databases, OpenSearch, AWS S3

DevOps & Automation

GitOps, Jenkins, Terraform, Ansible, HashiCorp (Vault, Consul, Terraform), LocalStack, CI/CD Pipelines, GitHub Actions, Infrastructure-as-Code (IaC)

Networking & Communication

REST APIs, WebSocket, API Gateway, RBAC, Service Mesh

Development Practices & Patterns

Microservices Architecture, 12-Factor App Principles, SRE Practices, LLMOps, MLOps, Multi-tenant Design, Distributed Systems, Event-Driven Architecture, KEDA (Kubernetes Event-Driven Autoscaling)

Security & Compliance

RBAC (Role-Based Access Control), Content Moderation, Prompt Injection Defense, Multi-tenant Isolation, Policy Enforcement, Adversarial Testing

Data & ML Tools

DVC (Data Version Control), Weights & Biases, Model Registries, Experiment Tracking, Dataset Versioning

Betriebssysteme

Linux

Programmiersprachen

Go (Golang)

Python

Java

Node.js

Postgres

MongoDB

Firebase

Datenbanken

PostgresSQL

MongoDB

Redis

Firestore

Elasticsearch

Vertrauen Sie auf Randstad

Im Bereich Freelancing

Im Bereich Arbeitnehmer�berlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Name E-Mail-Adresse Ihre Frage

Telefonnummer Unternehmen

Ich habe die Datenschutzbestimmungen gelesen und bin damit einverstanden.

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Schwerpunkte

Aufgabenbereiche

Produkte / Standards / Erfahrungen / Methoden

Betriebssysteme

Programmiersprachen

Datenbanken

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Schwerpunkte

Aufgabenbereiche

Produkte / Standards / Erfahrungen / Methoden

Betriebssysteme

Programmiersprachen

Datenbanken

Vertrauen Sie auf Randstad

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.