Freelancer: Software engineer with expertise in backend and cloud-native development, Kubernetes and LLM application development.

Freiberufler / Selbstst�ndiger

Remote-Arbeit

Verf�gbar ab: 01.12.2025

Verf�gbar zu: 100%

davon vor Ort: 100%

Top-Skills

Software

DevOps

Cloud

Node.js

Cloud Foundry

Kubernetes

Golang

AWS

prometheus

SystemArchitektur

Container

OpenAI

Chatbot

LangChain

Software-Entwicklung

cloud engineer

Sprachen

English

German

Einsatzorte

St�dte

Heidelberg (+500km) Z�rich (+50km)

L�nder

Deutschland, Schweiz

Remote-Arbeit

m�glich

Projekte

1 year 11 months

2024-01 - 2025-11

Observability Engineering (AI/ML)

Observability Engineer (AI/ML) Prometheus Grafana Kubernetes ...

Rolle

Observability Engineer (AI/ML)

Projektinhalte

Designed and implemented a fully instrumented, cloud-native observability and telemetry framework for hosted, fine-tuned, and proxied LLMs operating in enterprise-grade production environments. Delivered end-to-end visibility into AI/ML training pipelines, inference workloads, and model serving infrastructure. Developed unified tracing, logging, and metrics pipelines leveraging distributed-systems observability standards to surface granular insights across token-level usage, GPU/CPU resource saturation, container/node performance, latency distributions (P50?P99), error propagation, and emergent model-behavior ?flares.? Integrated cross-cloud observability stacks to support proactive SRE practices, automated incident triage, drift detection, and optimization of high-throughput AI workloads.

Architected end-to-end observability pipelines for LLM APIs and model-serving runtimes using OpenTelemetry SDKs/collectors, Prometheus exporters, and Kubernetes operators, instrumenting the full lifecycle of model training, inference, and system-level resource utilization.
Instrumented model endpoints, batch/stream training jobs, and inference gateways to capture high-resolution metrics such as tail latency, throughput (RPS/QPS), token-per-second performance, GPU memory fragmentation, multi-node utilization, error budgets, and anomaly detection signals via statistical and ML-based detectors.
Implemented automated alerting and SLO/SLA monitoring using Prometheus Alertmanager, and custom anomaly-detection pipelines to identify inference latency regressions, GPU/CPU saturation events, memory leaks, container restarts, or failed model-training runs?reducing mean time to detect (MTTD) and mean time to resolve (MTTR).
Conducted architectural reviews, telemetry schema standardization, and iterative refinements to reduce observability overhead, optimize scrape intervals, improve sampling strategies, and enhance distributed tracing efficiency across multi-cloud LLM deployments.
Collaborated with ML Ops, SRE, and platform engineering teams to integrate telemetry into CI/CD pipelines, automate environment drift detection, and enable data-driven scaling policies for training and inference clusters.

Kenntnisse

Prometheus Grafana Kubernetes Helm Python OpenTelemetry Promitor Dynatrace Go Loki Jaeger Tempo KEDA HelmArgoCD MLflow

Kunde

SAP

1 year 11 months

2024-01 - 2025-11

AI/ML Platform Engineering

AI/ML Platform Engineering LLM as Backend SystemArchitektur Back-End ...

Rolle

AI/ML Platform Engineering

Projektinhalte

Built a scalable GenAI platform for enterprise workloads, integrating advanced LLM-as-a-Backend capabilities with support for RAG pipelines, model routing/orchestration, fine-tuning, and content-moderation workflows. Delivered cross-language SDKs, internal CLI tools, and fully automated CI/CD pipelines to streamline AI adoption for product teams while optimizing for cost efficiency, security, and operational reliability.

Responsibilities:

Designed and implemented a that dynamically selected LLM backends based on latency, token cost, throughput, and task-specific quality, including fallback and failover strategies across regions/providers to meet strict SLAs.
Built using Azure OpenAI and Azure Cognitive Services, tuning concurrency, batching, and rate-limit handling to improve domain relevance, safety compliance, and multilingual support in enterprise applications.
Automated of GenAI workloads using Argo Workflows, ArgoCD, Jenkins, and GitOps-based configuration management, enabling repeatable infrastructure changes, reducing configuration drift, and achieving ~40% faster release cycles.
Delivered ?including Python and Go SDKs, internal CLI/automation scripts, and out-of-the-box observability integrations (Prometheus/Grafana)?that reduced integration effort, standardized access patterns to LLM services, and accelerated AI adoption across product teams.
Ensured by monitoring token utilization, CPU/GPU consumption (where applicable), autoscaling behavior, and model-specific SLAs; implemented per-tenant quotas, rate limiting, and governance controls to provide fair resource allocation, security, and full auditability in a multi-tenant environment.
Collaborated with platform and infrastructure teams to tuning GPU requests/limits, pod placement, and batching strategies to maximize GPU utilization while meeting p95/p99 latency and uptime objectives.
Deployed NVIDIA GPU Operator in production Kubernetes clusters to deliver reliable, automated GPU driver/tooling management and expose robust Prometheus-ready GPU telemetry.

Kenntnisse

LLM as Backend SystemArchitektur Back-End Python Prompt-Engineering Azure OpenAI RAG ArgoCD Jenkins GitOps Go Kserve Knative Kuberntes Inference Azure Devops

Kunde

SAP

10 months

2023-03 - 2023-12

Agentic AI Platform & RAG Excellence Framework

GenAI Engineer Python OpenAI LangChain ...

Rolle

GenAI Engineer

Projektinhalte

Designed and delivered a production-grade agentic AI platform supporting full-stack development, orchestration, deployment, and observability of LLM-driven autonomous agents across multiple enterprise business units. Implemented standardized blueprints for agent topologies, tool invocation layers, RAG/Indexing pipelines, and LLMOps workflows, ensuring horizontal scalability, fault tolerance, auditability, and regulatory alignment within a highly controlled financial-services environment. Built comprehensive evaluation, governance, and safety frameworks that accelerated organizational adoption of AI copilots and significantly reduced time-to-market for new intelligent-automation workloads.

Responsibilities:

Owned full-lifecycle delivery of agentic AI and ML initiatives?from problem scoping, feature engineering, and dataset curation to model development, quantitative/qualitative evaluation, deployment, and post-production monitoring.
Architected advanced agent systems (e.g., planner?executor, hierarchical/multi-agent, tool-augmented agents) leveraging memory modules, reflection loops, action-scoring policies, and controllability/safety constraints.
Designed and optimized RAG pipelines, including document ingestion, chunking heuristics, embedding generation, vector store configuration, hybrid retrieval, reranking, caching, and evaluation frameworks for precision/recall, hallucination rate, and latency SLAs.
? Implemented MCP for standardized, permissioned tool integrations and orchestrated heterogeneous LLM workloads using LangChain, LlamaIndex and custom microservices for tool execution.
Established enterprise-grade LLMOps practices including experiment tracking (MLflow/Weights & Biases), dataset/prompt versioning (DVC/Git), CI/CD pipelines (GitHub Actions/Azure DevOps), model registries, workload autoscaling, telemetry, drift detection, and incident-response runbooks.
Enforced reliability, safety, and compliance controls through prompt-injection defenses, schema validation, content-moderation pipelines, differential access controls, policy enforcement layers, adversarial/red-teaming evaluations, and pre-production quality gates.

Kenntnisse

Python OpenAI LangChain Kubernetes RAG Chatbot Go (Golang) Firebase Firestore LlamaIndex Semantic Kernel Vector Databases Redis MCP AWS FastAPI async asynchronous programming

Einsatzort

Remote

11 months

2022-05 - 2023-03

Development of a distributed orchestration system

Senior software engineer Go(lang) WebSocket OpenSearch ...

Rolle

Senior software engineer

Projektinhalte

Built a custom container-orchestration and control-plane platform inspired by Kubernetes, enabling automated cloud resource provisioning for hundreds of products across thousands of tenants. The platform was engineered for horizontal scalability, high availability, and low-latency rollout pipelines.

Responsibilities:

Designed and implemented the core control-plane architecture?including an API Server with RBAC, Controller Manager, Scheduler-like reconciliation loops, Namespace isolation, and CRD-style resource definitions?to support secure, multi-tenant operations at scale.
Developed client SDKs and code-generation tools to streamline custom controller development for internal engineering teams, ensuring consistency with the platform?s declarative model.
Integrated WebSocket for real-time event streaming between control plane and worker nodes and utilized Redis� for distributed caching and message brokering.
Employed LocalStack for local AWS service emulation in CI/CD pipelines.
Integrated Prometheus, Alertmanager, and Grafana for end-to-end metrics instrumentation, distributed tracing, and proactive anomaly detection. Implemented service liveness/readiness probes, exporters, and dashboards to improve reliability and observability.
Implemented Kubernetes-native patterns: Deployments, StatefulSets, DaemonSets, custom operators.

Kenntnisse

Go(lang) WebSocket OpenSearch LocalStack Redis Prometheus Grafana AWS S3 Kubernetes RBAC Distributed Systems Webhook

Kunde

SAP

Einsatzort

Walldorf

1 year 8 months

2020-09 - 2022-04

Enablement of SAP Analytics Cloud?s SaaS offering

Senior cloud engineer Cloud Foundry Node.js Java ...

Rolle

Senior cloud engineer

Projektinhalte

Developed core SaaS capabilities for SAP Analytics Cloud, including service broker, billing, and metering microservices, enabling pay-as-you-go access.

Built service broker for Cloud Foundry using Node.js.
Developed billing and metering microservices to track and charge resource usage.
Implemented Prometheus/Grafana monitoring to ensure system reliability.
Designed APIs for seamless microservice integration and external communication.
Ensured service scalability, supporting hundreds of concurrent tenants reliably.

Kenntnisse

Cloud Foundry Node.js Java Prometheus API gateway Redis Postgres Grafana

Einsatzort

Walldorf

1 year 5 months

2019-04 - 2020-08

Developed cloud infra. for the SAP HANA-as-a-Service

DevOps engineer HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) ...

Rolle

DevOps engineer

Projektinhalte

Built robust cloud infrastructure on AWS for HANA-as-a-Service, automating deployments, upgrades, and multi-region provisioning.

Developed Terraform and Ansible playbooks for installation and�automated upgrades.
Built a lightweight agent to respond to Consul changes, triggering relevant playbooks.
Implemented APIs for customer HANA system orders.
Designed secure VPC architecture and automated backup/recovery workflows using AWS S3 and Glacier.
Reduced deployment time by 40% while maintaining compliance and security standards.

Kenntnisse

HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) Cloud Foundry Python Go (Golang) Bash

Kunde

SAP

10 months

2018-07 - 2019-04

Development of an elastic caching microservice

Software Engineer Go(lang) Redis MongoDB ...

Rolle

Software Engineer

Projektinhalte

Developed an elastic caching microservice in Go to accelerate analytical query performance for a multi-tenant analytics platform, implementing context-aware caching with user permissions, roles, and cube dimension metadata.

Implementation:

Built cloud-native microservice following 12-factor app principles with stateless design, externalized configuration, and graceful shutdown
Implemented multi-layer caching strategy using Redis (L1 in-memory cache with TTL/LRU eviction) and MongoDB (L2 persistent cache for complex query metadata)
Designed cache key generation algorithm incorporating RBAC permissions, tenant isolation, and OLAP cube context (dimensions, measures, filters)
Developed cache invalidation strategies with pub/sub patterns for real-time data updates
Integrated Prometheus with custom metrics (cache hit/miss ratios, query latency percentiles, eviction rates)
Built Grafana dashboards for real-time performance monitoring and capacity planning

Kenntnisse

Go(lang) Redis MongoDB Kubernetes Prometheus 12-factor app

Kunde

SAP

Aus- und Weiterbildung

2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science

Position

Software and DevOps engineer with focus on cloud-native development and LLM application development.

Kompetenzen

Top-Skills

Software DevOps Cloud Node.js Cloud Foundry Kubernetes Golang AWS prometheus SystemArchitektur Container OpenAI Chatbot LangChain Software-Entwicklung cloud engineer

Schwerpunkte

AI/ML Platform Engineering & LLMOps

Cloud-Native Observability & SRE

Distributed Systems & Platform Engineering

AI/ML Platform Engineering & LLMOps

Deep expertise in building production-grade GenAI platforms and agentic AI systems, with comprehensive experience in LLM deployment, fine-tuning, RAG pipelines, and model orchestration. Specialized in architecting multi-tenant AI infrastructure that balances performance, cost optimization, and enterprise security requirements.

Cloud-Native Observability and SRE

Expert in designing end-to-end observability solutions for distributed systems and AI/ML workloads using OpenTelemetry, Prometheus, and Grafana. Proven ability to instrument complex environments from token-level metrics to infrastructure telemetry, enabling proactive incident management, anomaly detection, and data-driven optimization of high-throughput systems.

Distributed Systems and Platform Engineering

Strong foundation in building scalable, cloud-native platforms with expertise in Kubernetes ecosystem, control-plane architecture, and microservices orchestration. Skilled in implementing GitOps workflows, CI/CD automation, and infrastructure-as-code practices to deliver reliable, self-service platforms for enterprise-scale deployments.

Aufgabenbereiche

System Architecture

Software Engineering

DevOps

Architecture and implementation of enterprise GenAI platforms supporting RAG, fine-tuning, model routing, and content moderation workflows
Design and deployment of observability frameworks for AI/ML systems, including distributed tracing, metrics pipelines, and SLO/SLA monitoring
Development of autonomous agent systems with tool integration, memory modules, and safety/governance controls
Building cloud-native microservices and APIs for multi-tenant SaaS offerings with focus on scalability and reliability
Infrastructure automation using GitOps, CI/CD pipelines, and infrastructure-as-code across AWS and Azure environments
Implementation of control-plane architectures for container orchestration and resource provisioning at scale
Establishment of LLMOps practices including experiment tracking, model versioning, drift detection, and compliance enforcement
Performance optimization through caching strategies, autoscaling policies, and resource utilization monitoring
Cross-functional collaboration with ML Ops, SRE, and platform engineering teams to accelerate AI adoption
Security implementation including RBAC, multi-tenant isolation, and content moderation pipelines

Produkte / Standards / Erfahrungen / Methoden

DevOps

Software

AWS

OpenAI

Kubernetes

Observability

GenAI

Development

Profile
As a freelance software engineer, I deliver tailored, scalable software solutions for enterprise systems. My focus spans software architecture and development, DevOps and LLM-powered AI applications, with strong expertise in building reliable, observable and cost-efficient distributed systems on AWS and Azure.

AI/ML & GenAI

OpenAI, Azure OpenAI, LangChain, LlamaIndex, Semantic Kernel, RAG (Retrieval-Augmented Generation), Prompt Engineering, LLM Fine-tuning, Model Inference, Vector Databases, MLflow, Kserve, Knative, MCP (Model Context Protocol), Chatbot Development, Agentic AI Systems

Cloud Platforms & Services

AWS (VPC, EC2, S3, Glacier, CloudWatch, API Gateway), Azure DevOps, Azure Cognitive Services, Cloud Foundry, Multi-cloud Architecture

Container Orchestration & Infrastructure

Kubernetes, Helm, ArgoCD, Argo Workflows, Docker, StatefulSets, DaemonSets, Custom Operators, Control Plane Architecture

Observability & Monitoring

OpenTelemetry, Prometheus, Grafana, Dynatrace, Promitor, Loki, Jaeger, Tempo, Alertmanager, Distributed Tracing, Metrics Engineering, SLO/SLA Monitoring

Programming Languages

Python, Go (Golang), Node.js, Java, Bash

Data Storage & Caching

Redis, MongoDB, PostgreSQL, Firebase Firestore, Vector Databases, OpenSearch, AWS S3

DevOps & Automation

GitOps, Jenkins, Terraform, Ansible, HashiCorp (Vault, Consul, Terraform), LocalStack, CI/CD Pipelines, GitHub Actions, Infrastructure-as-Code (IaC)

Networking & Communication

REST APIs, WebSocket, API Gateway, RBAC, Service Mesh

Development Practices & Patterns

Microservices Architecture, 12-Factor App Principles, SRE Practices, LLMOps, MLOps, Multi-tenant Design, Distributed Systems, Event-Driven Architecture, KEDA (Kubernetes Event-Driven Autoscaling)

Security & Compliance

RBAC (Role-Based Access Control), Content Moderation, Prompt Injection Defense, Multi-tenant Isolation, Policy Enforcement, Adversarial Testing

Data & ML Tools

DVC (Data Version Control), Weights & Biases, Model Registries, Experiment Tracking, Dataset Versioning

Betriebssysteme

Linux

Programmiersprachen

Go (Golang)

Python

Java

Node.js

Postgres

MongoDB

Firebase

Datenbanken

PostgresSQL

MongoDB

Redis

Firestore

Elasticsearch

Einsatzorte

St�dte

Heidelberg (+500km) Z�rich (+50km)

L�nder

Deutschland, Schweiz

Remote-Arbeit

m�glich

Projekte

1 year 11 months

2024-01 - 2025-11

Observability Engineering (AI/ML)

Observability Engineer (AI/ML) Prometheus Grafana Kubernetes ...

Rolle

Observability Engineer (AI/ML)

Projektinhalte

Architected end-to-end observability pipelines for LLM APIs and model-serving runtimes using OpenTelemetry SDKs/collectors, Prometheus exporters, and Kubernetes operators, instrumenting the full lifecycle of model training, inference, and system-level resource utilization.
Instrumented model endpoints, batch/stream training jobs, and inference gateways to capture high-resolution metrics such as tail latency, throughput (RPS/QPS), token-per-second performance, GPU memory fragmentation, multi-node utilization, error budgets, and anomaly detection signals via statistical and ML-based detectors.
Implemented automated alerting and SLO/SLA monitoring using Prometheus Alertmanager, and custom anomaly-detection pipelines to identify inference latency regressions, GPU/CPU saturation events, memory leaks, container restarts, or failed model-training runs?reducing mean time to detect (MTTD) and mean time to resolve (MTTR).
Conducted architectural reviews, telemetry schema standardization, and iterative refinements to reduce observability overhead, optimize scrape intervals, improve sampling strategies, and enhance distributed tracing efficiency across multi-cloud LLM deployments.
Collaborated with ML Ops, SRE, and platform engineering teams to integrate telemetry into CI/CD pipelines, automate environment drift detection, and enable data-driven scaling policies for training and inference clusters.

Kenntnisse

Prometheus Grafana Kubernetes Helm Python OpenTelemetry Promitor Dynatrace Go Loki Jaeger Tempo KEDA HelmArgoCD MLflow

Kunde

SAP

1 year 11 months

2024-01 - 2025-11

AI/ML Platform Engineering

AI/ML Platform Engineering LLM as Backend SystemArchitektur Back-End ...

Rolle

AI/ML Platform Engineering

Projektinhalte

Responsibilities:

Designed and implemented a that dynamically selected LLM backends based on latency, token cost, throughput, and task-specific quality, including fallback and failover strategies across regions/providers to meet strict SLAs.
Built using Azure OpenAI and Azure Cognitive Services, tuning concurrency, batching, and rate-limit handling to improve domain relevance, safety compliance, and multilingual support in enterprise applications.
Automated of GenAI workloads using Argo Workflows, ArgoCD, Jenkins, and GitOps-based configuration management, enabling repeatable infrastructure changes, reducing configuration drift, and achieving ~40% faster release cycles.
Delivered ?including Python and Go SDKs, internal CLI/automation scripts, and out-of-the-box observability integrations (Prometheus/Grafana)?that reduced integration effort, standardized access patterns to LLM services, and accelerated AI adoption across product teams.
Ensured by monitoring token utilization, CPU/GPU consumption (where applicable), autoscaling behavior, and model-specific SLAs; implemented per-tenant quotas, rate limiting, and governance controls to provide fair resource allocation, security, and full auditability in a multi-tenant environment.
Collaborated with platform and infrastructure teams to tuning GPU requests/limits, pod placement, and batching strategies to maximize GPU utilization while meeting p95/p99 latency and uptime objectives.
Deployed NVIDIA GPU Operator in production Kubernetes clusters to deliver reliable, automated GPU driver/tooling management and expose robust Prometheus-ready GPU telemetry.

Kenntnisse

LLM as Backend SystemArchitektur Back-End Python Prompt-Engineering Azure OpenAI RAG ArgoCD Jenkins GitOps Go Kserve Knative Kuberntes Inference Azure Devops

Kunde

SAP

10 months

2023-03 - 2023-12

Agentic AI Platform & RAG Excellence Framework

GenAI Engineer Python OpenAI LangChain ...

Rolle

GenAI Engineer

Projektinhalte

Responsibilities:

Owned full-lifecycle delivery of agentic AI and ML initiatives?from problem scoping, feature engineering, and dataset curation to model development, quantitative/qualitative evaluation, deployment, and post-production monitoring.
Architected advanced agent systems (e.g., planner?executor, hierarchical/multi-agent, tool-augmented agents) leveraging memory modules, reflection loops, action-scoring policies, and controllability/safety constraints.
Designed and optimized RAG pipelines, including document ingestion, chunking heuristics, embedding generation, vector store configuration, hybrid retrieval, reranking, caching, and evaluation frameworks for precision/recall, hallucination rate, and latency SLAs.
? Implemented MCP for standardized, permissioned tool integrations and orchestrated heterogeneous LLM workloads using LangChain, LlamaIndex and custom microservices for tool execution.
Established enterprise-grade LLMOps practices including experiment tracking (MLflow/Weights & Biases), dataset/prompt versioning (DVC/Git), CI/CD pipelines (GitHub Actions/Azure DevOps), model registries, workload autoscaling, telemetry, drift detection, and incident-response runbooks.
Enforced reliability, safety, and compliance controls through prompt-injection defenses, schema validation, content-moderation pipelines, differential access controls, policy enforcement layers, adversarial/red-teaming evaluations, and pre-production quality gates.

Kenntnisse

Python OpenAI LangChain Kubernetes RAG Chatbot Go (Golang) Firebase Firestore LlamaIndex Semantic Kernel Vector Databases Redis MCP AWS FastAPI async asynchronous programming

Einsatzort

Remote

11 months

2022-05 - 2023-03

Development of a distributed orchestration system

Senior software engineer Go(lang) WebSocket OpenSearch ...

Rolle

Senior software engineer

Projektinhalte

Responsibilities:

Designed and implemented the core control-plane architecture?including an API Server with RBAC, Controller Manager, Scheduler-like reconciliation loops, Namespace isolation, and CRD-style resource definitions?to support secure, multi-tenant operations at scale.
Developed client SDKs and code-generation tools to streamline custom controller development for internal engineering teams, ensuring consistency with the platform?s declarative model.
Integrated WebSocket for real-time event streaming between control plane and worker nodes and utilized Redis� for distributed caching and message brokering.
Employed LocalStack for local AWS service emulation in CI/CD pipelines.
Integrated Prometheus, Alertmanager, and Grafana for end-to-end metrics instrumentation, distributed tracing, and proactive anomaly detection. Implemented service liveness/readiness probes, exporters, and dashboards to improve reliability and observability.
Implemented Kubernetes-native patterns: Deployments, StatefulSets, DaemonSets, custom operators.

Kenntnisse

Go(lang) WebSocket OpenSearch LocalStack Redis Prometheus Grafana AWS S3 Kubernetes RBAC Distributed Systems Webhook

Kunde

SAP

Einsatzort

Walldorf

1 year 8 months

2020-09 - 2022-04

Enablement of SAP Analytics Cloud?s SaaS offering

Senior cloud engineer Cloud Foundry Node.js Java ...

Rolle

Senior cloud engineer

Projektinhalte

Developed core SaaS capabilities for SAP Analytics Cloud, including service broker, billing, and metering microservices, enabling pay-as-you-go access.

Built service broker for Cloud Foundry using Node.js.
Developed billing and metering microservices to track and charge resource usage.
Implemented Prometheus/Grafana monitoring to ensure system reliability.
Designed APIs for seamless microservice integration and external communication.
Ensured service scalability, supporting hundreds of concurrent tenants reliably.

Kenntnisse

Cloud Foundry Node.js Java Prometheus API gateway Redis Postgres Grafana

Einsatzort

Walldorf

1 year 5 months

2019-04 - 2020-08

Developed cloud infra. for the SAP HANA-as-a-Service

DevOps engineer HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) ...

Rolle

DevOps engineer

Projektinhalte

Built robust cloud infrastructure on AWS for HANA-as-a-Service, automating deployments, upgrades, and multi-region provisioning.

Developed Terraform and Ansible playbooks for installation and�automated upgrades.
Built a lightweight agent to respond to Consul changes, triggering relevant playbooks.
Implemented APIs for customer HANA system orders.
Designed secure VPC architecture and automated backup/recovery workflows using AWS S3 and Glacier.
Reduced deployment time by 40% while maintaining compliance and security standards.

Kenntnisse

HashiCorp (Terraform/ Vault/ Consul) Ansible AWS (VPC/ EC2/ S3/ Glacier/ Cloud Watch/ API Gateway) Cloud Foundry Python Go (Golang) Bash

Kunde

SAP

10 months

2018-07 - 2019-04

Development of an elastic caching microservice

Software Engineer Go(lang) Redis MongoDB ...

Rolle

Software Engineer

Projektinhalte

Implementation:

Built cloud-native microservice following 12-factor app principles with stateless design, externalized configuration, and graceful shutdown
Implemented multi-layer caching strategy using Redis (L1 in-memory cache with TTL/LRU eviction) and MongoDB (L2 persistent cache for complex query metadata)
Designed cache key generation algorithm incorporating RBAC permissions, tenant isolation, and OLAP cube context (dimensions, measures, filters)
Developed cache invalidation strategies with pub/sub patterns for real-time data updates
Integrated Prometheus with custom metrics (cache hit/miss ratios, query latency percentiles, eviction rates)
Built Grafana dashboards for real-time performance monitoring and capacity planning

Kenntnisse

Go(lang) Redis MongoDB Kubernetes Prometheus 12-factor app

Kunde

SAP

Aus- und Weiterbildung

2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science

Position

Software and DevOps engineer with focus on cloud-native development and LLM application development.

Kompetenzen

Top-Skills

Software DevOps Cloud Node.js Cloud Foundry Kubernetes Golang AWS prometheus SystemArchitektur Container OpenAI Chatbot LangChain Software-Entwicklung cloud engineer

Schwerpunkte

AI/ML Platform Engineering & LLMOps

Cloud-Native Observability & SRE

Distributed Systems & Platform Engineering

AI/ML Platform Engineering & LLMOps

Cloud-Native Observability and SRE

Distributed Systems and Platform Engineering

Aufgabenbereiche

System Architecture

Software Engineering

DevOps

Architecture and implementation of enterprise GenAI platforms supporting RAG, fine-tuning, model routing, and content moderation workflows
Design and deployment of observability frameworks for AI/ML systems, including distributed tracing, metrics pipelines, and SLO/SLA monitoring
Development of autonomous agent systems with tool integration, memory modules, and safety/governance controls
Building cloud-native microservices and APIs for multi-tenant SaaS offerings with focus on scalability and reliability
Infrastructure automation using GitOps, CI/CD pipelines, and infrastructure-as-code across AWS and Azure environments
Implementation of control-plane architectures for container orchestration and resource provisioning at scale
Establishment of LLMOps practices including experiment tracking, model versioning, drift detection, and compliance enforcement
Performance optimization through caching strategies, autoscaling policies, and resource utilization monitoring
Cross-functional collaboration with ML Ops, SRE, and platform engineering teams to accelerate AI adoption
Security implementation including RBAC, multi-tenant isolation, and content moderation pipelines

Produkte / Standards / Erfahrungen / Methoden

DevOps

Software

AWS

OpenAI

Kubernetes

Observability

GenAI

Development

AI/ML & GenAI

Cloud Platforms & Services

AWS (VPC, EC2, S3, Glacier, CloudWatch, API Gateway), Azure DevOps, Azure Cognitive Services, Cloud Foundry, Multi-cloud Architecture

Container Orchestration & Infrastructure

Kubernetes, Helm, ArgoCD, Argo Workflows, Docker, StatefulSets, DaemonSets, Custom Operators, Control Plane Architecture

Observability & Monitoring

OpenTelemetry, Prometheus, Grafana, Dynatrace, Promitor, Loki, Jaeger, Tempo, Alertmanager, Distributed Tracing, Metrics Engineering, SLO/SLA Monitoring

Programming Languages

Python, Go (Golang), Node.js, Java, Bash

Data Storage & Caching

Redis, MongoDB, PostgreSQL, Firebase Firestore, Vector Databases, OpenSearch, AWS S3

DevOps & Automation

GitOps, Jenkins, Terraform, Ansible, HashiCorp (Vault, Consul, Terraform), LocalStack, CI/CD Pipelines, GitHub Actions, Infrastructure-as-Code (IaC)

Networking & Communication

REST APIs, WebSocket, API Gateway, RBAC, Service Mesh

Development Practices & Patterns

Microservices Architecture, 12-Factor App Principles, SRE Practices, LLMOps, MLOps, Multi-tenant Design, Distributed Systems, Event-Driven Architecture, KEDA (Kubernetes Event-Driven Autoscaling)

Security & Compliance

RBAC (Role-Based Access Control), Content Moderation, Prompt Injection Defense, Multi-tenant Isolation, Policy Enforcement, Adversarial Testing

Data & ML Tools

DVC (Data Version Control), Weights & Biases, Model Registries, Experiment Tracking, Dataset Versioning

Betriebssysteme

Linux

Programmiersprachen

Go (Golang)

Python

Java

Node.js

Postgres

MongoDB

Firebase

Datenbanken

PostgresSQL

MongoDB

Redis

Firestore

Elasticsearch

Vertrauen Sie auf Randstad

Im Bereich Freelancing

Im Bereich Arbeitnehmer�berlassung / Personalvermittlung

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Name E-Mail-Adresse Ihre Frage

Telefonnummer Unternehmen

Ich habe die Datenschutzbestimmungen gelesen und bin damit einverstanden.

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Top-Skills

Schwerpunkte

Aufgabenbereiche

Produkte / Standards / Erfahrungen / Methoden

Betriebssysteme

Programmiersprachen

Datenbanken

Einsatzorte

Projekte

Aus- und Weiterbildung

Position

Kompetenzen

Top-Skills

Schwerpunkte

Aufgabenbereiche

Produkte / Standards / Erfahrungen / Methoden

Betriebssysteme

Programmiersprachen

Datenbanken

Vertrauen Sie auf Randstad

Fragen?

Rufen Sie uns an +49 89 500316-300 oder schreiben Sie uns:

Das Freelancer-Portal

Direktester geht's nicht! Ganz einfach Freelancer finden und direkt Kontakt aufnehmen.