Designed and implemented an observability and telemetry framework for production LLM workloads (hosted and proxied) in enterprise environments. Focused on backend architecture, metrics pipelines, and standardized telemetry to give engineering and SRE teams insight into performance, reliability, and cost of AI services across multi-cloud deployments.
Responsibilities:
Designed and delivered a production-grade agentic AI platform supporting full-stack development, orchestration, deployment, and observability of LLM-driven autonomous agents across multiple enterprise business units. Implemented standardized blueprints for agent topologies, tool invocation layers, RAG/Indexing pipelines, and LLMOps workflows, ensuring horizontal scalability, fault tolerance, auditability, and regulatory alignment within a highly controlled financial-services environment. Built comprehensive evaluation, governance, and safety frameworks that accelerated organizational adoption of AI copilots and significantly reduced time-to-market for new intelligent-automation workloads.
Responsibilities:
Developed backend microservices to offer SAP Analytics Cloud as a SaaS product via the Cloud Foundry marketplace. Work focused on service broker APIs, metering, and billing services that integrate with downstream capacity management and system provisioning systems.
2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science
AI/ML Platform Engineering & LLMOps
Deep expertise in building production-grade GenAI platforms and agentic AI systems, with comprehensive experience in LLM deployment, fine-tuning, RAG pipelines, and model orchestration. Specialized in architecting multi-tenant AI infrastructure that balances performance, cost optimization, and enterprise security requirements.
Cloud-Native Observability and SRE
Expert in designing end-to-end observability solutions for distributed systems and AI/ML workloads using OpenTelemetry, Prometheus, and Grafana. Proven ability to instrument complex environments from token-level metrics to infrastructure telemetry, enabling proactive incident management, anomaly detection, and data-driven optimization of high-throughput systems.
Distributed Systems and Platform Engineering
Strong foundation in building scalable, cloud-native platforms with expertise in Kubernetes ecosystem, control-plane architecture, and microservices orchestration. Skilled in implementing GitOps workflows, CI/CD automation, and infrastructure-as-code practices to deliver reliable, self-service platforms for enterprise-scale deployments.
Cloud Platforms & Services
Container Orchestration & Infrastructure
Observability & Monitoring
Programming Languages
Data Storage & Caching
DevOps & Automation
Networking & Communication
Development Practices & Patterns
Security & Compliance
Data & ML Tools
Designed and implemented an observability and telemetry framework for production LLM workloads (hosted and proxied) in enterprise environments. Focused on backend architecture, metrics pipelines, and standardized telemetry to give engineering and SRE teams insight into performance, reliability, and cost of AI services across multi-cloud deployments.
Responsibilities:
Designed and delivered a production-grade agentic AI platform supporting full-stack development, orchestration, deployment, and observability of LLM-driven autonomous agents across multiple enterprise business units. Implemented standardized blueprints for agent topologies, tool invocation layers, RAG/Indexing pipelines, and LLMOps workflows, ensuring horizontal scalability, fault tolerance, auditability, and regulatory alignment within a highly controlled financial-services environment. Built comprehensive evaluation, governance, and safety frameworks that accelerated organizational adoption of AI copilots and significantly reduced time-to-market for new intelligent-automation workloads.
Responsibilities:
Developed backend microservices to offer SAP Analytics Cloud as a SaaS product via the Cloud Foundry marketplace. Work focused on service broker APIs, metering, and billing services that integrate with downstream capacity management and system provisioning systems.
2014 - 2017
Distributed Software Systems
TU Darmstadt (Germany)
Degree: Master of Science
AI/ML Platform Engineering & LLMOps
Deep expertise in building production-grade GenAI platforms and agentic AI systems, with comprehensive experience in LLM deployment, fine-tuning, RAG pipelines, and model orchestration. Specialized in architecting multi-tenant AI infrastructure that balances performance, cost optimization, and enterprise security requirements.
Cloud-Native Observability and SRE
Expert in designing end-to-end observability solutions for distributed systems and AI/ML workloads using OpenTelemetry, Prometheus, and Grafana. Proven ability to instrument complex environments from token-level metrics to infrastructure telemetry, enabling proactive incident management, anomaly detection, and data-driven optimization of high-throughput systems.
Distributed Systems and Platform Engineering
Strong foundation in building scalable, cloud-native platforms with expertise in Kubernetes ecosystem, control-plane architecture, and microservices orchestration. Skilled in implementing GitOps workflows, CI/CD automation, and infrastructure-as-code practices to deliver reliable, self-service platforms for enterprise-scale deployments.
Cloud Platforms & Services
Container Orchestration & Infrastructure
Observability & Monitoring
Programming Languages
Data Storage & Caching
DevOps & Automation
Networking & Communication
Development Practices & Patterns
Security & Compliance
Data & ML Tools