AI / Cloud / Platform Engineer

Building systems that
think, heal, and
scale autonomously.

I design production-grade AI-native infrastructure — from self-healing Kubernetes platforms to agentic RAG pipelines and multi-model orchestration systems.

AIOps Agentic AI GitOps AWS Lambda Terraform LangGraph Kubernetes RAG Step Functions
Scroll to explore

I am a senior AI/Cloud engineer specializing in building autonomous, production-grade systems at the intersection of cloud infrastructure and artificial intelligence.

My work focuses on multi-agent architectures, confidence-gated automation, and agentic retrieval systems — engineering software that can reason, decide, and act with minimal human intervention.

Every project I build is production-first: Terraform-managed infrastructure, proper observability, security boundaries, and documented runbooks — not proof-of-concepts, but deployable systems.

3
Production Platforms
12+
Terraform Modules
7
AI Agents Deployed
100%
Test Pass Rate

Three platforms.
One through-line.

01 — AIOps Platform
GitOps Sentinel
View on GitHub

A production-grade AIOps platform for Kubernetes that intercepts infrastructure anomalies, routes them through a multi-agent reasoning pipeline, and takes autonomous action calibrated to confidence. Every remediation becomes a Git commit — the cluster never mutates outside a reviewed PR or a high-confidence auto-apply.

Alertmanager → API Gateway (HMAC) → Signal Collector → EventBridge
  └─ Step Functions: ClassifierRoot CauseAction PlannerConfidence Scorer
     └─ RouteByConfidence → GitHub PR → Argo CD sync → Outcome Validator
Confidence Routing
≥80% auto-apply · 40–79% PR · <40% escalate
Pipeline
AWS Step Functions (Standard Workflow)
Dedup Strategy
DynamoDB conditional write + 7-day EventBridge archive
Self-Healing
Auto-revert PR on OutcomeFailed signal
Python AWS Lambda Step Functions EventBridge DynamoDB Terraform Argo CD Kubernetes Prometheus Grafana OPA Gatekeeper Amazon Bedrock / OpenAI
02 — Agentic RAG System
Medical Agentic RAG
View on GitHub

A production-grade agentic retrieval system for medical knowledge, powered by LangGraph workflows, pgvector semantic search, and streaming SSE responses. The system routes queries through a confidence-scored pipeline — iterating on relevance up to 3 times before falling back to live web search.

React SSE UI → FastAPI → LangGraph Workflow
  └─ Router → [Medical QnA | Device Manual | Web Search (DuckDuckGo)]
     └─ Relevance Check (max 3 iterations) → Augment → Generate (GPT-4o-mini)
Retrieval Sources
QnA corpus · Device manuals · Live web fallback
Vector Store
PostgreSQL + pgvector (1,000 docs / collection)
Confidence Scoring
Heuristic: 90% QnA → 40% failed web search
Infra
ECS Fargate + RDS + ALB + CloudFront (Terraform)
Python LangGraph FastAPI PostgreSQL / pgvector OpenAI GPT-4o-mini React SSE Streaming Docker Terraform AWS ECS Fargate CloudFront + S3
03 — LLM Orchestration
Multi-LLM Platform
View on GitHub

A unified orchestration platform for running workloads across multiple large language model providers. Designed to abstract provider-specific APIs behind a common interface — enabling intelligent model routing, cost tracking, and fallback strategies across OpenAI, Anthropic, and open-source model backends.

Client Request → Unified API Gateway → Router / Planner
  └─ Provider Adapters: OpenAI | Anthropic | Bedrock | Open-Source
     └─ Response Aggregation → Cost Tracking → Structured Output
Design Pattern
Provider-agnostic adapter layer
Routing
Cost, latency, and capability-based selection
Observability
Per-provider token cost breakdown
Resilience
Automatic fallback on provider failure
Python Multi-Provider SDK OpenAI Anthropic AWS Bedrock LangChain / LangGraph FastAPI Docker Terraform

Full-stack, cloud-native,
AI-first engineering.

AI / Agentic Systems
  • Multi-agent pipeline orchestration
  • LangGraph stateful workflows
  • Retrieval-Augmented Generation (RAG)
  • Confidence-gated decision engines
  • LLM routing & provider abstraction
  • Semantic vector search (pgvector)
Cloud Infrastructure
  • AWS Lambda, Step Functions, EventBridge
  • ECS Fargate, RDS, ALB, CloudFront
  • Amazon Bedrock & SageMaker
  • DynamoDB, S3, Secrets Manager
  • Terraform (12+ custom modules)
  • IAM scoped roles & X-Ray tracing
Kubernetes / GitOps
  • EKS cluster management
  • Argo CD GitOps controller
  • OPA Gatekeeper policy enforcement
  • Prometheus + Grafana observability
  • Helm chart deployment
  • Alertmanager integration
Backend & APIs
  • FastAPI (production-grade)
  • SSE streaming responses
  • PostgreSQL + pgvector
  • Docker & docker-compose
  • Rate limiting & API key auth
  • Structured logging (JSON)
Languages
  • Python (primary)
  • HCL / Terraform
  • JavaScript / React
  • Bash / Makefile
  • Open Policy Agent (Rego)
  • YAML / JSON
Practices
  • Test-driven development (pytest)
  • Infrastructure as Code
  • Event-driven architecture
  • CI/CD (GitHub Actions)
  • Security-first IAM design
  • Cost-aware architecture
Let's build something
genuinely ambitious.