Isokan Dev — AI/Cloud Engineer

About

I am a senior AI/Cloud engineer specializing in building autonomous, production-grade systems at the intersection of cloud infrastructure and artificial intelligence.

My work focuses on multi-agent architectures, confidence-gated automation, and agentic retrieval systems — engineering software that can reason, decide, and act with minimal human intervention.

Every project I build is production-first: Terraform-managed infrastructure, proper observability, security boundaries, and documented runbooks — not proof-of-concepts, but deployable systems.

Production Platforms

12+

Terraform Modules

AI Agents Deployed

100%

Test Pass Rate

Selected Work

Three platforms.
One through-line.

01 — AIOps Platform

GitOps Sentinel

View on GitHub

A production-grade AIOps platform for Kubernetes that intercepts infrastructure anomalies, routes them through a multi-agent reasoning pipeline, and takes autonomous action calibrated to confidence. Every remediation becomes a Git commit — the cluster never mutates outside a reviewed PR or a high-confidence auto-apply.

Alertmanager → API Gateway (HMAC) → Signal Collector → EventBridge
└─ Step Functions: Classifier → Root Cause → Action Planner → Confidence Scorer
└─ RouteByConfidence → GitHub PR → Argo CD sync → Outcome Validator

Confidence Routing
≥80% auto-apply · 40–79% PR · <40% escalate

Pipeline
AWS Step Functions (Standard Workflow)

Dedup Strategy
DynamoDB conditional write + 7-day EventBridge archive

Self-Healing
Auto-revert PR on OutcomeFailed signal

Python AWS Lambda Step Functions EventBridge DynamoDB Terraform Argo CD Kubernetes Prometheus Grafana OPA Gatekeeper Amazon Bedrock / OpenAI

02 — Agentic RAG System

Medical Agentic RAG

View on GitHub

A production-grade agentic retrieval system for medical knowledge, powered by LangGraph workflows, pgvector semantic search, and streaming SSE responses. The system routes queries through a confidence-scored pipeline — iterating on relevance up to 3 times before falling back to live web search.

React SSE UI → FastAPI → LangGraph Workflow
└─ Router → [Medical QnA | Device Manual | Web Search (DuckDuckGo)]
└─ Relevance Check (max 3 iterations) → Augment → Generate (GPT-4o-mini)

Retrieval Sources
QnA corpus · Device manuals · Live web fallback

Vector Store
PostgreSQL + pgvector (1,000 docs / collection)

Confidence Scoring
Heuristic: 90% QnA → 40% failed web search

Infra
ECS Fargate + RDS + ALB + CloudFront (Terraform)

Python LangGraph FastAPI PostgreSQL / pgvector OpenAI GPT-4o-mini React SSE Streaming Docker Terraform AWS ECS Fargate CloudFront + S3

03 — LLM Orchestration

Multi-LLM Platform

View on GitHub

A unified orchestration platform for running workloads across multiple large language model providers. Designed to abstract provider-specific APIs behind a common interface — enabling intelligent model routing, cost tracking, and fallback strategies across OpenAI, Anthropic, and open-source model backends.

Client Request → Unified API Gateway → Router / Planner
└─ Provider Adapters: OpenAI | Anthropic | Bedrock | Open-Source
└─ Response Aggregation → Cost Tracking → Structured Output

Design Pattern
Provider-agnostic adapter layer

Routing
Cost, latency, and capability-based selection

Observability
Per-provider token cost breakdown

Resilience
Automatic fallback on provider failure

Python Multi-Provider SDK OpenAI Anthropic AWS Bedrock LangChain / LangGraph FastAPI Docker Terraform

Technical Expertise

Full-stack, cloud-native,
AI-first engineering.

AI / Agentic Systems

Multi-agent pipeline orchestration
LangGraph stateful workflows
Retrieval-Augmented Generation (RAG)
Confidence-gated decision engines
LLM routing & provider abstraction
Semantic vector search (pgvector)

Cloud Infrastructure

AWS Lambda, Step Functions, EventBridge
ECS Fargate, RDS, ALB, CloudFront
Amazon Bedrock & SageMaker
DynamoDB, S3, Secrets Manager
Terraform (12+ custom modules)
IAM scoped roles & X-Ray tracing

Kubernetes / GitOps

EKS cluster management
Argo CD GitOps controller
OPA Gatekeeper policy enforcement
Prometheus + Grafana observability
Helm chart deployment
Alertmanager integration

Backend & APIs

FastAPI (production-grade)
SSE streaming responses
PostgreSQL + pgvector
Docker & docker-compose
Rate limiting & API key auth
Structured logging (JSON)

Languages

Python (primary)
HCL / Terraform
JavaScript / React
Bash / Makefile
Open Policy Agent (Rego)
YAML / JSON

Practices

Test-driven development (pytest)
Infrastructure as Code
Event-driven architecture
CI/CD (GitHub Actions)
Security-first IAM design
Cost-aware architecture

Building systems thatthink, heal, andscale autonomously.

Three platforms.One through-line.

Full-stack, cloud-native,AI-first engineering.

Building systems that
think, heal, and
scale autonomously.

Three platforms.
One through-line.

Full-stack, cloud-native,
AI-first engineering.