§ 01Now

BasedSan Francisco Bay Area

RoleSWE, 2K Games

FocusAI agents · platforms

I build AI agentsyou can actually trustin production.

Software engineer with a decade of curiosity and a taste for systems that refuse to fall over. I spend my days making agentic software useful, measurable, and reliable — not just impressive in a demo.

me@haoyangz.com

Scroll, slowly →↓

§ 02Philosophy

The gap between a flashy demo and a production system is measured in observability, evaluations, and a thousand small decisions nobody tweets about.

Agents are products.

They deserve SLOs, regression suites, and a paging rotation — same as any other service with users.

Context is the feature.

Retrieval strategy, conversation state, and prompt shape determine whether a user trusts you twice.

Numbers over vibes.

Offline evals gate the deploy. Online evals gate the rollout. Vibes are for the launch post.

Boring where it counts.

Postgres, Kafka, Redis, a circuit breaker. The exciting part sits on top of a foundation that never surprises you.

§ 03The stack, built as you scroll

05Agent layerplanning · tools · summaries

04Retrievalsemantic chunking · citations

03Gateway & servicesSpring Boot · Gin · Kubernetes

02Event busKafka · OAuth 2.0

01Data planePostgres · Redis · replicas

01 — Start with data that doesn't lie.

Postgres with read replicas and indexing strategy. Redis out front as a distributed cache. Baseline: 60% fewer DB queries, 50% faster APIs, reads 60% snappier.

02 — Make services talk without shouting.

Kafka for event-driven flow. +30% throughput, −10% inter-service latency. OAuth 2.0 for 5M+ daily logins with 20% fewer failures.

03 — A gateway that survives the weekend.

High-throughput microservices in Spring Boot + Gin on Kubernetes. 2M+ concurrent users. 99.99% uptime. Legacy translation for 50M requests/day at 25% lower cost.

04 — Retrieval you can argue with.

LlamaIndex over internal tools, code, and knowledge bases. Ingestion, semantic chunking, embeddings, vector search. Answers arrive with citations — or they don't arrive at all.

05 — Agents that know when to stop.

Prototyped in Dify, hardened with ADK. Specialized agents plan steps, call tools, and aggregate. ~70% less time finding internal info. 17% fewer tokens. Engineers & PMs actually use it.

§ 04Where I've been useful

2023 — present · San Francisco Bay Area

Software Engineer, 2K Games

Team: Platform & Internal AIStack: Java · Go · Spring Boot · ADK · Dify · Gin · K8s · Kafka · Postgres · Redis · LlamaIndexOwnership: Epics · tech design · contractor leads

Shipped an internal AI Slack assistant — agentic RAG over tools, code, and knowledge. −70% time to find info, −17% token cost per conversation.
Designed & shipped a commerce microservice at 2M+ CCU with 99.99% uptime on Spring Boot + Gin + K8s.
Tuned the data plane — HikariCP, PG read replicas, Redis cache: −35% DB latency, −60% queries, 20K+ TPS.
Built a gateway translation layer handling 50M req/day, phased out legacy services, −25% cost.
Reliability work — Resilience4j circuit breakers, Kafka event bus (+30% throughput), OAuth 2.0 for 5M daily logins.
Leading beyond my ticket — owning epics, writing tech designs, and coordinating contractors on delivery.

§ 05Signature work

Project 01 · Internal Platform

The Slack assistant that actually reads the docs.

An agentic RAG assistant over our private tools, code, and knowledge. Engineers ask, answers come back with citations — and the bill stays reasonable.

Input

Slack@ai — natural language

Conversation statemulti-turn context window

Reasoning

Planner agentADK · decides which tools to call

Docs agentConfluence · Notion

Code agentrepo search · PR lookup

Ops agentincident triage

Retrieval

Semantic chunkerLlamaIndex

Vector storeembeddings · similarity

Citation weaverevery claim, sourced

Guardrails

Context optimizer−17% tokens

Observabilitytraces · token ledger

Eval harnessoffline + online

−70%time finding internal info

−17%token cost per conversation

Eng + PMdaily-active adoption

0hallucinated citations in prod*

* caught by the eval harness before rollout. That's the whole point.

Project 02 · Platform Infrastructure

A commerce platform that doesn't flinch at launch day.

High-throughput microservices powering in-game commerce for 2M+ concurrent players with four nines of uptime. Boring on purpose.

Edge

API Gateway50M req / day

OAuth 2.05M daily logins

Resilience4jcircuit breakers

Services

Commerce serviceSpring Boot · Gin

Kubernetesauto-scaling

Kafkaevent-driven

Data

PostgreSQLread replicas · indexes

Redisdistributed cache

HikariCPconnection pool

Ops

Jenkins · CI/CDauto deploys

MonitoringSLOs · alerting

Cost controls−25% gateway

2M+concurrent users

99.99%uptime

20K+ TPSquery capacity

−50%API response time

§ 06Making agents honest · side work

Harness engineering, observability, and evaluations — the parts nobody films a demo about.

Outside of 2K, I've been working with a Gen-AI startup on the infrastructure that turns an agent from a party trick into a product:

Harness engineering — the scaffolding an agent runs inside: tool registries, retries, budgets, timeouts, replay.
Observability — tracing every step, token, tool call and latency hop so failures are legible.
Offline evaluations — regression suites with curated cases; a PR doesn't merge if the scoreboard drops.
Online evaluations — shadow traffic, LLM-as-judge, user-signal telemetry, all feeding the next dataset.

eval-suite/mainPASS

faithfulness		0.92
citation_recall		0.88
task_completion		0.81
tool_accuracy		0.95
p95_latency		1.8s
cost/query		$0.014

+ scenarios regenerate on every commit · drift gates the merge

If an agent can't be measured, it can't be shipped.
If it can't be shipped, it isn't real.

§ 07A quiet ledger of impact

2M+CCU supported

99.99%uptime on commerce

50Mgateway req / day

20K+TPS

5Mdaily OAuth logins

−70%time to find info

−60%DB queries

−50%API latency

−35%DB latency

−25%gateway cost

−17%token cost

1000×pipeline speedup (GU)

§ 08Arrangements, after hours

Engineering and arranging share the same bones — voice-leading is just dependency resolution in a nicer font. A few pieces I've been carrying around.

Beyond the Twilight

original composition · 2020 · 1:20

Le Château des Elfes

original composition · refined with Suno · 3:22

If You Say

piano arrangement · covered with Ella · 4:15

Become to Love You

piano cover · reharmonized · 1:18

Rain

piano improvisation · 1:17

Remember Me

piano cover · from Coco · 1:10

Wandering Earth — trailer, recomposed

recomposition · trailer scorefor The Wandering Earth

video2:09

(Hit ▶. R2 ships the bytes; the rest was shipped after hours.)

§ 09Education

Georgetown UniversityM.S. Computer Science · GPA 3.96 · 2021–2023

Beijing Jiaotong UniversityB.Eng. Computer Science & Technology · 2016–2020

§ 10Let's talk

If you're building agents that need to be trusted, or platforms that can't afford to flinch —

Emailme@haoyangz.com↗GitHub@zhy3213↗LinkedInhaoyang--zhang↗