HZHaoyang Zhang · Software Engineer · 2K Games
§ 01Now
BasedSan Francisco Bay Area
RoleSWE, 2K Games
FocusAI agents · platforms

I build AI agentsyou can actually trustin production.

Software engineer with a decade of curiosity and a taste for systems that refuse to fall over. I spend my days making agentic software useful, measurable, and reliable — not just impressive in a demo.

me@haoyangz.com
Scroll, slowly →
§ 02Philosophy

The gap between a flashy demo and a production system is measured in observability, evaluations, and a thousand small decisions nobody tweets about.

Agents are products.

They deserve SLOs, regression suites, and a paging rotation — same as any other service with users.

Context is the feature.

Retrieval strategy, conversation state, and prompt shape determine whether a user trusts you twice.

Numbers over vibes.

Offline evals gate the deploy. Online evals gate the rollout. Vibes are for the launch post.

Boring where it counts.

Postgres, Kafka, Redis, a circuit breaker. The exciting part sits on top of a foundation that never surprises you.

§ 03The stack, built as you scroll
05Agent layerplanning · tools · summaries
04Retrievalsemantic chunking · citations
03Gateway & servicesSpring Boot · Gin · Kubernetes
02Event busKafka · OAuth 2.0
01Data planePostgres · Redis · replicas

01 — Start with data that doesn't lie.

Postgres with read replicas and indexing strategy. Redis out front as a distributed cache. Baseline: 60% fewer DB queries, 50% faster APIs, reads 60% snappier.

02 — Make services talk without shouting.

Kafka for event-driven flow. +30% throughput, −10% inter-service latency. OAuth 2.0 for 5M+ daily logins with 20% fewer failures.

03 — A gateway that survives the weekend.

High-throughput microservices in Spring Boot + Gin on Kubernetes. 2M+ concurrent users. 99.99% uptime. Legacy translation for 50M requests/day at 25% lower cost.

04 — Retrieval you can argue with.

LlamaIndex over internal tools, code, and knowledge bases. Ingestion, semantic chunking, embeddings, vector search. Answers arrive with citations — or they don't arrive at all.

05 — Agents that know when to stop.

Prototyped in Dify, hardened with ADK. Specialized agents plan steps, call tools, and aggregate. ~70% less time finding internal info. 17% fewer tokens. Engineers & PMs actually use it.

§ 04Where I've been useful
2023 — present · San Francisco Bay Area

Software Engineer, 2K Games

Team: Platform & Internal AIStack: Java · Go · Spring Boot · ADK · Dify · Gin · K8s · Kafka · Postgres · Redis · LlamaIndexOwnership: Epics · tech design · contractor leads
  • Shipped an internal AI Slack assistant — agentic RAG over tools, code, and knowledge. −70% time to find info, −17% token cost per conversation.
  • Designed & shipped a commerce microservice at 2M+ CCU with 99.99% uptime on Spring Boot + Gin + K8s.
  • Tuned the data plane — HikariCP, PG read replicas, Redis cache: −35% DB latency, −60% queries, 20K+ TPS.
  • Built a gateway translation layer handling 50M req/day, phased out legacy services, −25% cost.
  • Reliability work — Resilience4j circuit breakers, Kafka event bus (+30% throughput), OAuth 2.0 for 5M daily logins.
  • Leading beyond my ticket — owning epics, writing tech designs, and coordinating contractors on delivery.
§ 05Signature work
Project 01 · Internal Platform

The Slack assistant that actually reads the docs.

An agentic RAG assistant over our private tools, code, and knowledge. Engineers ask, answers come back with citations — and the bill stays reasonable.

Input
Slack@ai — natural language
Conversation statemulti-turn context window
Reasoning
Planner agentADK · decides which tools to call
Docs agentConfluence · Notion
Code agentrepo search · PR lookup
Ops agentincident triage
Retrieval
Semantic chunkerLlamaIndex
Vector storeembeddings · similarity
Citation weaverevery claim, sourced
Guardrails
Context optimizer−17% tokens
Observabilitytraces · token ledger
Eval harnessoffline + online
−70%time finding internal info
−17%token cost per conversation
Eng + PMdaily-active adoption
0hallucinated citations in prod*

* caught by the eval harness before rollout. That's the whole point.

Project 02 · Platform Infrastructure

A commerce platform that doesn't flinch at launch day.

High-throughput microservices powering in-game commerce for 2M+ concurrent players with four nines of uptime. Boring on purpose.

Edge
API Gateway50M req / day
OAuth 2.05M daily logins
Resilience4jcircuit breakers
Services
Commerce serviceSpring Boot · Gin
Kubernetesauto-scaling
Kafkaevent-driven
Data
PostgreSQLread replicas · indexes
Redisdistributed cache
HikariCPconnection pool
Ops
Jenkins · CI/CDauto deploys
MonitoringSLOs · alerting
Cost controls−25% gateway
2M+concurrent users
99.99%uptime
20K+ TPSquery capacity
−50%API response time
§ 06Making agents honest · side work

Harness engineering, observability, and evaluations — the parts nobody films a demo about.

Outside of 2K, I've been working with a Gen-AI startup on the infrastructure that turns an agent from a party trick into a product:

  • Harness engineeringthe scaffolding an agent runs inside: tool registries, retries, budgets, timeouts, replay.
  • Observabilitytracing every step, token, tool call and latency hop so failures are legible.
  • Offline evaluationsregression suites with curated cases; a PR doesn't merge if the scoreboard drops.
  • Online evaluationsshadow traffic, LLM-as-judge, user-signal telemetry, all feeding the next dataset.
eval-suite/mainPASS
faithfulness0.92
citation_recall0.88
task_completion0.81
tool_accuracy0.95
p95_latency1.8s
cost/query$0.014
+ scenarios regenerate on every commit · drift gates the merge
If an agent can't be measured, it can't be shipped.
If it can't be shipped, it isn't real.
§ 07A quiet ledger of impact
2M+CCU supported
99.99%uptime on commerce
50Mgateway req / day
20K+TPS
5Mdaily OAuth logins
−70%time to find info
−60%DB queries
−50%API latency
−35%DB latency
−25%gateway cost
−17%token cost
1000×pipeline speedup (GU)
§ 08Arrangements, after hours

Engineering and arranging share the same bones — voice-leading is just dependency resolution in a nicer font. A few pieces I've been carrying around.

01

Beyond the Twilight

original composition · 2020 · 1:20
02

Le Château des Elfes

original composition · refined with Suno · 3:22
03

If You Say

piano arrangement · covered with Ella · 4:15
04

Become to Love You

piano cover · reharmonized · 1:18
05

Rain

piano improvisation · 1:17
06

Remember Me

piano cover · from Coco · 1:10
07

Wandering Earth — trailer, recomposed

recomposition · trailer scorefor The Wandering Earth
video2:09

(Hit ▶. R2 ships the bytes; the rest was shipped after hours.)

§ 09Education
Georgetown UniversityM.S. Computer Science · GPA 3.96 · 2021–2023
Beijing Jiaotong UniversityB.Eng. Computer Science & Technology · 2016–2020
§ 10Let's talk

If you're building agents that need to be trusted, or platforms that can't afford to flinch —