Production AI systems, founding-team execution, real users.

Tibor Thompson

I build LLM applications, agents, retrieval/RAG, evals, voice workflows, and the backend/product infrastructure that makes AI work in real products.

  • Building a pre-launch sports-tech product across mobile, web, backend, data, and release infrastructure
  • Built production AI systems spanning voice, retrieval/RAG, SQL actions, and evals
  • Stanford CS AI track; co-authored NeurIPS 2024 workflow automation research
LLM Applications Agents RAG LLM Evaluation Product Infrastructure

Now

What I'm working on

Current build

A pre-launch sports-tech product spanning mobile, web, data ingestion, auth, social features, and test infrastructure.

Current tooling

Workhorse, a terminal-based AI coding agent with multi-model routing, workspace tools, evals, and session persistence.

Useful conversations

Production AI systems, founding-engineer work, LLM evaluation, RAG/retrieval, voice agents, and backend infrastructure.

Selected Work

Systems and artifacts

A few systems I've built or contributed to, with the constraints, technical decisions, and outcomes that mattered.

Municipal AI Research Platform

ProphecyGov · 2025

Municipal teams need trustworthy AI answers with source attribution across large government document repositories.

Built citation, retrieval, and evaluation systems for a local-government AI product.

Trust was the constraint: generated answers needed visible source trails, not just plausible prose.

  • Matched answers back to source material with confidence scoring and fuzzy matching
  • Used LLM-as-judge flows to check orchestrator quality
  • Added SQL-backed actions for natural-language database work

Production AI Receptionist

Vanilla AI · 2024-2025

Real estate and home-service businesses needed automated customer intake, voice handling, and proposal workflows.

Sole founding engineer responsible for taking the product from blank repo to production.

The product only worked if voice, payments, proposals, and uptime all held together for live customers.

  • Built voice-agent workflows, payment handling, and proposal automation
  • Set up AWS infrastructure, CI/CD, tests, and deployment workflows
  • Explained the system architecture to investors as technical lead

WONDERBREAD Workflow Automation Research

Stanford Hazy Lab · 2023-2024

Digital business workflows are difficult for multimodal agents to evaluate and automate reliably.

Contributed to dataset generation, testing, and model validation for VLM-based workflow automation.

The benchmark work depended on turning messy browser workflows into examples models could be evaluated against.

  • Validated multimodal foundation models on business-process tasks
  • Contributed testing workflows and data-quality checks
  • Co-authored paper published at NeurIPS 2024

Generative Email Client

Stanford senior project · 2023-2024

Users need email drafts that reflect personal context and communication preferences.

Built a RAG email client using OpenAI and Meta Llama models with Stanford AI researchers.

Retrieval had to improve the draft without flattening the user into a generic email voice.

  • Implemented retrieval over user context for personalized draft generation
  • Compared OpenAI and Llama model behavior for generation quality
  • Prepared the extension for Chrome Web Store beta publication

Court-Of-Agents

Personal project · 2024

Multi-agent debates are hard to inspect when agents only produce isolated text responses.

Built a voice-enabled multi-agent simulation for courtroom-style debate and resolution.

The agent roles had to be legible enough that a debate could be followed, repeated, and tested.

  • Designed agent roles for collaborative argumentation and adjudication
  • Integrated voice interaction patterns for live simulation flow
  • Added repeatable test coverage around agent configuration

Experience

Where I've built

Startup and research work across AI products, backend systems, evaluation, retrieval, and user-facing tools.

2026-Present

Stealth Sports-Tech Startup

Co-Founder & Lead Engineer

  • Building a pre-launch consumer sports-tech product across mobile, web, backend, data ingestion, caching, authentication, social features, and test infrastructure
  • Architected React Native/Expo mobile and Next.js web surfaces backed by PostgreSQL, row-level security, serverless edge functions, and Redis-backed caching
  • Built live sports-data workflows, personalized feeds, social interactions, moderation/privacy foundations, integration tests, and iOS/Android release infrastructure

2025

ProphecyGov

Software Engineer, AI Infrastructure

  • Built a citation engine that matched AI answers back to source material with confidence scores and fuzzy matching
  • Added LLM-as-judge checks for orchestrator quality using Anthropic-style evaluation patterns
  • Built a SQL action layer so agents could turn natural-language requests into database operations
  • Improved vector retrieval with metadata filters across large municipal document repositories
  • Demoed the product with municipal buyers and conference audiences

2024-2025

Vanilla AI

Software Engineer (Founding Team)

  • Joined as the sole founding engineer and built the AI receptionist product from zero to production
  • Set up AWS infrastructure, CI/CD, tests, and operational workflows for live customer-facing agents
  • Built voice-call flows, payment handling, proposal automation, and a contractor-matching MVP
  • Explained the technical architecture to investors as the founding technical lead

2023-2024

Stanford University

AI Research Contributor

  • Co-authored WONDERBREAD, a benchmark for evaluating how vision-language models handle digital workflows
  • Worked on dataset generation, testing, and model validation for business-process tasks
  • Named co-author on the paper published at NeurIPS 2024 (Datasets & Benchmarks track)

2023

Kaladin Inc.

Software Engineer Intern

  • Built supervised-learning trading algorithms for a cross-chain crypto exchange
  • Ran historical backtests to refine strategies before testnet deployment
  • Integrated market-data REST APIs and built front-end visualization tools
  • Represented the company at three industry conferences and brought user feedback back to the roadmap

Tools

Regular stack

Backend & Infrastructure

Python Flask FastAPI PostgreSQL Redis REST APIs AWS EC2 Docker GitHub Actions Deno Edge Functions

Frontend

Next.js TypeScript React.js React Native Expo JavaScript

AI & Machine Learning

OpenAI Anthropic LangGraph Google Gemini Pinecone PyTorch TensorFlow RAG Multi-agent orchestration LLM-as-judge evaluation NLP Supervised Learning Reinforcement Learning Graph Neural Networks

Cloud & DevOps

AWS Google Cloud Automated testing Observability Production deployment CI/CD

Additional Languages

C++ C Solidity Flutter/Dart SwiftUI

Contact

Useful reasons to reach out

Recruiting conversations, technical diligence, founder-to-founder notes, or questions about the systems on this site.