AI Systems Engineer
LLM Reliability & AI Workflow Analysis

Over 10 years building production systems, now focused on AI reliability. LLM evaluation, multi-turn reasoning analysis, failure-mode detection, drift detection, and agentic-workflow reliability. Irvine, CA — Open to remote, hybrid, and relocation.

About

AI Systems Engineer with over 10 years building production systems, now focused on LLM reliability and AI workflow analysis.

Recently built a complete AI platform — a typed knowledge graph on Postgres, custom APIs, and agent orchestration — plus a model-agnostic evaluation framework that measures how consistently a model performs across long conversations and flags where its behavior breaks down.

I design and build the reliability and evaluation tooling rather than only running tools built by others — drift and regression checks, observability, failure-mode libraries, and validation gates.

Strong at identifying hidden gaps, clarifying decision logic, documenting system behavior, and turning messy inputs into repeatable analysis.

Experience

AI Systems Engineer

FireStrike Productions — Irvine, CA

Jan 2025 — Present

Built a PostgreSQL/Supabase system for LLM reliability and AI workflow analysis: 5,500+ typed nodes, 14,000+ relationship edges, an 18-cell diagnostic matrix, a failure-mode library with triggers and counters, validation gates, and TypeScript/Deno APIs that retrieve structured context for agentic workflows.

Built a model-agnostic LLM evaluation harness using a 20-prompt protocol to test reasoning stability across seven dimensions, turning long-conversation behavior into reliability scores, violation logs, and stability-band ratings for repeatable model comparison.

Implemented runtime reliability controls for AI agentic workflows, including drift and regression checks, observability hooks, and structured-output validation to detect unstable or malformed operations before they propagate.

Directed autonomous code-generation agents to build, compile, and revise backend database schemas, creating human-in-loop workflows for supervising AI-generated engineering work.

LLM Evaluation Reliability Knowledge Graphs PostgreSQL Agent Orchestration

Systems Architect

HoloShape LLC — Austin, TX

Mar 2020 — Dec 2024

Built a Unity-based mixed-reality proof of concept for movement measurement, mapping live body-tracking data into a 3D spatial interface for visualizing motion, position, and interaction over time.

Designed the system architecture across body tracking, spatial visualization, and reporting logic.

Modeled human movement as temporal and spatial data, connecting sensors, interaction state, and user interface into a coherent system design.

System Architecture Mixed Reality Spatial Data Temporal Modeling Unity

Front-End Web Developer

Independent Contract Development — New York, NY / San Diego, CA

Jan 2005 — Jan 2020

Delivered production web applications for healthcare, consumer, and agency clients on a contract basis.

Translated client requirements into structured, production-ready front-end systems using HTML, CSS, JavaScript, and CMS platforms.

HTML / CSS / JS Front-End CMS Platforms Requirements Analysis Client Delivery

Technical & Analytical Toolkit

AI Reliability & Evaluation

LLM evaluation · Benchmarking · Multi-turn reasoning analysis · Failure-mode detection · Drift & regression detection · Structured-output validation · Observability

AI Systems & Agents

Knowledge graphs · Retrieval (RAG) · Autonomous agent orchestration · Prompt design · Model behavior analysis

Engineering & Data

Python · PostgreSQL / Supabase · SQL · TypeScript / Deno · REST & RPC APIs · System architecture · Requirements analysis

Currently Developing

LangChain · LangGraph · Langfuse · Weights & Biases

Work Samples

Case 01

HoloSystem — AI Knowledge System & Operator Platform

AI Systems Engineering // Knowledge Infrastructure

Self-directed AI knowledge platform built to capture, structure, analyze, and retrieve complex system knowledge over time.

Highlights: 5,500+ typed knowledge nodes · 14,000+ typed relationships · 100+ table PostgreSQL/Supabase schema · 200+ stored procedures / RPC functions · Custom TypeScript/Deno API surface · Large archived AI conversation corpus · Scheduled background agents · Interactive operator surfaces for search, inspection, and plan navigation.

Demonstrates AI systems engineering, knowledge graph design, data structuring, workflow automation, and long-horizon independent system development.

Case 02

Alpha Stability Test — LLM Reliability Evaluation

AI Model Evaluation // Reliability Testing

A structured LLM reliability evaluation method for testing how model behavior changes across multi-turn interactions, constraint shifts, evidence updates, and ambiguous instructions.

Focus areas: multi-turn stability · hallucination risk · constraint adherence · boundary drift · confidence calibration · output consistency · self-correction behavior · failure-mode detection.

Demonstrates LLM reliability analysis, AI evaluation design, structural stability testing, failure-mode classification, and repeatable evaluation methodology.

View full protocol

Case 03

Recommendation System Gap Analysis

AI / System Behavior Analysis

A compact portfolio sample analyzing a product recommendation system that produced plausible but poorly coordinated outputs.

The analysis moves from: visible system behavior → hidden decision-logic mismatch → missing structure → practical improvement recommendation.

Demonstrates AI/system behavior analysis, gap and affordance detection, decision-logic diagnosis, practical system improvement design, and clear communication of complex analysis.

View step-through analysis

Case 04

AI Job-Titles Map

AI Hiring Landscape // Structural Analysis

A structural read of 30+ real AI job postings and profiles, sorting roles by how deep the work goes rather than by title.

The finding: depth (build it / run it / rate it), not the title, drives responsibility and pay — about a 10x spread top to bottom.

Demonstrates structural analysis, taxonomy design, labor-market research, and turning inconsistent information into a clear, decision-useful map.

View the map

Case 05

The Odyssey Companion Map

Information Design // Structural Mapping

Interactive structural map designed to help modern readers re-orient to a complex narrative through identity handles, power relationships, episode structure, and guided explanation.

Demonstrates information design, structural mapping, explanatory systems, audience-centered complexity reduction, and interactive portfolio design.

Open interactive map

Get in touch

Let's look at
your problem.

A paragraph is enough to start. Tell me what you're working on and I'll tell you whether I can help — and what that looks like.

Email DavidHKosky@gmail.com

Phone 718.404.6333

LinkedIn linkedin.com/in/davidkosky

Resume DK_RESUME.DOCX

Company FireStrike Productions

AI Systems Engineer
LLM Reliability & AI Workflow Analysis

AI Reliability & Evaluation

AI Systems & Agents

Engineering & Data

Currently Developing

Let's look at
your problem.

Alpha Stability Test v1.5c

This is what appeared

This is what was happening

This is what was missing

This is what I changed

This is what improved

What this demonstrates

What AI Job Titles Actually Mean

Build it

Run it

Rate it

How to read any AI title

AI Systems EngineerLLM Reliability & AI Workflow Analysis

AI Reliability & Evaluation

AI Systems & Agents

Engineering & Data

Currently Developing

Let's look atyour problem.

Alpha Stability Test v1.5c

This is what appeared

This is what was happening

This is what was missing

This is what I changed

This is what improved

What this demonstrates

What AI Job Titles Actually Mean

Build it

Run it

Rate it

How to read any AI title

AI Systems Engineer
LLM Reliability & AI Workflow Analysis

Let's look at
your problem.