Technical paper

How I Built a Local AI Agent With Computer Vision

A technical overview of the OnyxKraken pattern: local models, screen perception, memory, API tools, and the difference between a chatbot and an agent that can respond to real state.

Article text

Mark "Vizion" Barnes / MarkVizion / May 2026 ## The Problem Most AI assistants are built around conversation. They receive text, return text, and occasionally call a tool. That is useful, but it is not enough for desktop work. Real desktop work depends on state: what is open, where the user is stuck, what changed after the last action, what the system can see, and which tool should be used next. That is the problem OnyxKraken is designed around. ## The Core Loop The agent needs four layers: - Perception: screen state, visual context, and observable feedback - Memory: persistent notes, prior mistakes, project context, and retrieval - Tools: APIs, file operations, browser actions, code execution, and creative software hooks - Evaluation: a way to ask whether the last action moved the system forward Without perception, the agent guesses. Without memory, it repeats. Without tools, it only talks. Without evaluation, it drifts. ## Why Local-First Matters Local-first architecture changes the economics and the feel of the system. A desktop agent runs constantly. It cannot treat every small reasoning step like a premium cloud event. Local models let the system stay near the work. Cloud models can still be useful, but they should be escalation paths, not the whole foundation. ## The Design Lesson The goal is not to make the assistant sound smarter. The goal is to make it more situated. An agent that can see state, remember context, call tools, and evaluate outcomes becomes a workflow system. That is the difference between chat as an interface and AI as infrastructure. ## What This Proves OnyxKraken is the flagship proof system because it shows the architecture behind the portfolio: - local AI orchestration - computer vision context - persistent memory - API surfaces - tool routing - creative software automation - shipped interfaces around the agent That is the core of my AI product work: not prompts in isolation, but systems that can respond to the world around them.