AI & Multi-Agent Systems · Julian Contreras

— What it is

This is the center of gravity of the current work. Everything else on this site is shipped through it or coordinated by it.

A multi-agent operations infrastructure runs continuously on a dedicated server. It is bidirectional and always-on: agents hold persistent memory and a persistent runtime, so they do not start cold each time I reach for them. The system was designed and built from scratch — not assembled out of open-source agent frameworks — because the requirements were specific and the coordination had to be mine to control end to end.

There are distinct agent classes, each with a defined role. A central orchestration brain interprets intent and routes work. Ambassadors translate that intent into each provider and system's own protocol. Auditors verify what other agents produce. Messengers move information between layers without loss. Researchers gather context and feed it back. Senior executors do full-capability work, up to sixteen at once; below them, narrower sub-agents fan out — more than three hundred in parallel when the workload demands it.

I command the fleet from several surfaces. From the workstation through VSCode and the Linux CLI; from mobile devices in real time; over socket-level bidirectional connections that stay open rather than polling; and through real-time voice calls with running instances. The point of the breadth is not novelty — it is that the same infrastructure is reachable from wherever the work happens to be.

0

senior executor instances, full-capability, running concurrently

300+

narrower sub-agents in parallel when the workload calls for it

0

LLM providers orchestrated at once — Anthropic, OpenAI, Google, xAI, Moonshot

24/7

always-on, on dedicated hardware, with persistent memory and runtime

— Operating parameters

The numbers it actually runs at.

Infrastructure / steady state

Senior executor instances: up to 16 concurrent
Sub-agents: 300+ in parallel
LLM providers: 5, orchestrated simultaneously
Transport: socket-level, bidirectional, low-latency
Command surfaces: workstation · mobile · voice
Runtime host: dedicated Dell server, always-on
Operational state: persistent — agents never start cold
Agent-to-agent: protocol-mediated, audited
Context interfaces: function calling · MCP · RAG
Origin: built from scratch — not a framework assembly

— Topology

One brain, several classes, many workers.

Intent enters at the center. It is decomposed, routed to the class fit for the task, executed, and verified — then the result returns to the center, which holds the only complete picture.

Schematic. Class roles are fixed; instance counts scale with workload.

— The classes, in depth

What each layer is responsible for.

The central brain holds the whole problem.

One orchestration brain sits at the center. It interprets intent, decomposes a mission into tasks, routes those tasks to the agents best suited to them, and holds the global context that no single worker can see on its own.

It also resolves conflicts between agents and maintains operational state across the run. When two paths disagree, the decision is made here, with the full picture — not at the edge by a worker that only knows its own slice.

Interprets intent and decomposes it into discrete tasks
Routes work to the agent class fit for each task
Holds global context and resolves conflicts
Maintains operational state across the whole run

— Idle time

Metallic icosahedron — abstract representation of the agent fleet.

Idle capacity becomes preparation capacity

When there is no task, the fleet studies.

When no task is assigned, agents do not sit idle. They enter a self-directed study and cross-training loop: working through material, comparing approaches, and teaching each other across classes.

That learning is not left as transcript. Implementer agents take what was studied, fine-tune the studied model on it, and inject the resulting knowledge back into the system for future operation. The fleet that handles tomorrow's work is measurably more prepared than the one that finished today's.

Self-directed study and cross-training during idle windows
Implementer agents fine-tune the studied model on what was learned
Knowledge injected back for future operation, not just logged

— Models

Provider-neutral, and able to modify weights directly.

Provider routing

Anthropic
OpenAI
Google
xAI
Moonshot

One task, the model that fits it. No vendor lock.

Fine-tuning · LoRA · a research direction

Not vendor-locked — and not limited to prompting.

The infrastructure orchestrates multiple LLM providers at once — Anthropic, OpenAI, Google, xAI, Moonshot — and routes each task to the model that suits it. It is not tied to a single vendor. Where it helps, I go below the prompt: fine-tuning, LoRA, and advanced weight and bias modification on open-weight models.

There is also active research I will describe only at a high level. It uses small open-weight models in the four-to-twenty-billion-parameter range, with a proprietary process that activates only a minimal part — in memory — of much larger models in the hundred-to-six-hundred-billion range, and integrates that into the small models. The method is withheld; the direction has large potential, and that is as much as I will say here.

Five providers orchestrated simultaneously, routed per task
Fine-tuning, LoRA, weight/bias modification on open-weight models
Small-model research (~4–20B) integrating selective activation of larger ones (~100–600B) — method withheld

— Capabilities

What it can reach, and how it stays sound.

01

Function calling

Agents act through typed tool interfaces — read, write, query, execute — rather than emitting prose I have to interpret by hand.

02

MCP

Model Context Protocol connects agents to tools and data sources through a common contract, so a capability added once is available across the fleet.

03

RAG

Retrieval feeds agents the specific context a task needs from a working memory store, instead of relying on what a model happened to be trained on.

04

Persistent memory

State, prior decisions, and accumulated context survive between sessions. The infrastructure remembers; nothing has to be re-explained each morning.

05

Observability

What each agent did, why, and against which inputs is recorded — so behaviour can be inspected after the fact, not just trusted in the moment.

06

Security self-audit

Continuous, authorized self-audit with BlackArch Linux across local and cloud servers. To date: no open breaches and no information leakage.

— Mission decomposition

How one objective becomes coordinated work.

A request does not go straight to a worker. It is read as a single objective, broken into ordered tasks, routed by class, executed, verified, and returned to the center. Each stage has a defined owner.

Intake to return

01 Intake Intent arrives from a command surface and is read as a single objective, not yet a plan.
02 Decompose The orchestration brain breaks the objective into discrete, ordered tasks with explicit dependencies.
03 Route Each task is assigned to the agent class fit for it — a senior executor, or a fan-out of sub-agents.
04 Execute Work runs in parallel where the dependency graph allows, against typed tool interfaces.
05 Audit A separate agent verifies each result for correctness, security, and coherence before it counts.
06 Return Verified output flows back to the center, which alone holds the complete picture of the run.

— Transport

Connections stay open in both directions.

The link between a command surface and a running instance is socket-level and bidirectional. It stays open rather than polling, so I can push to an instance and it can push back — mid-task, not only on completion.

Bidirectional socket

One line, held open, used both ways.

Socket-level · bidirectional · low-latency

Not request-and-wait. A held line.

Most automation talks to a model by asking a question and waiting for an answer. This does not. The transport is a persistent socket in both directions: I send instruction or context into a live instance, and the instance sends progress, questions, and results back on the same open line.

Because the connection is held rather than re-established per message, latency stays low and an exchange can be continuous. An agent can ask me something mid-task and act on the reply without tearing down and rebuilding the channel.

Persistent socket — no per-message reconnect overhead
Both ends can initiate — push in, push back
Continuous mid-task exchange, not request-and-wait

— Command surfaces

Reachable from wherever the work is.

The same infrastructure answers to three surfaces. The breadth is not for novelty — it means the fleet is reachable from the workstation, from a phone, and by voice, without a separate system for each.

All three surfaces ride the same bidirectional transport into one core.

— The learning loop

Idle, study, fine-tune, inject — then repeat.

The fleet's preparedness is not fixed at deployment. Between active tasks it studies, consolidates what it studied into a model change, and injects the result back. The loop is what makes tomorrow's fleet more prepared than today's.

Active task The fleet executes against a mission. Intent is decomposed, routed, executed in parallel, audited, and returned. Persistent state is updated as the run proceeds.
Idle window Self-directed study and cross-training. With no task assigned, agents work through material, compare approaches, and teach each other across classes rather than sitting idle.
Consolidation Studied material becomes a model change. Implementer agents fine-tune the studied open-weight model on what was learned — LoRA and direct weight and bias modification where it helps.
Injection Knowledge is fed back for future operation. The result is injected into the system, not left as a transcript, so the next run starts more prepared than the last one finished.

— Memory & context

What the fleet knows, and how it reaches it.

An agent does not rely on what a model happened to be trained on. Context is retrieved for the task at hand, state persists between sessions, and every action runs through a typed contract. Nothing has to be re-explained each morning.

Context interfaces

Working memory store: task-scoped context, retrieved on demand
Retrieval: RAG — the specific context a task needs, not a model's training guess
Persistent state: decisions and accumulated context survive between sessions
Tool contract: MCP — a capability added once is reachable fleet-wide
Action interface: function calling — typed read / write / query / execute
Provenance: what each agent did, why, and against which inputs is recorded

— Self-audit

The infrastructure checks itself.

The same fleet that does the work is held to an inward-facing security audit. It runs continuously, with authorization, against my own servers — local and cloud — using the BlackArch toolset.

01

Authorized scope

The self-audit runs against my own local and cloud servers, continuously and with explicit authorization. It is an inward-facing exercise, not an outward one.

02

BlackArch toolset

BlackArch Linux supplies the tooling the audit runs with — the same instruments used to probe a system are turned on the infrastructure that holds the fleet.

03

Standing result

To date the audit reports no open breaches and no information leakage. It is a recurring check, so the result is a current state rather than a one-time certificate.

The infrastructure that makes the rest of the work possible.