Releases

Changelog

What shipped and when.

v1.2.0

Agents get a persistent terminal

The code-execution skill goes from a per-call calculator to a real, persistent workspace. Until now each tool call spun up a throwaway sandbox, ran one snippet, and threw it away — nothing carried over, so an agent could not clone a repo in one step and run its tests in the next. This release binds one secure cloud sandbox to the whole conversation and adds full shell access. An agent can now git clone a project, cd into it, install dependencies, and run a build or test suite across several turns, with the filesystem, working directory, and installed packages all persisting in between — the same way you'd use a terminal. It runs in an isolated cloud microVM, never on your server, with network access off by default.

  • ·Persistent per-conversation sandboxes. The first code or shell call in a thread provisions one sandbox; every later call reconnects to it. Files you write, packages you install, and the current directory all survive from one call to the next, so multi-step workflows finally work. The sandbox auto-reaps after about ten minutes of inactivity, so idle conversations cost nothing.
  • ·Full shell access via a new run_shell tool. Anything a terminal can do — git, psql, ffmpeg, duckdb, curl, pandoc, pnpm, ripgrep — the agent can now run. A non-zero exit code comes back as data (exit code plus stdout and stderr) rather than a hard error, so the agent can read the failure and decide what to do next instead of giving up.
  • ·One shared workspace across the toolkit. run_python, run_node, run_shell, install_package, and upload-attachment all operate on the same live session, so a file written by Python is immediately visible to a shell command, and a package installed once is available everywhere for the rest of the conversation.
  • ·Secure by construction. Untrusted code runs in an isolated E2B cloud microVM, never on the Platos host. Network egress is denied by default — turn it on per environment only when a workflow genuinely needs to reach the internet (for git clone or package installs). Each tenant brings their own sandbox key, scoped per project and environment from the dashboard.
  • ·Runtime upgrade. Unlike the previous release, this ships in the agent runtime — the reference deploy at play.platos.dev is already updated. Self-hosters get it on the next agent image pull; enable the Code Execution skill on an agent and set an E2B key for the environment to switch it on.
v1.1.0

Message ratings in the SDKs

Thumbs up/down on assistant messages is now a first-class SDK feature. The rating backend (storage, per-version satisfaction analytics, and a memory-quality loop) already existed, but you could not reach it from the client libraries, so embedded chat had no way to collect feedback. This release adds it to the JavaScript client and the React widget, so end users can rate replies and that signal flows into your dashboards and the agent's own memory quality.

  • ·@platosdev/client gains a messages namespace: rate(messageId, 'up' | 'down', { comment }), unrate(messageId), and getForMessage(messageId) which returns the current user's vote plus anonymized up/down counts. Rate against the server message id, which the streaming API now surfaces on a typed message_persisted event.
  • ·@platosdev/react-widget renders thumbs up/down on each assistant reply out of the box. The headless usePlatosChat hook exposes a rate(messageId, direction) callback and tags each message with its server id and current vote; rating is optimistic and toggles off if you click the same direction twice.
  • ·Feedback does more than count. A thumbs-down quarantines the memories the agent extracted from that message so they stop surfacing in future retrieval; a thumbs-up boosts their confidence. Votes also roll up into the per-agent-version satisfaction view on the Evals and Monitoring dashboards, so you can compare versions on real user sentiment.
  • ·Published as @platosdev/client 0.3.0 and @platosdev/react-widget 0.2.0. No runtime upgrade required — the agent-side rating API has been live; this is purely the client surface that was missing.
v1.0.5

Latency + reliability — throttle, slow-turn, and model-display fixes

A cluster of latency and correctness fixes. The reference deploy hit a host-level CPU throttle (ClickHouse self-telemetry runaway) that made the dashboard slow for hours; that is root-caused and permanently fixed. Separately, agent turns driven through the SDKs could stall for up to a minute before the model was even called, because the per-turn memory step embedded the message with no timeout; that is now bounded so a slow embedding provider can never hold up a response. Plus a fix so the model you pick in the agent builder is the model the list shows.

  • ·Root cause of the throttle: ClickHouse's built-in self-telemetry, not Platos data. ClickHouse writes its own diagnostics (metric_log, asynchronous_metric_log, latency_log, and friends) every few seconds. On a small VPS this produced ~1,015,000 rows in ~193 storage fragments and a backlog of background merges that could not complete under the resulting CPU pressure — a self-reinforcing spiral that held the host at ~95% CPU steal for hours. Platos itself queries none of those tables; it only uses the spans and task-telemetry tables.
  • ·Permanent fix: disable the high-churn self-logs entirely. These internal log tables are now turned off in the ClickHouse config (the same treatment already applied to trace and text logs), so the fragment churn that fed the merge pile-up never starts. An earlier size cap was not enough because it bounded total rows, not the per-flush fragment creation that actually drives the merges. The query and part logs stay, kept small by a 7-day retention policy.
  • ·Production recovered and verified. The bloated internal tables were cleared (a metadata-only operation safe to run even under throttle, touching zero Platos or customer data); the throttle lifted from ~95% CPU steal to ~0%, system load fell from ~40 to under 4, active merges dropped to zero, and dashboard login went from ~32 seconds back to ~1 second. Confirmed the internal logs stay flat (no new writes) after the fix.
  • ·Slow-turn fix: bound the per-turn memory embedding so a slow provider can't stall a response. Every turn with a user attached injects a "things I remember about this user" block, which embeds the message via an external embedding API before calling the model. That call had no timeout, so a cold or rate-limited embedding key could hang the whole turn — observed as a ~64-second wait before the model was even invoked, while the model itself answered in ~5 seconds. The embedding call now times out (default 8s) and the memory step races a budget (default 5s); on timeout the agent simply answers without the memory enrichment instead of blocking. This lived in the shared agent runtime, so it affected every SDK consumer, not just one app.
  • ·The agent builder's model picker now stays in sync with the displayed model. Picking a model wrote the routing table but left a separate legacy model field stale, so the agent list could show an old model name even though the new one was actually running. The two are now kept in lock-step on every create and update across the dashboard, the platform MCP, and the SDK.
v1.0.4

Deploy hardening — off-box builds

Operational release for self-hosters. Building the agent and webapp images on the same box that serves traffic can spike CPU and degrade the live service on a small VPS. This release moves image builds off the box: CI builds and publishes them, and the box only ever pulls. Plus a safety-gated deploy script and a type-check gate that blocks non-compiling code from merging.

  • ·Off-box image builds. A CI workflow now builds the agent and webapp images on a runner and publishes them to the container registry, tagged latest and per-commit. The production box never runs a compile step again — the old flow ran the full TypeScript + bundle build on the 4-core reference VPS while it was serving requests, which pushed load past 40 and tripped the host CPU throttle. New docker-compose.deploy.yml override swaps the build steps for registry image refs so a stray build on the box is a no-op.
  • ·Safety-gated deploy script. scripts/deploy-platos.sh pulls the pre-built images, runs migrations, recreates only the app services, and waits for health. It refuses to start if the box has no real CPU headroom — it samples actual idle from /proc/stat with steal time counted against availability, so it correctly aborts under a host-level CPU throttle instead of tipping a slow box into a down box. Rollback is just deploying a previous per-commit tag — no rebuild.
  • ·Type-check gate on every change. Continuous integration now type-checks the agent runtime on every pull request and is a required check before merge, so code that does not compile can no longer reach the main branch or a deploy.
v1.0.3

Robustness sweep

Hardening release from a full-codebase audit pass. Completes the block-shape class started in v1.0.2 by closing the last two readers that could choke on a string-shaped JSON column, after confirming the live service is healthy end to end.

  • ·Block-shape hardening sweep. An audit over every reader of the array-shaped promptBlocks / dynamicBlocks columns found two stragglers the v1.0.2 fix did not cover. The per-turn dynamicBlocks log line ran .map on the raw value behind only a truthiness check, so a string scalar would crash the turn before the guarded dispatch loop on the next line ran (this logs on every turn, so it was the hottest path still exposed). And the MCP agent_diff tool coalesced promptBlocks with ?? [], which lets a string through to be iterated character-by-character into a garbage diff. Both now Array.isArray-guarded. The class is now closed at every write boundary and every read site.
  • ·Audit triage, no busywork. The same pass flagged candidates that turned out to be non-issues on inspection — a WebSocket close handler whose only await is already in try/catch, and an OAuth state timestamp check that runs after HMAC verification so it cannot be forged. Those were deliberately left alone rather than churned. Two genuine items were filed for a design call rather than a silent patch: multi-tenant uniqueness of entity slugs on the public OAuth discovery routes, and the skipped cross-scope isolation regression cases.
v1.0.2

Config-shape hardening

Bug-fix release. An agent whose promptBlocks or dynamicBlocks were double-encoded by a client (sent as a JSON string instead of an array) could land a string in an array-shaped column, which crashed the agent detail page and silently dropped the agent's dynamic context blocks. Hardened at the write boundary so it cannot recur, with read-side defenses on the dashboard as backup. Also folds in the sane-defaults work from the prior deploy.

  • ·promptBlocks / dynamicBlocks double-encode guard. PlatosAgent.promptBlocks and .dynamicBlocks are array-shaped Json columns, but Postgres jsonb also accepts a bare string scalar. A client that sent JSON.stringify(blocks) instead of blocks corrupted the row permanently: the dashboard called .map on a string (z.map is not a function, crashing the agent page) and the runtime's Array.isArray guard silently dropped every dynamic block, so the agent answered ungrounded with its {{context}} variables unresolved. New coerceBlockList() normalizes at all four persist sites (create, update, restore-version, snapshot) — parse-if-string, require-array, drop otherwise. Idempotent on correct data.
  • ·Dashboard read-side defense. The agent detail editor and Context tab now coerce block lists on read (asBlockArray) so a malformed row can never crash the page, even if bad data arrives by some other path. Belt to the write-path braces.
  • ·Sane defaults (carried from the prior deploy). The PIFSP-11 entity_ids mandate is now explicit opt-in rather than auto-enforced whenever an agent sees 2+ entities in scope — single-purpose agents no longer break the moment a second entity joins the project. And the public-docs API rate limit (/api/v1/public/*) is env-driven with a 600/60s default, enough headroom for a marketing-site SSG build that fans out ~250 requests from one IP.
v1.0.1

Reliability + governance — post-mortem release

Shipped in response to the 2026-05-19 play.platos.dev outage. Four-layer compounding failure root-caused and fixed end-to-end: a misbehaving connected entity's reconnect storm, ClickHouse's own observability tables OOM-looping inside an undersized memory cap, and a transport-layer SDK bug that corrupted query strings. Plus the per-tool approval policy gate landed as a real pause/resume flow, and the client SDK gained a typed `tools` namespace.

  • ·Per-tool approval gate now enforces in the agent-runtime dispatcher. MCPPermissionGatewayService.resolve() runs on every dispatch when PLATOS_TOOL_DISPATCH_PERMISSION_GATE=1 is set. On require_approval the dispatcher persists a PlatosAgentApproval row, publishes approval_needed over Redis (dashboard Socket.IO subscribes), and BLPOP-waits on a scoped Redis key for resolution. Edited-args supported end-to-end — operator approves with editedArgs and the entity sees the edited shape. Fail-closed on persistence + wait failures, fail-open on resolver errors. Default off; opt in per deployment.
  • ·`client.tools` namespace in @platosdev/client. Typed wrappers for the agent's tool-catalog REST surface: list({ category }), search(q, { limit, entity }), stats(), matrix() (with health data), setEnabled(entityId, toolName, enabled), test(toolId, params). Exports PlatosTool, PlatosToolHealth, PlatosToolMatrixRow, PlatosToolStats. Backs the dashboard's Tools tab without consumers needing to hand-roll fetch + scope headers.
  • ·Tool-bridge transport bugs fixed in both SDKs. PlatoolsClient.websocketUrl() was concatenating /ws/sdk to the end of the URL — when PLATOS_URL carried query params (?source=…&env=prod), the suffix corrupted the last query value and the server failed env resolution. Splits at ?, appends to the path, re-attaches the query. Same fix in @platosdev/platools-sdk (TS) and platools (PyPI), with parity tests in both.
  • ·Welcome-frame dual-shape decode. SDK now accepts both the canonical welcome.organization_id (what the server actually emits) and the legacy welcome.org_id. Silences the noisy 'platools received malformed message' warn that older SDKs emitted on every connect, and exposes entity_id, environment_id, project_id when present.
  • ·ClickHouse hardened against the OOM-loop class. Default CLICKHOUSE_MEM_LIMIT raised from 1200m to 4g (the 1200m default OOM-killed the server 523 times in 12 days on the reference deploy once system.metric_log grew large enough that a single background merge exceeded the cgroup limit). Default POSTGRES_MEM_LIMIT raised from 512m to 1g. New clickhouse-ttl-apply one-shot sidecar applies 7-day TTL via SQL ALTER TABLE … MODIFY TTL to every system log table. Retroactive on existing tables, idempotent, portable across ClickHouse versions.
  • ·Agent-side tool_register rate-limit (defense-in-depth). The WS bridge now caps re-registrations per (entity, env) at PLATOS_TOOL_REGISTER_MAX_PER_MIN (default 6). Above that the server replies with register_throttled + retry_after_ms and skips the expensive 196 KB parse + per-tool UPSERT + BM25 rebuild. Bug-free clients touch this once at startup; the cap exists so a future buggy client cannot DoS the agent regardless of what it does wrong.
  • ·Run-engine optional disable. RUN_ENGINE_WORKER_ENABLED=0 documented as the recommended setting for any deployment that isn't running trigger.dev — disables the chronic empty-queue polling that otherwise burns CPU and blocks the event loop on throttled hosts.
  • ·AI SDK v6 allowSystemInMessages warning suppressed at the three intentional call sites in agent.service.ts where the system prompt rides inside messages[] (sub-agent dispatch, main turn streamText, structured-output generateObject retry path). The per-message shape is load-bearing — required to attach providerOptions.anthropic.cacheControl — so the warning is silenced explicitly rather than refactored away.
  • ·Quarterly maintenance section in docs/self-hosting.md documenting docker builder prune -af (the reference deploy had accumulated 149.7 GB of stale build cache across 566 entries on a 193 GB disk; pruning recovered 66 GB) and noting that the ClickHouse TTL sidecar handles system-table sanity automatically going forward.
v1

Platos v1 launch

The first public release. Apache 2.0, self-hostable in five minutes, production-tested. Everything below ships in the v1 baseline. Subsequent releases stack on top with deltas.

  • ·Agent runtime: durable, versioned agent configs with system prompt, tools, memory, model, and budget caps composed into one record. Versioning, canary rollouts, and rollback first-class on the agent detail page.
  • ·Conversations and threads: chat surface with streaming token output, tool-call traces, attachments (images, PDFs), structured output, citations, and reasoning-token display per provider.
  • ·Postman mode: side-by-side conversation pipeline view for debugging tool calls, prompt assembly, and provider routing.
  • ·Sub-agents and clusters: agents can spawn agents, share memory and context across a cluster topology.
  • ·Skills: reusable agent behaviors with manifest-declared `required_env`. Claude-skills-format compatible. Import from URL.
  • ·Connected entities: external systems wired in via the platools SDK (TypeScript and Python). 200+ MCP tools across the federated surface.
  • ·Tool gateway: scope-aware routing, schema injection, per-tool ACL, audit log, retry waterfall.
  • ·Approvals: human-in-the-loop with edit-first flow (approved, approved-with-edits, denied), durable approvals via wait.forToken for hours-to-days SLA, JSON arg editor on the detail page, structured rate-limit response.
  • ·Background ops (BGOs): durable long-running operations spawned via `spawn_bgo`, schedulable via `schedule_bgo`. Built on trigger.dev.
  • ·Memory: three-tier system (short-term thread context, long-term per-user, knowledge graph). Async extraction. Per-user drill-down with breach detection. Memory ratings loop.
  • ·Observability: full trace per turn with cache, reasoning, and cost columns. ClickHouse spans store with first-class token columns. Prometheus metrics with kind labels.
  • ·Cost correctness: provider-aware cache discount factors (Anthropic 90 percent, OpenAI 50 percent, Google 75 percent). Reasoning-token billing. Idempotent retries.
  • ·Budget caps: per-agent, per-user, scope-wide, skill-tier. TOCTOU-safe via reservation pattern. Threshold alerts at 50, 80, and 100 percent.
  • ·Governance: rate limits, safety events, breach detection. Per-user consumption summary on the Users monitoring tab. Currently-breached users panel on Governance.
  • ·Encryption: AES-256-GCM at-rest for conversations behind `PLATOS_MESSAGE_ENCRYPTION_KEY`. Encrypted secret store for provider keys.
  • ·MCP gateway: federates four tool families (entity tools mirrored, trigger meta-tools, Platos skills, control plane). OAuth 2.1 with Dynamic Client Registration for scoped tools. PAT bearer tokens for service accounts.
  • ·Public docs MCP: unauthenticated read-only catalog at `mcp.platos.dev/mcp`, install in Claude Code, Cursor, or any MCP client with one command.
  • ·Multi-key provider support: N keys per provider via numbered env-var variants. Anthropic, OpenAI, Google, Vertex, plus OpenAI-compatible providers (Groq, DeepSeek, Together, Fireworks, others).
  • ·BYOK across providers: provider keys live in env vars; the Providers page shows linked status and runs health checks.
  • ·Webhooks: subscribe to conversation events from any external system.
  • ·OpenAPI: public REST surface documented and machine-readable.
  • ·Engine layer: durable run engine V2, schedules, queues, deployments, batches, waitpoints. Surfaced in the dashboard for visibility; full reference at trigger.dev/docs.
  • ·Self-host: one docker compose file. Postgres, ClickHouse, Redis, MinIO, agent, webapp. Five-minute install on a 4 GB VPS.
  • ·Reference entity backend at `references/entity-hello-world` for the OSS onboarding sample.
  • ·Docs and guides: 47 docs across platform, engine, governance, observability, and DX tiers. 28 guides across getting-started, integrations, recipes, and troubleshooting.
  • ·Public REST API at `/api/v1/public` for docs, guides, and search. ISR-cached for 10-minute propagation.
  • ·Talk to Platos: in-product chat agent backed by the docs MCP, indexes 75 markdown surfaces with question-first search.

Talk to Platos

Powered by the Platos runtime

Powered by Platos →