By Feature · Experiment Runners

Experiments as Deployments

Run experiments on your own hardware — dispatched through Actions, results read back. The agent never holds the keys.

You already ship two things through GitHub Actions: a frontend website and a backend API. This adds a third target — local experiment runs — so an agent can launch experiments on a machine youcontrol and read the results back, without ever holding the runner's keys. It is an extension of the Warp 2 safety spine, not a new mechanism.

By FeatureThe Forward-Deployed Agent Repo Hygiene GitHub, CI & Auto-Merge Autonomous Workflows Credentials & the Safety Spine Experiment Runners Roles & Orchestration Bookkeeping & Cost Routines & Tech Tree

Three kinds of deployment

Everything ships through GitHub Actions. The agent triggers work but never holds the keys — the pipeline does. There are three deploy targets:

Frontend website — triggered by merge/Actions, runs on a CDN or hosting (Cloudflare, AWS), and produces a deployed, response-only site.

Backend API — triggered by merge/Actions, runs on a trusted backend service, and produces a running service with DB access.

Local experiment run — triggered by workflow_dispatch (agent or human), runs on a runner you choose (GitHub-hosted, or self-hosted on your own hardware), and produces result files the agent reads back.

An experiment run is the backend-API-shapedone: it's invoked, runs to completion on the chosen host, and emits artifacts — rather than serving live traffic. The key property across all three: the agent stays one step removed from the credentials. It opens PRs and dispatches workflows; Actions holds the secrets and the runner registration.

What this fits — the dispatchable slice

This path is for dispatchable, non-interactiveexperiment runs: work you can express as a command with inputs, fire off, and read the artifacts from afterward. That's exactly the slice that benefits from batched async delegation — queue a set of runs, approve them once, let the agent dispatch them, and come back to results.

This is not a replacement for hands-on remote control of interactive hardware work — live GPU debugging, GUI-driven EDA tools, or any run where a human has to watch and interpret as it goes. Those stay on remote control. The two coexist: use this for the dispatchable slice, remote control for the interactive slice.

Triggering an experiment run

Use a GitHub Actions workflow_dispatch trigger — a manual/programmatic dispatch that takes inputsfor the things you'll vary between runs (hyperparameters, dataset path, config). The agent (or a human) dispatches it; the run lands on whichever host you've pointed it at. A minimal, illustrativetemplate (provider-agnostic — adapt it, don't ship it as-is):

# .github/workflows/experiment.yml
name: experiment
on:
  workflow_dispatch:
    inputs:
      run_name:    { description: "Label for this run", type: string, required: true }
      lr:          { description: "Learning rate",       type: string, default: "3e-4" }
      dataset:     { description: "Dataset id or path",  type: string, required: true }

jobs:
  run:
    # GitHub-hosted: `runs-on: ubuntu-latest`
    # Self-hosted on YOUR hardware: target your runner's labels, e.g.:
    runs-on: [self-hosted, gpu, lab-box]
    steps:
      - uses: actions/checkout@v4
      - name: Run experiment
        run: |
          ./run_experiment.sh \
            --name "${{ inputs.run_name }}" \
            --lr "${{ inputs.lr }}" \
            --dataset "${{ inputs.dataset }}" \
            --out ./results
      # …then an upload step — see "Where results go" below.

GitHub-hosted vs. self-hosted runners

The host choice is yours:

GitHub-hosted runner— simplest. Good for CPU-bound or light experiments where you don't need special hardware. Zero machines to maintain.

Self-hosted runner on your own/specific hardware — register your GPU rig, lab box, or licensed-software machine as a self-hosted runner so dispatched runs execute on thatmachine. This is the key to "run on my own hardware": the job lands on your box, reads your local dataset, and uses your GPU — while the agent still only ever dispatches the workflow.

Where results go

The run produces files; where they land is up to you. There is no single mandated default — pick by how tightly you want the agent integrated vs. how much extra infra you're willing to stand up.

CloudClawer file storage — push the result files into your CloudClawer namespace so the agent reads them back via its MCP file tools. Tightest integration: results land exactly where the agent already looks.

Google Drive — add a Drive upload step. Easy for humans and agents to browse, and convenient if your team already lives in Drive. Slightly looser coupling.

GitHub Actions artifacts (actions/upload-artifact) — zero extra infra, results attach straight to the run. The catch: artifacts are ephemeral / retention-limited (they expire) — good for quick iteration, not a durable record.

Many teams mix these — artifacts for fast inner-loop iteration, plus a durable copy to CloudClawer storage or Drive for the runs worth keeping.

The public-upstream + private-mirror pattern

If your experiment code lives in a public repo, you must not put your secrets, environments, or self-hosted-runner registration in it. The pattern that solves this is a public upstream + per-customer private mirror:

Public upstream — the shared/open repo. Holds the experiment code and workflow definitions. No secrets, ever.

Private mirror — one per customer. A private copy that holds your own Actions secrets / environments and your self-hosted-runner registration. Dispatched runs and secret-bearing jobs happen here, on your hardware. Secrets live only in the private mirror.

The two streams stay in sync in two directions. Sync DOWN — pull public → private (routine):

git remote add upstream <public-repo-url>   # one-time
git fetch upstream
git merge upstream/main                       # or: git rebase upstream/main
git push origin main                          # update your private mirror

Sync UP — push private → public (occasional): contribute non-secret improvements back upstream via a PR from a clean branch or a cherry-pick of the specific non-secret commits — deliberately excluding anything secret. Never push a branch that contains customer config or credentials to the public remote.

Guardrails — do these before the first dispatch:

.gitignoreevery local secret file so it can't be committed.

Store secrets only as GitHub Actions secrets / environments in the private mirror — not in tracked files, not in the public repo.

Enable secret scanning and branch protection on both repos.

Merge down frequently (small, regular pulls beat one giant catch-up merge), and keep customer-specific config in clearly separated files (e.g. a private/ directory or *.local.* files) so it never rides an upstream PR.

Safety-spine alignment

This is the same Warp 2 contract as production deploys — just a third target:

GitHub Actions is the vault.The runner's secrets and self-hosted registration live in Actions (in the private mirror), not in the agent's hands.

Access is granted indirectly. The agent dispatches a workflow; it never gets SSH onto your hardware. Revoke access any time by rotating the Actions secret or de-registering the runner — no key ever leaks to the agent.

The agent reads results read-only.It consumes the result files after the fact; it never holds the runner's credentials and never executes on the box directly.

If you've already built the Warp 2 safety spine for prod, you've already done the hard part — experiment runs slot straight into it. Same vault, same indirect access, same read-only consumption; only the deploy target is new.

See it in context

Warp 2 — Assisted & Auditable →

The level where the safety spine these runs ride is built

Credentials & the Safety Spine →

GitHub Actions as the vault, indirect access, and data-loss guards

Credentials & the Safety Spine Roles & Orchestration