burrow/services/forgejo-nsc/README.md at 283209d3649ceb625e12edb19a7606c822066c18

hackclub/burrow

Fork 0

Conrad Kramer 283209d364

Build Apple / Build App (macOS) (push) Waiting to run

Details

Build Apple / Build App (iOS Simulator) (push) Has started running

Details

Build Site / Next.js Build (push) Successful in 2m1s

Details

Build Rust / Cargo Test (push) Has been cancelled

Details

Harden macos runner cleanup

2026-03-19 14:01:37 -07:00

8.5 KiB

Raw Blame History

forgejo-nsc-dispatcher

This service exposes a simple HTTP API that tells Namespace Cloud to start ephemeral Forgejo Actions runners on demand. It glues together three pieces:

Forgejo Actions – the service requests a scoped registration token for the repository/organization/instance where you want to run jobs.
Namespace (nsc) – the dispatcher shells out to the nsc CLI to create a short‑lived environment, runs the forgejo-runner container inside it, and exits after a single job (forgejo-runner one-job). The Namespace TTL is the hard cap, not the typical lifetime.
Your automation – you call the service via HTTP (directly, through Caddy, via Forgejo webhooks, etc.) whenever a new runner is needed.

Directory layout

.
├── cmd/forgejo-nsc-dispatcher   # main entry point
├── internal/                    # service packages (config, forgejo client, nsc dispatcher, HTTP server)
├── config.example.yaml          # starter config referenced by README
├── flake.nix / flake.lock       # reproducible builds (Go binary + container image)
└── .forgejo/workflows           # CI that runs go test/build and publishes manifests

Configuration

Copy config.example.yaml and update it for your Forgejo instance and Namespace profile. The important knobs are:

forgejo.base_url – HTTPS endpoint of your Forgejo server. A PAT with actions:runner scope is required in forgejo.token.
forgejo.instance_url – URL that spawned runners use to register back to Forgejo. This must be reachable from the runner (typically the public URL like https://git.burrow.net). On the forge host it commonly differs from base_url (which may be http://127.0.0.1:3000).
forgejo.default_scope – where new runners register (instance, organization, or repository).
forgejo.default_labels – labels applied to every spawned runner. GateForge workflows via runs-on: ["namespace-profile-linux-medium"] (or other namespace-profile-linux-* labels).
namespace.nsc_binary – path to the nsc binary (the Nix container ships one compiled from namespacelabs/foundation so /app/bin/nsc works out of the box).
namespace.image – OCI image containing forgejo-runner.
namespace.machine_type / namespace.duration – shape + TTL for the ephemeral Namespace environment. The dispatcher destroys the instance after a job so the TTL acts as a hard cap, not an idle timeout.
macOS fallback launches still use nsc create, but bootstrap runs over the Compute SSH config endpoint instead of nsc ssh so the dispatcher can always destroy the instance itself instead of relying on a websocket SSH proxy handoff.
namespace.linux_cache_* / namespace.macos_cache_* – persistent cache volumes mounted into runners so Linux can keep /nix plus shared build caches warm and macOS can reuse Rust toolchains, Xcode package caches, and lane-local derived data. If Namespace keeps reusing an older undersized cache volume, bump the cache tag name to force a fresh allocation at the new size.

Running locally

# Ensure nsc is available (e.g. `go build ./foundation/cmd/nsc`)
cp config.example.yaml config.yaml
nix develop   # optional dev shell with Go toolchain
go run ./cmd/forgejo-nsc-dispatcher --config config.yaml

API example:

curl -X POST http://localhost:8080/api/v1/dispatch \
  -H 'Content-Type: application/json' \
  -d '{
    "count": 1,
    "ttl": "20m",
    "labels": ["namespace-profile-linux-medium"],
    "scope": {"level": "repository", "owner": "example", "name": "app"}
  }'

Deploying with Nix + GHCR

nix build .#packages.x86_64-linux.container-amd64 produces a deterministic tarball containing the service, the nsc binary, BusyBox, and forgejo-runner.
The included Build Container workflow builds both amd64 and arm64 images on Namespace runners and pushes them to ghcr.io/<owner>/<repo>. No Fly.io manifests are emitted – the multi‑arch manifest points only at GHCR.

How this fits behind Caddy (last-mile networking)

The dispatcher is just an HTTP server. You can:

Run it anywhere that can reach Forgejo and Namespace: bare metal, Namespace cluster, Kubernetes, Fly, etc.

Put Caddy (or any reverse proxy) in front to terminate TLS, do auth, or rewrite URLs. For example:

forgejo-dispatcher.example.com {
  reverse_proxy 127.0.0.1:8080
  basicauth /api/* {
    user JDJhJDE...
  }
}

The service doesn’t assume Caddy, nor does it manipulate HTTP clients directly – it simply waits for POST requests. As long as the dispatcher can reach Forgejo’s REST API and run the nsc binary, you can drop it anywhere.

Autoscaling (webhook + poller)

If you don’t want to call /api/v1/dispatch manually, there’s a companion autoscaler (cmd/forgejo-nsc-autoscaler) that watches Forgejo job queues and triggers the dispatcher for you. It operates in two modes simultaneously:

Polling – every instance polls GET /api/v1/.../actions/runners to keep a minimum number of idle Namespace runners per label. This continues until a webhook is successfully processed, so the system is self-bootstrapping.
Webhooks – once Forgejo reaches the autoscaler via the /webhook/{name} endpoint, the autoscaler stops polling and reacts to workflow_job events in real time. Each payload is mapped to a target label set and results in a dispatch call.

You can manage multiple Forgejo instances by listing them under instances in autoscaler.example.yaml:

listen: ":8090"
dispatcher:
  url: "http://dispatcher:8080"

instances:
- name: burrow
  forgejo:
    base_url: "https://git.burrow.net"
    token: "PENDING-FORGEJO-PAT"
  scope:
    level: "repository"
    owner: "hackclub"
    name: "burrow"
    disable_polling: true   # webhook-only mode
    poll_interval: "30s"
    webhook_secret: "supersecret"
    webhook:
      url: "https://nsc-autoscaler.burrow.net/webhook/burrow"
      content_type: "json"
      events: ["workflow_job"]
      active: true
    targets:
      - labels: ["namespace-profile-linux-medium"]
        min_idle: 0  # set to 0 to scale-to-zero between jobs
        ttl: "20m"
      - labels: ["namespace-profile-macos-large"]
        min_idle: 0
        ttl: "90m"
        machine_type: "6x14"
      - labels: ["namespace-profile-windows-large"]
        min_idle: 0
        ttl: "45m"
        machine_type: "windows/amd64:8x16"

For Burrow, use Scripts/provision-forgejo-nsc.sh to mint the Forgejo PAT, generate a Namespace token from the logged-in Namespace account, and refresh secrets/forgejo/{nsc-token,nsc-dispatcher-config,nsc-autoscaler-config}.age. The token file is emitted as JSON with a long-lived session_token plus the current bearer_token. The nsc CLI paths use the session-backed login flow, while the Compute API path can consume the bearer token directly. The forge host consumes the encrypted secrets through agenix; avoid keeping local plaintext intake/ copies around.

Long-lived runtime state is now sourced from age-encrypted files:

secrets/forgejo/admin-password.age
secrets/forgejo/agent-ssh-key.age
secrets/forgejo/nsc-token.age
secrets/forgejo/nsc-dispatcher-config.age
secrets/forgejo/nsc-autoscaler-config.age

After refreshing the encrypted secrets, deploy the forge host so config.age.secrets.* updates the live paths for services.burrow.forge, services.burrow.forgeRunner, and services.burrow.forgejoNsc. The Nix host module also installs a periodic forgejo-prune-runners timer that marks stale offline runners deleted in Forgejo's database so wedged instances do not leave the queue polluted indefinitely.

Run it next to the dispatcher:

go run ./cmd/forgejo-nsc-autoscaler --config autoscaler.yaml
# or build the binary/container via `nix build .#forgejo-nsc-autoscaler`

If your Forgejo build doesn’t expose the runner listing API, set disable_polling: true and rely on webhook entries. The autoscaler will auto-create/update the webhook (using the PAT) so that new workflow_job events immediately call the dispatcher even if the service isn’t publicly reachable yet.

In Forgejo add a webhook pointing to https://nsc-autoscaler.burrow.net/webhook/burrow with the shared secret (or let the autoscaler create it by specifying webhook.url in config). The autoscaler continues polling until it receives the first valid webhook (unless disabled), so you get capacity immediately even if outbound webhooks from Forgejo aren’t yet configured.

8.5 KiB Raw Blame History Unescape Escape

forgejo-nsc-dispatcher

Directory layout

Configuration

Running locally

Deploying with Nix + GHCR

How this fits behind Caddy (last-mile networking)

Autoscaling (webhook + poller)

8.5 KiB

Raw Blame History