Designing "Explore" Mode: Define by Entry, Not by Fences

I run a small multi-agent dev setup (my-ai-team) with three session modes: team (planner/developer/reviewer relay), adhoc (one agent owns the whole delivery cycle), and explore. The explore mode is still an RFC — we haven't shipped it yet. But the design already taught us something useful about how to scope agent autonomy.

The first draft was all negatives

The obvious way to define an investigation-only mode is to list what it can't do: no issues, no branches, no PRs, report findings and stop. That works — until it doesn't.

The problem: there's always a case the fence didn't anticipate. If poking at a bug turns up a one-line fix, the rules force an awkward choice between "break protocol and just fix it" or "hand off to a separate session for a trivial change." Neither is right.

Define by entry condition instead

The fix was to redefine explore by how it starts, not what it's forbidden from doing. Team and adhoc carry a delivery obligation from the first message — they begin with a predefined task. Explore enters from an open topic with an uncertain goal. That's the only difference.

That single shift dissolves the fence. The mode now has several legitimate exits — adjourn with findings, park a ticket for later, or hand work off to delivery — and none of them needs a special-case rule, because none contradicts the definition.

The gate is the ticket, not the PR

The hard part of any "investigation that might turn into work" mode is the autonomy boundary: when is the agent allowed to commit to real consequences?

The answer turned out to be a primitive we already had. Opening a ticket is a serious act. A ticket without a blocked label is a clear signal that delivery may run all the way to merge. A ticket with blocked means "real, but the timing isn't ripe." So the gate isn't a new checkpoint bolted on before the PR — it's the ticket's own blocked/non-blocked status. No new machinery, and the decision lives where the seriousness already lives.

The ticket as a membrane

We also cut the scope so explore never writes delivery code. Its output ceiling is a (possibly newly created) repo plus one or more detailed tickets. Implementation always belongs to adhoc or team.

The ticket becomes a membrane: explore writes it, delivery consumes it, and no single prompt is asked to be both an investigation prompt and a delivery prompt. The detail written into the ticket carries the discussion's context across the handoff.

A nice side effect: no repo → no ticket → no PR. The repo is a precondition for a ticket, the ticket is the delivery gate, so a discussion that never produces a repo simply can't produce delivery. That's a valid ending, not an error to guard against.

Two smaller things worth keeping

Gate autonomy on reversibility and blast radius, not on who's human. The instinct was "two agents shouldn't land work without a person in the loop." But the honest axis is actor-neutral: an irreversible, wide-blast-radius action deserves a second check regardless of whether a human or an agent is driving it. Publishing a ticket, by contrast, is reversible and low-blast — it needs no special gate.

We designed explore by spending an hour inside it. The session had actually booted under the adhoc prompt, but because we kept the conversation open-ended and goal-uncertain, it behaved exactly like the mode we were inventing. It ended, fittingly, by publishing a single ticket and nothing more.

"Might could" sounds natural to you but it's a double modal in English

Non-native English speakers — especially those whose native language stacks modals freely — sometimes write things like:

"One thing we might could improve..."

This is a double modal. It exists in some regional American dialects (Southern US) but is grammatically incorrect in standard and professional English. Pick one modal: might or could.

The right choice depends on intent:

  • "One thing we could improve" — confident suggestion, implies you see a concrete area to fix. This is usually what you mean.
  • "One thing we might improve" — tentative, sounds like you're unsure whether improvement will happen at all. Weaker than you probably intend.
  • "One thing we might want to improve" — works, but adds unnecessary words. "Could" is cleaner.

The other trap in the same sentence: "the pipeline I re-run" vs "the pipeline I reran". "Re-run" is the present tense or a noun; if the action already happened, it's reran (past tense).

The corrected sentence:

"One thing we could improve: even though the deployment failed, the pipeline I reran still reported 'All Good'."

Garbled text before PS1 over SSH? Your OSC sequences are leaking

You SSH into a machine and see garbage like ile://host/home/userurrentDir=/home/user before your prompt. Locally everything is fine. What happened?

Your PROMPT_COMMAND includes an OSC 7 / OSC 1337 sequence that reports the current directory to the terminal emulator. Locally, WezTerm (or iTerm2, etc.) intercepts these invisible escape sequences. Over SSH with TERM=linux, no terminal emulator is listening — the raw bytes print as text.

The fix is a guard condition before wiring __report_cwd into PROMPT_COMMAND:

if [[ -n "${TERM_PROGRAM:-}" || "${TERM:-}" =~ xterm|screen|tmux|wezterm|alacritty ]]; then
  [[ ":$PROMPT_COMMAND:" != *__report_cwd* ]] && \
    PROMPT_COMMAND="__report_cwd${PROMPT_COMMAND:+; $PROMPT_COMMAND}"
fi

When TERM=linux (plain SSH) and TERM_PROGRAM is empty, the function is skipped entirely. No escape sequences, no garbage.

After deploying the fix, start a fresh shell — PROMPT_COMMAND is already set in the running session and won't be cleared by re-sourcing.

Short-lived tokens don't belong in the same file as long-lived credentials

You protect ~/.bashrc.secret with chmod 444 so nothing accidentally overwrites your API keys and passwords. Then your token-refresh script needs to write a new session cookie somewhere — and hits Permission denied.

The fix isn't adding a second writable secrets file. It's stop persisting short-lived tokens entirely.

Refactor the refresh script to output TOKEN HASH on stdout and let the caller capture it:

# refresh_token — reads credentials, prints session token to stdout
CREDENTIALS=$(bash ~/bin/refresh_token 2>/dev/null)
TOKEN=$(echo "$CREDENTIALS" | awk '{print $1}')
HASH=$(echo "$CREDENTIALS" | awk '{print $2}')

No file writes. No second secrets layer. The token lives in the process environment for the duration of the operation and disappears when it's done.

This works because session tokens are cheap to re-obtain — one HTTP call with stored credentials. The only reason to persist them is avoiding that call, which is almost never worth the complexity of managing a writable token store alongside a read-only credential store.

The rule: if you can regenerate it from long-lived credentials in under a second, don't persist it.

One Telegram Bot + Two Machines = Silent Message Loss

Running the same Telegram bot relay as a systemd service on multiple machines means both instances poll the same bot token. Telegram delivers each update to one consumer only — so messages vanish at random, or land on the wrong host. Both sides think they're healthy; neither is.

The fix: one bot per machine, one channel per bot. Each channel becomes a dedicated console for exactly one host. No coordination needed, no distributed locks, no split-brain.

To manage multiple bot tokens in a shared dotfiles repo, key them by hostname:

# dotfiles — all tokens in one file, encrypted as usual
export TG_TOKEN_homeserver="token_aaa"
export TG_TOKEN_vps="token_bbb"

# relay picks its own token at runtime
export TG_TOKEN=$(eval echo \$TG_TOKEN_$(hostname))

Every machine runs the same service file unchanged — hostname resolves to the right token automatically. Adding a third machine is just a new bot, a new channel, and one more TG_TOKEN_<hostname> line.

TG Channel: homeserver  →  bot-A  →  machine 1
TG Channel: vps         →  bot-B  →  machine 2
TG Channel: pi          →  bot-C  →  machine 3