Validator Troubleshooting
Common issues and solutions for running the Minotaur validator — Anvil, subtensor, leader election, consensus, peer discovery, scoring, weights, and Docker.
Common issues and solutions for running the Minotaur validator. Each section starts with the symptom you’ll see, then walks you through the checks that resolve it.
Anvil Connection Issues
Symptom: Simulation failures, connection refused errors, or plans that never score.
Verify ANVIL_RPC_URL is set and reachable:
curl -X POST -H "Content-Type: application/json" \
--data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
$ANVIL_RPC_URL If using Alchemy or Infura on the upstream (ETH_UPSTREAM_RPC_URL / BASE_UPSTREAM_RPC_URL), confirm your API key is valid and has not exceeded rate limits.
For local testnet, ensure the Anvil container is healthy:
docker compose ps anvil Check that each Anvil fork is on the expected chain (ETH = 1, Base = 8453, BT EVM = 964, local Anvil = 31337).
Symptom: Simulations hang or time out.
Anvil has a 2-second block time by default. The simulator calls evm_mine after sending transactions. If the upstream RPC is slow, simulations may time out. Increase logging to DEBUG to see simulation details:
export LOG_LEVEL=DEBUG Subtensor Sync Issues
Symptom: Validator cannot read metagraph, weight emission fails, or connection refused to subtensor.
Verify SUBTENSOR_URL is correct:
- Mainnet:
wss://entrypoint-finney.opentensor.ai:443 - Local testnet:
ws://localhost:9944
Test the connection:
# Using wscat
wscat -c "$SUBTENSOR_URL" -x '{"id":1,"jsonrpc":"2.0","method":"system_health","params":[]}' For local testnet, confirm the subtensor container is running and healthy:
docker compose ps subtensor Archive subtensor nodes can lag 20+ hours behind. For chain-head queries (block hashes, current block), finney is preferred.
Leader Election
Symptom: Validator is running but not processing orders (BlockLoop idle).
Only the leader runs the BlockLoop. Check if this validator is the leader:
curl http://localhost:9100/leader The leader is the validator with the highest TAO stake on subnet 112 (ties broken by hotkey lexicographic order). If you have less stake than other validators you will be a follower. For local testnet or development, set FORCE_LEADER=1 to bypass stake-based election.
On leader change, all in-flight work is dropped and the new leader reprocesses from scratch. This is expected behavior — don’t try to “fix” it.
Consensus Failures
Symptom: Plans are scored but never relayed on-chain. Quorum not reached in logs.
Check consensus configuration and discovered peers:
curl http://localhost:9100/consensus/info The peers field should be non-empty — if it’s empty, see No peers discovered below. Then verify:
- Discovered peer axon URLs are reachable from this validator (network / firewall rules)
- Check the live on-chain quorum:
make get-quorum-base(orcast call $VALIDATOR_REGISTRY 'quorumBps()(uint256)' --rpc-url $BASE_RPC_URL). The default6666is 2-of-3 BFT. See Quorum Management for changing it. VALIDATOR_PRIVATE_KEYis set and valid — used to sign EIP-712 consensus messages- Followers independently re-simulate and re-score; if a follower’s scores don’t both exceed threshold, it won’t sign
Subnet-team operators only: if ORDER_CONSENSUS_PEERS or CHAMPION_CONSENSUS_PEERS is set (the named manual override, used in our prod where metagraph axons aren’t published yet), confirm the addr@url format is correct. Third-party validators should leave both unset and let discovery handle it.
For local testnet only: set QUORUM_BPS_OVERRIDE to force a local value without going through the registry. Production deployments should leave it unset.
No peers discovered
Symptom: curl http://localhost:9100/consensus/info returns an empty peers list, or the daemon log says ProtocolConfig: peer discovery: probed 0 candidates → 0 verified.
Discovery requires four things to line up. Check each in order:
-
Your VALIDATOR_AXON_URL is set and reachable
From any other host:
curl $VALIDATOR_AXON_URL/identityShould return a JSON payload, not a
503. If503, the daemon is missing one of: bittensor wallet (nomy_hotkey),VALIDATOR_AXON_URLenv, or a signing key. -
Your hotkey is on the metagraph with the correct axon URL
btcli subnet metagraph --netuid 112 --subtensor.network finneyFind your hotkey. The
axoncolumn must matchVALIDATOR_AXON_URL. If wrong, re-runbtclito update it (or your bittensor wallet’sserve_axonrunner). -
Your EVM signing address is in the on-chain ValidatorRegistry
cast call $VALIDATOR_REGISTRY 'isValidator(address)(bool)' 0xYourEvmAddress --rpc-url $RPC_URLMust return
trueon every chain you operate on. Iffalse, see validator quickstart Step 4 for the handshake with the registry owner. -
Other validators' /identity endpoints are reachable from your host
curl <their-axon-url>/identityIf unreachable, it’s a network issue (firewall, NAT).
Symptom: identity probe returns valid JSON but discovery still rejects it. Check the daemon logs for Identity probe ... recovered EVM X but it is not in ValidatorRegistry.getValidators() — rejecting. That’s the on-chain handshake (step 3) for the other validator — they need to be added to the registry too.
JS Scoring Engine Issues
Symptom: JS scores are always 0.0, NaN, or scoring errors in logs.
Verify Node.js 20.x is installed:
node --version # Should be v20.x The JS engine runs app scoring code in a Node.js sandbox. Check that the app’s JS code exports the required functions:
module.exports = { config, manifest, score }; The score(plan, state, context) function receives:
plan—ExecutionPlandict (withmetadata,interactions, etc.)state— StructuredIntentStateexport withraw_params,control, andtyped_context(plus compatibilityextra/rawParamsaliases)context— Full context withcontext.simulation(token transfers, gas, state changes),context.state, andcontext.oracle
Common mistake: Writing score(plan, simulation, state) — the second parameter is state, not simulation. Simulation data is in context.simulation.
Check the validator logs for JS execution errors. Enable debug logging:
export LOG_LEVEL=DEBUG BlockLoop Not Processing Orders
Symptom: Orders are submitted but never processed.
-
Confirm you are the leader
See Leader Election above.
-
Check BlockLoop status
curl http://localhost:9100/blockloop/status -
Verify orders exist and are OPEN
curl http://localhost:9100/orders -
Check that app definitions are loaded
curl http://localhost:9100/healthThe response should show the number of loaded intents.
-
Verify store path
If using
--store-path, verify the file exists and is readable. -
Review logs
Look for errors during plan generation, simulation, or scoring.
Weight Emission Not Working
Symptom: Validator runs but never emits weights on-chain.
Weights are emitted once per epoch (default: 60 seconds, production: 1200s, configurable via --epoch-seconds). A champion miner must exist — weights are only emitted when a solver has been submitted and accepted:
curl http://localhost:9100/weights
curl http://localhost:9100/weights/history Verify Bittensor wallet configuration:
WALLET_NAMEandHOTKEY_NAMEmust match a wallet with a registered hotkey on subnet 112.- The wallet directory must be accessible (default:
~/.bittensor/wallets/).
On Bittensor 10.x, set_weights uses commit-reveal. The 60s --epoch-seconds default is too aggressive for mainnet — Bittensor’s weights_set_rate_limit (~100 blocks) will reject most attempts. Production deployments should use --epoch-seconds 1200.
Miner Submissions Rejected (leader-only)
Symptom: Miner solver submissions fail or are not adopted.
This section only applies if you are running the optional API service and are currently the leader (highest-stake validator). Third-party validators running only the canonical platform/validator/ stack don’t accept submissions — the leader does. If you’re not the leader, miners shouldn’t be hitting your box for submissions.
If you are the leader:
curl http://localhost:8080/v1/submissions Then verify:
- Solver code goes through three screening stages before adoption. Check logs for rejection reasons.
- The miner is registered on the subnet and its hotkey appears in the metagraph.
- The miner is pointing at the correct API URL for
/v1/submissions*.
Port Conflicts
Symptom: Address already in use on startup.
Check what is using the port:
lsof -i :9100 Change the port with --port:
python -m minotaur_subnet.validator.main --port 9101 For local testnet, port 9100 is used internally by Docker networking and is not exposed to the host by default.
Docker / Local Testnet Issues
Symptom: Containers fail to start or are unhealthy.
Check container status and logs:
docker compose ps
docker compose logs validator
docker compose logs anvil The validator daemon waits for its three Anvil forks to report healthy before starting (see depends_on in platform/validator/docker-compose.yml). On a first cold start this can take 60–90 seconds — anvil-btevm in particular waits on a public RPC. If you see “dependency failed to start”, wait a bit and docker compose up -d again; the start_period on each anvil healthcheck gives them grace time on subsequent retries.
The validator daemon and the optional API service are independent processes — neither depends on the other for startup.
For the local-testnet path, ensure the .env file in platform/local_testnet/ has valid ALCHEMY_RPC_URL and BASE_ALCHEMY_RPC_URL values. The canonical validator stack reads from platform/validator/.env.
If containers are stuck, do a clean restart:
make testnet-down
make testnet-up The init container runs once on startup (registers subnet, deploys contracts). Check its logs if other services fail:
docker compose logs init Insufficient TAO Balance
Symptom: Registration or weight emission fails with balance errors.
Check your balance:
btcli wallet balance --wallet.name my-validator --subtensor.network finney Subnet 112 registration requires a burn fee. Ensure you have enough TAO. For local testnet, the init container handles registration and funding automatically.