Hardware CI
Hardware tests under test/hw/ are driven by GitHub Actions through
the Hardware Tests workflow (.github/workflows/hardware-test.yml).
This page is the reference for setting up the runners, secrets, and
per-node configuration that workflow depends on — plus the recipe for
onboarding a new hardware node.
How it works
The workflow has three jobs:
preflight — runs on the
hw-coordinatorself-hosted runner (the only host with a route to the private-lab labgrid coordinator), reads the node manifest (.github/hw-nodes.json), and queries the coordinator for advertised places. It emits a JSON array of available nodes as theavailable_nodesjob output. If the coordinator is unreachable the TCP probe fails fast andavailable_nodesis[]— all downstream matrix legs then skip (green) instead of hanging on hardware that is not powered up.hw-direct — a matrix job, one leg per available node, running on the self-hosted runner attached to that node (
runs-on: [self-hosted, <matrix.node.runner_label>]). Tests run without the coordinator, against a node-local labgrid YAML whose path is in the runner-level env varLG_DIRECT_ENV. WhenLG_DIRECT_ENVis unset on a runner the job skips all work and exits green — add the env entry in~/actions-runner/.envto light up direct-mode on that node.hw-coord — a matrix job, one leg per available node, running on the same per-node runner as hw-direct (label
hw-<place>). Each leg uses the committedtest/hw/env/<place>.yamlfrom the manifest and talks to the labgrid coordinator viaLG_COORDINATOR— so pytest exercises the coordinator code path while the XSA toolchain (Vivado’s sdtgen, xsct) and kernel build artifacts already present on the per-node runner are used in place. Thehw-coordinatorrunner itself carries only the preflight job.
Fork-PR protection is handled by GitHub Actions’ built-in workflow
approval gate rather than a custom environment:. Under Settings
→ Actions → General → Approval for outside collaborators, workflows
triggered by fork PRs require a maintainer to click “Approve and run”
before any self-hosted runner is scheduled. Same-repo PRs, pushes to
main, and workflow_dispatch are always trusted and run
directly.
Boot reliability
The BootFPGASoC strategy (used on mini2 / ZCU102 and similar
SD-mux targets) does two things to tolerate the common “silent first
power-on” flake in CI:
Pre-emptive cold-cycle. After muxing the SD card back to the DUT and before the kernel-banner expect, the strategy does an explicit
power.off() → sleep 5 → power.on(). The priorStatus.powered_offtransition did already toggle power, but the intervening SD-mux operations (which briefly energise the SD slot from the host side) can leave the board in a latched state where the firston()looks applied but the board stays silent on UART. A clean cycle right before the boot window sidesteps this.One-shot retry. If the banner expect still times out with zero bytes captured (board genuinely never emitted anything), the strategy tears down the shell driver, power-cycles once more, and re-runs the expect. The retry only kicks in on 0-byte silence — if any bytes made it through, the failure is a real boot problem (wrong DTB, bad BOOT.BIN, etc.) and is raised immediately.
Both behaviours are tuned by attributes on the strategy:
wait_for_kernel_banner_timeout(default 120 s) — time to wait forLinuxin the UART stream on each attempt.kernel_banner_retries(default 1) — number of additional boot attempts on zero-byte silence.debug_write_boot_log(truein every env yaml shipped here) — dump the pexpect buffer at attempt timeout touart_log_kernel_banner_attempt<N>_<ts>.txtat the workspace root. The CI workflow’s artifact step captures these so flaky boots can be post-mortem’d from the Actions run page.
Board power-off on teardown
The board fixture in test/hw/conftest.py wraps its yield
in a try/finally so that every hw test-module exits with the board
transitioned back to powered_off. Lab hardware is never left
energised between runs, and the fixture’s fallback path calls
power.off() directly on the bound PowerProtocol driver if the
strategy is in a broken state from a prior failure.
Node manifest
.github/hw-nodes.json is the single source of truth for per-node
CI configuration:
[
{
"place": "bq",
"runner_label": "hw-bq",
"env_remote": "test/hw/env/bq.yaml",
"tests": ["test/hw/test_adrv9371_zc706_hw.py"]
},
{
"place": "mini2",
"runner_label": "hw-mini2",
"env_remote": "test/hw/env/mini2.yaml",
"tests": [
"test/hw/test_ad9081_zcu102_xsa_hw.py",
"test/hw/test_ad9081_zcu102_system_hw.py"
]
}
]
Field reference:
place— the labgrid coordinator place name. Must match what the exporter registers (labgrid-client placeson the coordinator host).runner_label— the extra label on the self-hosted runner physically attached to the board. Convention:hw-<place>.env_remote— repo-relative path to the committed env YAML (conventionallytest/hw/env/<place>.yaml) used by thehw-coordmatrix leg. Must useRemotePlaceonly — no local paths, no credentials, no serial device names.tests— list ofpytesttargets (test files) to run for this node, used verbatim in bothhw-directandhw-coord.
Adding a new hardware node
One-time setup:
Stand up an exporter that registers a new place
<name>with the coordinator at10.0.0.41:20408— install it as a systemd unit viascripts/labgrid-exporter/install.sh(see Exporter systemd service). Verify withlabgrid-client -x 10.0.0.41:20408 places.Register a new self-hosted runner with labels
self-hosted,hw-<name>on the host physically attached to the board (see Self-hosted runner registration below).Author a node-local labgrid YAML on the runner host — typically
~/ci/lg_direct.yaml— that describes the exporter resources directly (noRemotePlace). Referencelg_adrv9371_zc706_tftp.yamlonbqfor a worked example.Put
LG_DIRECT_ENV=/home/<user>/ci/lg_direct.yamlin~/actions-runner/.envon the runner host. The runner picks this up automatically on the next job.Author a
test/hw/env/<name>.yamlusingRemotePlaceonly; commit it.Append one entry to
.github/hw-nodes.jsonwith the new place, runner label, env_remote path, and test list. No workflow edits needed.
That’s it — the next workflow run picks up the new node automatically
via fromJSON(needs.preflight.outputs.available_nodes).
Self-hosted runner registration
Prerequisites (one-time, repo-admin only):
GitHub CLI (
gh) installed locally, authenticated with a repo admin account foranalogdevicesinc/pyadi-dt.
Generate a one-use registration token:
gh api -X POST \
/repos/analogdevicesinc/pyadi-dt/actions/runners/registration-token
Copy the token value from the response.
On the target host:
mkdir -p ~/actions-runner && cd ~/actions-runner
# Pick the latest from https://github.com/actions/runner/releases
curl -O -L https://github.com/actions/runner/releases/download/v2.319.1/actions-runner-linux-x64-2.319.1.tar.gz
tar xzf actions-runner-linux-x64-2.319.1.tar.gz
./config.sh --url https://github.com/analogdevicesinc/pyadi-dt \
--token <TOKEN> \
--labels self-hosted,hw-<place>
sudo ./svc.sh install && sudo ./svc.sh start
For the runner on the coordinator host (10.0.0.41), use
--labels self-hosted,hw-coordinator. That is the only label the
workflow hardcodes; every other hw-* label flows from the
manifest.
Populate the runner-level env file so the direct-mode job can find the node-local labgrid YAML:
echo "LG_DIRECT_ENV=/home/$USER/ci/lg_direct.yaml" >> ~/actions-runner/.env
sudo ./svc.sh stop && sudo ./svc.sh start
(The hw-coordinator runner does not need LG_DIRECT_ENV — it
only runs hw-coord legs, which use the committed
test/hw/env/*.yaml files.)
Direct-mode YAML templates
A node-local direct-mode YAML has the same shape as the coordinator
env YAML except it declares resources + drivers in full rather than
pointing at a RemotePlace. Because it contains host-specific
paths (serial by-id symlinks, sdmux ID_PATH, MassStorageDevice
partition path) and credentials (VeSync account for the smart plug,
Home Assistant tokens for ZC706’s HAS outlet), it must not be
checked into the repo.
Two redacted templates ship in
doc/source/developer/samples/:
lg_direct_mini2.yaml.example— AD9081 + ZCU102 on mini2. Fill in serial port symlink, sdmux ID_PATH, MassStorageDevice partition, and VeSync credentials.lg_direct_nuc.yaml.example— FMCDAQ3 + VCU118 on nuc. Fill in serial port symlink, VeSync credentials; JTAGroot_target/microblaze_targetfollow the existing exporter config at~/dev/lg-coordinator/lg_fmcdaq3_vcu118_exporter.yamlon nuc.
Per-node bring-up:
Copy the template onto the runner host, rename and strip the
.examplesuffix:scp doc/source/developer/samples/lg_direct_mini2.yaml.example \ mini2:~/ci/lg_direct.yaml
SSH in and replace every
<FILL>placeholder with the host-specific value. Cross-reference the exporter YAML already on the host for serial symlinks, USB device paths, and VeSync credentials.Point the runner at it and restart the runner service:
echo "LG_DIRECT_ENV=/home/$USER/ci/lg_direct.yaml" >> ~/actions-runner/.env sudo ./svc.sh stop && sudo ./svc.sh start
Trigger the workflow (
gh workflow run hardware-test.yml) and check thathw-direct (<place>)now runs the full test path instead of emittingLG_DIRECT_ENV is not set on this runner — skipping direct-mode tests..
The templates are intentionally minimal — only the drivers and resources each board’s test actually uses — so the same bring-up works for both ZCU102 (SD-mux boot) and VCU118 (JTAG boot) after swapping the strategy / driver section.
System-tool prerequisites on each hw-node runner:
sudo apt-get install -y device-tree-compiler cpp u-boot-tools
# For ZynqMP nodes:
pip install --user xilinx-sdt-gen
# For ZCU102/AD9081 (USB SD-mux mode):
sudo apt-get install -y usbsdmux
On every hw-node runner (including the coordinator host), the
workflow uses uv to build two
persistent venvs under ~/.cache/adidt-ci/: one for
labgrid-client on the coordinator host, and one holding an
editable pip install -e ".[dev]" of adidt on every runner.
.github/scripts/bootstrap-uv.sh curl-installs uv into
~/.local/bin on first use, so no distro Python packaging is
required — only curl and a working python3 interpreter (both
present on stock Debian/Ubuntu).
Coordinator-mode tests additionally require SSH key-auth from the
hw-coordinator runner to the exporter host (for the
MassStorageDriver SSH-proxy path) and write access to
~/.cache/adidt/kernel/ for the kernel image cache.
Exporter systemd service
Each hw-node runs a labgrid-exporter process that publishes its
local hardware to the coordinator. Production hosts run it as a
non-templated systemd service reading
/etc/labgrid/exporter.yaml, installed by
scripts/labgrid-exporter/install.sh.
See Labgrid exporter systemd service for the convention, installer options, and day-to-day operation.
Fork-PR approval gate
Fork-PR workflows are held by GitHub’s built-in approval gate
configured at Settings → Actions → General → Approval for outside
collaborators. Pick Require approval for first-time contributors
(or stricter) so fork-PR runs pause until a maintainer clicks
Approve and run in the PR’s Actions tab. No custom
environment: is needed on the workflow side.
Private-repo dependency access
The [dev] extras in pyproject.toml include
pyadi-build @ git+https://github.com/tfcollins/pyadi-build.git,
which currently points at a private repo. For CI to uv pip
install it, scope a fine-grained PAT to the private repo and store
it as a repository secret.
Create a fine-grained PAT at https://github.com/settings/tokens?type=beta:
Token name:
pyadi-dt-ci-pyadi-build-readResource owner: the org/user that owns the private dependency (here:
tfcollins)Repository access: Only select repositories → pick
tfcollins/pyadi-build(and any other private deps).Repository permissions → Contents: Read-only.
Store it at repository scope. Fork PRs don’t see repo secrets until a maintainer approves via the built-in workflow approval gate, so the PAT stays protected without any environment indirection.
# Repo-scoped secret — visible to trusted (non-fork) runs, # and to fork-PR runs only after maintainer "Approve and run". gh secret set PYADI_BUILD_TOKEN \ --repo analogdevicesinc/pyadi-dt \ --body 'github_pat_...'
(Or via the GitHub UI: Settings → Secrets and variables → Actions → New repository secret.)
The install step reads secrets.PYADI_BUILD_TOKEN and exports it
as a process-local GIT_CONFIG_COUNT / GIT_CONFIG_KEY_0 /
GIT_CONFIG_VALUE_0 triple that rewrites https://github.com/
to https://x-access-token:<token>@github.com/ for the duration
of that step. The secret is never written to the runner’s
~/.gitconfig so it doesn’t leak to later jobs.
Debug artifacts
Every hw-direct and hw-coord matrix leg uploads a workflow artifact
named hw-<mode>-<place>-output containing, per run:
Generated
.dts/ pre-cpp.pp.dts/ compiled.dtbfrom the test’s output directory (test/hw/output/).dmesg_*.logsnapshots taken bycollect_dmesg.Per-attempt
uart_log_kernel_banner_attempt<N>_<ts>.txtdumps from failed boot attempts.
Artifacts stay for 14 days. Fast local diff against a known-good reference:
gh run download <RUN_ID> -n hw-coord-mini2-output -D /tmp/artifact
python3 -m adidt.tools.dts_compare_cli \
test/devices/fixtures/ad9081_zcu102_xsa_reference.dts \
/tmp/artifact/ad9081_zcu102.dts
See Local DT-emission parity test (below) for the property-level inspection flow.
Local DT-emission parity test
Most driver-probe failures on the declarative System API path today
are visible in the generated DTS before anything is flashed — they
show up as missing / wrong properties compared to the XSA pipeline’s
emission (which is known to probe on real hardware). The parity
test at test/devices/test_system_ad9081_dts_parity.py pins the
System API’s emitted DTS against a committed XSA reference fixture
(test/devices/fixtures/ad9081_zcu102_xsa_reference.dts) over the
full list of kernel-critical properties defined in
adidt.tools.dts_inspect.KERNEL_CRITICAL_KEYS. It runs in under
1 s and gives a focused per-property failure when the two paths
diverge.
Regenerate the fixture when the XSA path evolves:
gh run download <PASSING_RUN_ID> -n hw-coord-mini2-output -D /tmp/ref
cp /tmp/ref/ad9081_zcu102.dts test/devices/fixtures/ad9081_zcu102_xsa_reference.dts
Troubleshooting
- All hw jobs skip on every run.
Preflight is marking everything unavailable. Check the preflight job logs for “Coordinator … unreachable” or the places listing. Try
labgrid-client -x 10.0.0.41:20408 placesfrom any host.- One node’s jobs skip while others run.
The corresponding exporter is not advertising its place to the coordinator. SSH to that node and restart the unit (
sudo systemctl restart labgrid-exporter@<place>— see Exporter systemd service), then verify the place shows up inlabgrid-client places. Checkjournalctl -u labgrid-exporter@<place> -n 50if the restart alone doesn’t recover it.- ``hw-direct`` fails with “LG_DIRECT_ENV is not set”.
The runner’s
~/actions-runner/.envdoes not defineLG_DIRECT_ENV, or the file path it points at does not exist. Edit the file and restart the runner service.- ``hw-coord`` fails with “No such file” on the env yaml.
The manifest entry’s
env_remotepath points at a file that is not committed at the listed repo-relative path (conventionallytest/hw/env/<place>.yaml). Either commit the YAML or fix the manifest entry.- PR from a fork never runs hw jobs.
Fork PRs pause on GitHub’s built-in workflow approval gate. Open the PR’s Actions tab and click Approve and run to release the jobs.
- ``hw-coord (<place>)`` takes ~120 s longer than usual.
The
BootFPGASoCretry path fired on a zero-byte serial timeout. Check theuart_log_kernel_banner_attempt1_*.txtartifact — if it’s empty, the first power-on was silent (the pre-emptive cold-cycle should have handled this, so getting here usually means the serial exporter on the hw node is misbehaving). Restart the exporter —sudo systemctl restart labgrid-exporter@<place>— which also respawnsser2net.- Board left powered on after a local test run.
The
boardfixture intest/hw/conftest.pypowers the board off at teardown, but if the fixture itself errored out before theyield(e.g.require_hw_prereqsfailed) the teardown didn’t run. Manual recovery:labgrid-client -x <coordinator> -p <place> acquire, thenpower offin the labgrid shell.