JESD204 FSM

JESD204 FSM Interface Linux Kernel Framework.

The JESD204 Linux Kernel Framework is a Finite State Machine (FSM) that is meant to synchronize other Linux device drivers to be able to properly bring-up & manage a single or multiple JESD204 links.

The JESD204 link bring-up and management is complicated, and it requires that many actors (device drivers), be in sync with each other, in various link bring-up states/stages. Typical components of an JESD204 link are the physical layer (PHY), link layer (LL), transport layer (TPL) and the high speed converter device and clocking layer with all it’s constrains and inter-dependencies. This has to happen not just at boot-time, but also during run-time, in case a link is going to be reconfigured or breaks and has to recover.

To achieve this, the JESD204 Linux Kernel Framework hooks into all the drivers that participate in the link management (bring-up/bring-down) and each driver provides a set of callbacks for each state that it supports.

The relationship between the devices is defined in the device-tree. The relationship is called a connection so as not to re-use the term link, which can cause confusion with the term link from the JESD204 standard. The whole group of devices, is actually a graph (or topology), with a single top-level device.

Device Topology

JESD204 devices form a directed graph (topology) where connections represent data flow between devices. The topology is defined in Device Tree using jesd204-inputs properties that specify parent-child relationships.

================================================================================
                         JESD204 Topology Graph
================================================================================

                          +--------------------------+
                          | ad9081@0                 |
                          | [TOP]                    |
                          +--------------------------+
                                        |
                                        v

           +--------------------------+  +--------------------------+
           | axi-ad9081-rx-hpc@8      |  | axi-ad9081-tx-hpc@8      |
           |                          |  |                          |
           +--------------------------+  +--------------------------+
                         |                             |
                         v                             v

           +--------------------------+  +--------------------------+
           | axi-jesd204-rx@8         |  | axi-jesd204-tx@8         |
           |                          |  |                          |
           +--------------------------+  +--------------------------+
                         |                             |
                         v                             v

           +--------------------------+  +--------------------------+
           | axi-adxcvr-rx@8          |  | axi-adxcvr-tx@8          |
           |                          |  |                          |
           +--------------------------+  +--------------------------+
                         |                             |
                         v                             v

                          +--------------------------+
                          | hmc7044@0                |
                          | [CLK]                    |
                          +--------------------------+

Legend: [TOP] = Top device (ADC/DAC)  [CLK] = Clock/SYSREF source

--------------------------------------------------------------------------------
Link 2 - RX (JESD204B)  State: opt_post_running_stage
--------------------------------------------------------------------------------
  JESD Parameters:  L=8  M=4  N=16  N'=16  F=1  K=32  S=1
  Encoder: 8B/10B    Subclass: 1  Scrambling: Yes  HD: No
  Sample Rate:  1.500000000000 GHz
  Lane Rate:    15.000000000000 GHz
  LMFC Rate:   46.875000000 MHz
  Device Clock: 375.000000000 MHz

--------------------------------------------------------------------------------
Link 0 - TX (JESD204B)  State: opt_post_running_stage
--------------------------------------------------------------------------------
  JESD Parameters:  L=8  M=4  N=16  N'=16  F=1  K=32  S=1
  Encoder: 8B/10B    Subclass: 1  Scrambling: Yes  HD: No
  Sample Rate:  1.500000000000 GHz
  Lane Rate:    15.000000000000 GHz
  LMFC Rate:   46.875000000 MHz
  Device Clock: 375.000000000 MHz

Each topology has:

  • Top-level device: Initiates state transitions and defines link IDs

  • Input connections: Declared via jesd204-inputs property

  • Link IDs: Specify which JESD204 link(s) a device participates in

Device Tree Properties

jesd204-device

Boolean property marking a node as a JESD204 device.

jesd204-top-device

Marks device as the top-level device. Value is the topology ID.

jesd204-link-ids

Array of link IDs this top device manages.

jesd204-inputs

Array of phandles with arguments: <&parent_device topo_id link_id>

jesd204-sysref-provider

Marks this device as the primary SYSREF provider for the topology.

jesd204-secondary-sysref-provider

Marks this device as a secondary SYSREF provider (for link recovery).

jesd204-stop-states

Array of state indices where the FSM should pause (for multi-topology sync).

jesd204-ignore-errors

Boolean to continue despite errors (useful for debugging).

To illustrate, here’s an example device-tree and below it how the graph representation looks like for an ADRV9009 FMC card on a ZC706.

// SPDX-License-Identifier: GPL-2.0
/*
 * Analog Devices ADRV9009 (via jesd204-fsm)
 * https://wiki.analog.com/resources/eval/user-guides/adrv9009
 * https://wiki.analog.com/resources/tools-software/linux-drivers/iio-transceiver/adrv9009
 * https://wiki.analog.com/resources/tools-software/linux-software/adrv9009_advanced_plugin
 *
 * hdl_project: <adrv9009/zc706>
 * board_revision: <>
 *
 * Copyright (C) 2020 Analog Devices Inc.
 */

#include "zynq-zc706-adv7511-adrv9009.dts"

#include <dt-bindings/iio/adc/adi,adrv9009.h>

&trx0_adrv9009 {
    jesd204-device;
    #jesd204-cells = <2>;
    jesd204-top-device = <0>; /* This is the TOP device */
    jesd204-link-ids = <DEFRAMER_LINK_TX FRAMER_LINK_RX FRAMER_LINK_ORX>;

    jesd204-inputs =
        <&axi_adrv9009_rx_jesd 0 FRAMER_LINK_RX>,
        <&axi_adrv9009_rx_os_jesd 0 FRAMER_LINK_ORX>,
        <&axi_adrv9009_tx_jesd 0 DEFRAMER_LINK_TX>;

    /delete-property/ interrupts;
};

&axi_adrv9009_rx_jesd {
    jesd204-device;
    #jesd204-cells = <2>;
    jesd204-inputs = <&axi_adrv9009_adxcvr_rx 0 FRAMER_LINK_RX>;
};

&axi_adrv9009_rx_os_jesd {
    jesd204-device;
    #jesd204-cells = <2>;
    jesd204-inputs = <&axi_adrv9009_adxcvr_rx_os 0 FRAMER_LINK_ORX>;
};

&axi_adrv9009_tx_jesd {
    jesd204-device;
    #jesd204-cells = <2>;
    jesd204-inputs = <&axi_adrv9009_adxcvr_tx 0 DEFRAMER_LINK_TX>;
};

&axi_adrv9009_adxcvr_rx {
        jesd204-device;
    #jesd204-cells = <2>;
    jesd204-inputs =  <&clk0_ad9528 0 FRAMER_LINK_RX>;
    clocks = <&clk0_ad9528 1>; /* div40 is controlled by axi_adrv9009_rx_jesd */
    clock-names = "conv";
};

&axi_adrv9009_adxcvr_rx_os {
        jesd204-device;
    #jesd204-cells = <2>;
    jesd204-inputs =  <&clk0_ad9528 0 FRAMER_LINK_ORX>;
    clocks = <&clk0_ad9528 1>; /* div40 is controlled by axi_adrv9009_rx_os_jesd */
    clock-names = "conv";
};

&axi_adrv9009_adxcvr_tx {
        jesd204-device;
    #jesd204-cells = <2>;
    jesd204-inputs =  <&clk0_ad9528 0 DEFRAMER_LINK_TX>;
    clocks = <&clk0_ad9528 1>; /* div40 is controlled by axi_adrv9009_tx_jesd */
    clock-names = "conv";
};

&clk0_ad9528 {
    jesd204-device;
    #jesd204-cells = <2>;
    jesd204-sysref-provider;

    adi,sysref-pattern-mode = <SYSREF_PATTERN_NSHOT>;
    /delete-property/ adi,sysref-request-enable;
};

The structure above translates to the image below.

https://wiki.analog.com/_media/resources/tools-software/linux-drivers/jesd204/adrv9009-fmc-jesd204-fsm-topology.png

Design Principles

The picture in the diagram makes thinks look really simple, but in reality they aren’t. If any of the devices in that topology/graph has a change of state, or an error occurs, multiple devices must be re-synchronized.

Also, the device-tree described above, is an actual working device-tree. Some variations may be found in the ADI Linux kernel repository (i.e. some more nodes in-between the nodes described above).

There may be other frameworks in Linux that describe this topology, but the challenge with JESD204 is that (at this current point in time), there is no clear idea of the minimum amount of states needed to synchronize or re-synchronize in order to recover a JESD204 link if it goes down.

The end-result is an FSM that tries to make all the devices go through the same states at once.

Some design principles, defined so far for this framework:

  1. A group of devices shall be named a topology (or informally graph or tree); while the picture above looks simple, more complicated topologies should be support with this framework

  2. Each topology shall have a single top-level device; for a multi-chip topology, one will be picked to be the top-level one. For IIO devices, it is assumed that this device will also register the IIO buffer.

  3. Each device driver must register with the JESD204 framework to be able to take part in a topology

  4. A device may only be part of a topology, if it is defined in the device-tree (or other configuration mechanism) via a ‘jesd204-device’ node/definition.

  5. A top-level device may be defined via ‘jesd204-top-device = <ID>’ ; ID is a number defining the topology ID, to be able to specify more topologies

  6. The top-level device defines the JESD204 link IDs in the device-tree (via ‘jesd204-link-ids’ array property); the order in this array, is the order in which the JESD204 links are initialized;

  7. Each device declares its connections using the jesd204-inputs list array property. The jesd-inputs are declared using following syntax: jesd204-inputs = <phandleX TOPOLOGY_ID LINK_ID_X>, <phandleY TOPOLOGY_ID LINK_ID_Y>, …

  8. All devices in a topology must go through the same states together when bringing up a link and in the same order reverse in reverse when bringing down or rolling back; example: all 8 devices must go from S0 to S9 together, and S9 to S0 together

  9. When going through each state, each device-driver will provide it’s own set of callbacks for what to do in each state; if a callback it is not provided, it is assumed that the device-driver doesn’t care about that specific state, and the transition will continue

  10. When an error occurs in any of the states, the states should automatically be rolled back from the state that has errored back to the initial/idle state; so, if going from S0 to S9 and S3 faults, the transition will be S0, S1, S2, S3, S2, S1, S0 (in perfect symmetry)

  11. Rolling back doesn’t stop even when any of the states errors out; it is of higher priority to reach back to IDLE state, than to stop when rolling back

  12. Each callback (in the driver) must return either JESD204_STATE_CHANGE_DONE (value 1) or JESD204_STATE_CHANGE_DEFER (value 0), or an error if it occurs (any negative value). The decision was made for JESD204_STATE_CHANGE_DONE to be 1, so that when a new driver implements a callback for a framework, return 0 doesn’t mean DONE (i.e. accidental/unwanted state transitions);

  13. The JESD204_STATE_CHANGE_DEFER is important if a state should stop (but not rollback) and wait for an external call (a thread/retry mechanism) to restart the FSM and continue from the current state; so when transitioning from state S0 to S9, and S4 calls for a DEFER, the FSM will stop at S4, and an external entity (retry loop, workq,interrupt ,etc) would call the FSM to continue the transition up to S9; the DEFER mechanism/logic allows us to pause a transition of states if any device (in the topology) calls for it (because it isn’t ready yet)

  14. For any particular state, the callbacks of the top-level device must be called last; for the other devices it shouldn’t matter; the top-level is typically the ADC/DAC/XCVR, so it is important that this is called last to enable/disable the final bits of a link

  15. There can be only a single device that can act as a SYSREF provider in a topology; defining more than one will fail the initialization of the topology

TL;DR - show me the code

The current source code of the JESD204 Linux framework resides in drivers/jesd204/

It is comprised of the current source files:

  • jesd204-core.c - the core file of the framework - it reads the device-tree, constructs the topology

  • jesd204-fsm.c - the entire FSM logic

  • jesd204-sysfs.c - the Linux sysfs code to export files for debug/control/etc under /sys/bus/jesd204/devices/jesd204:X

  • jesd204-priv.h - internal framework structures/functions to be shared inside the framework

  • include/linux/jesd204/jesd204.h - API definitions to be used by drivers registering with the framework

How does it work?

A typical driver needs to provide some data to the framework. Example (for ADRV9009):

static const struct jesd204_dev_data jesd204_adrv9009_init = {
    .state_ops = {
        [JESD204_OP_DEVICE_INIT] = {
            .per_device = adrv9009_jesd204_uninit,
        },
        [JESD204_OP_LINK_INIT] = {
            .per_link = adrv9009_jesd204_link_init,
        },
        [JESD204_OP_CLOCKS_ENABLE] = {
            .per_link = adrv9009_jesd204_clks_enable,
        },
        [JESD204_OP_LINK_SETUP] = {
            .per_device = adrv9009_jesd204_link_setup,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
            .post_state_sysref = true,
        },
        [JESD204_OP_LINK_ENABLE] = {
            .per_link = adrv9009_jesd204_link_enable,
            .post_state_sysref = true,
        },
        [JESD204_OP_LINK_RUNNING] = {
            .per_link = adrv9009_jesd204_link_running,
        },
        [JESD204_OP_OPT_SETUP_STAGE1] = {
            .per_device = adrv9009_jesd204_setup_stage1,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
            .post_state_sysref = true,
        },
        [JESD204_OP_OPT_SETUP_STAGE2] = {
            .per_device = adrv9009_jesd204_setup_stage2,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
            .post_state_sysref = true,
        },
        [JESD204_OP_OPT_SETUP_STAGE3] = {
            .per_device = adrv9009_jesd204_setup_stage3,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
            .post_state_sysref = true,
        },
        [JESD204_OP_OPT_SETUP_STAGE4] = {
            .per_device = adrv9009_jesd204_setup_stage4,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
        },
        [JESD204_OP_OPT_SETUP_STAGE5] = {
            .per_device = adrv9009_jesd204_setup_stage5,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
        },
        [JESD204_OP_OPT_POST_RUNNING_STAGE] = {
            .per_device = adrv9009_jesd204_post_running_stage,
            .mode = JESD204_STATE_OP_MODE_PER_DEVICE,
        },
    },

    .max_num_links = 3,
    .sizeof_priv = sizeof(struct adrv9009_jesd204_priv),
};

The driver needs to call devm_jesd204_dev_register(). All this does, is to bind the driver from the probe the device-tree binding/definition for this device’s place in a JESD204 topology. If the devm_jesd204_dev_register() function returns NULL, this driver is not part of any JESD204 topology/operation. For example, some clock-chip drivers can operate as clock-drivers, or as JESD204 providers.

Example:

jdev = devm_jesd204_dev_register(&spi->dev, jesd204_init);
if (IS_ERR(jdev))
    return PTR_ERR(jdev);

All drivers must finally call (in probe) the jesd204_fsm_start() on their object from the framework. This is true for all devices, even the ones that are not top-level devices.

Example:

ret = jesd204_fsm_start(jdev, JESD204_LINKS_ALL);

If jdev is NULL, that is fine. Typically, a driver may call this FSM for all JESD204 links that are defined in the device-tree. With the sysfs, the jesd204_fsm_start() may be called for a single JESD204 link.

There’s an equivalent jesd204_fsm_stop() that will stop the FSM.

The proper functioning of the FSM relies on the driver correctly using the framework and that that connections between devices be properly defined in the device-tree.

The initialization data

The initialization data has type:

/**
 * struct jesd204_dev_data - JESD204 device initialization data
 * @sysref_cb       SYSREF callback, if this device/driver supports it
 * @sizeof_priv     amount of data to allocate for private information
 * @links       JESD204 initial link configuration
 * @max_num_links   maximum number of JESD204 links this device can support
 * @num_retries     number of retries in case of error (only for top-level device)
 * @state_ops       ops for each state transition of type @struct jesd204_state_op
 */
struct jesd204_dev_data {
    jesd204_sysref_cb           sysref_cb;
    size_t                  sizeof_priv;
    const struct jesd204_link       *links;
    unsigned int                max_num_links;
    unsigned int                num_retries;
    struct jesd204_state_op         state_ops[__JESD204_MAX_OPS];
};
  • A SYSREF provider hooks itself with the sysref_cb hook, but there also must be a device-tree property to mark that this is the SYSREF provider used in the topology.

  • Optionally a driver may reserve some memory for private state data via sizeof_priv and can be obtained via a jesd204_dev_priv(jdev)

  • The links field is used to define JESD204 links in a static manner in the driver; these may go away if there aren’t any clear use-cases for them; but it could be that some devices allow only a fixed configuration, so these could be useful in those cases

  • max_num_links - maximum number of JESD204 links that this device supports; the actual number will be configured from the device-tree, but it shouldn’t exceed this number

  • num_retries - number of retries in case of error during an FSM start/link-bring-up

  • state_ops - more below

Each driver hooks it’s callback by adding the proper entry in the state_ops array. The type of a state_op is defined as (may be subject to change):

/**
 * struct jesd204_state_op - JESD204 device per-state op
 * @mode        mode for this state op, depending on this @per_device or @per_link is called
 * @per_device      op called for each JESD204 **device** during a transition
 * @per_link        op called for each JESD204 **link** individually during a transition
 * @post_state_sysref   true if a SYSREF should be issued after the state change
 */
struct jesd204_state_op {
    enum jesd204_state_op_mode  mode;
    jesd204_dev_cb          per_device;
    jesd204_link_cb         per_link;
    bool                post_state_sysref;
};

During a state-transition a state callback will be called:

  • once for each JESD204 link if the mode is default JESD204_STATE_OP_MODE_PER_LINK; in this case the per_link callback is called

  • once for each device (regardless of the number of JESD204 links per device) if mode is JESD204_STATE_OP_MODE_PER_DEVICE; in this case the per_device callback is called;

It’s unsure (yet) whether it makes sense to call both per_link and per_device callbacks for a state. It could be an option at a later point in time.

Optionally, each state may request a SYSREF call, by setting post_state_sysref to true.

Why yet another kernel framework?

Before the introduction of the JESD204-FSM kernel framework, JESD204 link bring-up and management was subject to some known deficiencies, incurred by the Linux driver model. In order to understand the original challenges, they are in following explained with their new solution.

Lack of common integrated management core (framework)

Prior to the new kernel framework, each converter driver required a lot of Linux kernel common clock framework (CCF) clocks connected. There used to be one for the JESD lane clock, the JESD core/link clock (typical lane rate / 40), the converter clock and the SYSREF clock. Each converter driver implemented some math on how to calculate the link and lane clock from its configuration. This caused a lot of duplicated boilerplate code. However more problematic was that the link enable was done using the CCF clk_prepare_enable() API. This was convenient since the CCF ensured that the parent of each clock was enabled prior to its childs. But depending on the JESD204 link direction this was not always the ideal sequence across the entire chain. Error propagation was also suboptimal, since an error code delivered to the driver which controlled the clk_enable could have been originated anywhere in the clock tree, from clock-chip, via the PHY-Layer, LINK-Layer, etc. One other issue was that besides clk_enable and clk_set_rate() there were other things to control, such as SYSREF N-SHOT mode, which wasn’t possible due to the lack of a proper API. Also, the CCF uses reference counting, so disabling a clock doesn’t necessarily disable the clock in case it was enabled twice, possibly from a different device. There were many more things such as controlling a clock from a CCF clock implementation wasn’t possible due to the global CCF lock (spinlock). The new implementation still used CCF clocks in its intended way, but no longer using it for link bring-up and enable which it wasn’t intended for. Last but not least, on 32-bit Linux systems the CCF rate is handled as 32-bit value, which without truncation easily overflowed with the JESD204 lane rate passed in Hz. With the new framework required clocks are automatically computed and checked. The framework implements the sequence, error conditions are detected and handled.

More Information