

# NVMe-IP for Gen4 reference design manual

|   |        |                               | <u>Rev1.2</u> | 22-Dec-23 |
|---|--------|-------------------------------|---------------|-----------|
| 1 | Overv  | iew                           |               | 2         |
| 2 | Hardw  | are overview                  |               | 3         |
|   |        | stGen                         |               |           |
|   |        | /Me                           |               |           |
|   | 2.2.1  | NVMe-IP                       |               |           |
|   | 2.2.2  | Integrated Block for PCIe     |               |           |
|   | 2.2.3  | Dual port RAM                 |               |           |
|   | 2.3 CF | PU and Peripherals            |               |           |
|   | 2.3.1  | AsyncAxiReg                   |               |           |
|   | 2.3.2  | UserReg                       |               |           |
| 3 | CPU F  | -irmware                      |               | 17        |
|   |        | st firmware (nvmeiptest.c)    |               |           |
|   | 3.1.1  | Identify Command              |               |           |
|   | 3.1.2  | Write/Read Command            |               |           |
|   | 3.1.3  | SMART Command                 |               |           |
|   | 3.1.4  | Flush Command                 |               |           |
|   | 3.1.5  | Secure Erase Command          |               |           |
|   | 3.1.6  | Shutdown Command              |               |           |
|   |        | Inction list in Test firmware |               |           |
| 4 |        | ole Test Result               |               |           |
| 5 |        | on History                    |               |           |



### 1 Overview

NVM Express (NVMe) is a specification that defines the interface between the host controller and solid state drive (SSD) through PCI Express. It optimizes the process of issuing commands and completions by utilizing only two registers (Command issue and Command completion), and enables parallel operation by supporting up to 64K commands within a single queue. This improves transfer performance for both sequential and random access.

In the PCIe SSD market, two standards are commonly used: AHCI and NVMe. AHCI is the older standard used for providing the interface to SATA hard disk drives while NVMe is optimized for non-volatile memory like SSDs. A detailed comparison between the AHCI and NVMe protocol is available in the "A Comparison of NVMe and AHCI" document at

https://sata-io.org/system/files/member-downloads/NVMe%20and%20AHCI %20 long .pdf

An example of an NVMe storage device can be found at <a href="http://www.nvmexpress.org/products/">http://www.nvmexpress.org/products/</a>



Figure 1-1 NVMe protocol layer

To access NVMe Gen4 SSD, the general system implements an NVMe driver running on the processor, as shown in the left side of Figure 1-1. The physical connection of NVMe standard is PCIe connector which is one-to-one type, allowing for each PCIe host to connect to one PCIe device without the use of a PCIe switch. NVMe-IP implements the NVMe driver for accessing NVMe SSD using pure hardware logic. This allows the user to access NVMe SSD without requiring any processor or driver, but instead using the NVMe IP in the FPGA board. The use of pure hardware logic for the NVMe host controller reduces the overhead time for software-hardware handshake, resulting in high performance for both writing and reading with NVMe SSD.



#### 2 Hardware overview



Figure 2-1 NVMe-IP for Gen4 demo hardware

The hardware modules in the test system can be categorized into three sections: test function (TestGen), NVMe function (CtmRAM, IdenRAM, U2IPFIFO, IP2UFIFO, NVMe-IP for Gen4, and PCle block), and CPU system (CPU and LAxi2Reg).

The TestGen connects to the user interface of NVMe-IP for Gen4 and is responsible for generating test data stream of Write command and verifying test data stream of Read command. The write and read data stream are stored at two FIFOs (U2IPFIFO and IP2UFIFO). The TestGen always writes or reads data when the FIFO is ready, allowing for optimal transfer performance evaluation for the system.

The NVMe consists of the NVMe-IP for Gen4 and the PCIe hard IP (Integrated Block for PCI Express), enabling direct access to an NVMe Gen4 SSD without PCIe switch connection. The command request and the parameters of each command, the inputs of NVMe-IP for Gen4, are controlled by the CPU through LAxi2Reg module. While the data interface for both Custom and Identify commands is connected to RAMs that are accessible by the CPU.



The CPU is connected to the LAxi2Reg module, for interface with the NVMe test logics. Integrating the CPU into the test system allows the user to set the test parameters and monitor the test status via the Serial console. Using CPU also facilitates the execution of multiple test cases to verify the functionality of the IP. The default firmware for the CPU includes functions for executing the NVMe commands by using the NVMe-IP for Gen4.

Figure 2-1 displays three clock domains: CpuClk, UserClk, and PCleClk. CpuClk serves as the clock domain for the CPU and its peripherals, requiring a stable clock that can function independently from other hardware. UserClk is the clock domain utilized for the operation of the NVMe-IP for Gen4, RAM, and TestGen. According to the NVMe-IP for Gen4 datasheet, the frequency of UserClk must be greater than or equal to PCleClk. The reference design utilizes 275 MHz for UserClk at PCle Gen4 speed. PCleClk, generated by PCle hard IP, is synchronized with the 256-bit AXI4-stream and operates at 250 MHz for 4-lane PCle Gen4. Further hardware details are described below.

#### 2.1 TestGen



The TestGen module handles the data interface of NVMe-IP, facilitating data transfer for both Write and Read commands. In case of a Write command, TestGen sends 256-bit test data to NVMe-IP via U2IPFIFO. In contrast, for a Read command, the test data is received from IP2UFIFO for comparison with the expected value, ensuring data accuracy. Data bandwidth of TestGen is set to match that of NVMe-IP by running at the same clock and data bus size. The control logic ensures that the Write or Read enable is always asserted to 1b when the FIFO is ready to write or read data, respectively. This ensures that both U2IPFIFO and IP2UFIFO are always ready to transfer data with NVMe-IP without delay, providing the best performance for writing and reading data with the SSD through NVMe-IP.

To provide a flexible test environment, the user can set some test parameters to control the TestGen module such as total transfer size, transfer direction, and test pattern selector via the console. These test parameters are stored in the Register block. The detailed hardware logic of TestGen is illustrated in Figure 2-3.





Figure 2-3 TestGen hardware

In Figure 2-3, two key aspects of the system are depicted. The first part illustrates the control of data flow, while the second part details the generation of test data for use with the FIFO interface.

In the upper portion of Figure 2-3, we focus on the control of data flow. Two signals, the WrFfAFull and RdFfEmpty, are integral to the FIFO interface for flow control. When the FIFO reaches its capacity (indicated by WrFfAFull=1b), the WrFfWrEn signal is set to 0b, effectively pausing data transfer into the FIFO. In a read operation, when data is available within the FIFO (indicated by RdFfEmpty=0b), the system retrieves this data for comparison by setting the RdFfRdEn to 1b. Furthermore, it is important to note that both write and read operation are completed when the total transferred data matches the user-defined value. Consequently, the counter logic is designed to track the amount of data transferred during this command, and upon command completion, both WrFfWrEn and RdFfRdEn must be de-asserted.

The lower section of Figure 2-3 outlines the methods for generating test data, either for writing to the FIFO or for data verification. There are five available test patterns: all-zero, all-one, 32-bit incremental data, 32-bit decremental data, and LFSR. These patterns are selected by the Pattern Selector.

For the all-zero or all-one pattern, every bit of the data is set to zero or one, respectively. Conversely, the other test patterns are designed by separating the data into two parts to create unique test data within every 512-byte data, as shown in Figure 2-4.





Figure 2-4 Test pattern format in each 512-byte data for Increment/Decrement/LFSR pattern

Each 512-byte data block consists of a 64-bit header in Dword#0 and Dword#1, followed by the test data in the remaining words of the 512-byte data (Dword#2 – Dword#127). The header is created using the Address counter block, which operates in 512-byte units. The initial value of the Address counter is configured by the user and increases after transferring each 512-byte data.

The content of the remaining Dwords (DW#2 – DW#127) depend on the pattern selector, which could be 32-bit incremental data, 32-bit decremental data, or the LFSR pattern. The 32-bit incremental data is designed using the Data counter, while the decremental data can be created by connecting NOT logic to the incremental data. The LFSR pattern is generated using the LFSR counter, using the equation  $x^31 + x^21 + x + 1$ . To generate a 256-bit test data using the LFSR pattern, the data is divided into two sets of 128-bit data. Each set uses a different start value, as shown in Figure 2-5.



Figure 2-5 256-bit LFSR Pattern in TestGen

Using a look-ahead technique, the test data generation process produces four 32-bit LFSR data or 128-bit data during each clock cycle. These 128-bit data sets are represented by the same color in Figure 2-5. The initial value of each data set is derived by combining certain bits of LBAAddr and LBAAddrB (a NOT logic operation of LBAAddr). This approach ensures that each 128-bit data set uses a unique start value.



The generated test data is then written to the FIFO as write data or used as expected data for verification with the read data from the FIFO. 'PattFail' flag is set to 1b when the data verification process fails. The timing diagram for writing data to the FIFO is shown below.



Figure 2-6 Timing diagram of Write operation in TestGen

- The write operation is initiated by setting WrPattStart signal to 1b for a single clock cycle, which is followed by the assertion of rWrTrans to enable the control logic for generating write enable to FIFO.
- 2) If two conditions are satisfied (rWrTrans is asserted to 1b during the write operation and the FIFO is not full, indicated by WrFfAFull=0b), the write enable (rWrFfWrEn) to FIFO is asserted to 1b.
- 3) The write enable is fed back to the counter to count the total amount of data in the write operation.
- 4) If FIFO is almost full (WrFfAFull=1b), the write process is paused by de-asserting rWrFfWrEn to 0b.
- 5) The write operation is finished when the total data count (rDataCnt) is equal to the set value (rEndSize). At this point, both rWrTrans and rWrFfWrEn are de-asserted to 0b.

For read transfer, the read enable of FIFO is controlled by the empty flag of FIFO. Unlike the write enable, the read enable signal is not stopped by total data count and not started by start flag. When the read enable is asserted to 1b, the data counter and the address counter are increased for counting the total amount of data and generating the header of expect value, respectively.



#### 2.2 NVMe



Figure 2-7 NVMe hardware

In the reference design, the NVMe-IP's user interface consists of a control interface and a data interface. The control interface receives commands and parameters from either the Custom command interface or dgIF typeS, depending on the type of command. For instance, Custom command interface is used when operating SMART, Secure Erase, or Flush command.

On the other hand, the data interface of NVMe-IP has four different interfaces with a data bus width of 256-bit. These interfaces include Custom command RAM interface, Identify interface, FIFO input interface (dgIF typeS), and FIFO output interface (dgIF typeS). While the Custom command RAM interface is a bi-directional interface, the other interfaces are unidirectional interface. In the reference design, the Custom command RAM interface is used for one-directional data transfer when NVMe-IP sends SMART data to LAxi2Reg.

#### 2.2.1 NVMe-IP

The NVMe-IP implements NVMe protocol of the host side to direct access an NVMe Gen4 SSD without PCle switch connection. It supports seven commands, i.e., Write, Read, Identify, Shutdown, SMART, Secure Erase, and Flush. More details of NVMe-IP are described in datasheet.

https://dgway.com/products/IP/NVMe-IP/dg\_nvme\_ip\_data\_sheet\_g4\_en/



# 2.2.2 Integrated Block for PCIe

This block is the hard IP integrated in certain Xilinx FPGAs to support PCIe Gen4 speed. It implements Physical, Data Link, and Transaction Layers of PCIe specification. More details are described in Xilinx document.

PG213: UltraScale+ Devices Integrated Block for PCI Express <a href="https://www.xilinx.com/products/intellectual-property/pcie4-ultrascale-plus.html#documentation">https://www.xilinx.com/products/intellectual-property/pcie4-ultrascale-plus.html#documentation</a>

PG343: Versal ACAP Integrated Block for PCI Express <a href="https://www.xilinx.com/products/intellectual-property/pcie-versal.html#documentation">https://www.xilinx.com/products/intellectual-property/pcie-versal.html#documentation</a>

The PCIe hard IP is created by using IP wizard. It is recommended for user to select "PCIe Block Location" which is closed to the transceiver pin that connects to the SSD. Please see more details about the location of PCIe hard IP and transceiver from following document.

UG575: UltraScale and UltraScale+ FPGAs Packaging and Pinouts <a href="https://www.xilinx.com/support/documentation/user\_guides/ug575-ultrascale-pkg-pinout.pd">https://www.xilinx.com/support/documentation/user\_guides/ug575-ultrascale-pkg-pinout.pd</a> f

AM013: Versal ACAP Packaging and Pinouts <a href="https://www.xilinx.com/support/documentation/architecture-manuals/am013-versal-pkg-pin">https://www.xilinx.com/support/documentation/architecture-manuals/am013-versal-pkg-pin</a> out.pdf

The example of PCIe hard IP location on XCVC1902-VSVA2197 is shown in Figure 2-8.



Figure 22: VC1902 Banks in VSVA2197 Package

| GTY Quad 106<br>X0Y6<br>CD [L]        | HDIO<br>Bank 306<br>AA | HDIO<br>Bank 406<br>AB | GTY Quad 206<br>X1Y6<br>BG [R]        |
|---------------------------------------|------------------------|------------------------|---------------------------------------|
| GTY Quad 105<br>X0Y5<br>CC [L] (RCAL) | PCIE<br>X0Y2           | PCIE<br>X1Y2           | GTY Quad 205<br>X1Y5<br>BF [R]        |
| GTY Quad 104<br>X0Y4<br>CB [L] (RCAL) | PCIE<br>X0Y1           | MRMAC<br>X0Y3          | GTY Quad 204<br>X1Y4<br>BE [R]        |
| GTY Quad 103<br>X0Y3<br>CA [L]        | СРМ4                   | MRMAC<br>X0Y2          | GTY Quad 203<br>X1Y3<br>BD [R] (RCAL) |
| CPM4                                  | CPM4                   | PCIE<br>X1Y0           | GTY Quad 202<br>X1Y2<br>BC [R] (RCAL) |
| LPDMIO<br>Bank 502                    | PMCDIO<br>Bank 503     | MRMAC<br>X0Y1          | GTY Quad 201<br>X1Y1<br>BB [R]        |
| PMCMIO/PMCDIO<br>Bank 500             | PMCMIO<br>Bank 501     | MRMAC<br>X0Y0          | GTY Quad 200<br>X1Y0<br>BA [R]        |

Figure 2-8 PCIe Hard IP Pin location



#### 2.2.3 Dual port RAM

Two dual port RAMs, CtmRAM and IdenRAM, store the returned data from Identify command and SMART command, respectively. IdenRAM has an 8 Kbyte size and is used to store the 8 KB output from the Identify command.

The data bus size for NVMe-IP and LAxi2Reg differ, with NVMe-IP having a 256-bit size and LAxi2Reg having a 32-bit size. As a result, IdenRAM is configured as an asymmetric RAM with different bus sizes for its Write and Read interfaces.

NVMe-IP also has a double-word enable, which allows it to write only 32-bit data in certain cases. The RAM setting on Xilinx IP tool supports write byte enable, so a small logic circuit was designed to convert the double word enable to be write byte enable, as shown in Figure 2-9.



Figure 2-9 Byte write enable conversion logic

The input to the AND logic is bit[0] of WrDWEn and the WrEn signal. The output of the AND logic is fed to bits[3:0] of IdenRAM byte write enable. Bit[1], [2], ..., [7] of WrDWEn are then applied to bits[7:4], [11:8], ..., [31:28] of IdenRAM write byte enable, respectively.

On the other hand, CtmRAM is implemented as a true dual-port RAM with two read ports and two write ports, and with byte write enable. A small logic circuit must be used to convert the double word enable of Custom interface to byte write enable, similar to IdenRAM. The true dual-port RAM is used to support additional features when a customized Custom command requires data input. A simple dual-port RAM is sufficient to support the SMART command, even though the data size returned from the SMART command is 512 bytes. However, CtmRAM is implemented with an 8KB RAM for the customized Custom command.



# 2.3 CPU and Peripherals

The CPU system uses a 32-bit AXI4-Lite bus as the interface to access peripherals such as the Timer and UART. The system also integrates an additional peripheral to access NVMe-IP test logic by assigning a unique base address and address range. To support CPU read and write operations, the hardware logic must comply with the AXI4-Lite bus standard. LAxi2Reg module, as shown in Figure 2-10, is designed to connect the CPU system via the AXI4-Lite interface, in compliance with the standard.



Figure 2-10 CPU and peripherals hardware

LAxi2Reg consists of AsyncAxiReg and UserReg. AsyncAxiReg converts AXI4-Lite signals into a simple Register interface with a 32-bit data bus size, similar to the AXI4-Lite data bus size. It also includes asynchronous logic to handle clock domain crossing between the CpuClk and UserClk domains.

UserReg includes the register file of parameters and the status signals of other modules in the test system, including the CtmRAM, IdenRAM, NVMe-IP, and TestGen. More details of AsyncAxiReg and UserReg are explained below.



# 2.3.1 AsyncAxiReg



Figure 2-11 AsyncAxiReg Interface

The signal on AXI4-Lite bus interface can be grouped into five categories, i.e., LAxiAw\* (Write address channel), LAxiw\* (Write data channel), LAxiB\* (Write response channel), LAxiAr\* (Read address channel), and LAxir\* (Read data channel). More details to build Custom logic for AXI4-Lite bus can be found in the following document.

https://github.com/Architech-Silica/Designing-a-Custom-AXI-Slave-Peripheral/blob/master/designing\_a\_custom\_axi\_slave\_rev1.pdf

According to AXI4-Lite standard, the write channel and read channel operate independently for both control and data interfaces. Therefore, the logic within AsyncAxiReg to interface with AXI4-Lite bus is divided into four groups, i.e., Write control logic, Write data logic, Read control logic, and Read data logic, as shown in the left side of Figure 2-11. The Write control I/F and Write data I/F of the AXI4-Lite bus are latched and transferred to become the Write register interface with clock domain crossing registers. Similarly, the Read control I/F of AXI4-Lite bus is latched and transferred to the Read register interface, while Read data is returned from Register interface to AXI4-Lite bus via clock domain crossing registers. In the Register interface, RegAddr is a shared signal for write and read access, so it loads the value from LAxiAw for write access or LAxiAr for read access.

The Register interface is compatible with single-port RAM interface for write transaction. The read transaction of the Register interface has been slightly modified from RAM interface by adding the RdReq and RdValid signals to control read latency time. The address of Register interface is shared for both write and read transactions, so user cannot write and read the register simultaneosly. The timing diagram of the Register interface is shown in Figure 2-12.





Figure 2-12 Register interface timing diagram

- 1) Timing diagram to write register is similar to that of a single-port RAM. The RegWrEn signal is set to 1b, along with a valid RegAddr (Register address in 32-bit units), RegWrData (write data for the register), and RegWrByteEn (write byte enable). The byte enable consists of four bits that indicate the validity of the byte data. For example, bit[0], [1], [2], and [3] are set to 1b when RegWrData[7:0], [15:8], [23:16], and [31:24] are valid, respectively.
- 2) To read register, AsyncAxiReg sets the RegRdReq signal to 1b with a valid value for RegAddr. The 32-bit data is returned after the read request is received. The slave detects the RegRdReq signal being set to start the read transaction. In the read operation, the address value (RegAddr) remains unchanged until RegRdValid is set to 1b. The address can then be used to select the returned data using multiple layers of multiplexers.
- 3) The slave returns the read data on RegRdData bus by setting the RegRdValid signal to 1b. After that, AsyncAxiReg forwards the read value to the LAxir\* interface.



# 2.3.2 UserReg



Figure 2-13 UserReg Interface

The UserReg module consists of an Address decoder, a Register File, and a Register Mux. The Address decoder decodes the address requested by AsyncAxiReg and selects the active register for either write or read transactions. The assigned address range in UserReg is divided into six areas, as shown in Figure 2-13.

- 1) 0x0000 0x00FF: Mapped to set the command with the parameters of NVMe-IP and TestGen. This area is write-access only.
- 0x0200 0x02FF: Mapped to set the parameters for the Custom command interface of NVMe-IP. This area is write-access only.
- 3) 0x0100 0x01FF: Mapped to read the status signals of NVMe-IP and TestGen. This area is read-access only.
- 4) 0x0300 0x03FF: Mapped to read the status of Custom command interface (NVMe-IP). This area is read-access only.
- 5) 0x2000 0x3FFF: Mapped to read data from IdenRAM. This area is read-access only.
- 6) 0x4000 0x5FFF: Mapped to write or read data using Custom command RAM interface. This area allows both write-access and read access. However, the demo shows only read-access by running the SMART command.

The Address decoder decodes the upper bits of RegAddr to select the active hardware (NVMe-IP, TestGen, IdenRAM, or CtmRAM). The Register File within UserReg has a 32-bit bus size, so the write byte enable (RegWrByteEn) is not used in the test system, and the CPU uses a 32-bit pointer to set the hardware register.

To read the register, multi-level multiplexers (mux) select the data to return to the CPU by using the address. The lower bits of RegAddr are fed to the submodule to select the active data from each submodule. While the upper bits are used in UserReg to select the returned data from each submodule. Consequently, the latency time of read data equals to two clock cycles, and RegRdValid is created by RegRdReq, with two D Flip-flops asserted. More details of the address mapping within the UserReg module are shown in Table 2-1.



# Table 2-1 Register Map

| Address      | Register Name                                                               | Description                                                                                           |  |  |  |
|--------------|-----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|--|--|--|
| Rd/Wr        | (Label in the "nvmeiptest.c")                                               |                                                                                                       |  |  |  |
|              | 0x0000 - 0x00FF: Control signals of NVMe-IP and TestGen (Write access only) |                                                                                                       |  |  |  |
| BA+0x0000    | User Address (Low) Reg                                                      | [31:0]: Input to be bits[31:0] of start address as 512-byte unit                                      |  |  |  |
|              | (USRADRL_INTREG)                                                            | (UserAddr[31:0] of dgIF typeS)                                                                        |  |  |  |
| BA+0x0004    | User Address (High) Reg                                                     | [15:0]: Input to be bits[47:32] of start address as 512-byte unit                                     |  |  |  |
|              | (USRADRH_INTREG)                                                            | (UserAddr[47:32] of dgIF typeS)                                                                       |  |  |  |
| BA+0x0008    | User Length (Low) Reg                                                       | [31:0]: Input to be bits[31:0] of transfer length as 512-byte unit                                    |  |  |  |
|              | (USRLENL_INTREG)                                                            | (UserLen[31:0] of dgIF typeS)                                                                         |  |  |  |
| BA+0x000C    | User Length (High) Reg                                                      | [15:0]: Input to be bits[47:32] of transfer length as 512-byte unit                                   |  |  |  |
|              | (USRLENH_INTREG)                                                            | (UserLen[47:32] of dgIF typeS)                                                                        |  |  |  |
| BA+0x0010    | User Command Reg                                                            | [2:0]: Input to be user command (UserCmd of dgIF typeS for NVMe-IP)                                   |  |  |  |
|              | (USRCMD_INTREG)                                                             | 000b: Identify, 001b: Shutdown, 010b: Write SSD, 011b: Read SSD,                                      |  |  |  |
|              |                                                                             | 100b: SMART/Secure Erase, 110b: Flush, 101b/111b: Reserved                                            |  |  |  |
|              |                                                                             | When this register is written, the command request is sent to NVMe-IP to start the                    |  |  |  |
| DA - 00044   | T4 D-44 D                                                                   | operation.                                                                                            |  |  |  |
| BA+0x0014    | Test Pattern Reg                                                            | [2:0]: Select test pattern                                                                            |  |  |  |
| DA : 0::0000 | (PATTSEL_INTREG)                                                            | 000b-Increment, 001b-Decrement, 010b-All 0, 011b-All 1, 100b-LFSR                                     |  |  |  |
| BA+0x0020    | NVMe Timeout Reg (NVMTIMEOUT_INTREG)                                        | [31:0]: Mapped to TimeOutSet[31:0] of NVMe-IP                                                         |  |  |  |
|              | ,                                                                           | tus signals of NVMo.IP and TostGon (Poad access only)                                                 |  |  |  |
| DA - 00400   |                                                                             | tus signals of NVMe-IP and TestGen (Read access only)                                                 |  |  |  |
| BA+0x0100    | User Status Reg                                                             | [0]: UserBusy of dgIF typeS (0b: Idle, 1b: Busy) [1]: UserError of dgIF typeS (0b: Normal, 1b: Error) |  |  |  |
|              | (USRSTS_INTREG)                                                             | [2]: Data verification fail (0b: Normal, 1b: Error)                                                   |  |  |  |
| BA+0x0104    | Total disk size (Low) Reg                                                   | [31:0]: Mapped to LBASize[31:0] of NVMe-IP                                                            |  |  |  |
| BATOXO104    | (LBASIZEL_INTREG)                                                           | [01.0]. Mapped to EBAGIZE[01.0] of MVMIE-II                                                           |  |  |  |
| BA+0x0108    | Total disk size (High) Reg                                                  | [15:0]: Mapped to LBASize[47:32] of NVMe-IP                                                           |  |  |  |
| Driverse 100 | (LBASIZEH_INTREG)                                                           | [31]: Mapped to LBAMode of NVMe-IP                                                                    |  |  |  |
| BA+0x010C    | User Error Type Reg                                                         | [31:0]: Mapped to UserErrorType[31:0] of NVMe-IP to show error status                                 |  |  |  |
|              | (USRERRTYPE_INTREG)                                                         | [- · · · · · · · · · · · · · · · · · · ·                                                              |  |  |  |
| BA+0x0110    | PCIe Status Reg                                                             | [0]: PCIe linkup status from PCIe hard IP (0b: No linkup, 1b: linkup)                                 |  |  |  |
|              | (PCIESTS_INTREG)                                                            | [3:2]: Two lower bits to show PCIe link speed of PCIe hard IP. MSB is bit[16].                        |  |  |  |
|              |                                                                             | (000b: Not linkup, 001b: PCle Gen1, 010b: PCle Gen2,                                                  |  |  |  |
|              |                                                                             | 011b: PCle Gen3, 111b: PCle Gen4)                                                                     |  |  |  |
|              |                                                                             | [7:4]: PCIe link width status from PCIe hard IP                                                       |  |  |  |
|              |                                                                             | (0001b: 1-lane, 0010b: 2-lane, 0100b: 4-lane, 1000b: 8-lane)                                          |  |  |  |
|              |                                                                             | [13:8]: Current LTSSM State of PCle hard IP. Please see more details of LTSSM                         |  |  |  |
|              |                                                                             | value in Integrated Block for PCIe datasheet.                                                         |  |  |  |
|              |                                                                             | [16]: The upper bit to show PCIe link speed of PCIe hard IP.                                          |  |  |  |
| BA+0x0114    | Completion Status Reg                                                       | Two lower bits are bit[3:2]. [15:0]: Mapped to AdmCompStatus[15:0] of NVMe-IP                         |  |  |  |
| BATUXU114    | (COMPSTS_INTREG)                                                            | [31:16]: Mapped to IOCompStatus[15:0] of NVMe-IP                                                      |  |  |  |
| BA+0x0118    | NVMe CAP Reg                                                                | [31:0]: Mapped to NVMeCAPReg[31:0] of NVMe-IP                                                         |  |  |  |
| DATUMITO     | (NVMCAP_INTREG)                                                             | [01.0]. Mapped to Inviniconi Inaglotio] of Invinical                                                  |  |  |  |
| BA+0x011C    | NVMe IP Test pin Reg                                                        | [31:0]: Mapped to TestPin[31:0] of NVMe-IP                                                            |  |  |  |
| BATOMOTIC    | (NVMTESTPIN_INTREG)                                                         | [OT.O]. Mapped to Testi III[OT.O] of MANNIG-11                                                        |  |  |  |
|              | (14VIVITEOTI III_IIIITICE)                                                  |                                                                                                       |  |  |  |



| Address     | Register Name                                                             | Description                                                                   |  |  |  |
|-------------|---------------------------------------------------------------------------|-------------------------------------------------------------------------------|--|--|--|
| Rd/Wr       | (Label in the "nvmeiptest.c")                                             |                                                                               |  |  |  |
|             | 0x0100 – 0x01FF: Status signals of NVMe-IP and TestGen (Read access only) |                                                                               |  |  |  |
| BA+0x0130 - | Expected value Word0-7 Reg                                                | 256-bit of the expected data at the 1st failure when executing Read           |  |  |  |
| BA+0x014F   | (EXPPATW0-W7_INTREG)                                                      | command.                                                                      |  |  |  |
|             |                                                                           | 0x0130: Bit[31:0], 0x0134[31:0]: Bit[63:32],, 0x014C[31:0]: Bit[255:224]      |  |  |  |
| BA+0x0150 - | Read value Word0-7 Reg                                                    | 256-bit of the read data at the 1st failure when executing Read command.      |  |  |  |
| BA+0x016F   | (RDPATW0-W7_INTREG)                                                       | 0x0150: Bit[31:0], 0x0154[31:0]: Bit[63:32],, 0x016C[31:0]: Bit[255:224]      |  |  |  |
| BA+0x0170   | Data Failure Address(Low) Reg                                             | [31:0]: Bit[31:0] of the byte address of the 1st failure when executing Read  |  |  |  |
|             | (RDFAILNOL_INTREG)                                                        | command.                                                                      |  |  |  |
| BA+0x0174   | Data Failure Address(High) Reg                                            | [24:0]: Bit[56:32] of the byte address of the 1st failure when executing Read |  |  |  |
|             | (RDFAILNOH_INTREG)                                                        | command.                                                                      |  |  |  |
| BA+0x0178   | Current test byte (Low) Reg                                               | [31:0]: Bit[31:0] of the current test data size in TestGen module             |  |  |  |
|             | (CURTESTSIZEL_INTREG)                                                     |                                                                               |  |  |  |
| BA+0x017C   | Current test byte (High) Reg                                              | [24:0]: Bit[56:32] of the current test data size of TestGen module            |  |  |  |
|             | (CURTESTSIZEH_INTREG)                                                     |                                                                               |  |  |  |
|             | Other interfaces (Custom of                                               | command of NVMe-IP, IdenRAM, and Custom RAM)                                  |  |  |  |
| BA+0x0200-  | Custom Submission Queue Reg                                               | [31:0]: Submission queue entry of SMART, Secure Erase, and Flush              |  |  |  |
| BA+0x023F   |                                                                           | command                                                                       |  |  |  |
| Wr          | (CTMSUBMQ_STRUCT)                                                         | Input to be CtmSubmDW0-DW15 of NVMe-IP.                                       |  |  |  |
|             |                                                                           | 0x200: DW0, 0x204: DW1,, 0x23C: DW15                                          |  |  |  |
| BA+0x0300-  | Custom Completion Queue Reg                                               | [31:0]: CtmCompDW0-DW3 output from NVMe-IP                                    |  |  |  |
| BA+0x030F   |                                                                           | 0x300: DW0, 0x304: DW1,, 0x30C: DW3                                           |  |  |  |
| Rd          | (CTMCOMPQ_STRUCT)                                                         |                                                                               |  |  |  |
| BA+0x0800   | IP Version Reg                                                            | [31:0]: Mapped to IPVersion[31:0] of NVMe-IP                                  |  |  |  |
| Rd          | (IPVERSION_INTREG)                                                        |                                                                               |  |  |  |
| BA+0x2000-  | Identify Controller Data                                                  | 4KB Identify Controller Data structure                                        |  |  |  |
| BA+0x2FFF   |                                                                           |                                                                               |  |  |  |
| Rd          | (IDENCTRL_CHARREG)                                                        |                                                                               |  |  |  |
| BA+0x3000-  | Identify Namespace Data                                                   | 4KB Identify Namespace Data structure                                         |  |  |  |
| BA+0x3FFF   |                                                                           |                                                                               |  |  |  |
| Rd          | (IDENNAME_CHARREG)                                                        |                                                                               |  |  |  |
| BA+0x4000-  | Custom command Ram                                                        | Connect to 8KB CtmRAM interface for storing 512-byte data output from         |  |  |  |
| BA+0x5FFF   |                                                                           | SMART Command.                                                                |  |  |  |
| Wr/Rd       | (CTMRAM_CHARREG)                                                          |                                                                               |  |  |  |



# 3 CPU Firmware

### 3.1 Test firmware (nvmeiptest.c)

The CPU follows these steps upon system startup to complete the initialization process.

- 1) Initialize UART and Timer settings.
- 2) Wait for the PCIe connection to become active (PCIESTS INTREG[0]=1b).
- 3) Wait for NVMe-IP to complete its own initialization process (USRSTS\_INTREG[0]=0b). If errors are encountered, the process will stop and display an error message.
- 4) Display the status of the PCIe link, including the number of lanes and the speed, by reading PCIESTS\_INTREG[16:2] status.
- 5) Display the main menu with options to run seven commands for NVMe-IP, i.e., Identify, Write, Read, SMART, Flush, Secure Erase, and Shutdown.

More details on the sequence for each command in the CPU firmware are described in the following sections.

# 3.1.1 Identify Command

The sequence for the firmware when the Identify command is selected by user is as follows.

- 1) Set bits[2:0] of USRCMD\_INTREG to 000b to send the Identify command request to NVMe-IP. The busy flag (USRSTS\_INTREG[0]) will then change from 0b to 1b.
- 2) The CPU waits until the operation is completed or an error is detected by monitoring USRSTS\_INTREG[1:0].
  - Bit[0] is de-asserted to 0b when the command is completed. The data of Identify command returned by NVMe-IP will be stored in IdenRAM.
  - Bit[1] is asserted to 1b, indicating an error. In this case, the error message will be displayed on the console with details decoded from USRERRTYPE\_INTREG[31:0]. The process will then stop.
- 3) Once the busy flag (USRSTS\_INTREG[0]) is de-asserted to 0b, the CPU proceeds to display information that has been decoded from LBASIZEL/H\_INTREG, which includes the SSD capacity and LBA unit size. Besides, further information, such as the SSD model, can be retrieved from the IdenRAM (IDENCTRL\_CHARREG).



#### 3.1.2 Write/Read Command

The sequence for the firmware when the Write/Read command is selected is as follows.

- 1) Receive start address, transfer length, and test pattern from Serial console. If any inputs are invalid, the operation will be cancelled.
  - Note: If LBA unit size = 4 KB, the start address and transfer length must align to 8.
- 2) After obtaining all the inputs, set them to USRADRL/H\_INTREG, USRLENL/H\_INTREG, and PATTSEL\_INTREG.
- 3) To execute either the Write or Read command, set bits[2:0] of USRCMD\_INTREG to 010b or 011b, respectively. This sends the command request to the NVMe-IP. Once the command is issued, the busy flag of NVMe-IP (USRSTS\_INTREG[0]) will change from 0b to 1b.
- 4) The CPU waits until the operation is completed or an error (excluding verification error) is detected by monitoring USRSTS\_INTREG[2:0].
  - Bit[0] is de-asserted to 0b when the command is completed.
  - Bit[1] is asserted to 1b, indicating an error. In this case, the error message will be displayed on the console with details decoded from USRERRTYPE\_INTREG[31:0]. The process will then stop.
  - Bit[2] is asserted when data verification fails. In this case, the verification error message will then be displayed on the console, but the CPU will continue to run until the operation is completed or the user inputs any key to cancel the operation.

While the command is running, the current transfer size, read from CURTESTSIZE\_INTREG, will be displayed every second.

5) Once the busy flag (USRSTS\_INTREG[0]) is de-asserted to 0b, CPU will calculate and display the test result on the console including the total time usage, total transfer size, and transfer speed.

#### 3.1.3 SMART Command

The sequence of the firmware when the SMART command is selected is as follows.

- 1) The 16-Dword of the Submission Queue entry (CTMSUBMQ\_STRUCT) is set to the SMART command value.
- 2) Set bits[2:0] of USRCMD\_INTREG[2:0] to 100b to send the SMART command request to NVMe-IP. The busy flag (USRSTS\_INTREG[0]) will then change from 0b to 1b.
- 3) The CPU waits until the operation is completed or an error is detected by monitoring USRSTS INTREG[1:0].
  - Bit[0] is de-asserted to 0b after the operation is finished. The data of SMART command returned by NVMe-IP will be stored in CtmRAM.
  - Bit[1] is asserted to 1b, indicating an error. In this case, the error message will be displayed on the console with details decoded from USRERRTYPE\_INTREG[31:0]. The process will then stop.
- 4) After the busy flag (USRSTS\_INTREG[0]) is de-asserted to 0b, the CPU will display information decoded from CtmRAM (CTMRAM\_CHARREG), including Remaining Life, Percentage Used, Temperature, Total Data Read, Total Data Written, Power On Cycles, Power On Hours, and Number of Unsafe Shutdown.

For more information on the SMART log, refer to the NVM Express Specification. <a href="https://nvmexpress.org/specifications/">https://nvmexpress.org/specifications/</a>



#### 3.1.4 Flush Command

The sequence of the firmware when the Flush command is selected is as follows.

- 1) The 16-Dword of the Submission Queue entry (CTMSUBMQ\_STRUCT) is set to the Flush command value.
- 2) Set bits[2:0] of USRCMD\_INTREG to 110b to send Flush command request to NVMe-IP. The busy flag of NVMe-IP (USRSTS\_INTREG[0]) will then change from 0b to 1b.
- 3) The CPU waits until the operation is completed or an error is detected by monitoring USRSTS\_INTREG[1:0].
  - Bit[0] is de-asserted to 0b after the operation is finished. The CPU will then return to the main menu.
  - Bit[1] is asserted to 1b, indicating an error. In this case, the error message will be displayed on the console with details decoded from USRERRTYPE\_INTREG[31:0]. The process will then stop.

#### 3.1.5 Secure Erase Command

The sequence of the firmware when the Secure Erase command is selected is as follows.

- 1) The 16-Dword of the Submission Queue entry (CTMSUBMQ\_STRUCT) is set to the Secure Erase command value.
- 2) Set NVMTIMEOUT\_INTREG to 0 to disable timer to prevent the timeout error.
- Set bits[2:0] of USRCMD\_INTREG[2:0] to 100b to send Secure Erase command request to NVMe-IP. The busy flag of NVMe-IP (USRSTS\_INTREG[0]) will then change from 0b to 1b.
- 4) The CPU waits until the operation is completed or an error is detected by monitoring USRSTS\_INTREG[1:0].
  - Bit[0] is de-asserted to 0b after the operation is finished. The CPU will then proceed to the next step.
  - Bit[1] is asserted to 1b, indicating an error. In this case, the error message will be displayed on the console with details decoded from USRERRTYPE\_INTREG[31:0]. The process will then stop.
- 5) After completing the command, the timer is re-enabled to generate timeout error in NVMe-IP by setting NVMTIMEOUT\_INTREG to the default value.

#### 3.1.6 Shutdown Command

The sequence of the firmware when the Shutdown command is selected is as follows.

- 1) Set bits[2:0] of USRCMD\_INTREG to 001b to send the Shutdown command request to NVMe-IP. The busy flag of NVMe-IP (USRSTS\_INTREG[0]) will then change from 0b to 1b.
- 2) The CPU waits until the operation is completed or an error is detected by monitoring USRSTS\_INTREG[1:0].
  - Bit[0] is de-asserted to 0b after the operation is finished. The CPU will then proceed to the next step.
  - Bit[1] is asserted to 1b, indicating an error. The error message will be displayed on the console with details decoded from USRERRTYPE\_INTREG[31:0]. The process will then stop.
- 3) After Shutdown command completes, both the SSD and NVMe-IP will become inactive and the CPU will be unable to receive any new commands from the user. To continue testing, the user must power off and power on the system.



# 3.2 Function list in Test firmware

| int exec_ctm(u | int exec_ctm(unsigned int user_cmd)                                                                                                       |  |  |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Parameters     | user_cmd: 4-SMART command, 6-Flush command                                                                                                |  |  |
| Return value   | Return value 0: No error, -1: Some errors are found in the NVMe-IP                                                                        |  |  |
| Description    | Execute SMART command as outlined in section 3.1.3 (SMART Command) or execute Flush command as outlined in section 3.1.4 (Flush Command). |  |  |

| unsigned long long get_cursize(void)                       |                                                                  |  |  |
|------------------------------------------------------------|------------------------------------------------------------------|--|--|
| Parameters                                                 | Parameters None                                                  |  |  |
| Return value                                               | Read value of CURTESTSIZEL/H_INTREG                              |  |  |
| Description                                                | The value of CURTESTSIZEL/H_INTREG is read and converted to byte |  |  |
| units before being returned as the result of the function. |                                                                  |  |  |

| int get_param(userin_struct* userin)                                           |                                                                          |  |
|--------------------------------------------------------------------------------|--------------------------------------------------------------------------|--|
| Parameters userin: Three inputs from user, i.e., start address, total length i |                                                                          |  |
|                                                                                | unit, and test pattern                                                   |  |
| Return value                                                                   | 0: Valid input, -1: Invalid input                                        |  |
| Description                                                                    | Receive the input parameters from the user and verify the value. When    |  |
|                                                                                | the input is invalid, the function returns -1. Otherwise, all inputs are |  |
|                                                                                | updated to userin parameter.                                             |  |

| void iden_dev(void) |                                                                 |  |  |
|---------------------|-----------------------------------------------------------------|--|--|
| Parameters None     |                                                                 |  |  |
| Return value        | None                                                            |  |  |
| Description         | Execute Identify command as outlined in section 3.1.1 (Identify |  |  |
|                     | Command).                                                       |  |  |

| int setctm_era | int setctm_erase(void)                                |  |  |
|----------------|-------------------------------------------------------|--|--|
| Parameters     | ters None                                             |  |  |
| Return value   | 0: No error, -1: Some errors are found in the NVMe-IP |  |  |
| Description    | Set Secure Erase command to CTMSUBMQ_STRUCT and call  |  |  |
|                | exec_ctm function to execute Secure Erase command.    |  |  |

| int setctm_flush(void)                                                                                 |  |  |
|--------------------------------------------------------------------------------------------------------|--|--|
| Parameters None                                                                                        |  |  |
| Return value 0: No error, -1: Some errors are found in the NVMe-IP                                     |  |  |
| Description Set Flush command to CTMSUBMQ_STRUCT and call execution function to operate Flush command. |  |  |

| int setctm_sm                                                                                                                                       | int setctm_smart(void) |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|--|--|
| Parameters None                                                                                                                                     |                        |  |  |
| Return value 0: No error, -1: Some errors are found in the NVMe-IP                                                                                  |                        |  |  |
| Description Set SMART command to CTMSUBMQ_STRUCT and call exfunction to operate SMART command. Finally, decode and SMART information on the console |                        |  |  |



| void show_error(void)                                                                                                                                                            |      |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|--|
| Parameters                                                                                                                                                                       | None |  |
| Return value                                                                                                                                                                     | None |  |
| Description Read USRERRTYPE_INTREG, decode the error flag, and display the corresponding error message. Also, call show_pciestat function to chethe the hardware's debug signal. |      |  |

| void show_pciestat(void) |                                                                                                                                                                                                         |  |
|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Parameters               | None                                                                                                                                                                                                    |  |
| Return value             | None                                                                                                                                                                                                    |  |
| Description              | Read PCIESTS_INTREG until the read value from two read times is stable. After that, display the read value on the console. Also, debug signal is read from NVMTESTPIN_INTREG to display on the console. |  |

| void show_result(void) |                                                                                                                                                                                                                                                                                     |  |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Parameters             | None                                                                                                                                                                                                                                                                                |  |
| Return value           | None                                                                                                                                                                                                                                                                                |  |
| Description            | Print total transfer size by calling get_cursize and show_size function After that, calculate total time usage from global parameters (timer_and timer_upper_val) and then display in usec, msec, or sec unit. Final transfer performance is calculated and displayed in MB/s unit. |  |

| void show_size(unsigned long long size_input) |                                                         |  |  |
|-----------------------------------------------|---------------------------------------------------------|--|--|
| Parameters                                    | size_input: transfer size to display on the console     |  |  |
| Return value                                  | None                                                    |  |  |
| Description                                   | Calculate and display the input value in MB or GB unit. |  |  |

| void show_smart_hex16byte(volatile unsigned char *char_ptr) |                                                 |  |  |
|-------------------------------------------------------------|-------------------------------------------------|--|--|
| Parameters                                                  | *char_ptr: pointer of 16-byte SMART data        |  |  |
| Return value                                                | None                                            |  |  |
| Description                                                 | Display 16-byte SMART data as hexadecimal unit. |  |  |

| void show_smart_int8byte(volatile unsigned char *char_ptr) |                                                                              |  |
|------------------------------------------------------------|------------------------------------------------------------------------------|--|
| Parameters                                                 | *char_ptr: pointer of 8-byte SMART data                                      |  |
| Return value                                               | None                                                                         |  |
| Description                                                | When the input value is less than 4 billion (32-bit), the 8-byte SMART       |  |
|                                                            | data is displayed in decimal unit. If the input value exceeds this limit, an |  |
|                                                            | overflow message is displayed.                                               |  |

| void show_smart_size8byte(volatile unsigned char *char_ptr) |                                                                                                                                          |  |
|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|--|
| Parameters                                                  | *char_ptr: pointer of 8-byte SMART data                                                                                                  |  |
| Return value                                                | None                                                                                                                                     |  |
| Description                                                 | Display 8-byte SMART data as GB or TB unit. When the input value is more than limit (500 PB), the overflow message is displayed instead. |  |



| void show_vererr(void) |                                                                                        |  |  |
|------------------------|----------------------------------------------------------------------------------------|--|--|
| Parameters             | None                                                                                   |  |  |
| Return value           | None                                                                                   |  |  |
| Description            | Read RDFAILNOL/H_INTREG (error byte address), EXPPATW0-W7_INTREG (expected value), and |  |  |
|                        | RDPATW0-W7_INTREG (read value) to display verification error details                   |  |  |
|                        | on the console.                                                                        |  |  |

| void shutdown_dev(void) |                                                                          |  |  |
|-------------------------|--------------------------------------------------------------------------|--|--|
| Parameters              | None                                                                     |  |  |
| Return value            | None                                                                     |  |  |
| Description             | Execute Shutdown command as outlined in section 3.1.6 (Shutdown Command) |  |  |

| int wrrd_dev(unsigned int user_cmd) |                                                                                                                                                                                                       |  |
|-------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Parameters                          | user_cmd: 2-Write command, 3-Read command                                                                                                                                                             |  |
| Return value                        | 0: No error, -1: Receive invalid input or some errors are found.                                                                                                                                      |  |
| Description                         | Execute Write command or Read command as outlined in section 3.1.2 (Write/Read Command). In this function, 'show_result' is called to compute and display transfer performance in Write/Read command. |  |



# 4 Example Test Result

The performance results of executing Write and Read commands on the demo system using a TB Samsung 990 Pro and the NVMe-IP for Gen4 with 1 MB buffer are shown in Figure 4-1. The test utilizes an LFSR pattern for data and specifies a transfer size of 64 GB.



Figure 4-1 Test Performance of NVMe-IP for Gen4 demo using 1 TB Samsung 990 Pro SSD

Utilizing the VCK190 board with PCIe Gen4, the measured write performance reached approximately 7,500 MB/sec, while the read performance achieved approximately 7,000 MB/sec.



# **5** Revision History

| Revision | Date      | Description                                                       |
|----------|-----------|-------------------------------------------------------------------|
| 1.02     | 22-Dec-23 | Update function list and performance result, and add Secure Erase |
|          |           | feature.                                                          |
| 1.01     | 16-Aug-22 | Update firmware                                                   |
| 1.00     | 15-Sep-21 | Initial Release                                                   |

Copyright: 2021 Design Gateway Co,Ltd.