This is an old revision of the document!

Cryring BPM Gateware Documentation
- Resources
1 Introduction
- 1.1 Measurement principle
- 1.2 Processing hardware
2 BPM algorithm
3 Peripheral devices
4 Gateware implementation
5 Build flow and simulation
6 Gateware software interface
7 Extended gateware software interface
- 7.1 Extended register map
  - 7.1.1 Additional configuration registers
  - 7.1.2 Additional status registers
- 7.2 Architecture information storage
8 Test software
- 8.1 FPGA Observer
- 8.2 Test scripts
  - 8.2.1 PCIe access test script
9 Helper scripts
10 Continuous integration environment
11 Programming and hardware configuration
12 Hardware properties
13 Test coverage
- 13.1 BPM algorithm
  - 13.1.1 Simulation
  - 13.1.2 Using a function generator as data source
- 13.2 Reliability tests
14 System limitations and peculiarities
- 14.1 CPU unit reboot
- 14.2 PLL unlock on FMC boards
15 References

Cryring BPM Gateware Documentation

This documentation was automatically generated from Latex. Any changes should be applied to the Latex sources in the Git repository https://git.gsi.de/BEA_HDL/Cryring_BPM_Gateware
The Latex sources can be converted to the DokuWiki format using the script doc/scripts/create_dokuwiki.py (see chapter 10.5.3).

The documentation is also available as a PDF: Cryring_BPM_Gateware_Documentation.pdf

Resources

All of the code of this project, the helper scripts and also the source of this documentation are under version control in a Git repository whose upstream is: https://git.gsi.de/BEA_HDL/Cryring_BPM_Gateware. The relevant branch is master.

Additional datasheets and papers are included as a Git submodule of the main Git repository. The upstream of the submodule is: https://git.gsi.de/BEA_HDL/datasheets and the relevant branch is master.

Installation scripts to set up a Gitlab runner for continuous integration including all necessary software to build the gateware can be found in a Git repository whose upstream is: https://git.gsi.de/BEA_HDL/Gitlab_Runner_Setup_Centos_7. The relevant branch is master.

1 Introduction

This document describes the gateware (= FPGA firmware) implementation of the Beam Position Monitor (BPM) for the Cryring accelerator at GSI. The term Trajectory Measurement System (TMS) is also common for this system and is used as a synonym for BPM. The BPM measures the horizontal and vertical beam positions at nine places of the accelerator ring, resulting in 18 location results.

There had been a previous implementation by Piotr Miedzik, but since no documentation could be found besides a conference paper [2], it was decided to reimplement the gateware.

1.1 Measurement principle

At each of the 18 measurement spots two capacitor plates are used to detect the electrostatic induction of the passing by charged particle bunches.

Figure 1.1: Mechanical drawing of a single BPM. The two segments of the slotted tube are the capacitor plates. The box on the top is the amplifier. Image origin: [3]

The 36 voltages of the capacitor plates are amplified and led via coaxial cables to a single evaluation point where the analog to digital conversion and the digital processing takes place.

Figure 1.2: Cryring BPM system overview. This document only describes the implementation of the block labeled 125 MSa/s ADC DAQ System (36 ch.), excluding the software part. Image origin: [3]

The positions of the particle beam are calculated respectively from the voltage difference of two related capacitor plates using the algorithm described in chapter 2.

1.2 Processing hardware

Each of the 36 voltages coming from the amplifiers at the capacitor plates is sampled by a Renesas ISLA216P ADC at a sampling rate of 125 MHz with a resolution of 16 bits. Respectively four of the ADCs are placed on a single FMC board. Respectively two (ore only one for the last one) of the FMC boards are mounted on an AFC carrier board which is equipped with a Xilinx Artix XC7A200T FPGA for data processing [4].

Figure 1.3: An AFC carrier board with two mounted ADC FMC boards. The FPGA is located under the blue heat spreader.

The whole system uses five AFC carrier boards which are mounted in a MicroTCA crate together with a timing receiver and a CPU unit for post processing. Each of the five FPGAs is responsible for the processing of up to eight ADC data streams. The communication between the CPU unit and the FPGAs takes place via PCI Express over the so called backplane of the MicroTCA crate.

Figure 1.4: MicroTCA crate with from left to right: power supply, MCH, CPU unit, timing receiver, 5 AFC boards with 9 mounted FMC ADC boards, second MCH

This document describes the gatewares of the FPGAs on the five AFC carrier boards. The gatewares are identical independent of the number of mounted FMC boards.

2 BPM algorithm

The beam position is calculated from the measurement of the voltages of two corresponding plates:

with

and

where is a proportionality factor influenced by the dimension of the measurement system, some possible voltage offset and the beam position.

2.1 Capacitance correction

The capacitance of the two corresponding capacitor plates can differ from their nominal value so that one of the voltages has to be corrected by multiplying a correction factor:

The default value of in the gateware is 1. It is configurable by the software via register accesses.

2.2 Least squares algorithm

A linear least squares approach is used to reduce measurement errors. The choice of the algorithm is described in [1]. The optimal approach would be an orthogonal least squares algorithm. Since the relative error of the difference signal dominates that of the sum signal , it can be simplified to a vertical least squares algorithm:

Minimizing

via partial differentiation

and

leads to

Equation 2.1: BPM algorithm

2.3 Averaging

For further reducing the data rate and reducing the measurement noise, the result of the least squares algorithm is averaged over an adjustable number of samples . This is implemented via a simple block averaging:

2.4 Control signals

2.4.1 Gate

There is an input signal coming from a timing receiver that gates the calculation of the least squares algorithm. A calculation will start with the low to high transition of the gate signal and will be repeated continuously until a high to low transition is detected, after which the current calculation will still be completed.

2.4.2 RF pulse

The RF pulse signal is intended to synchronize the calculation of the least squares algorithm with the frequency of the particle bunches. A possible previous calculation of the least squares algorithm will be finished and a new calculation will be started whenever a RF pulse is detected.

2.5 Parameters

2.5.1 Least squares algorithm calculation length

This parameter defines the number of ADC samples that will be taken into account by the least squares algorithm if no RF pulses are present. The detection of a RF pulse will override this parameter. The overriding will only work as expected if the calculation length is set to a value that is longer than the period of the RF pulses.

The available range of values for the calculation length is 3 to 65536.

2.5.2 Averaging length

This parameter defines the number of Least squares algorithm results that will be taken into account by the averaging algorithm.

The available range of values for the averaging length is 1 to 1048576.

3 Peripheral devices

There are three different peripheral devices on each of the FMC ADC boards that have to be configured by the gateware. Since they have no persistent storage they have to configured after every power cycle:

Si571 programmable VCXO
AD9510 PLL and clock distribution
ISLA216P ADC

3.1 Si571 programmable VCXO

The Si571 programmable VCXO is connected via I2C using 0x49 as device address. Additionally, there is an OE (output enable) pin, which has to be driven high or left unconnected since it provides an internal pullup. The device supports a maximum I2C bus speed of 400 kbit/s.

The startup frequency before configuration via I2C is 155.52 MHz. The Si571 is located below the heat spreader of the FMC board, which has to be unscrewed to read the labeling:

SiLabs 571
AJC000337 G
D09JW702+

The part properties can be decoded by providing the part number 571AJC000337 on a SiLabs web page [5]:

Product: Si571
Description: Differential/single-ended I2C programmable VCXO; 10-1417 MHz
Frequency A: 155.52 MHz
I2C Address (Hex Format): 49
Format: LVPECL
Supply Voltage: 3.3 V
OE Polarity: OE active high
Temperature Stability: 20 ppm
Tuning Slope: 135 ppm/V
Minimum APR: +/- 130 ppm
Frequency Range. 10 - 280 MHz
Operating Temp Range (C): -40 to +85

A datasheet can be found on the SiLabs website [6].

3.1.1 Programming the frequency

There are three adjustable parameters that define the output frequency:

where

is the fixed internal quartz frequency of 114.285 MHz +/- 2000 ppm.
has to be in the range .
allowed values for are 4, 5, 6, 7, 9, 11
allowed values for are 1 and all even numbers in

The three parameters should be chosen in a way that is minimal to reduce power consumption. If there should still be multiple possibilities for the choice of , one should choose as maximal.

For a desired output frequency of 125 MHz the optimum values are:

= 5
= 8
= 43.750273439

Since the uncorrected frequency has an inaccuracy of 2000 ppm, one should read the initial value first and calculate

in order to get a more accurate result. is factory calibrated to compensate the actual frequency offset of .

The VCXO has a built-in configuration timeout of 10 ms. All I2C write operations from freezing to unfreezing the digitally controlled oscillator have to complete during this period to become active.

3.1.2 Configuration

The following registers are read by the gateware for calculating the frequency correction:

address	description
`0x07`	HSDIV - 4 (bits 7 - 5), N1 - 1 MSB (bits 4 - 0)
`0x08`	RFREQ MSB (bits 5 - 0)
`0x09`	RFREQ
`0x0A`	RFREQ
`0x0B`	RFREQ
`0x0C`	RFREQ LSB

Table 3.1: Si571 registers read by the gateware

The value of register 0x07 is only used to determine if the frequency has been programmed before, e.g. after a reloading of the bitstream of the FPGA without a power cycle of the FMC ADC board. Applying the frequency correction again would lead to a wrong result, since the RFREQ registers do not contain the factory defaults any more.

The following registers are programmed by the gateware after the calculation of the frequency correction:

address	value	description
`0x89`	`0x10`	freeze digitally controlled oscillator (bit 4)
`0x07`	`0x21`	HSDIV - 4 (bits 7 - 5), N1 - 1 MSB (bits 4 - 0)
`0x08`	`0xC2`	N1 - 1 LSB (bits 7 - 6), RFREQ MSB (bit 5 - 0)
`0x09`	result of the frequency correction	RFREQ
`0x0A`	result of the frequency correction	RFREQ
`0x0B`	result of the frequency correction	RFREQ
`0x0C`	result of the frequency correction	RFREQ LSB
`0x89`	`0x00`	unfreeze digitally controlled oscillator (bit 4)
`0x87`	`0x40`	new frequency applied (bit 6)

Table 3.2: Si571 registers programmed by the gateware

3.2 AD9510 PLL and clock distribution

The Analog Devices AD9510 is connected via SPI. Writing to registers must be completed with a write to the register address 0x5A with the LSBit set in the write value (e.g. 0x01) to take effect. Multiple writes can precede the writing of register 0x5A, so that this needs to be done only once at the end of a write sequence. The maximum SPI clock frequency is 25 MHz.

The phase frequency detector of the PLL, which compares the VCXO frequency to the reference frequency, has a maximum input frequency of 100 MHz. Higher frequencies have to be divided by the prescalers R (reference input) and N (VCXO input). A lock signal can be connected to a status pin, that is connected to a FPGA GPIO.

A datasheet can be found on the Analog Devices website [7].

3.2.1 Configuration

The gateware configures the AD9510 device to lock the VCXO frequency to a reference clock coming from the FPGA.
The following registers are programmed by the gateware:

address	value	description
`0x08`	`0x33`	normal charge pump mode (bits 1 - 0), analog lock detect on STATUS pin (bit 5 - 2)
`0x09`	`0x70`	charge pump current: 4.8 mA (bits 6 - 4)
`0x0A`	`0x44`	PLL power up (bits 1 - 0), VCXO prescaler: 2 (bits 4 - 2), B counter bypass (bit 6)
`0x0B`	`0x00`	R divider MSB
`0x0C`	`0x01`	R divider LSB: divide reference clock input by 2 (value is divider - 1)
`0x0D`	`0x01`	anti backlash pulse width: 2.9 ns (bit 1 - 0)
`0x3C`	`0x08`	output 0 voltage: 810 mV (bits 3 - 2), output 0 enable (to ADC 0, bits 1 - 0)
`0x3D`	`0x08`	output 1 voltage: 810 mV (bits 3 - 2), output 1 enable (to ADC 1, bits 1 - 0)
`0x3E`	`0x08`	output 2 voltage: 810 mV (bits 3 - 2), output 2 enable (to ADC 2, bits 1 - 0)
`0x3F`	`0x08`	output 3 voltage: 810 mV (bits 3 - 2), output 3 enable (to ADC 3, bits 1 - 0)
`0x40`	`0x03`	output 4 power down (bit 0)
`0x41`	`0x03`	output 5 power down (bit 0)
`0x42`	`0x03`	output 6 power down (bit 0)
`0x43`	`0x02`	output 7 current 3.5 mA (bits 2 - 1), output 7 enable (to FPGA for monitoring, bit 0)
`0x45`	`0x02`	clk1 power down (bit 1), clock input from clk2 (VCXO) (bit 0)
`0x49`	`0x80`	bypass the divider in front of output 0 (bit 7)
`0x4B`	`0x80`	bypass the divider in front of output 1 (bit 7)
`0x4D`	`0x80`	bypass the divider in front of output 2 (bit 7)
`0x4F`	`0x80`	bypass the divider in front of output 3 (bit 7)
`0x57`	`0x80`	bypass the divider in front of output 7 (bit 7)
`0x5A`	`0x01`	load the register bank overlay to the actual register bank (bit 0)

Table 3.3: AD9510 registers programmed by the gateware

3.3 ISLA216P ADC

The four ISLA216P ADCs are connected via SPI. The communication to each chip is enabled via an individual chip select line. The MOSI, MISO and CLK lines are shared between the four chips. Parallel configuration by driving all chip selects high at the same time works for the writing registers, but not for reading, since there would be multiple drivers on the MISO line. The maximum SPI clock frequency is given by the ADC sampling frequency divided by 16. At a sample frequency of 125 MHz this corresponds to a SPI clock frequency of 7.8125 MHz.

A datasheet can be found on the Renesas website [8].

The ADCs provide a configurable gain correction of +/- 4.2% and a configurable offset correction of +/- 138 LSBs. Since the gain correction and the offset correction are implemented digitally in the gateware, most of the configuration registers can be left at their default values.

The SPI interface in the gateware is implemented as a four wire interface, whereas the default setting of the ISLA216P SPI interface is a three wire mode. For being able to configure the ISLA216Ps interactively, the gateware configures the corresponding register to four wire mode at start up.

3.3.1 Configuration

The following registers are programmed by the gateware:

address	value	description
`0x00`	`0x80`	enable four wire mode (enable the usage of a dedicated SPI MISO line)

Table 3.4: ISLA216P registers programmed by the gateware

4 Gateware implementation

4.1 Clocking

The gateware uses three primary clocks:

PCIe reference clock, 100 MHz
FMC 0 ADC clock, 125 MHz
FMC 1 ADC clock, 125 MHz

4.1.1 PCIe reference clock

The PCIe reference clock comes from an output of an ADN4604 clock switch an the AFC board [10]. The clock switch is controlled via I2C by the MMC firmware to output a 100 MHz clock, which enters the FPGA as a differential input signal on pins H20 and G20. The PCIe reference clock feeds the reference clock input of the PCIe IP core by Xilinx, which contains a PLL producing a 125 MHz output clock for the AXI interface named clk_125_pcie_axi.

clk_125_pcie_axi drives a MMCM to generate:

clk_125, 125 MHz, this is the main processing clock of the design
clk_100, 100 MHz, used for reading the FPGA serial number
clk_200, 200 MHz, used for the SDRAM interface IP core by Xilinx

The SDRAM interface IP core contains a MMCM which generates a 100 MHz clock named clk_100_sdram for the AXI interface.

4.1.2 FMC ADC clocks

On each of the two FMC boards there is a Si571 programmable VCXO (see chapter 3.1) which feeds the four ADCs. The frequency of the VCXO can be coupled to a reference clock coming from the FPGA (see chapter 3.2). The VCXO is programmed to a nominal output frequency of 125 MHz and coupled by the PLL with clk_125 coming from the FPGA. Bringing the PLL to lock is quite demanding, so that with the current settings a stable lock can not be guaranteed.

From each of the four ADCs of an FMC board an individual clock signal is led to the FPGA which is used for the deserialization in the IDDR primitives. For the further processing, only the clock signal from the first ADC of a FMC board is used since the clock frequencies of the four ADCs are identical.

There are two clock domain crossing FIFOs in the gateware to synchronize the data from the ADCs to the main processing clock clk_125.

4.2 Resets

4.2.1 PLL not in lock

As long as the PLL in the MMCM producing the main processing clock clk_125 is not yet in lock, the design is held in reset. After this the lock should be stable until the next power cycle.

4.2.2 Reset button

There is a push button labeled RST at the center of the AFC front panel which is connected to the microcontroller for the MMC firmware. The firmware should forward a button press to the FPGA pin AG26 as an active low signal to initiate a reset of the gateware. With the actual OpenMMC firmware this forwarding does not work, so that pressing the RST button does not have any effect.

4.3 Data flow diagram

Figure 4.1: Simplified data flow diagram

Figure 4.1 shows a simplified data flow diagram. For simplicity, some features are not included in the diagram:

processing clocks and clock domain crossings
resets
gate and RF pulse inputs
ADC data offset and gain corrections
LEDs
read out logics of FPGA serial number and build time stamp
ADC maximum amplitude calculation
signals connected to the observer
data width conversions of AXI and AXI Stream connections

4.4 Input delays

The data inputs from the ADCs require a latency correction to compensate clock and routing delays. This is implemented via individually configurable input delay primites for both the clock and for the data input pins. By increasing the input delay of either a clock or of the associated data inputs, the alignment can be corrected in both directions.

The input delays provide a 32 tap delay line with a configurable delay between 0 to 31 taps [9]. Each tap corresponds to a delay of:

with being the frequency of the clock connected to the IDELAYCTRL primitive. With the 200 MHz clock connected in this design this corresonds to a tap delay value of 78 s.

4.4.1 Calculation of optimal delay values

The ADCs offers programmable user patterns that can be sent in place of the ADC samples to check the correct timing of the digital interface. For finding the optimum delay values, the following procedure is applied:

The clock input delay is increased until the pattern begins to deteriorate and the delay index at which that happens is noted.
After that the clock input delay is reset to 0 and the data delay is increased until the pattern begins to deteriorate.
The optimum value is assumed to be the midpoint between these values.

There is a configuration file src/software/fpga_observer/config/enable_adc_test_patterns.csv which programs the ADCs to output reference patterns, which can be used together with the FPGA Observer software (see chapter 8.1.5).

4.4.2 Chosen delay values

For an ADC clock frequency of 125 MHz the optimal delay values are:

FMC0: no deterioration at any clock delay, but deterioration at and above data delay 0x06
- ADC clock delay value: 0x0D
- ADC data delay value: 0x00
FMC1: no deterioration at any clock delay, but deterioration at and above data delay 0x05
- ADC clock delay value: 0x0D
- ADC data delay value: 0x00

These values are programmed to the IDELAY primitives at start up. They can be changed via individual configuration registers (see chapter 7.1.1).

4.5 Clock domain crossings

Even though the FMC clocks are coupled to the main processing clock by the FMC’s PLLs, they can jitter against the main processing clock or even run at a slightly different frequency if the PLLs unlock for any reason.

To prevent data corruption, two clock domain crossing FIFOs are used for the incoming ADC data, one for each FMC board. In the case of frequency deviations, there are two cases to differentiate:

the FMC clock is running slightly faster than the processing clock:
- one sample at a time will be discarded
- this happens synchronous for all four ADCs of a FMC board
the FMC clock is running slightly slower than the processing clock:
- one sample at a time will be repeated
- this happens synchronous for all four ADCs of a FMC board

Due to the synchronous handling of the four ADCs on a FMC board, the discarding or repetition of samples should not have any measureable effect on the BPM results, since the two inputs to each BPM come from the same FMC board.

4.6 BPM algorithm

The proportionality factor in equation 2.1 is set implicitly to 1 so that the result has to interpreted as a relative position in the range :

Equation 4.1: Normalized BPM algorithm

The capacitance correction (see chapter 2.1) and the linear regression (equation 4.1) are implemented as a pipelined algorithm. The processing clock is equal to the sampling frequency of the ADCs.

4.6.1 Pipeline steps performed every clock cycle

The gain correction, the differences and sums of the incoming ADC data pairs and and the four different sums of equation 4.1 are calculated every clock cycle.

Step 0: Capacitance correction

offset and gain corrected ADC 0 data sample is strobed unchanged. Input: 17 bits signed, output 17 bits signed
offset and gain corrected ADC 1 data sample is multiplied with a correction factor coming from a configuration register. Input: 17s bit signed for data, 16 bits unsigned for correction factor, output 17 bits signed

Step 1: Calculation of sum and difference signals

: sum of data 0 and data 1, inputs: 17 bits signed, output: 18 bits signed
: difference of data 0 and data 1, inputs: 17 bits signed, output: 18 bits signed

Step 2: Calculation of products, sign extension, summation

The maximum word length of the adders in the DSP48 blocks of the FPGA is 48 bits. When using these adders, the maximum summation length is limited by the word length of longest term (36) to a value of 12.

: inputs: 18 bits signed, output: 48 bits signed
: input: 18 bits signed, output: 48 bits signed
: input: 18 bits signed, output: 30 bits signed
: input: 18 bits signed, output: 30 bits signed
: counter, output: 12 bits unsigned

4.6.2 Pipeline steps performed with a reduced data rate

The following pipeline steps are only performed once for every linear regression period. The length of the linear regression period is defined by the BPM linear regression length register (see chapter 6.2.1).

If a RF signal is present, the length is additionally controlled by the distances of the pulses of this signal. A new linear regression calculation will be started with every rising edge of the RF signal, while the post processing steps for the previous period will be started.

Step 3: Conversion to floating point

The DSP48 blocks in the FPGA can only handle multiplications up to 18 bits times 25 bits. For this reason, a conversion to a floating point format is performed.

The floating point format is:

: 18 bits signed (integer, not fractional as usual for floating point formats)
: 6 bits unsigned

which decodes to:

The sums , , and are converted to float.

A conversion is not necessary for since it is only 12 bits wide.

Step 4: Calculation of the products of sums

The products , , and are calculated by multiplying the mantissas and by adding the exponents of the floating point representations.

Step 5: Shifting to align for subtraction and sign extensions

In general the results of step 4 will have different exponents, so that the mantissas have to be shifted to a common exponent before a subtraction can take place.

The mantissa of the float number with the smaller exponent is shifted by the difference of exponents digits to the right and the exponent is set to the larger exponent. Sign extensions by 1 bit take place to prevent over- and underflows by the subtraction.

Step 6: Calculation of the subtractions in the numerator and the denominator

Now that the operands have the same exponent, the subtractions can take place by subtracting the mantissas.

The exponents of the results stay the same as that of the operands.

The results are: and

Step 7: Conversion of the mantissas to floating point

Due to the multiplication in step 4 and the sign extension in step 5 the mantissas have now a length of 37 bits, which is again too long for the final division. The mantissa is converted to the same floating point format as described in step 4.

The results respectively have a mantissa of 18 bits and two exponents of 6 bits each which have to be united in the next step.

Step 8: Start of division and unification of exponents

Division is a costly operation in FPGAs. In this implementation it is performed by an IP core by Xilinx which is parametrized to 18 bits for both the divisor and the dividend. The result is 33 bits wide, of which 15 bits are fractional.

The division takes 25 clock cycles to complete. The divider IP core reaches a throughput of 1 in 3 clock cycles. Thus 3 is the lower limit for the linear regression length for the current settings of the IP core.

The exponents generated in step 7 are united to the existing ones from step 6 by addition.

Step 9: Subtraction of the exponents of dividend and divisor

The divider IP core only handles the mantissas. The exponents of the dividend and the divisor are subtracted.

Step 10 - 32: Waiting for the division to complete

The results of step 9 are pipelined until the completion of the division.

Step 33: Shifting and slicing the division result

The division result is shifted to the right by minus the exponent from step 9. After that, the lower 16 bits are sliced to form the result of the linear regression algorithm.

The result has to be interpreted as a relative position in the range , multiplied by .

Two signals are created for debugging purposes and are connected to the signal observer (see chapter 4.10.2):

result out of range (1 bit): High if the absolute value of the numerator is greater than that of the denominator. This can happen if the phases of the two input signals are not aligned. In this case the result is set to the maximum or minimum value.
division by zero (1 bit): Comes from the divider IP core and is high if the divisor is zero. This is very unlikely to happen. In this case the result is set to 0.

Limitations

Allowed values for the linear regression length are: 3, 4, 5, … , 4096

The lower limit is caused by the divider IP core which can only handle one division in three clock cycles.

The upper limit is caused by the maximum operand length of the adder in the DSP48 primitives in the FPGA. A higher limit would be implementable at the cost of an increased resource usage and two additional clock cycles of processing latency.

4.7 BPM averaging

The result from the BPM algorithm is sign extended and added up until the desired number of samples is reached. Only powers of two are allowed for the averaging length. Allowing any desired number would require a general division operation at the end of the averaging process, whereas a division by a power of two can be implemented by a simple shift operation. This is why the configuration register ’log2 of BPM averaging length’ contains the dual logarithm of the averaging length (see chapter 6.2.1).

The result is sliced to the same number of bits as the result from the BPM algorithm. It also has to be interpreted as a relative position in the range , multiplied by .

Available values for the averaging length are 1, 2, 4, … , 1,048,576.

The upper limit is not caused by any implementation limitation, but was simply chosen because longer averaging lengths were not assumed to be useful.

4.8 AXI infrastructure

The memory mapped data transfers inside the FPGA are handled via the AXI protocol using a star topology with a central interconnect. The common data width is 256 bits and the common clock is the main processing clock of 125 MHz.

The AXI masters connected to the interconnect are:

PCIe interface
scope 0 / observer
scope 1
scope 2

The PCIe interface only supports an AXI data width of 128 bits, so that an AXI data width converter is used to be able to connect it to the interconnect. A clock domain crossing also takes place despite identical frequencies, since the AXI clock of the PCIe interface is derived directly from the PCIe reference clock and could jitter against the independently derived main processing clock.

The AXI slaves connected to the interconnect are:

SDRAM interface
register bank / Block RAM

The SDRAM interface only supports an AXI clock frequency of 100 MHz, so that an AXI clock converter is used to synchronize it to the main processing clock.

The AXI interconnect is configured to connect the scopes only to the SDRAM interface and only with write access since other accesses are not needed.

Even though there is no need for the PCIe interface to write to the SDRAM, this access is enabled because otherwise the PCIe driver will crash in case of an erroneous write access to the SDRAM.

4.9 AXI Stream infrastructure

The scopes / the observer internally use an AXI Stream bus to process the incoming data. The final data stream is converted to the AXI protocol.

4.10 Scopes and observers

4.10.1 Scopes

There are three so called scopes for interactively storing calculation results.

Scope 0: corrected ADC data

Scope 1: BPM results

Scope 1: BPM averaging results

4.10.2 Observer

The observer uses the same infrastructure as scope 0, so that they cannot used at the same time. It additionately implements more complex observing features and a configurable two-state trigger.

4.11 Configuration of peripheral devices

The peripheral devices documented in chapter 3 are initially programmed by the gateware. During operation, they can be configured using the corresponding gateware registers (see chapter 7.1.1).

4.11.1 SPI Interface

4.11.2 I2C Interface

4.12 PCIe Interface

This gateware uses the Xilinx IP core DMA/Bridge Subsystem for PCI Express with the following configration:

PCIe speed: 5 GTransfers/s

4.13 SDRAM interface

For the communication with the SDRAM, an IP core by Xilinx is used.

5 Build flow and simulation

The build flow is designed and tested to be run on a Linux operation system. The bitstream generation should also work on a Windows installation, but the depending Bash and Python scripts would have to be adapted for Windows. For example, there is one generated VHDL file src/vhdl/generated_constant_package.vhd which is generated by the script src/scripts/generate_monitoring_and_control.py using the content of the configuration files in src/config.

5.1 Prerequisites

An installation of Xilinx Vivado is required. Currently the IP cores are built for version 2019.2 so that this version should be installed.

You can set your preferred Vivado version in the script src/config/project_config.sh.

5.2 Build flow

5.2.1 Scripted build flow

For a completely automatic script based build flow without using the Vivado GUI proceed as follows:

navigate to the root folder of the repository in a terminal
type run/run_build_flow.sh

A project will be generated in the folder output/<Vivado version>/build_flow. The bitstream (if successful) will be generated in the subfolder aft_top.runs/impl_1.

5.2.2 Vivado GUI based build flow

Via a Bash script

navigate to the root folder of the repository in a terminal
type run/create_project.sh

This will open the Vivado GUI and set up a project, which can take some minutes. The project will be generated in the folder output/<Vivado version>/project.

Via the Vivado GUI

If you intend to use the Vivado GUI itself to set up the project proceed as follows:

open Vivado
use the TCL console in the bottom of the GUI to navigate to src/scripts using the commands pwd and cd
type source create_project.tcl in the TCL console. This will set up the Vivado project.

5.3 Simulation

5.3.1 Vivado GUI based simulation

Prerequisite: an existing Vivado project see chapter 5.2.2.

Click on Run Simulation in the Vivado GUI.

5.3.2 Scripted simulation

The scripted simulation checks that the simulation results match a predefined reference pattern.

navigate to the root folder of the repository in a terminal
type run/run_simulation.sh <name of module (or none for the toplevel simulation)>
you will find the output files of the simulation in the folder output/<Vivado version>/simulation.

5.3.3 Peripherals simulation models

The toplevel simulation includes a Verilog simulation model from Micron, the manufacturer of the AFC’s SDRAM, which allows the simulation of the behaviour of the external SDRAM.

The SDRAM interface IP needs an initial calibration process which finishes after about 120 us. If the communication to the SDRAM is of interest the simulation time should be chosen to be longer than that.

6 Gateware software interface

The communication between the gateware inside the FPGA and the software running on the CPU unit takes place via a PCIe driver by Xilinx called XDMA. There is only one PCIe Bar in use in the gateware which maps a coherent memory space of 0x80010000 bytes (= 2,147,549,184 in decimal) to different physical memories on the AFC board.

The following mapping is applied:

address	size	memory type	description
0x00000000	`2 GiB`	SDRAM	on AFC board, for scope data
0x80000000	`32 kiB`	Block RAM	inside FPGA, for device information and write registers
0x80008000	`32 kiB`	Flip Flops	inside FPGA, for read registers

Table 6.1: Memory mapping

6.1 Scope memory

There are three scope memory regions of which the one for the corrected ADC data is the largest since it has the highest data rate.

start address	size	description
`0x00000000`	`1 GiB`	corrected ADC data
`0x40000000`	`512 MiB`	BPM result
`0x60000000`	`512 MiB`	BPM averaging result

Table 6.2: Scopes memory map

6.1.1 Scope 0: corrected ADC data

The corrected ADC data is stored in the following format:

address	bits	radix	description
`0x00000000`	`16`	signed	ADC 0 data (time = 0)
`0x00000002`	`16`	signed	ADC 1 data (time = 0)
`0x00000004`	`16`	signed	ADC 2 data (time = 0)
`0x00000006`	`16`	signed	ADC 3 data (time = 0)
`0x00000008`	`16`	signed	ADC 4 data (time = 0)
`0x0000000A`	`16`	signed	ADC 5 data (time = 0)
`0x0000000C`	`16`	signed	ADC 6 data (time = 0)
`0x0000000E`	`16`	signed	ADC 7 data (time = 0)
`0x00000010`	`16`	signed	ADC 0 data (time = 1)
…	…	…	…

Table 6.3: Corrected ADC data storage format

The corrected data is the result of two sequential operations on the raw ADC data:

offset correction by adding a correction summand
gain correction by multiplying a correction factor

The correction summand and the correction factor can be set by individual configuration registers (see chapter 6.2.1).

The corrected ADC data scope memory can hold up to samples. At a sampling frequency of 125 MHz this corresponds to a maximum capture duration of 0.537 seconds.

6.1.2 Scope 1: BPM result

The BPM result is stored in the following format:

address	bits	radix	description
`0x40000000`	`16`	signed	BPM 0 result (time = 0)
`0x40000002`	`16`	unsigned	reserved for BPM 0 result confidence (time = 0)
`0x40000004`	`16`	signed	BPM 1 result (time = 0)
`0x40000006`	`16`	unsigned	reserved for BPM 1 result confidence (time = 0)
`0x40000008`	`16`	signed	BPM 2 result (time = 0)
`0x4000000A`	`16`	unsigned	reserved for BPM 2 result confidence (time = 0)
`0x4000000C`	`16`	signed	BPM 3 result (time = 0)
`0x4000000E`	`16`	unsigned	reserved for BPM 3 result confidence (time = 0)
`0x40000010`	`16`	signed	BPM 0 result (time = 1)
…	…	…	…

Table 6.4: BPM result storage format

The BPM result scope memory can hold up to samples. At a sampling frequency of 125 MHz and with a linear regression length of e.g. 1024 this corresponds to a maximum capture duration of 4:35 minutes.

6.1.3 Scope 2: BPM averaging result

The BPM averaging result is stored in the following format:

address	bits	radix	description
`0x60000000`	`16`	signed	BPM 0 averaging result (time = 0)
`0x60000002`	`16`	unsigned	reserved for BPM 0 averaging result confidence (time = 0)
`0x60000004`	`16`	signed	BPM 1 averaging result (time = 0)
`0x60000006`	`16`	unsigned	reserved for BPM 1 averaging result confidence (time = 0)
`0x60000008`	`16`	signed	BPM 2 averaging result (time = 0)
`0x6000000A`	`16`	unsigned	reserved for BPM 2 averaging result confidence (time = 0)
`0x6000000C`	`16`	signed	BPM 3 averaging result (time = 0)
`0x6000000E`	`16`	unsigned	reserved for BPM 3 averaging result confidence (time = 0)
`0x60000010`	`16`	signed	BPM 0 averaging result (time = 1)
…	…	…	…

Table 6.5: BPM averaging result storage format

The BPM averaging result scope memory can hold up to samples. At a sampling frequency of 125 MHz, with a linear regression length of e.g. 1024 and with an averaging length of e.g. 1024 this corresponds to a maximum capture duration of 78.2 hours.

6.2 Register map

6.2.1 Configuration registers

The following registers can be written by software:

index	address	bits	radix	description	default value
`0`	`0x80007000`	`16`	signed	ADC 0 offset correction summand	`0x0000`
`1`	`0x80007020`	`16`	signed	ADC 1 offset correction summand	`0x0000`
`2`	`0x80007040`	`16`	signed	ADC 2 offset correction summand	`0x0000`
`3`	`0x80007060`	`16`	signed	ADC 3 offset correction summand	`0x0000`
`4`	`0x80007080`	`16`	signed	ADC 4 offset correction summand	`0x0000`
`5`	`0x800070A0`	`16`	signed	ADC 5 offset correction summand	`0x0000`
`6`	`0x800070C0`	`16`	signed	ADC 6 offset correction summand	`0x0000`
`7`	`0x800070E0`	`16`	signed	ADC 7 offset correction summand	`0x0000`
`8`	`0x80007100`	`16`	unsigned	ADC 0 gain correction factor	`0x8000`
`9`	`0x80007120`	`16`	unsigned	ADC 1 gain correction factor	`0x8000`
`10`	`0x80007140`	`16`	unsigned	ADC 2 gain correction factor	`0x8000`
`11`	`0x80007160`	`16`	unsigned	ADC 3 gain correction factor	`0x8000`
`12`	`0x80007180`	`16`	unsigned	ADC 4 gain correction factor	`0x8000`
`13`	`0x800071A0`	`16`	unsigned	ADC 5 gain correction factor	`0x8000`
`14`	`0x800071C0`	`16`	unsigned	ADC 6 gain correction factor	`0x8000`
`15`	`0x800071E0`	`16`	unsigned	ADC 7 gain correction factor	`0x8000`
`16`	`0x80007200`	`16`	unsigned	BPM 0 capacitance correction factor	`0x8000`
`17`	`0x80007220`	`16`	unsigned	BPM 1 capacitance correction factor	`0x8000`
`18`	`0x80007240`	`16`	unsigned	BPM 2 capacitance correction factor	`0x8000`
`19`	`0x80007260`	`16`	unsigned	BPM 3 capacitance correction factor	`0x8000`
`20`	`0x80007280`	`12`	unsigned	BPM linear regression length - 1	`0x3FF`
`21`	`0x800072A0`	`5`	unsigned	log2 of BPM averaging length	`0x0A`
`22`	`0x800072C0`	`4`	unsigned	gate signal input select	`0x0`
`23`	`0x800072E0`	`4`	unsigned	RF signal input select	`0x8`
`32`	`0x80007400`	`26`	unsigned	scope 0 capture length - 1	`0x0000FFF`
`33`	`0x80007420`	`2`	unsigned	scope 0 trigger mode	`0x2`
`34`	`0x80007440`	`1`	binary	scope 0 arm trigger	`0`
`40`	`0x80007500`	`25`	unsigned	scope 1 capture length - 1	`0x0000FFF`
`41`	`0x80007520`	`2`	unsigned	scope 1 trigger mode	`0x1`
`42`	`0x80007540`	`1`	binary	scope 1 arm trigger	`0`
`43`	`0x80007560`	`1`	binary	scope 1 capture mode	`0`
`48`	`0x80007600`	`25`	unsigned	scope 2 capture length - 1	`0x0000FFF`
`49`	`0x80007620`	`2`	unsigned	scope 2 trigger mode	`0x1`
`50`	`0x80007640`	`1`	binary	scope 2 arm trigger	`0`
`51`	`0x80007660`	`1`	binary	scope 2 capture mode	`0`

Table 6.6: List of configuration registers

0 - 7: ADC {0 - 7} offset correction summand

Correction summand for a possible offset deviation of the ADC. The offset correction precedes the gain correction.

8 - 15: ADC {0 - 7} gain correction factor

Correction factor for a possible gain deviation of the ADC. The default value 0x8000 corresponds to a multiplication by 1. The possible correction range is .

16 - 19: BPM {0 - 3} capacitance correction factor

The capacitances of the two corresponding capacitor plates of a single BPM can differ. Data 0 is fed unchanged into the BPM algorithm, while data 1 is multiplied by a correction factor. The default value 0x8000 corresponds to a multiplication by 1. The possible correction range is .

20: BPM linear regression length - 1

Number of samples over which the linear regression is calculated if no external RF pulse signal is present. This value is valid for all four BPMs. If an external RF pulse signal is present, the result of the linear regression will be output and a new calculation will be started on every rising edge of the RF pulse signal. For this to work, this register has to be set to a value that is longer than the interval between the RF pulses.

Allowed values: 0x002 - 0xFFF

The lower limit is determined by the throughput of the divider IP core of 1 in 3 clock cycles that is used for the final division of the BPM algorithm.

21: Log2 of BPM averaging length

Dual logarithm of the number of linear regression results over which the averaging is calculated. This value is valid for all four BPMs.

Allowed range: 0 .. 20. Higher values will be set to the maximum allowed value. This corresponds to an averaging length of 1, 2, 4, … , 1,048,576.

22: Gate signal input select

value	input
0 - 7	MLVDS line 0 - 7 on the backplane
8	FMC 0 TRIG input
9	FMC 1 TRIG input

The gate signal input can be switched between one of the eight MLVDS lines on the backplane and the two MMCX connectors labeled TRIG on the FMC front panels.

23: RF signal input select

value	input
0 - 7	MLVDS line 0 - 7 on the backplane
8	FMC 0 TRIG input
9	FMC 1 TRIG input

The RF signal input can be switched between one of the eight MLVDS lines on the backplane and the two MMCX connectors labeled TRIG on the FMC front panels.

32, 40, 48: Scope {0, 1, 2} capture length - 1

The number of samples minus one that are stored after a scope has been triggered. Each sample consists of 16 bytes.

33, 41, 49: Scope {0, 1, 2} trigger mode

value	trigger mode
0	trigger on rising edge of gate signal
1	trigger on high state of gate signal
2, 3	trigger instantly after the trigger is armed, independent of the state of the gate signal

34, 42, 50: Scope {0, 1, 2} arm trigger

Writing a 1 to this register will arm the trigger once. The register does not have to be reset to 0 before the next arm trigger, just write another 1 to it. If the corresponding register ’continuous trigger’ is set to 1, writing to this register does not have any effect.

43, 51: Scope {1, 2} capture mode

value	capture mode
0	capture until the number of samples defined by register {40, 48} are stored
1	the same, but cancel capturing when the gate signal goes low

A capture mode register is only available for scopes 1 and 2. Scope 0 (for corrected ADC data) always operates in capture mode 0.

6.2.2 Status registers

The following status registers can be read by software:

index	address	bits	radix	description
`0`	`0x80008000`	`16`	signed	latest BPM 0 result
`1`	`0x80008020`	`16`	signed	latest BPM 1 result
`2`	`0x80008040`	`16`	signed	latest BPM 2 result
`3`	`0x80008060`	`16`	signed	latest BPM 3 result
`4`	`0x80008080`	`16`	signed	latest BPM 0 averaging result
`5`	`0x800080A0`	`16`	signed	latest BPM 1 averaging result
`6`	`0x800080C0`	`16`	signed	latest BPM 2 averaging result
`7`	`0x800080E0`	`16`	signed	latest BPM 3 averaging result
`32`	`0x80008400`	`2`	unsigned	scope 0 capture status
`33`	`0x80008420`	`32`	unsigned	scope 0 latest write address
`40`	`0x80008500`	`2`	unsigned	scope 1 capture status
`41`	`0x80008520`	`32`	unsigned	scope 1 latest write address
`48`	`0x80008600`	`2`	unsigned	scope 2 capture status
`49`	`0x80008620`	`32`	unsigned	scope 2 latest write address
`127`	`0x80008FE0`	`57`	unsigned	FPGA serial number

Table 6.7: List of status registers

0 - 3: latest BPM {0 - 3} result

This value divided by represents the relative beam position in the range .

4 - 7: latest BPM {0 - 3} averaging result

This value divided by represents the relative beam position in the range . Due to the averaging there should be less noise on this value than on the BPM result.

32, 40, 48: Scope {0, 1, 2} capture status

value	capture status
0	idle
1	waiting for trigger
2	capturing
3	done

The value 0 is only present before starting the trigger for the first time. After that, the effective idle state is 3.

33, 41, 49: Scope {0, 1, 2} latest write address

Address where the latest data sample was stored during the scope’s capturing process.

127: FPGA serial number

The XDMA PCIe driver by Xilinx numbers the devices randomly and is not able to identify the slot number of an AFC board. This register holds the FPGA’s unique serial number and can be used to identify an AFC board.

6.3 Capturing procedure

6.3.1 Known number of samples

A typical procedure for capturing a predefineable number of samples starting from the rising edge of the gate signal is the following:

write the number of samples minus 1 to the configuration register ’capture length - 1’
write a 0 to the configuration register named ’trigger mode’
write a 0 (= default) to the configuration register named ’capture mode’
write a 1 to the configuration register named ’arm trigger’
you can check the status register named ’capture status’ for the progress: 1: rising edge of gate signal not yet detected, 2: capturing is ongoing, 3: capturing completed
you can check the current write address by polling the status register named ’latest write address’

6.3.2 Unknown number of samples

BPM results are only calculated while the gate signal is high. If you want to capture a complete high period of e.g. BPM average samples, the total number of samples is unknown. Proceed as follows:

write the maximum value 0x1FFFFFF to the configuration register named ’capture length - 1’
write a 0 to the configuration register named ’trigger mode’
write a 1 to the configuration register named ’capture mode’
write a 1 to the configuration register named ’arm trigger’
you can check the status register ’capture status’ as above
the value of the status register named ’latest write address’ will be static after completion and indicates how many samples have been captured

7 Extended gateware software interface

Besides the interface documented in chapter 6 which is meant for productive use, there is an extended interface for development and debugging purposes. The extended interface is also present in the bitstream by default.

While the productive interface is intended to be kept as downward compatible as possible, the extended interface may be subject to major changes during the development process.

7.1 Extended register map

7.1.1 Additional configuration registers

The following additional registers can be written by software:

index	address	bits	radix	description	default value
`39`	`0x800074E0`	`1`	binary	scope 0 continuous trigger	`0`
`47`	`0x800075E0`	`1`	binary	scope 1 continuous trigger	`0`
`55`	`0x800076E0`	`1`	binary	scope 2 continuous trigger	`0`
`64`	`0x80007800`	`2`	unsigned	FMC 0 status LED select	`0x0`
`65`	`0x80007820`	`3`	unsigned	FMC 0 status LED value	`0x7`
`66`	`0x80007840`	`5`	unsigned	FMC 0 SPI chip select	`0x0F`
`67`	`0x80007860`	`1`	binary	FMC 0 SPI read/write	`0`
`68`	`0x80007880`	`8`	unsigned	FMC 0 SPI address	`0x00`
`69`	`0x800078A0`	`8`	unsigned	FMC 0 SPI write data	`0x00`
`70`	`0x800078C0`	`1`	binary	FMC 0 SPI trigger	`0`
`71`	`0x800078E0`	`1`	binary	FMC 0 ADC resetn	`1`
`72`	`0x80007900`	`1`	binary	FMC 0 I2C read/write	`0`
`73`	`0x80007920`	`7`	unsigned	FMC 0 I2C device address	`0x49`
`74`	`0x80007940`	`8`	unsigned	FMC 0 I2C register address	`0x00`
`75`	`0x80007960`	`8`	unsigned	FMC 0 I2C write data	`0x00`
`76`	`0x80007980`	`1`	binary	FMC 0 I2C trigger	`0`
`77`	`0x800079A0`	`1`	binary	FMC 0 PLL resetn	`1`
`78`	`0x800079C0`	`1`	binary	FMC 0 clock switch select	`1`
`79`	`0x800079E0`	`1`	binary	FMC 0 VCXO output enable	`1`
`80`	`0x80007A00`	`2`	unsigned	FMC 1 status LED select	`0x0`
`81`	`0x80007A20`	`3`	unsigned	FMC 1 status LED value	`0x7`
`82`	`0x80007A40`	`5`	unsigned	FMC 1 SPI chip selec	`0x0F`
`83`	`0x80007A60`	`1`	binary	FMC 1 SPI read/write	`0`
`84`	`0x80007A80`	`8`	unsigned	FMC 1 SPI address	`0x00`
`85`	`0x80007AA0`	`8`	unsigned	FMC 1 SPI write data	`0x00`
`86`	`0x80007AC0`	`1`	binary	FMC 1 SPI trigger	`0`
`87`	`0x80007AE0`	`1`	binary	FMC 1 ADC rstn	`1`
`88`	`0x80007B00`	`1`	binary	FMC 1 I2C read/write	`0`
`89`	`0x80007B20`	`7`	unsigned	FMC 1 I2C device address	`0x49`
`90`	`0x80007B40`	`8`	unsigned	FMC 1 I2C register address	`0x00`
`91`	`0x80007B60`	`8`	unsigned	FMC 1 I2C write data	`0x00`
`92`	`0x80007B80`	`1`	binary	FMC 1 I2C trigger	`0`
`93`	`0x80007BA0`	`1`	binary	FMC 1 PLL rstn	`1`
`94`	`0x80007BC0`	`1`	binary	FMC 1 clock switch select	`1`
`95`	`0x80007BE0`	`1`	binary	FMC 1 VCXO output enable	`1`

Table 7.1: List of additional configuration registers - part 1

index	address	bits	radix	description	default value
`96`	`0x80007C00`	`5`	unsigned	FMC 0 ADC 0 clock delay	`0x0D`
`97`	`0x80007C20`	`5`	unsigned	FMC 0 ADC 1 clock delay	`0x0D`
`98`	`0x80007C40`	`5`	unsigned	FMC 0 ADC 2 clock delay	`0x0D`
`99`	`0x80007C60`	`5`	unsigned	FMC 0 ADC 3 clock delay	`0x0D`
`100`	`0x80007C80`	`5`	unsigned	FMC 0 ADC 0 data delay	`0x00`
`101`	`0x80007CA0`	`5`	unsigned	FMC 0 ADC 1 data delay	`0x00`
`102`	`0x80007CC0`	`5`	unsigned	FMC 0 ADC 2 data delay	`0x00`
`103`	`0x80007CE0`	`5`	unsigned	FMC 0 ADC 3 data delay	`0x00`
`104`	`0x80007D00`	`5`	unsigned	FMC 1 ADC 0 clock delay	`0x0D`
`105`	`0x80007D20`	`5`	unsigned	FMC 1 ADC 1 clock delay	`0x0D`
`106`	`0x80007D40`	`5`	unsigned	FMC 1 ADC 2 clock delay	`0x0D`
`107`	`0x80007D60`	`5`	unsigned	FMC 1 ADC 3 clock delay	`0x0D`
`108`	`0x80007D80`	`5`	unsigned	FMC 1 ADC 0 data delay	`0x00`
`109`	`0x80007DA0`	`5`	unsigned	FMC 1 ADC 1 data delay	`0x00`
`110`	`0x80007DC0`	`5`	unsigned	FMC 1 ADC 2 data delay	`0x00`
`111`	`0x80007DE0`	`5`	unsigned	FMC 1 ADC 3 data delay	`0x00`
`112`	`0x80007E00`	`1`	binary	productive mode	`1`
`113`	`0x80007E20`	`1`	binary	AFC LED select	`0`
`114`	`0x80007E40`	`3`	unsigned	AFC LED value	`0x7`
`115`	`0x80007E60`	`1`	binary	gate override	`0`
`116`	`0x80007E80`	`1`	binary	gate override value	`1`
`117`	`0x80007EA0`	`2`	unsigned	observer valid signal select	`0x0`
`118`	`0x80007EC0`	`8`	unsigned	MLVDS direction	`0x00`
`119`	`0x80007EE0`	`8`	unsigned	MLVDS output value	`0x00`
`120`	`0x80007F00`	`3`	unsigned	observer multiplexer 0 select	`0x0`
`121`	`0x80007F20`	`3`	unsigned	observer multiplexer 1 select	`0x0`
`122`	`0x80007F40`	`27`	unsigned	observer number of samples - 1	`0x00000FFF`
`123`	`0x80007F60`	`3`	unsigned	observer trigger select	`0x0`
`124`	`0x80007F80`	`64`	unsigned	observer trigger compare vector (t = -1)	`0x0000000000000000`
`125`	`0x80007FA0`	`64`	unsigned	observer trigger compare vector (t = 0)	`0x0000000000000000`
`126`	`0x80007FC0`	`64`	unsigned	observer trigger compare bit mask	`0xFFFFFFFFFFFFFFFF`
`127`	`0x80007FE0`	`1`	binary	observer arm trigger	`0`

Table 7.2: List of additional configuration registers - part 2

39, 47, 55: Scope {0, 1, 2} continuous trigger

If set to 1, the trigger is armed and will be rearmed automatically after every capture completion.

64, 80: FMC {0, 1} status LED select

There is one tricolor LED on the FMC front panel labeled status that can be controlled by the gateware.

value	input
0	ADC clock, blink frequency divided by , green if AD9510 PLL is in lock, otherwise red
1	AD9510 monitoring clock, blink frequency divided by
2, 3	static value from register ’status LED value’

65, 81: FMC {0, 1} status LED value

The static lighting pattern defined by this register becomes active if the corresponding register ’status LED select’ is set to 2 or 3.

bit	color
0	red
1	green
2	blue

66, 82: FMC {0, 1} SPI cs

Chip select signals (active high) of the SPI bus to the four ADCs and to the AD9510 PLL and clock distribution.

bit	device
0	ADC 0
1	ADC 1
2	ADC 2
3	ADC 3
4	PLL and clock distribution

67, 83: FMC {0, 1} SPI read/write

0: write mode, 1: read mode

68, 84: FMC {0, 1} SPI address

The address of the register that shall be accessed.

69, 85: FMC {0, 1} SPI write data

The data that shall be written to a register.

70, 86: FMC {0, 1} SPI trigger

Write a 1 to this register to start a read or write access on the SPI bus. The register does not have to be reset to 0 before the next SPI trigger, just write another 1 to it.

71, 87: FMC {0, 1} ADC resetn

Low active reset signal to the four ADCs in parallel. Tie to 0 and back to 1 to initiate a reset.

72, 88: FMC {0, 1} I2c read/write

0: write mode, 1: read mode

73, 89: FMC {0, 1} I2C device address

The address of the connected VCXO is 0x49.

74, 90: FMC {0, 1} I2C register address

The address of the register that shall be accessed.

75, 91: FMC {0, 1} I2C write data

The data that shall be written to a register.

76, 92: FMC {0, 1} I2C trigger

Write a 1 to this register to start a read or write access on the I2C bus. The register does not have to be reset to 0 before the next I2C trigger, just write another 1 to it.

77, 93: FMC {0, 1} PLL resetn

Low active reset signal to the PLL and clock distribution. Tie to 0 and back to 1 to initiate a reset.

78, 94: FMC {0, 1} Clock switch select

There is a separate clock switch in front of the AD9510 PLL reference clock input.

value	connect to
0	MMCX connector labeled REF on the front panel of the FMC board
1	clock output from the FPGA via the FMC connector

79, 95: FMC {0, 1} VCXO output enable

Enables the frequency output of the VCXO.

96 - 99 and 104 - 107: FMC {0, 1} ADC {0 - 3} clock delay

There is a configurable input delay for setting the correct digital interface timing for both the clock and the data signals. Increasing this value increases the delay of the clock, so that the data is sampled later.

100 - 103 and 108 - 111: FMC {0, 1} ADC {0 - 3} data delay

See above. Increasing this value increases the delay of the data, so that the data is sampled at an earlier position.

112: Productive mode

When productive mode is 1, scope 0 operates in an easy to use mode for storing ADC data.

Setting this register to 0 enables additional functionality like combining and choosing different signals to store and a more powerful and flexible two-stage trigger.

Registers 117 to 127 are only relevant in non productive mode.

113: AFC LED select

There is one tricolor LED at the center of the AFC front panel labeled L3 that can be controlled by the gateware.

value	input
0	PCIe reference clock, blink frequency divided by , white
1	static value from register 114 ’AFC LED value’

114: AFC LED value

Static lighting pattern if register 113 ’AFC LED select’ = 1.

bit	color
0	red
1	green
2	blue

115: Gate override

For testing purposes without an external gate signal you can set this register to 1 and simulate a gate signal via register 116 ’gate override value’.

116: Gate override value

Can be used to simulate a gate signal when register 115 ’gate override’ is 1.

117: Observer valid signal select

Determines the data valid input to the observer. Samples are only stored when the valid signal is high.

value	input
0, 3	constant 1
1	BPM result valid
2	BPM averaging result valid

118: MLVDS direction

Determines the direction of the eight MLVDS lines on the AMC connector. A ’0’ corresponds to an input to the FPGA and a ’1’ to an output from the FPGA.

118: MLVDS output value

Determines the logic levels of the eight MLVDS lines if they are configured as outputs (see previous register).

120, 121: Observer multiplexer {0, 1} select

The observer stores samples that are 128 bits wide, which consist of two concatenated 64 bits wide multiplexer outputs. Each multiplexer can choose between eight different input vectors. Like this, each signal can be observed in parallel to any other signal.

value	input vector(64 bits)
0	corrected ADC data of ADCs 0 - 3
1	corrected ADC data of ADCs 4 - 7
2	BPM 0 and 1 result, additional information
3	BPM 2 and 3 result, additional information
4	BPM 0 and 1 averaging result, additional information
5	BPM 2 and 3 averaging result, additional information
6	SPI and I2C signals, MLVDS signals, FMC trigger signals
7	test counter

For a detailed description of the input vectors see chapter 4.10.2.

122: Observer number of samples - 1

The number of samples minus one that are stored after the observer has been triggered. Each sample consists of 16 bytes.

123: Observer trigger select

Analog to register 120 and 121. Determines on which observer input vector the trigger listens.

124: Observer trigger compare vector (t = -1)

64 bit wide compare vector that is compared with the observer input vector determined by register 123 ’observer trigger select’. If the two pattern match, the next sample will be compared to the compare vector determined by register 125: ’trigger compare vector (t = 0)’.

125: Observer trigger compare vector (t = -1)

See above. If the patterns do not match, the next sample will be compared to the compare vector determined by register 124: ’trigger compare vector (t = -1)’. If the patterns match the data acquisition is triggered.

126: Observer trigger compare bit mask

Determines which bits of the input vector shall be compared with that of the compare vectors. Valid for both trigger compare vectors (registers 124 and 125). For triggering, the patterns must match for all bits whose bit mask is 1.

126: Observer arm trigger

Starts the comparing process. Data is captured if the patterns defined by the previous three registers match.

7.1.2 Additional status registers

The following additional status registers can be read by software:

index	address	bits	radix	description
`64`	`0x80008800`	`1`	binary	FMC 0 SPI busy
`65`	`0x80008820`	`8`	unsigned	FMC 0 SPI read data
`66`	`0x80008840`	`1`	binary	FMC 0 I2C busy
`67`	`0x80008860`	`8`	unsigned	FMC 0 I2C read data
`68`	`0x80008880`	`1`	binary	FMC 0 PLL status
`69`	`0x800088A0`	`38`	unsigned	FMC 0 VCXO initial RFREQ
`70`	`0x800088C0`	`38`	unsigned	FMC 0 VCXO RFREQ
`71`	`0x800088E0`	`32`	unsigned	FMC 0 measured ADC clock frequency
`72`	`0x80008900`	`32`	unsigned	FMC 0 ADC FIFO underflow counter
`73`	`0x80008920`	`32`	unsigned	FMC 0 ADC FIFO overflow counter
`80`	`0x80008A00`	`1`	binary	FMC 1 SPI busy
`81`	`0x80008A20`	`8`	unsigned	FMC 1 SPI read data
`82`	`0x80008A40`	`1`	binary	FMC 1 I2C busy
`83`	`0x80008A60`	`8`	unsigned	FMC 1 I2C read data
`84`	`0x80008A80`	`1`	binary	FMC 1 PLL status
`85`	`0x80008AA0`	`38`	unsigned	FMC 1 VCXO initial RFREQ
`86`	`0x80008AC0`	`38`	unsigned	FMC 1 VCXO RFREQ
`87`	`0x80008AE0`	`32`	unsigned	FMC 1 measured ADC clock frequency
`88`	`0x80008B00`	`32`	unsigned	FMC 1 ADC FIFO underflow counter
`89`	`0x80008B20`	`32`	unsigned	FMC 1 ADC FIFO overflow counter
`96`	`0x80008C00`	`16`	unsigned	ADC 0 max peak to peak
`97`	`0x80008C20`	`16`	unsigned	ADC 1 max peak to peak
`98`	`0x80008C40`	`16`	unsigned	ADC 2 max peak to peak
`99`	`0x80008C60`	`16`	unsigned	ADC 3 max peak to peak
`100`	`0x80008C80`	`16`	unsigned	ADC 4 max peak to peak
`101`	`0x80008CA0`	`16`	unsigned	ADC 5 max peak to peak
`102`	`0x80008CC0`	`16`	unsigned	ADC 6 max peak to peak
`103`	`0x80008CE0`	`16`	unsigned	ADC 7 max peak to peak
`111`	`0x80008DE0`	`1`	binary	SDRAM initial calibration complete
`124`	`0x80008F80`	`1`	binary	observer triggered
`125`	`0x80008FA0`	`1`	binary	observer capture busy
`126`	`0x80008FC0`	`32`	unsigned	build timestamp

Table 7.3: List of additional status registers

64, 80: FMC {0, 1} SPI busy

Indicates that a SPI read or write access is going on. The value of this register has to be checked to be 0 before triggering a SPI access.

65, 81: FMC {0, 1} SPI read data

Contains the result of a read access to a SPI register.

66, 82: FMC {0, 1} I2C busy

Indicates that an I2C read or write access is going on. The value of this register has to be checked to be 0 before triggering an I2C access.

67, 83: FMC {0, 1} I2C read data

Contains the result of a read access to an I2C register.

68, 84: FMC {0, 1} PLL status

Value of the configurable output pin status of the AD9510 PLL and clock distribution IC. By default this pin indicates lock status of the PLL.

69, 85: FMC {0, 1} VCXO initial RFREQ

RFREQ is a factory calibrated multiplicator to the XTAL frequency of the Si571 programmable VCXO. Before the programming of a new output frequency this value has to be read (see chapter 3.1).

69, 86: FMC {0, 1} VCXO RFREQ

The VCXO output frequency is programmed to 125 MHz by the gateware. This register holds the value of RFREQ that has been programmed (see chapter 3.1).

71, 87: FMC {0, 1} measured ADC clock frequency

The ADC clock is measured against the main processing clock. This register holds the number of detected ADC clock cycles during 1 second of the main processing clock.

72, 88: FMC {0, 1} ADC FIFO underflow counter

If the ADC clock is slower than the main processing clock, samples will be repeated by the clock domain crossing FIFO output logic. For each repetition the underflow counter will be incremented by 1.

73, 89: FMC {0, 1} ADC FIFO underflow counter

If the ADC clock is faster than the main processing clock, samples will discarded by the clock domain crossing FIFO input logic. For each discarded sample the overflow counter will be incremented by 1.

{96 - 103}: ADC {0 - 7} max peak to peak

The maximum and the minimum value of the ADC data is determined over a free running period of 1 second. This register contains the difference of the maximum and the minimum value.

111: SDRAM initial calibration complete

The communication to the SDRAM is controlled by an IP core by Xilinx which performs a timing calibration at start up. The value of this register will be 1 after completion of the initial calibration.

124: observer triggered

Indicates that the observer has been triggered.

125: observer capture busy

Indicates that a capturing process is ongoing.

126: build timestamp

Time when the bitstream was created. This information can be used to identify the gateware version (together with the Git commit information documented in chapter 7.2.3).

Format:

bits	information
0 - 5	seconds
6 - 11	minutes
12 - 16	hours
17 - 22	last two decimal digits of the year
23 - 26	month
27 - 31	day

7.2 Architecture information storage

The first seven eights of the Block RAM (see memory mapping, table 6.1) are used to store information about the observer signals, the registers and the gateware version.

7.2.1 Observer signal information

Information about the signals connected to the eight observer multiplexer inputs is stored in the first half of the Block RAM. Following information is stored for every bit of each of the eight 64 bits wide multiplexer inputs:

name of signal (30 bytes)
display type of signal (1 byte)
bit index in signal (1 byte)

address	multiplexer input	bytes 0 - 29	byte 30	byte 31
`0x80000000`	0	name of signal A	display type of signal A	0
`0x80000020`	0	name of signal A	display type of signal A	1
…	…	…	…	width - 1 of signal A
`0x80000XXX`	0	name of signal B	display type of signal B	0
`0x80000XXX`	0	name of signal B	display type of signal B	1
…	…	…	…	width - 1 of signal B
…	…	…	…	…
`0x80000800`	1	name of signal C	display type of signal C	0
`0x80000820`	1	name of signal C	display type of signal C	1
…	…	…	…	width - 1 of signal C
`0x80000XXX`	1	name of signal D	display type of signal D	0
`0x80000XXX`	1	name of signal D	display type of signal D	1
…	…	…	…	width - 1 of signal D
…	…	…	…	…
`0x80003FE0`	7	name of signal X	display type of signal X	width - 1 of signal X

Table 7.4: Observer signal information storage format

Table 7.4 shows the storage format of the 512 entries, each of which has a width of 32 bytes. The coding of the display type byte is the following:

value	display type
0	hexadecimal
1	signed
2	unsigned
3	binary
4	analog

The names are stored as ASCII strings. If a name is shorter than 30 bytes, the remaining bytes are filled with Null characters.

The observer signal information is used by the FPGA Observer software to display the observer signals in the Data Acquisition tab (see chapter 8.1.3). The information is also used to format the measurement data to be displayed by GTKWave.

7.2.2 Register information

Information about the 128 configuration registers and the 128 status registers is stored in the third quarter of the Block RAM. Following information is stored for every register:

name of register (31 bytes)
number of bits (1 byte)

address	bytes 0 - 30	byte 31
`0x80004000`	name of configuration register 0	width of configuration register 0
`0x80004020`	name of configuration register 1	width of configuration register 1
…	…	…
`0x80004FE0`	name of configuration register 127	width of configuration register 127
`0x80005000`	name of status register 0	width of status register 0
`0x80005020`	name of status register 1	width of status register 1
…	…	…
`0x80005FE0`	name of status register 127	width of status register 127

Table 7.5: Register information storage format

Table 7.5 shows the storage format of the 256 entries, each of which has a width of 32 bytes. The names are stored as ASCII strings. If a name is shorter than 31 bytes, the remaining bytes are filled with Null characters. If not all registers are in use, a width of 0 bits indicates that a register is not present.

The register information is used by the FPGA Observer software to display the registers in the Register Access tab (see chapter 8.1.2).

7.2.3 Gateware information

The address range from 0x80006000 to 0x80006FFF is used to store information about the gateware version. The information is stored as an ASCII string of variable length (maximum 4 kiB), which is assembled from information from the Git repository. It contains the URL of the remote server of the Git repository, the latest commit hash and the latest commit date.

The gateware information is used by the FPGA Observer software to display the information in the Gateware Information tab (see chapter 8.1.6), except from the bitstream generation date, which is read from status register 126 ’build timestamp’.

8 Test software

8.1 FPGA Observer

There is a graphical test software intended to be run on the CPU unit. It is implemented in Python using the GTK 3 GUI toolkit.

8.1.1 Installation and usage

The sources and an installation script can be found under src/software/fpga_observer/ in the Cryring_BPM_Gateware Git repository.

Installation

Connect to the CPU unit e.g. via ssh. Clone the Cryring_BPM_Gateware Git repository:

git clone git@git.gsi.de:BEA_HDL/Cryring_BPM_Gateware.git

Install the PCIe driver:

cd src/software/pcie_driver
sudo ./install.sh

Install the FPGA Observer software:

cd ../fpga_observer
sudo ./install.sh

Figure 8.1: FPGA Observer - Register Access tab

Usage

For the PCIe driver to work, the bitstreams of the FPGAs have to be loaded before powering the CPU unit. If that is not the case, power cycle the CPU unit by pulling out the Hot Swap Handle and pushing it in again. A software reboot does not work.

Connect to the CPU unit e.g. via ssh -X in order to allow a graphical connection. Start the FPGA Observer software by:

sudo fpga_observer local

A GUI should open and a choice of FPGA serial numbers should be displayed on the upper left corner. If the list is empty, either the loading of the FPGAs finished after powering the CPU unit or the PCIe driver did not install correctly. The FPGA serial numbers can be used to identify the AFC board you like to access. Choose a serial number and click connect.

8.1.2 Register Access tab

The names and widths of the registers are read from an information memory region in the FPGA (see chapter 7.2.2). The status registers are displayed on the left and the configuration registers on the right.

The read button reads all the status registers either once or continuously if the continuous check button is checked. The write button writes all the configuration registers whose check buttons next to the write value are checked.

8.1.3 Data Acquisition tab

The names and widths of the signals connected to the eight observer multiplexer inputs are read from an information memory region in the FPGA (see chapter 7.2.1).

The two combo boxes Observer 0 and Observer 1 determine which multiplexer inputs are selected for data acquisition. The combo box Trigger determines on which of the observer inputs the trigger will listen.

Figure 8.2: FPGA Observer - Data Acquisition tab

The Number Of Samples entry determines how many samples will be stored after a trigger event when the capture button has been pressed. If the continuous check button is checked, the trigger will be rearmed automatically when the data acquisition completes.

The individual Trigger Active (&) check buttons define on which signals the trigger will listen. All of the enabled conditions have to become true for a trigger event.

The trigger conditions for t = -1 and t = 0 contain the compare vectors of the two stage trigger which have to match in consecutive clock cycles. The Trigger Mask (&) defines on which bits of a signal the trigger will listen.

When the data acquisition completes, the open source waveform viewer GTKWave is called to display the captured data, which has been stored to a .vcd file before.

Figure 8.3: GTKWave

8.1.4 BPM Calculation tab

The result of the BPM algorithm and of the averaging are displayed in this tab. If the read button is pressed, the values from status registers 0 - 3 and 4 - 7 (see chapter 6.2.2) are read once (or continuously if the check button continuous is checked). The displayed values are the register values divided by .

Figure 8.4: FPGA Observer - BPM Calculation tab

8.1.5 Peripherals Configuration tab

Configuration

A configuration file can be loaded to program the three different peripheral devices documented in chapter 3. An initial configuration is already performed by the gateware after startup.
The configuration file is a comma separated values (.csv) file with the following syntax:

device number (1 hexadecimal digit), register number (2 hexadecimal digits), value (2 hexadecimal digits)

The device number is encoded as follows:

number	device
0 - 3	ADCs 0 - 3 on FMC0
4	all ADCs on FMC0 in parallel
5	PLL on FMC0
6	VCXO on FMC0
7 - A	ADCs 0 - 3 on FMC0
B	all ADCs on FMC0 in parallel
C	PLL on FMC0
D	VCXO on FMC0

A correct configuration of the VCXOs via software can not be guaranteed due to the I2C configuration timeout of 10 ms of the VCXOs (see chapter 3.1). Any configuration to the VCXOs should be checked by subsequently reading back the register values.

Reading the device registers

The complete register bank of the first ADC, the PLL and the VCXO on FMC 0 can be read and stored to three individual log files in the /tmp directory.

Figure 8.5: FPGA Observer - Peripherals Configuration tab

8.1.6 Gateware Information tab

Figure 8.6: FPGA Observer - Gateware Information tab

Information about the gateware version is displayed in this tab. The URL of the remote server of the Git repository, the latest commit hash and the latest commit date are read from an information memory region in the FPGA (see chapter 7.2.3). The bitstream generation date is read from status register 126 ’build timestamp’.

8.2 Test scripts

8.2.1 PCIe access test script

There is a PCIe access test script under src/software/pcie_driver/test_pcie_access.sh in the Cryring_BPM_Gateware Git repository. It uses the tools provided together with the XDMA PCIe driver to test some basic reading and writing to different memories via the PCIe driver. The reading results are displayed via hexdump.

9 Helper scripts

9.1 VHDL beautification

There is a script src/scripts/beautify_vhdl.py for autoformatting VHDL files using the open source software Emacs.

The script expects one parameter: <file that shall be formatted>, or all for formatting all VHDL files in the repository. The formatting is performed in place, overwriting the original source file.

The script applies several corrections and changes to the Emacs formatting result:

correction of the handling of the comparison operator <=
correction of the handling of initializations like (others => ’0’)
enforcing of spaces around the operators +, -, *, /, &
no indentation for closing brackets
aligning of full comment lines to the indentation level of the following VHDL command
indentation with tabs instead of spaces

9.2 Remote power cycling of the CPU unit

Whenever the bitstream of an FPGA is reloaded, the CPU unit has to be rebooted via its Hot Swap Handle in order to establish a PCIe connection. A software reboot does not work.

An alternative possibility of remote power cycling the CPU unit is via the MCH.

The script src/scripts/powercycle_cpu_unit.py instructs the MCH via SSH to power down and repower the CPU unit. The script expects one parameter: <name of MCH, e.g. sdmch023>.

The script takes about 60 seconds to complete.

9.3 Plotting of measurement data

The script src/scripts/plot_frequency_response.py can be used to plot measurement data. It was used to create figure 12.6.

9.4 Generation of a VHDL file for monitoring and control

The monitoring and control configuration of the gateware is defined by the configuration files observer_signals.csv, read_registers.csv and write_registers.csv in the folder src/config.

The script src/scripts/generate_monitoring_and_control.py is used to convert the configuration to a VHDL file stored as src/vhdl/generated_constant_package.vhd. It contains the register default values and a Block RAM initialization vector containing the observer and register configuration.

The script is also executed by the gateware build flow documented in chapter 5.

9.5 Generation of documentation

9.5.1 PDF

There is a script doc/scripts/create_pdf.sh for generating a PDF file from the Latex sources in doc/tex. It calls the open source software Pdflatex twice on the top level documentation file Cryring_BPM_Gateware_Documentation.tex to enable the generation of references inside the PDF file.

9.5.2 Markdown

There is a script doc/scripts/create_markdown.py for generating the Markdown file README.md, which is displayed on the repository’s start page in Gitlab. The script uses the open source software Pandoc for an initial conversion of the Latex sources in doc/tex.

The result of Pandoc is postprocessed for multiple reasons:

conversion of the math syntax to Gitlab’s .md format
corrections of the bibliography, HTML syntax and Latex labels
corrections of the references to figures, tables and equations
enumeration of chapters, sections and subsections
adding of captions for figures and equations
enumeration of figures, tables and equations
generation of a table of contents
implementation of citations

Additional documentation which is not included in the Latex sources is appended from the file doc/markdown/epilog.md.

9.5.3 DokuWiki

There is a script doc/scripts/create_dokuwiki.py for generating a DokuWiki file which can be used to populate a Wiki page on e.g. https://www-bd.gsi.de/dokuwiki.

The script converts the Latex documentation sources to the DokuWiki format in four steps:

conversion of Latex sources to Markdown
preprocessing of Markdown before the conversion to DokuWiki
calling Pandoc to convert Markdown to DokuWiki
postprocessing for correction, extension of functionality and a different style

The preprocessing actions are:

removing table of contents since it is automatically generated by DokuWiki

The postprocessing actions are:

conversion or equations to images since DokuWiki can not render equations
replacement of HTML tags, Latex color tags, etc. since DokuWiki can not handle them
conversion of the reference format

10 Continuous integration environment

There is a continuous integration environment setup for the Cryring_BPM_Gateware Git repository. It is implemented as a so called Gitlab Runner that communicates with the remote of the Git repository, the Gitlab server git.gsi.de.

At the moment the Gitlab Runner is running on the Linux server sdlx035 located in a server room in the basement.

The benefits of continuous integration are:

every change will be tested automatically
it is ensured that no files are missing in the repository
the master branch can be kept functional at any time
build results like e.g. bitstreams are automatically generated and can be archived

Figure 10.1: Gitlab: continuous integration pipelines

10.1 Installation

There is an installation script install.sh in the Gitlab_Runner_Setup_Centos_7 Git repository. It installs the Gitlab Runner as well as the software needed for simulation, generation of documentation and building an FPGA.

After the installation, the newly setup Gitlab Runner has to configured to connect to a remote repository on a Gitlab server. In the repository’s web front end on the Gitlab server, go to Settings CI/CD Runners and copy the registration token which you will need in the following step.

On the newly installed Gitlab Runner server, open a terminal and type sudo gitlab-runner register.

Enter the following information:

gitlab-ci coordinator URL: e.g. https://git.gsi.de
gitlab-ci token: enter the registration token copied before
gitlab-ci description: name of the server, e.g. sdlx035
gitlab-ci tags: leave empty
executor: shell

You can add multiple repositories with different tokens by running sudo gitlab-runner register multiple times.

10.2 Pipeline Stages

Each push to the Gitlab server will trigger a so called continuous integration / continuous delivery (CI/CD) pipeline. The pipeline setup is defined by the file gitlab-ci.yml in the root folder of the repository.

The following pipeline stages are defined:

documentation
simulation
FPGA build

Figure 10.2: Gitlab: Pipeline stages

10.2.1 Documentation

The script create_documentation.sh in doc/tex is run to generate this documentation from the Latex source files. Pdflatex is run twice by the script to allow the generation of references inside the document. This pipeline stage succeeds if Pdflatex can generate the PDF without errors.

The log file of Pdflatex and - if successful - the PDF of the documentation are archived.

10.2.2 Simulation

The script run/run_simulation.sh is run which uses the Vivado command line interface to simulate the top level of the gateware. Test signals from the ADCs are generated as inputs to the simulation. The BPM results are saved to a file which is compared to a reference pattern. This pipeline stage succeeds if there is no error in simulation and if the BPM result file matches the reference pattern.

The log file of the simulation and - if successful - a file with the BPM results from the simulation are archived.

10.2.3 FPGA build

The script run/run_build_flow.sh is run which uses the Vivado command line interface to build the gateware. This pipeline stage succeeds if there is no error during the build process and if a bitstream file has been generated.

Different log files from synthesis and implementation, different reports like utilization and timing reports and - if successful - the bitstream file are archived.

Figure 10.3: Gitlab: Pipeline progress console

10.3 Build results

For each of the pipeline stages the archiving of build results can be configured for an adjustable time period, which is set to one week. If the period has passed and the build results have been deleted, they can be generated again by restarting the pipeline.

The build results can be downloaded from the Gitlab web front end where they are called job artifacts (see figure 10.3).

The CI/CD pipelines can also be used to generate FPGA bitstreams without having to set up a build environment.

10.4 Settings

You can define individual settings for the CI/CD section of each Git repository in the Gitlab web front end. The following settings should fit for most cases:

Use git clone to get the recent application code, otherwise the pipelines might fail during git fetch:

Settings CI/CD General pipelines Git strategy for pipelines: git clone

Increase the timeout to allow FPGA build to finish in any case:

Settings CI/CD General pipelines Timeout: 6h

11 Programming and hardware configuration

11.1 Programming the gateware

11.1.1 Using a JTAG programmer

Before being able to access the FPGA you need to program the JTAG switch on the AFC board using a script from the Cryring_BPM_Gateware Git repository. Open the Vivado Hardware Manager software:

Tools Run Tcl Script: src/scripts/program_scansta_jtag_switch.tcl

You should now see a xc7a200t_0 device. Right click on it and choose Program Device.

Choose the correct bitstream (.bit file) and press OK. The programming takes about one minute.

11.1.2 Using a JTAG Switch Module

If there is a JTAG Switch Module (JSM) in the MicroTCA crate, the bitstream can also be programmed remotely via a so called Xilinx Virtual Cable:

download svf_to_nsvf-linux-x86.bin from MCH GUI JSM
download afc_scansta.sfv from https://github.com/lnls-dig/fpga-programming
convert afc_scansta.sfv to afc_scansta.nsfv using the command ./svf_to_nsvf-linux-x86.bin afc_scansta.sfv
upload afc_scansta.nsfv in MCH GUI JSM to the port of the JTAG switch to which the AFC board you want to program is connected
open Vivado Hardware Manager
Open Target New Target Next Local Server Add Xilinx Virtual Cable (XVC)
Hostname: sdmch<xxx>.acc.gsi.de
Port: find correct port number in NAT-MCH GUI JSM
Finish
Open target
you should see the FPGA now in Vivado Hardware Manager and can program it

The first four steps are persistent and only have to be executed initially.

11.1.3 Storing a bitstream persistently in the SPI Flash

There is a 256 MB SPI Flash memory on the AFCv3.1 board for persistent bitstream storage.

File format conversion

First you have to convert the bitstream (.bit) file to a .mcs file. There is a script in the Cryring_BPM_Gateware Git repository for this purpose:

src/scripts/convert_bit_to_mcs.sh <path to .bit file>

The .mcs file will be generated in the same folder as the .bit file.

Programming

Program the JTAG switch on the AFC board as described in chapter 11.1.1. You should now see a xc7a200t_0 device.
Right click on it and choose Add Configuration Memory Device and choose mt25ql256-spi-x1_x2_x4

You should now see a mt25ql256-spi-x1_x2_x4 device.

Right click on it and choose Program Configuration Memory Device.
Choose the .mcs file you created before and press OK. The programming is really slow and can take up to half an hour.

11.2 Configuration of the MCH

11.2.1 Via the MCH’s web interface

Base configuration

MCH global parameter SSH access: enabled
This will trigger SSH key generation which takes some minutes to complete.

PCIe parameter Upstream slot power up delay: 5 sec
Delay before the CPU unit will power up on start up. For making sure that the bitstreams are loaded to the AFC’s FPGAs from Flash memory before the CPU unit boots you might have to increase this value.

PCIe parameter PCIe hot plug delay for AMCs: 0 sec
Delay before the AFC boards will power up on start up.

Switch PCIe x80

Set the CPU-Unit as upstream AMC source in ’Virtual Switch 0’:

PCIe Virtual Switches Upstream AMC: AMC1/4..7
(for CPU unit in AMC slot 1)

Make sure you enable PCIe downstream ’4..7’ for the AMC slots which contain your AFC boards.

11.2.2 Via USB

The most comfortable way of configuring the MCH is via its web interface. If you have accidentally disabled the webserver, set an invalid IP or DHCP configuration or reset the MCH settings to default, you can access the MCH via an USB connection to the micro USB port on the left side of the front panel.

On a Linux PC, connect a micro USB cable and check via dmesg that a LUFA USB-RS232 Adapter has been detected. The driver will be accessible at /dev/ttyACM<some number>, use e.g. Putty to connect to this serial port using the parameter speed = 19200.

Now typing mch will output information about the MCH. Typing ? will display a list of available commands. Most of the settings of the web interface are also available on the command line interface. You can for example set the IP address or a DHCP name to be able to connect to the web interface.

11.3 Enabling and disabling network boot on the CPU unit

Shortly after powering the CPU unit, press F2 to enter the BIOS.

In the Main tab, go to Boot Features and select the following (using F6 for enabling and F5 for disabling):

PXE BOOT: <Enabled>
Front ETH0: <Enabled>

In the Boot tab, go to Legacy and Boot Type Order. There should be an Others entry that has to be shifted to the top of the list using F6.

Save the settings by pressing F4.

The CPU unit should boot from network after the next reboot.

11.4 Programming the MMC firmware

For programming the MMC firmware into the LPC microcontroller you need to install a proprietary software from NXP called LPCxpresso.

11.4.1 Installation of LPCxpresso on Linux

Download LPCxpresso from the NXP website [11]. You need to register for the download. Follow the instructions in INSTALL.txt. After the installation, open the IDE via <installation directory>/lpcxpresso and register the installation via Help Activate Create serial number and register (Free Edition). Create a serial number in the dialog, copy it to the form in the website and afterwards paste the activation key you got from the website to Help Activate Activate (Free Edition).

11.4.2 Programming

Disconnect the AFC board completely. The power for programming the microcontroller will come from the LPC-Link programmer. Connect and power the LPC-Link programmer via USB and connect the customized cable to the CPU-JTAG connector on the AFC board. Connect the plug so that the flat cable is pointing in the direction of the FMC connector.

Program the device via:

lpcxpresso/bin/dfu-util -d 0x0471:0xdf55 -c 0 -t 2048 -R -D lpcxpresso/bin/LPCXpressoWIN.enc

sudo lpcxpresso/bin/crt_emu_cm3_nxp -pLPC1768 -g -wire=winusb -load-base=0 -flash-load-exec=<path to firmware binary>

You can find the openMMC firmware binary under firmware/openMMC-full-afc-bpm-v1.4.0.bin in the Cryring_BPM_Gateware Git repository.

11.4.3 Differences in the MMC firmwares of Creotech and LNLS

LNLS’s OpenMMC firmware routes a 100 MHz clock to the PCIe reference clock input, whereas Creotech’s MMC firmware routes a 125 MHz clock to this pin. The frequency of sys_clk is 125 MHz for both.

The OpenMMC firmware should forward the signal of the reset button on the AFC front panel to the FPGA pin AG26 after some seconds. However, no reaction can be observed on this pin. It is unclear if or how Creotechs MMC firmware handles reset button actions, since there is no observable reaction.

The current gateware is functional with the OpenMMC firmware. For running it together with Creotech’s MMC firmware, the PCIe reference clock frequency in the IP pcie_dma_ip has to be changed to 125 MHz.

12 Hardware properties

12.1 LEDs on the AFC and FMC front panels

12.1.1 LEDs driven by the FPGA gateware

There are three tricolor LEDs connected to FPGA pins:

L3 in the center of the AFC front panel:

Currently displays the PCIe reference clock divided by in white.

LD1 (v1.0) or STATUS (v1.2 and v2.3) on the right of the two FMC board’s front panels:

Currently display the ADC clock frequencies divided by in green if the PLLs indicate a lock, otherwise red.

Each tricolor LED consists of three independent LEDs (red, green and blue).

12.1.2 LEDs driven by the MMC

In Service (L1), green
Alarm (L2), red
Hot Swap (HS), blue

Lighting patterns of the Hot Swap LED

Insertion of an AFC board:

event	Hot Swap Handle	Hot Swap LED
AMC inserted into chassis with handle open	Open	On
AMC handle closed	Closed	Blinks
Activation granted and AMC powers up	Closed	Off

Source: [12]

Removal of an AFC board:

event	Hot Swap Handle	Hot Swap LED
AMC handle pulled open	Open	Blinks
Deactivation granted and AMC powers down (AMC can now be removed)	Open	On

Source: [12]

12.2 MCH PCIe status LEDs

The lighting patterns of the PCIe status LEDs on the MCH show the link status and the link speed of the PCIe connections:

LED state	meaning
off	no PCIe link
1 blink/sec	2.5 GBaud
2 blinks/sec	5 GBaud
on	8 GBaud

Source: [13]

12.3 Differences between hardware versions

12.3.1 Differences between AFC version 2 and AFC version 3.1

Both boards carry 2 GiBytes of DDR3-SDRAM, divided in four modules of 512 MiBytes each. The SDRAM model can be determined via the FBGA code printed on the modules using the Micron part decoder webpage [14].

AFC version 2

FBGA code: D9PBC, translates to Micron MT41J512M8RA-125:D
operates at 1.5 V

AFC version 3.1

FBGA code: D9QBV, translates to Micron MT41K512M8RH-125 IT:E
compatible to older MT41J family, operates at 1.5 V or 1.35 V

Differences between FMC ADC 250 M 16B 4CH versions

The LD1 (v1.0) or STATUS (v1.2 and v2.3) LED on the right of the FMC board front panel is connected differently between v1.0 and (v1.2 and v2.3). When using the location constraints for v1.2 and v2.3 together with a v1.0 board, the LED lights as follows:

wanted red off
wanted green lights green
wanted blue lights red

Also, the MMCX input TRIG seems to be connected differently on v1.0. Feeding HF-Pulses into v1.0 boards does not work with the current bitstream.

12.4 Maximum achievable data rate to and from SDRAM

The gross data rate of the SDRAM interface is 800 MT/s with 32 bits/transfer, resulting in a theoretical gross data rate of 3.2 GiBytes/s. The maximum achievable data rate is limited by concurrent read and write accesses and by SDRAM refresh cycles.

The storage of the samples of all eight ADCs in parallel at a sampling rate of 125 MHz results in a write data rate of:

The SDRAM capacity of 2 GiBytes would be sufficient to store the stream data of all eight ADCs for 1.07 seconds.

12.5 Analog characteristics

12.5.1 ADC input filter

The FMC ADC boards were originally designed for very high input frequencies and are equipped with input filters that show a pronounced high pass characteristic. There are different versions of the boards which have a different ADC input filter circuitry.

Figure 12.1: Schematics of the original ADC input filter of versions 1.0 and 1.2. Image taken from [15]

Figure 12.2: Schematics of the original ADC input filter of version 2.3. Image taken from [16]

The part labeled TR1(B) BD0205F5050A00 is a balun with an operating range of 70 - 1000 MHz [17]. Lower frequencies are severely attenuated.

For being able to use the FMC ADC boards in the Cryring BPM system, the baluns have to be replaced by more suitable components.

Two approaches have been implemented on versions 1.0 and 1.2 (probably by Piotr Miedzik):

each balun is replaced by two wires
each balun is replaced by two capacitors of probably 100 nF (hint in an old email)

Figure 12.3: v1.2 ADC input filter: balun TR1B replaced by two capacitors

Figure 12.4: v1.0 ADC input filter: balun TR1B replaced by two wires

Figure 12.5: Original v2.3 ADC input filter

The heatspreader under the bottom of the FMC ADC board has to be unscrewed to access the baluns.

There is a significant difference in the ADC input filter circuitry between versions 1.0 and 1.2 and version 2.3. In version 2.3 the transmission line transformers L11 {A, B, C, D} have been removed and the RC filter has been modified.

Figure 12.6 shows the magnitude frequency responses of the original v2.3 input filter and of the two modifications of the v1.0 and v1.2 input filters. The diagram data was created by using a sine signal from a signal generator with an amplitude of 2 and by measuring the maximum amplitude swing of the raw ADC data.

Figure 12.6: Magnitude frequency responses of different ADC input filters

12.6 List of AFC v3.1 boards

AFC serial number	FPGA serial number	MMC firmware	bitstream in Flash memory	FMC	location
111154	0x004ACC24235885C	openMMC-full-afc-bpm-v1.4.0	Cryring BPM	2 x FMC ADC 250M 16b 4ch v1.0 & v1.2	SB2 4.111a
191087	0x004D5C242358854	openMMC-full-afc-bpm-v1.4.0	LNLS RT DAQ	2 x FMC ADC 100M 14b 4ch v5	ask Tobias Hoffmann
240030	0x078D5C24235885C	openMMC-full-afc-bpm-v1.4.0	Cryring BPM	2 x FMC ADC 250M 16b 4ch v2.3	ask René Geißler
260046	?	openMMC-full-afc-bpm-v1.4.0	LNLS RT DAQ	none, power supply damaged	ask Tobias Hoffmann
261056	0x068D5C24235885C	openMMC-full-afc-bpm-v1.4.0	LNLS RT DAQ	2 x FMC ADC 100M 14b 4ch v5	PowerBridge Computer
290148	0x058D5C24235885C	openMMC-full-afc-bpm-v1.4.0	LNLS RT DAQ	2 x FMC ADC 100M 14b 4ch v5	ask Harald Bräuning

The FPGA serial number is not printed anywhere, but can only be read from the DNA_PORT primitive by the gateware. In this gateware the FPGA serial number is read out and stored in a register (see chapter 6.2.2).

13 Test coverage

13.1 BPM algorithm

13.1.1 Simulation

This test simulates the VHDL code of the gateware and is automatically run by the CI/CD pipelines of Gitlab.

All ADC data inputs are driven by the same repeated pattern of positive and negative values, but with different amplitudes.

For the same patterns on both inputs of a BPM with the amplitudes and ,

can be expressed as with

and equation 4.1 simplifies to:

Equation 13.1: equally_shaped_signal_bpm_result

so that the expected BPM result can be calculated as:

BPM	ADC	relative amplitudes ,	expected BPM result	simulated BPM result
0	0	1	10922.67	10922
	1	1/2
1	2	1/2	-10922.67	-10923
	3	1
2	4	1	0	0
	5	1
3	6	1	25486.22	25487
	7	1/8

The simulation results are consistent with the expections considering possible numeric calculation deviations in the numerous calculation steps, which might influence the least significant bits.

13.1.2 Using a function generator as data source

Digital gain setting

The following measurement was made using a function generator which was configured to output two phase aligned sines with an amplitude of which were connected to the two inputs of a BPM.

The linear regression length and the averaging length were both set to 1024.

Before starting the measurement, the digital gain of one of the two ADC inputs was corrected so that the BPM averaging result equalled 0.

After that, the digital gain correction of the other ADC input was used to set different amplitudes in order to avoid possible nonlinearities of the function generator gain.

The results were read from the FPGA Observer GUI which displays the BPM results divided by .

relative amplitude	expected BPM result	measured BPM result
1/8	0.	0.778
2/8	0.6	0.600
3/8	0.	0.454
4/8	0.	0.333
5/8	0.2308	0.230
6/8	0.1429	0.142
7/8	0.0	0.066
8/8	0	0.000

The measurement results are consistent with the expections considering noise and possible numeric calculation deviations in the numerous calculation steps, which might influence the least significant bits.

13.2 Reliability tests

A test of the BPM scope and the BPM averaging scope was run overnight. The two inputs of a BPM were fed by two function generator ouputs with the following settings:

sine signal
frequency: 1 MHz
output to ADC0: 2.001
output to ADC1: 0.667

According to equation 13.1, the expected BPM result for the chosen input amplitudes is 0.5.

The test was run for 12 hours, during which the scopes were read continously and histograms of the occurring results were created.

Figure 13.1: BPM result histogram

The gateware was configured as follows:

ADC1 gain correction: 0x8398 BPM averaging result = 0.5
linear regression length: 1024
averaging length: 1024
number of samples per capture: 1024

Figure 13.2: BPM averaging result histogram

The resulting histograms do not show Gaussian distributions, but still seem to be reasonably confined. The deviations from Gaussian distributions might have been caused by temperature shifts during the night which might have affected the amplitudes.

14 System limitations and peculiarities

14.1 CPU unit reboot

The FPGA bitstreams have to be loaded before CPU unit boots, otherwise the PCIe driver will not detect the FPGAs. This can be ensured by setting suitable power up delays in the MCH (see chapter 11.2.1).

Whenever a new bitstream is loaded to a FPGA, the CPU unit has to rebooted either via its Hot Swap Handle or remotely via the MCH (see chapter 9.2). An operating system reboot does not work.

14.2 PLL unlock on FMC boards

With the current clock configuration and the current settings of the PLL on the FMC boards, the locking of the PLLs cannot be guaranteed, which might result in clock frequency differences betweeen the processing clock on the AFC board and the ADC clocks on the FMC boards. There are synchronization FIFOs which ensure correct clock domain crossings, but a small fraction of ADC samples might have to be discarded or repeated once, depending on which frequency is higher.

This will always affect all the samples of a FMC board in parallel, so that no differences between the two input data streams of a single BPM will occur and no measurable effect on the BPM results should be observed. Nevertheless, it is intended to find the cause of this behaviour.

15 References

[1] A. Reiter, R. Singh: Comparison of beam position calculation methods for application in digital acquisition systems. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, February 2018, https://git.gsi.de/BEA_HDL/datasheets/-/blob/master/Paper_BPM_Algorithm.pdf

[2] P. Miedzik, H. Bräuning, T. Hoffmann, A. Reiter, R. Singh: A MicroTCA based beam position monitoring system at CRYRING@ESR. 16th Int. Conf. on Accelerator and Large Experimental Control Systems, Barcelona, Spain, October 2017, https://git.gsi.de/BEA_HDL/datasheets/-/blob/master/Paper_BPM_architecture.pdf

[3] A. Reiter, W. Kaufmann, R. Singh, P. Miedzik, T. Hoffmann, H. Bräuning: The CRYRING BPM cookbook, March 2015, https://git.gsi.de/BEA_HDL/datasheets/-/blob/master/Cryring_BPM_System_Overview.pdf

[4] AMC FMC Carrier (AFC) Git repository, Open Hardware Repository, https://ohwr.org/project/afc/wikis/home

[5] Silicon Labs: Timing part decoder web page, https://www.silabs.com/timing/lookup-customize

[6] Silicon Labs: Si571 datasheet, https://www.silabs.com/documents/public/data-sheets/si570.pdf

[7] Analog Devices: AD9510 datasheet, https://www.analog.com/media/en/technical-documentation/data-sheets/AD9510.pdf

[8] Renesas: ISLA216P datasheet, https://www.renesas.com/us/en/www/doc/datasheet/isla216p.pdf

[9] Xilinx: Artix 7 datasheet, https://www.xilinx.com/support/documentation/data_sheets/ds181_Artix_7_Data_Sheet.pdf

[10] AFC v3.1 schematics, https://git.gsi.de/BEA_HDL/datasheets/-/blob/master/Schematics_AFC_v3.1.pdf

[11] NXP: LPCxpresso download web page, https://www.nxp.com/design/microcontrollers-developer-resources/lpc-microcontroller-utilities/lpcxpresso-ide-v8-2-2:LPCXPRESSO

[12] NXP: AMC documentation, https://www.nxp.com/docs/en/reference-manual/MSC8156AMCUM.pdf

[13] NAT GmbH: MCH technical reference manual, https://www.nateurope.com/manuals/nat_mch_pciex48_v2x_man_hw.pdf

[14] Micron: FBGA and Component Marking Decoder, https://www.micron.com/support/tools-and-utilities/fbga

[15] FMC ADC 250M 16B 4ch v1.0 and v1.2 schematics, https://git.gsi.de/BEA_HDL/datasheets/-/blob/master/Schematics_FMC_ADC_250M_16B_4ch.pdf

[16] FMC ADC 250M 16B 4ch v2.3 schematics, https://github.com/lnls-dig/fmc250-hw/blob/master/circuit_board/ADC.SchDoc

[17] Anaren: Balun BD0205F5050A00 datasheet, https://git.gsi.de/BEA_HDL/datasheets/-/blob/master/Balun_BD0205F5050A00_datasheet.pdf

GSI - BEA Wiki

User Tools

Site Tools

Sidebar

Table of Contents