This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
playground:playground [2020/08/14 20:29] rgeissler |
playground:playground [2020/12/03 09:39] rhaseitl |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== | + | ====== |
- | + | ||
- | ===== HDL code for Xilinx Artix 7 FPGA on AMC board used in Cryring BPM ===== | + | |
- | + | ||
- | This documentation is automatically generated from the LaTeX sources by the script '' | + | |
- | + | ||
- | A PDF of this documentation can be generated in the CI/CD section, see [[# | + | |
- | + | ||
- | + | ||
- | ====== Resources ====== | + | |
- | + | ||
- | All of the code of this project, the helper scripts and also the source of this documentation are under version control in a Git repository whose upstream is: https:// | + | |
- | + | ||
- | Additional datasheets and papers are included as a Git submodule of the main Git repository. The upstream of the submodule is: https:// | + | |
- | + | ||
- | Installation scripts to set up a Gitlab runner for continuous integration including all necessary software to build the gateware can be found in a Git repository whose upstream is: https:// | + | |
- | + | ||
- | ====== 1 Introduction ====== | + | |
- | + | ||
- | This document describes the gateware ( = FPGA firmware) implementation of the Beam Position Monitor (BPM) for the Cryring accelerator at GSI. The term Trajectory Measurement System (TMS) is also common for this system and is used as a synonym for BPM.\\ | + | |
- | There had been a previous implementation by Piotr Miedzik, but since no documentation could be found besides a conference paper [[# | + | |
- | The BPM measures the horizontal and vertical beam positions at nine places of the accelerator ring, resulting in 18 location results. | + | |
- | + | ||
- | ===== 1.1 Measurement principle ===== | + | |
- | + | ||
- | At each of the 18 measurement spots two capacitor plates are used to detect the electrostatic induction of the passing by charged particle bunches. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The 36 voltages of the capacitor plates are amplified and led via coaxial cables to a single evaluation point where the analog to digital conversion and the digital processing takes place. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The positions of the particle beam are calculated respectively from the voltage difference of two related capacitor plates using the algorithm described in [[# | + | |
- | + | ||
- | ===== 1.2 Processing hardware ===== | + | |
- | + | ||
- | Each of the 36 voltages coming from the amplifiers at the capacitor plates is sampled by a Renesas ISLA216P ADC at a sampling rate of 125 MHz with a resolution of 16 bits. Respectively four of the ADCs are placed on a single FMC board. Respectively two (ore only one for the last one) of the FMC boards are mounted on an AFC carrier board which is equipped with a Xilinx Kintex XC7A200T FPGA for data processing. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The whole system uses five AFC carrier boards which are mounted in a MicroTCA crate together with a timing receiver and a CPU unit for post processing. Each of the five FPGAs is responsible for the processing of up to eight ADC data streams. The communication between the CPU unit and the FPGAs takes place via PCI Express over the so called backplane of the MicroTCA crate. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | This document describes the gatewares of the FPGAs on the five AFC carrier boards. The gatewares are identical independent of the number of mounted FMC boards. | + | |
- | + | ||
- | ====== 2 BPM algorithm ====== | + | |
- | + | ||
- | The beam position is calculated from the measurement of the voltages of two corresponding plates: | + | |
- | + | ||
- | < | + | |
- | \delta = a + \frac{x}{\kappa}\sigma | + | |
- | </ | + | |
- | with | + | |
- | + | ||
- | < | + | |
- | \delta=U_R-U_L | + | |
- | </ | + | |
- | and | + | |
- | + | ||
- | < | + | |
- | \sigma=U_R+U_L | + | |
- | </ | + | |
- | where $`\kappa`$ is a proportionality factor influenced by the dimension of the measurement system, $`a`$ some possible voltage offset and $`x`$ the beam position. | + | |
- | + | ||
- | ===== 2.1 Capacitance correction ===== | + | |
- | + | ||
- | The capacitance of the two corresponding capacitor plates can differ from their nominal value so that one of the voltages has to be corrected by multiplying a correction factor: | + | |
- | + | ||
- | < | + | |
- | U_R=U_{R, | + | |
- | </ | + | |
- | < | + | |
- | U_L=c_L \cdot U_{L, | + | |
- | </ | + | |
- | The default value of $`c_L`$ in the gateware is 1. It is configurable by the software via register accesses. | + | |
- | + | ||
- | ===== 2.2 Least squares algorithm ===== | + | |
- | + | ||
- | A linear least squares approach is used to reduce measurement errors. The choice of the algorithm is described in [[# | + | |
- | + | ||
- | < | + | |
- | E(x,a) = \sum_i(a + {\frac{x}{\kappa}\sigma_i - \delta_i})^2 | + | |
- | </ | + | |
- | Minimizing | + | |
- | + | ||
- | < | + | |
- | E(x,a) | + | |
- | </ | + | |
- | via partial differentiation | + | |
- | + | ||
- | < | + | |
- | \frac{\partial E}{\partial x} = 0 | + | |
- | </ | + | |
- | and | + | |
- | + | ||
- | < | + | |
- | \frac{\partial E}{\partial a} = 0 | + | |
- | </ | + | |
- | leads to | + | |
- | + | ||
- | < | + | |
- | \frac{x}{\kappa} = \frac{N\sum_i\sigma_i\delta_i-(\sum_i\sigma_i)(\sum_i\delta_i)}{N\sum_i\sigma_i^2-(\sum_i\sigma_i)^2} | + | |
- | + | ||
- | </ | + | |
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 2.3 Averaging ===== | + | |
- | + | ||
- | For further reducing the data rate and reducing the measurement noise, the result of the least squares algorithm is averaged over an adjustable number of samples $`N`$. This is implemented via a simple block averaging: | + | |
- | + | ||
- | < | + | |
- | x_{avg} = \frac{1}{N}\sum_{t=0}^{N-1}x(t) | + | |
- | </ | + | |
- | ====== 3 Peripheral devices ====== | + | |
- | + | ||
- | There are three different peripheral devices on each of the FMC ADC boards that have to be configured by the gateware. Since they have no persistent storage they have to configured after every power cycle: | + | |
- | + | ||
- | * Si571 programmable VCXO | + | |
- | * AD9510 PLL and clock distribution | + | |
- | * ISLA216P ADC | + | |
- | + | ||
- | ===== 3.1 Si571 programmable VCXO ===== | + | |
- | + | ||
- | The Si571 programmable VCXO is connected via I2C using 0x49 as device address. Additionally, | + | |
- | + | ||
- | The startup frequency before configuration via I2C is 155.52 MHz. The Si571 is located below the heat spreader of the FMC board, which has to be unscrewed to read the labeling: | + | |
- | + | ||
- | //SiLabs 571\\ | + | |
- | AJC000337 G\\ | + | |
- | D09JW702+// | + | |
- | + | ||
- | The part properties can be decoded by providing the part number 571AJC000337 on a SiLabs web page [[# | + | |
- | + | ||
- | Product: Si571\\ | + | |
- | Description: | + | |
- | Frequency A: 155.52 MHz\\ | + | |
- | I2C Address (Hex Format): 49\\ | + | |
- | Format: LVPECL\\ | + | |
- | Supply Voltage: 3.3 V\\ | + | |
- | OE Polarity: OE active high\\ | + | |
- | Temperature Stability: 20 ppm\\ | + | |
- | Tuning Slope: 135 ppm/V\\ | + | |
- | Minimum APR: +/- 130 ppm\\ | + | |
- | Frequency Range. 10 - 280 MHz\\ | + | |
- | Operating Temp Range (C): -40 to +85 | + | |
- | + | ||
- | A datasheet can be found on the SiLabs website [[# | + | |
- | + | ||
- | ==== 3.1.1 Programming the frequency ==== | + | |
- | + | ||
- | There are three adjustable parameters that define the output frequency: | + | |
- | + | ||
- | < | + | |
- | f_{out} = \frac{f_{XTAL} \cdot RFREQ}{HSDIV \cdot N1} | + | |
- | </ | + | |
- | where | + | |
- | + | ||
- | * $`f_{XTAL}`$ is the fixed internal quartz frequency of 114.285 MHz +/- 2000 ppm. | + | |
- | * $`f_{XTAL} \cdot RFREQ`$ has to be in the range $`[4850 \mathrm{MHz}, | + | |
- | * allowed values for $`HSDIV`$ are 4, 5, 6, 7, 9, 11 | + | |
- | * allowed values for $`N1`$ are 1 and all even numbers in $`[2, | + | |
- | + | ||
- | The three parameters should be chosen in a way that $`RFREQ`$ is minimal to reduce power consumption. If there should still be multiple possibilities for the choice of $`HSDIV \cdot N1`$, one should choose $`HSDIV`$ as maximal. | + | |
- | + | ||
- | For a desired output frequency of 125 MHz the optimum values are: | + | |
- | + | ||
- | * $`HSDIV`$ = 5 | + | |
- | * $`N1`$ = 8 | + | |
- | * $`RFREQ`$ = 43.750273439 | + | |
- | + | ||
- | Since the uncorrected $`f_{XTAL}`$ frequency has an inaccuracy of 2000 ppm, one should read the initial $`RFREQ`$ value first and calculate | + | |
- | + | ||
- | < | + | |
- | RFREQ = RFREQ_{init} \cdot \frac{f_{out} \cdot HSDIV \cdot N1}{f_{out, | + | |
- | </ | + | |
- | in order to get a more accurate result. $`RFREQ_{init}`$ is factory calibrated to compensate the actual frequency offset of $`f_{XTAL}`$. | + | |
- | + | ||
- | ==== 3.1.2 Configuration ==== | + | |
- | + | ||
- | The following registers are read by the gateware for calculating the frequency correction: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The value of register 0x07 is only used to determine if the frequency has been programmed before, e.g. after a reloading of the bitstream of the FPGA without a power cycle of the FMC ADC board. Applying the frequency correction again would lead to a wrong result, since the RFREQ registers do not contain the factory defaults any more. | + | |
- | + | ||
- | The following registers are programmed by the gateware after the calculation of the frequency correction: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 3.2 AD9510 PLL and clock distribution ===== | + | |
- | + | ||
- | The Analog Devices AD9510 is connected via SPI. Writing to registers must be completed with a write to the register address 0x5A with the LSBit set in the write value (e.g. 0x01) to take effect. Multiple writes can precede the writing of register 0x5A, so that this needs to be done only once at the end of a write sequence. The maximum SPI clock frequency is 25 MHz.\\ | + | |
- | The phase frequency detector of the PLL, which compares the VCXO frequency to the reference frequency, has a maximum input frequency of 100 MHz. Higher frequencies have to be divided by the prescalers R (reference input) and N (VCXO input). A lock signal can be connected to a status pin, that is connected to a FPGA GPIO. | + | |
- | + | ||
- | A datasheet can be found on the Analog Devices website [[# | + | |
- | + | ||
- | ==== 3.2.1 Configuration ==== | + | |
- | + | ||
- | The gateware configures the AD9510 device to lock the VCXO frequency to a reference clock coming from the FPGA.\\ | + | |
- | The following registers are programmed by the gateware: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 3.3 ISLA216P ADC ===== | + | |
- | + | ||
- | The four ISLA216P ADCs are connected via SPI. Each chip is enabled via an individual chip select line. The MOSI, MISO and CLK lines are shared between the four chips. Parallel configuration by driving all chip selects high at the same time works for the writing registers, but not for reading, since there would be multiple drivers on the MISO line. The maximum SPI clock frequency is given by the ADC sampling frequency divided by 16. At a sample frequency of 125 MHz this corresponds to a SPI clock frequency of 7.8125 MHz. | + | |
- | + | ||
- | A datasheet can be found on the Renesas website [[# | + | |
- | + | ||
- | The ADCs provide a configureable gain correction of +/- 4.2% and a configureable offset correction of +/- 138 LSBs. Since the gain correction and the offset correction are implemented digitally in the gateware, most of the configuration registers can be left at their default values.\\ | + | |
- | The SPI interface in the gateware is implemented as a four wire interface, whereas the default setting of the ISLA216P SPI interface is a three wire mode. For being able to configure the ISLA216Ps interactively, | + | |
- | + | ||
- | ==== 3.3.1 Configuration ==== | + | |
- | + | ||
- | The following registers are programmed by the gateware: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ====== 4 Gateware implementation ====== | + | |
- | + | ||
- | ===== 4.1 Clocking ===== | + | |
- | + | ||
- | The gateware uses three primary clocks: | + | |
- | + | ||
- | * PCIe reference clock, 100 MHz | + | |
- | * FMC 0 ADC clock, 125 MHz | + | |
- | * FMC 1 ADC clock, 125 MHz | + | |
- | + | ||
- | ==== 4.1.1 PCIe reference clock ==== | + | |
- | + | ||
- | The PCIe reference clock comes from an output of an ADN4604 clock switch an the AFC board [[# | + | |
- | + | ||
- | // | + | |
- | + | ||
- | * // | + | |
- | * // | + | |
- | * // | + | |
- | + | ||
- | The SDRAM interface IP core contains a MMCM which generates a 100 MHz clock named // | + | |
- | + | ||
- | ==== 4.1.2 FMC ADC clocks ==== | + | |
- | + | ||
- | On each of the two FMC boards there is a Si571 programmable VCXO (see [[# | + | |
- | + | ||
- | From each of the four ADCs of an FMC board an individual clock signal is led to the FPGA which is used for the deserialization in the IDDR primitives. For the further processing, only the clock signal from the first ADC of a FMC board is used since the clock frequencies of the four ADCs are identical. | + | |
- | + | ||
- | There are two clock domain crossing FIFOs in the gateware to synchronize the data from the ADCs to the main processing clock // | + | |
- | + | ||
- | ===== 4.2 Resets ===== | + | |
- | + | ||
- | The gateware uses two reset sources: | + | |
- | + | ||
- | * main PLL not in lock | + | |
- | * reset button on the AFC front panel | + | |
- | + | ||
- | ==== 4.2.1 PLL not in lock ==== | + | |
- | + | ||
- | As long as the PLL in the MMCM producing the main processing clock //clk_125// is not yet in lock, the design is held in reset. After this the lock should be stable until the next power cycle. | + | |
- | + | ||
- | ==== 4.2.2 Reset button ==== | + | |
- | + | ||
- | There is a push button labeled //RST// at the center of the AFC front panel which is connected to the microcontroller for the MMC firmware. The firmware should forward a button press to the FPGA pin AG26 as an active low signal to initiate a reset of the gateware.\\ | + | |
- | With the actual OpenMMC firmware this forwarding does not work, so that pressing the //RST// button does not have any effect. | + | |
- | + | ||
- | ===== 4.3 BPM algorithm ===== | + | |
- | + | ||
- | The proportionality factor $`\kappa`$ in equation 2.1 is set implicitly to 1 so that the result has to interpreted as a relative position in the range $`[-1, 1]`$: | + | |
- | + | ||
- | < | + | |
- | x = \frac{N\sum_i\sigma_i\delta_i-(\sum_i\sigma_i)(\sum_i\delta_i)}{N\sum_i\sigma_i^2-(\sum_i\sigma_i)^2} | + | |
- | + | ||
- | </ | + | |
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The capacitance correction (see [[# | + | |
- | + | ||
- | ==== 4.3.1 Pipeline steps performed every clock cycle ==== | + | |
- | + | ||
- | The gain correction, the differences and sums of the incoming ADC data pairs $`\sigma`$ and $`\delta`$ and the four different sums of equation 4.1 are calculated every clock cycle. | + | |
- | + | ||
- | === Step 0: Capacitance correction === | + | |
- | + | ||
- | * offset and gain corrected ADC 0 data sample is strobed unchanged. Input: 17 bits signed, output 17 bits signed | + | |
- | * offset and gain corrected ADC 1 data sample is multiplied with a correction factor coming from a configuration register. Input: 17s bit signed for data, 16 bits unsigned for correction factor, output 17 bits signed | + | |
- | + | ||
- | === Step 1: Calculation of `\sigma` and `\delta` === | + | |
- | + | ||
- | * $`\sigma`$: sum of data 0 and data 1, inputs: 17 bits signed, output: 18 bits signed | + | |
- | * $`\delta`$: difference of data 0 and data 1, inputs: 17 bits signed, output: 18 bits signed | + | |
- | + | ||
- | === Step 2: Calculation of `\sigma\delta` and `\sigma^{2}`, | + | |
- | + | ||
- | The maximum word length of the adders in the DSP48 blocks of the FPGA is 48 bits. When using these adders, the maximum summation length is limited by the word length of longest term $`\sigma\delta`$ (36) to a value of 12. | + | |
- | + | ||
- | * $`\sum_i\sigma_i\delta_i`$: | + | |
- | * $`\sum_i\sigma_i^2`$: | + | |
- | * $`\sum_i\sigma_i`$: | + | |
- | * $`\sum_i\delta_i`$: | + | |
- | * $`N`$: counter, output: 12 bits unsigned | + | |
- | + | ||
- | ==== 4.3.2 Pipeline steps performed with a reduced data rate ==== | + | |
- | + | ||
- | The following pipeline steps are only performed once for every linear regression period. The length of the linear regression period is defined by the BPM linear regression length register (see [[# | + | |
- | + | ||
- | If a RF signal is present, the length is additionally controlled by the distances of the pulses of this signal. A new linear regression calculation will be started with every rising edge of the RF signal, while the post processing steps for the previous period will be started. | + | |
- | + | ||
- | === Step 3: Conversion to floating point === | + | |
- | + | ||
- | The DSP48 blocks in the FPGA can only handle multiplications up to 18 bits times 25 bits. For this reason, a conversion to a floating point format is performed. | + | |
- | + | ||
- | The floating point format is: | + | |
- | + | ||
- | * $`mantissa`$: | + | |
- | * $`exponent`$: | + | |
- | + | ||
- | which decodes to: $`value = mantissa \cdot 2^{exponent}`$ | + | |
- | + | ||
- | The sums $`\sum_i\sigma_i\delta_i`$, | + | |
- | + | ||
- | A conversion is not necessary for $`N`$ since it is only 12 bits wide. | + | |
- | + | ||
- | === Step 4: Calculation of the products of sums === | + | |
- | + | ||
- | The products $`N\sum_i\sigma_i\delta_i`$, | + | |
- | + | ||
- | === Step 5: Shifting to align for subtraction and sign extensions === | + | |
- | + | ||
- | In general the results of step 4 will have different exponents, so that the mantissas have to be shifted to a common exponent before a subtraction can take place. | + | |
- | + | ||
- | The mantissa of the float number with the smaller exponent is shifted by the difference of exponents digits to the right and the exponent is set to the larger exponent. Sign extensions by 1 bit take place to prevent over- and underflows by the subtraction. | + | |
- | + | ||
- | === Step 6: Calculation of the subtractions in the numerator and the denominator === | + | |
- | + | ||
- | Now that the operands have the same exponent, the subtractions can take place by subtracting the mantissas.\\ | + | |
- | The exponents of the results stay the same as that of the operands. | + | |
- | + | ||
- | The results are: $`N\sum_i\sigma_i\delta_i-(\sum_i\sigma_i)(\sum_i\delta_i)`$ and $`N\sum_i\sigma_i^2-(\sum_i\sigma_i)^2`$ | + | |
- | + | ||
- | === Step 7: Conversion of the mantissas to floating point === | + | |
- | + | ||
- | Due to the multiplication in step 4 and the sign extension in step 5 the mantissas have now a length of 37 bits, which is again too long for the final division. The mantissa is converted to the same floating point format as described in step 4. | + | |
- | + | ||
- | The results respectively have a mantissa of 18 bits and two exponents of 6 bits each which have to be united in the next step. | + | |
- | + | ||
- | === Step 8: Start of division and unification of exponents === | + | |
- | + | ||
- | Division is a costly operation in FPGAs. In this implementation it is performed by an IP core by Xilinx which is parametrized to 18 bits for both the divisor and the dividend. The result is 33 bits wide, of which 15 bits are fractional. | + | |
- | + | ||
- | The division takes 25 clock cycles to complete. The divider IP core reaches a throughput of 1 in 3 clock cycles. Thus 3 is the lower limit for the linear regression length for the current settings of the IP core.\\ | + | |
- | The exponents generated in step 7 are united to the existing ones from step 6 by addition. | + | |
- | + | ||
- | === Step 9: Subtraction of the exponents of dividend and divisor === | + | |
- | + | ||
- | The divider IP core only handles the mantissas. The exponents of the dividend and the divisor are subtracted. | + | |
- | + | ||
- | === Step 10 - 32: Waiting for the division to complete === | + | |
- | + | ||
- | The results of step 9 are pipelined until the completion of the division. | + | |
- | + | ||
- | === Step 33: Shifting and slicing the division result === | + | |
- | + | ||
- | The division result is shifted to the right by minus the exponent from step 9. After that, the lower 16 bits are sliced to form the result of the linear regression algorithm. | + | |
- | + | ||
- | The result has to be interpreted as a relative position in the range $`[-1,1[`$, multiplied by $`2^{15}`$. | + | |
- | + | ||
- | Two signals are created for debugging purposes and are connected to the signal observer (see [[# | + | |
- | + | ||
- | * //result out of range// (1 bit): High if the absolute value of the numerator is greater than that of the denominator. This can happen if the phases of the two input signals are not aligned. In this case the result is set to the maximum or minimum value. | + | |
- | * //division by zero// (1 bit): Comes from the divider IP core and is high if the divisor is zero. This is very unlikely to happen. In this case the result is set to 0. | + | |
- | + | ||
- | === Limitations === | + | |
- | + | ||
- | Allowed values for the linear regression length are: 3, 4, 5, … , 4096 | + | |
- | + | ||
- | The lower limit is caused by the divider IP core which can only handle one division in three clock cycles. | + | |
- | + | ||
- | The upper limit is caused by the maximum operand length of the adder in the DSP48 primitives in the FPGA. A higher limit would be implementable at the cost of increased resource usage and two additional clock cycles of processing latency. | + | |
- | + | ||
- | ===== 4.4 BPM averaging ===== | + | |
- | + | ||
- | The result from the BPM algorithm is sign extended and added up until the desired number of samples is reached. Only powers of two are allowed for the averaging length. Allowing any desired number would require a general division operation at the end of the averaging process, whereas a division by a power of two can be implemented by a simple shift operation. This is why the configuration register ’log2 of BPM averaging length’ contains the dual logarithm of the averaging length (see [[# | + | |
- | The result is sliced to the same number of bits as the result from the BPM algorithm. It also has to be interpreted as a relative position in the range $`[-1,1[`$, multiplied by $`2^{15}`$. | + | |
- | + | ||
- | Available values for the averaging length are 1, 2, 4, … , 1, | + | |
- | + | ||
- | The upper limit is not caused by any implementation limitation, but was simply chosen because longer averaging lengths were not assumed to be useful. | + | |
- | + | ||
- | ===== 4.5 AXI infrastructure ===== | + | |
- | + | ||
- | ===== 4.6 AXI Stream infrastructure ===== | + | |
- | + | ||
- | ===== 4.7 Scopes and observers ===== | + | |
- | + | ||
- | ==== 4.7.1 Scopes ==== | + | |
- | + | ||
- | ==== 4.7.2 Observer ==== | + | |
- | + | ||
- | ===== 4.8 Configuration of peripheral devices ===== | + | |
- | + | ||
- | ===== 4.9 PCIe ===== | + | |
- | + | ||
- | ====== 5 Build flow ====== | + | |
- | + | ||
- | ====== 6 Continuous integration environment ====== | + | |
- | + | ||
- | There is a continous integration environment setup for the // | + | |
- | + | ||
- | At the moment the Gitlab Runner is running on the Linux server //sdlx035// located in a server room in the basement. | + | |
- | + | ||
- | The benefits of continuous integration are: | + | |
- | + | ||
- | * every change will be tested automatically | + | |
- | * it is ensured that no files are missing in the repository | + | |
- | * the master branch can be kept functional at any time | + | |
- | * build results like e.g. bitstreams are automatically generated and can be archived | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 6.1 Installation ===== | + | |
- | + | ||
- | There is an installation script '' | + | |
- | + | ||
- | After the installation, | + | |
- | + | ||
- | On the newly installed Gitlab Runner server, open a terminal and type '' | + | |
- | Enter the following information: | + | |
- | + | ||
- | * gitlab-ci coordinator URL: e.g. '' | + | |
- | * gitlab-ci token: enter the registration token copied before | + | |
- | * gitlab-ci description: | + | |
- | * gitlab-ci tags: leave empty | + | |
- | * executor: '' | + | |
- | + | ||
- | You can add multiple repositories with different tokens by running '' | + | |
- | + | ||
- | ===== 6.2 Pipeline Stages ===== | + | |
- | + | ||
- | Each '' | + | |
- | + | ||
- | * documentation | + | |
- | * simulation | + | |
- | * FPGA build | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== 6.2.1 Documentation ==== | + | |
- | + | ||
- | The script '' | + | |
- | + | ||
- | The log file of Pdflatex and - if successful - the PDF of the documentation are archived. | + | |
- | + | ||
- | ==== 6.2.2 Simulation ==== | + | |
- | + | ||
- | The script '' | + | |
- | + | ||
- | The log file of the simulation and - if successful - a file with the BPM results from the simulaton are archived. | + | |
- | + | ||
- | ==== 6.2.3 FPGA build ==== | + | |
- | + | ||
- | The script '' | + | |
- | + | ||
- | Different log files from synthesis and implementation, | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 6.3 Build results ===== | + | |
- | + | ||
- | For each of the pipeline stages the archiving of build results can be configured for an adjustable time period, which is set to one week. If the period has passed and the build results have been deleted, they can be generated again by restarting the pipeline. | + | |
- | + | ||
- | The build results can be downloaded from the Gitlab web frontend where they are called //job artifacts// (see figure 6.3). | + | |
- | + | ||
- | $`\color{green}{\text{\textbf{The CI/CD pipelines can also be used to generate FPGA bitstreams without | + | |
- | having to set up a build environment.}}}`$ | + | |
- | + | ||
- | ====== 7 Gateware software interface ====== | + | |
- | + | ||
- | The communication between the gateware inside the FPGA and the software running on the CPU unit takes place via a PCIe driver by Xilinx called XDMA. | + | |
- | + | ||
- | There is only one PCIe Bar in use in the gateware which maps a coherent memory space of 0x80010000 bytes (= 2, | + | |
- | + | ||
- | The following mapping is applied: | + | |
- | + | ||
- | ^ **address**^ | + | |
- | | | + | |
- | | | + | |
- | | | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 7.1 Scope memory ===== | + | |
- | + | ||
- | There are three scope memory regions of which the one for the corrected ADC data is the largest since it has the highest data rate. | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== 7.1.1 Scope 0: corrected ADC data ==== | + | |
- | + | ||
- | The corrected ADC data is stored in the following format: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The corrected data is the result of two sequential operations on the raw ADC data: | + | |
- | + | ||
- | - offset correction by adding a correction summand | + | |
- | - gain correction by multiplying a correction factor | + | |
- | + | ||
- | The correction summand and the correction factor can be set by individual configuration registers (see [[# | + | |
- | + | ||
- | The corrected ADC data scope memory can hold up to $`2^{26}`$ samples. At a sampling frequency of 125 MHz this corresponds to a maximum capture duration of 0.537 seconds. | + | |
- | + | ||
- | ==== 7.1.2 Scope 1: BPM result ==== | + | |
- | + | ||
- | The BPM result is stored in the following format: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The BPM result scope memory can hold up to $`2^{25}`$ samples. At a sampling frequency of 125 MHz and with a linear regression length of e.g. 1024 this corresponds to a maximum capture duration of 4:35 minutes. | + | |
- | + | ||
- | ==== 7.1.3 Scope 2: BPM averaging result ==== | + | |
- | + | ||
- | The BPM averaging result is stored in the following format: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The BPM averaging result scope memory can hold up to $`2^{25}`$ samples. At a sampling frequency of 125 MHz, with a linear regression length of e.g. 1024 and with an averaging length of e.g. 1024 this corresponds to a maximum capture duration of 78.2 hours. | + | |
- | + | ||
- | ===== 7.2 Register map ===== | + | |
- | + | ||
- | ==== 7.2.1 Configuration registers ==== | + | |
- | + | ||
- | The following registers can be written by software: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | === 0 - 7: ADC {0 - 7} offset correction summand === | + | |
- | + | ||
- | Correction summand for a possible offset deviation of the ADC. The offset correction precedes the gain correction. | + | |
- | + | ||
- | === 8 - 15: ADC {0 - 7} gain correction factor === | + | |
- | + | ||
- | Correction factor for a possible gain deviation of the ADC. The default value 0x8000 corresponds to a multiplication by 1. The possible correction range is $`[0, 2[`$. | + | |
- | + | ||
- | === 16 - 19: BPM {0 - 3} capacitance correction factor === | + | |
- | + | ||
- | The capacitances of the two corresponding capacitor plates of a single BPM can differ. Data 0 is fed unchanged into the BPM algorithm, while data 1 is multiplied by a correction factor. The default value 0x8000 corresponds to a multiplication by 1. The possible correction range is $`[0, 2[`$. | + | |
- | + | ||
- | === 20: BPM linear regression length - 1 === | + | |
- | + | ||
- | Number of samples over which the linear regression is calculated if no external RF pulse signal is present. This value is valid for all four BPMs. If an external RF pulse signal is present, the result of the linear regression will be output and a new calculation will be started on every rising edge of the RF pulse signal. For this to work, this register has to be set to a value that is longer than the interval between the RF pulses. | + | |
- | + | ||
- | Allowed values: 0x0002 - 0xFFFF | + | |
- | + | ||
- | The lower limit is determined by the throughput of the divider IP core of 1 in 3 clock cycles that is used for the final division of the BPM algorithm. | + | |
- | + | ||
- | === 21: Log2 of BPM averaging length === | + | |
- | + | ||
- | Dual logarithm of the number of linear regression results over which the averaging is calculated. This value is valid for all four BPMs. Allowed range: 0 .. 20. Higher values will be set to the maximum allowed value. This corresponds to an averaging length of 1, 2, 4, … , 1, | + | |
- | + | ||
- | === 22: Gate signal input select === | + | |
- | + | ||
- | ^ value^input | + | |
- | | 0 - 7|MLVDS line 0 - 7 on the backplane | + | |
- | | 8|FMC 0 //TRIG// input | | + | |
- | | 9|FMC 1 //TRIG// input | | + | |
- | + | ||
- | The gate signal input can be switched between one of the eight MLVDS lines on the backplane and the two MMCX connectors labeled //TRIG// on the FMC front panels. | + | |
- | + | ||
- | === 23: RF signal input select === | + | |
- | + | ||
- | ^ value^input | + | |
- | | 0 - 7|MLVDS line 0 - 7 on the backplane | + | |
- | | 8|FMC 0 //TRIG// input | | + | |
- | | 9|FMC 1 //TRIG// input | | + | |
- | + | ||
- | The RF signal input can be switched between one of the eight MLVDS lines on the backplane and the two MMCX connectors labeled //TRIG// on the FMC front panels. | + | |
- | + | ||
- | === 32, 40, 48: Scope {0, 1, 2} capture length - 1 === | + | |
- | + | ||
- | The number of samples minus one that are stored after a scope has been triggered. Each sample consists of 16 bytes. | + | |
- | + | ||
- | === 33, 41, 49: Scope {0, 1, 2} trigger mode === | + | |
- | + | ||
- | ^ value^trigger mode ^ | + | |
- | | 0|trigger on rising edge of gate signal | + | |
- | | 1|trigger on high state of gate signal | + | |
- | | 2, 3|trigger instantly after the trigger is armed, independent of the state of the gate signal | + | |
- | + | ||
- | === 34, 42, 50: Scope {0, 1, 2} continuous trigger === | + | |
- | + | ||
- | If set to 1 the trigger is armed and will be rearmed automatically after every capture completion. | + | |
- | + | ||
- | === 35, 43, 51: Scope {0, 1, 2} arm trigger === | + | |
- | + | ||
- | Writing a 1 to this register will arm the trigger once. The register does not have to be reset to 0 before the next arm trigger, just write another 1 to it. If the corresponding register ’continuous trigger’ is set to 1, writing to this register does not have any effect. | + | |
- | + | ||
- | === 44, 52: Scope {1, 2} capture mode === | + | |
- | + | ||
- | ^ value^capture mode ^ | + | |
- | | 0|capture until the number of samples defined by register {32, 48} are stored | + | |
- | | 1|the same, but cancel capturing when the gate signal goes low | | + | |
- | + | ||
- | A capture mode register is only available for scopes 1 and 2. Scope 0 (for corrected ADC data) always operates in capture mode 0. | + | |
- | + | ||
- | ==== 7.2.2 Status registers ==== | + | |
- | + | ||
- | The following status registers can be read by software: | + | |
- | + | ||
- | ^ **index**^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | === 0 - 3: latest BPM {0 - 3} result === | + | |
- | + | ||
- | This value divided by $`2^{15}`$ represents the relative beam position in the range $`[-1, 1[`$. | + | |
- | + | ||
- | === 4 - 7: latest BPM {0 - 3} averaging result === | + | |
- | + | ||
- | This value divided by $`2^{15}`$ represents the relative beam position in the range $`[-1, 1[`$. Due to the averaging there should be less noise on this value than on the BPM result. | + | |
- | + | ||
- | === 32, 40, 48: Scope {0, 1, 2} capture status === | + | |
- | + | ||
- | ^ value^capture status | + | |
- | | 0|idle | + | |
- | | 1|waiting for trigger | + | |
- | | 2|capturing | + | |
- | | 3|done | + | |
- | + | ||
- | The value 0 is only present before starting the trigger for the first time. After that, the effective idle state is 3. | + | |
- | + | ||
- | === 33, 41, 49: Scope {0, 1, 2} latest write address === | + | |
- | + | ||
- | Address where the latest data sample was stored during the scope’s capturing process. | + | |
- | + | ||
- | === 127: FPGA serial number === | + | |
- | + | ||
- | The XDMA PCIe driver by Xilinx numbers the devices randomly and is not able to identify the slot number of an AFC board. This register holds the FPGA’s unique serial number and can be used to identify an AFC board. | + | |
- | + | ||
- | ===== 7.3 Capturing procedure ===== | + | |
- | + | ||
- | ==== 7.3.1 Known number of samples ==== | + | |
- | + | ||
- | A typical procedure for capturing a predefineable number of samples starting from the rising edge of the gate signal is the following: | + | |
- | + | ||
- | * write the number of samples minus 1 to the configuration register ’capture length - 1’ | + | |
- | * write a 1 to the configuration register named ’arm trigger’ | + | |
- | * you can check the status register named ’capture status’ for the progress: 1: rising edge of gate signal not yet detected, 2: capturing is ongoing, 3: capturing completed | + | |
- | * you can check the current write address by polling the status register named ’latest write address’ | + | |
- | + | ||
- | ==== 7.3.2 Unknown number of samples ==== | + | |
- | + | ||
- | BPM results are only calculated while the gate signal is high. If you want to capture a complete high period of e.g. BPM average samples, the total number of samples is unknown. Proceed as follows: | + | |
- | + | ||
- | * write the maximum value 0x1FFFFFF to the configuration register named ’capture length - 1’ | + | |
- | * write a 1 to the configuration register named ’capture mode’ | + | |
- | * write a 1 to the configuration register named ’arm trigger’ | + | |
- | * you can check the status register ’capture status’ as above | + | |
- | * the value of the status register named ’latest write address’ will be static after completion and indicates how many samples have been captured | + | |
- | + | ||
- | ====== 8 Extended gateware software interface ====== | + | |
- | + | ||
- | Besides the interface documented in [[# | + | |
- | + | ||
- | ===== 8.1 Extended register map ===== | + | |
- | + | ||
- | ==== 8.1.1 Additional configuration registers ==== | + | |
- | + | ||
- | The following additional registers can be written by software: | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ^ **index**^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | === 64, 80: FMC {0, 1} status LED select === | + | |
- | + | ||
- | There is one tricolor LED on the FMC front panel labeled //status// that can be controlled by the gateware. | + | |
- | + | ||
- | ^ value^input | + | |
- | | 0|ADC clock, blink frequency divided by $`2^{27}`$, green if AD9510 PLL is in lock, otherwise red | | + | |
- | | 1|AD9510 monitoring clock, blink frequency divided by $`2^{27}`$ | + | |
- | | 2, 3|static value from register ’status LED value’ | + | |
- | + | ||
- | === 65, 81: FMC {0, 1} status LED value === | + | |
- | + | ||
- | Static lighting pattern if register ’status LED select’ = 2 or 3. | + | |
- | + | ||
- | ^ bit^color | + | |
- | | 0|red | | + | |
- | | 1|green | + | |
- | | 2|blue | + | |
- | + | ||
- | === 66, 82: FMC {0, 1} SPI cs === | + | |
- | + | ||
- | Chip select signals (active high) of the SPI bus to the four ADCs and to the AD9510 PLL and clock distribution. | + | |
- | + | ||
- | ^ bit^device | + | |
- | | 0|ADC 0 | | + | |
- | | 1|ADC 1 | | + | |
- | | 2|ADC 2 | | + | |
- | | 3|ADC 3 | | + | |
- | | 4|PLL and clock distribution | + | |
- | + | ||
- | === 67, 83: FMC {0, 1} SPI read/write === | + | |
- | + | ||
- | 0: write mode, 1: read mode | + | |
- | + | ||
- | === 68, 84: FMC {0, 1} SPI address === | + | |
- | + | ||
- | The address of the register that shall be accessed. | + | |
- | + | ||
- | === 69, 85: FMC {0, 1} SPI write data === | + | |
- | + | ||
- | The data that shall be written to a register. | + | |
- | + | ||
- | === 70, 86: FMC {0, 1} SPI trigger === | + | |
- | + | ||
- | Write a 1 to this register to start a read or write access on the SPI bus. The register does not have to be reset to 0 before the next SPI trigger, just write another 1 to it. | + | |
- | + | ||
- | === 71, 87: FMC {0, 1} ADC resetn === | + | |
- | + | ||
- | Low active reset signal to the four ADCs in parallel. Tie to 0 and back to 1 to initiate a reset. | + | |
- | + | ||
- | === 72, 88: FMC {0, 1} I2c read/write === | + | |
- | + | ||
- | 0: write mode, 1: read mode | + | |
- | + | ||
- | === 73, 89: FMC {0, 1} I2C device address === | + | |
- | + | ||
- | The address of the connected VCXO is 0x49. | + | |
- | + | ||
- | === 74, 90: FMC {0, 1} I2C register address === | + | |
- | + | ||
- | The address of the register that shall be accessed. | + | |
- | + | ||
- | === 75, 91: FMC {0, 1} I2C write data === | + | |
- | + | ||
- | The data that shall be written to a register. | + | |
- | + | ||
- | === 76, 92: FMC {0, 1} I2C trigger === | + | |
- | + | ||
- | Write a 1 to this register to start a read or write access on the I2C bus. The register does not have to be reset to 0 before the next I2C trigger, just write another 1 to it. | + | |
- | + | ||
- | === 77, 93: FMC {0, 1} PLL resetn === | + | |
- | + | ||
- | Low active reset signal to the PLL and clock distribution. Tie to 0 and back to 1 to initiate a reset. | + | |
- | + | ||
- | === 78, 94: FMC {0, 1} Clock switch select === | + | |
- | + | ||
- | There is a separate clock switch in front of the AD9510 PLL reference clock input. | + | |
- | + | ||
- | ^ value^connect to ^ | + | |
- | | 0|MMCX connector labeled //REF// on the front panel of the FMC board | | + | |
- | | 1|clock output from the FPGA via the FMC connector | + | |
- | + | ||
- | === 79, 95: FMC {0, 1} VCXO output enable === | + | |
- | + | ||
- | Enables the frequency output of the VCXO. | + | |
- | + | ||
- | === 96 - 99 and 104 - 107: FMC {0, 1} ADC {0 - 3} clock delay === | + | |
- | + | ||
- | There is a configurable input delay for setting the correct digital interface timing for both the clock and the data signals. Increasing this value increases the delay of the clock, so that the data is sampled later. | + | |
- | + | ||
- | === 100 - 103 and 108 - 111: FMC {0, 1} ADC {0 - 3} data delay === | + | |
- | + | ||
- | See above. Increasing this value increases the delay of the data, so that the data is sampled at an earlier position. | + | |
- | + | ||
- | === 112: Productive mode === | + | |
- | + | ||
- | When productive mode is 1, scope 0 operates in an easy to use mode for storing ADC data.\\ | + | |
- | Setting this register to 0 enables additional functionality like combining and choosing different signals to store and a more powerful and flexible two-stage trigger.\\ | + | |
- | Registers 117 to 127 are only relevant in non productive mode. | + | |
- | + | ||
- | === 113: AFC LED select === | + | |
- | + | ||
- | There is one tricolor LED at the center of the AFC front panel labeled //L3// that can be controlled by the gateware. | + | |
- | + | ||
- | ^ value^input | + | |
- | | 0|PCIe reference clock, blink frequency divided by $`2^{27}`$, white | | + | |
- | | 1|static value from register 114 ’AFC LED value’ | + | |
- | + | ||
- | === 114: AFC LED value === | + | |
- | + | ||
- | Static lighting pattern if register 113 ’AFC LED select’ = 1. | + | |
- | + | ||
- | ^ bit^color | + | |
- | | 0|red | | + | |
- | | 1|green | + | |
- | | 2|blue | + | |
- | + | ||
- | === 115: Gate override === | + | |
- | + | ||
- | For testing purposes without an external gate signal you can set this register to 1 and simulate a gate signal via register 116 ’gate override value’. | + | |
- | + | ||
- | === 116: Gate override value === | + | |
- | + | ||
- | Can be used to simulate a gate signal when register 115 ’gate override’ is 1. | + | |
- | + | ||
- | === 117: Observer valid signal select === | + | |
- | + | ||
- | Determines the data valid input to the observer. Samples are only stored when the valid signal is high. | + | |
- | + | ||
- | ^ value^input | + | |
- | | 0, 3|constant 1 | | + | |
- | | 1|BPM result valid | | + | |
- | | 2|BPM averaging result valid | | + | |
- | + | ||
- | === 120, 121: Observer multiplexer {0, 1} select === | + | |
- | + | ||
- | The observer stores samples that are 128 bits wide, which consist of two concatenated 64 bits wide multiplexer outputs. Each multiplexer can choose between eight different input vectors. Like this, each signal can be observed in parallel to any other signal. | + | |
- | + | ||
- | ^ value^input vector(64 bits) ^ | + | |
- | | 0|corrected ADC data of ADCs 0 - 3 | | + | |
- | | 1|corrected ADC data of ADCs 4 - 7 | | + | |
- | | 2|BPM 0 and 1 result, additional information | + | |
- | | 3|BPM 2 and 3 result, additional information | + | |
- | | 4|BPM 0 and 1 averaging result, additional information | + | |
- | | 5|BPM 2 and 3 averaging result, additional information | + | |
- | | 6|SPI and I2C signals, MLVDS signals, FMC trigger signals | + | |
- | | 7|test counter | + | |
- | + | ||
- | For a detailed description of the input vectors see [[# | + | |
- | + | ||
- | === 122: Observer number of samples - 1 === | + | |
- | + | ||
- | The number of samples minus one that are stored after the observer has been triggered. Each sample consists of 16 bytes. | + | |
- | + | ||
- | === 123: Observer trigger select === | + | |
- | + | ||
- | Analog to register 120 and 121. Determines on which observer input vector the trigger listens. | + | |
- | + | ||
- | === 124: Observer trigger compare vector (t = -1) === | + | |
- | + | ||
- | 64 bit wide compare vector that is compared with the observer input vector determined by register 123 ’observer trigger select’. If the two pattern match, the next sample will be compared to the compare vector determined by register 125: ’trigger compare vector (t = 0)’. | + | |
- | + | ||
- | === 125: Observer trigger compare vector (t = -1) === | + | |
- | + | ||
- | See above. If the patterns do not match, the next sample will be compared to the compare vector determined by register 124: ’trigger compare vector (t = -1)’. If the patterns match the data acquisition is triggered. | + | |
- | + | ||
- | === 126: Observer trigger compare bit mask === | + | |
- | + | ||
- | Determines which bits of the input vector shall be compared with that of the compare vectors. Valid for both trigger compare vectors (registers 124 and 125). For a triggering, the patterns must match for all bits where the bit mask is 1. | + | |
- | + | ||
- | === 126: Observer arm trigger === | + | |
- | + | ||
- | Starts the comparing process. Data is captured if the patterns defined by the previous three registers match. | + | |
- | + | ||
- | ==== 8.1.2 Additional status registers ==== | + | |
- | + | ||
- | The following additional status registers can be read by software: | + | |
- | + | ||
- | ^ **index**^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | === 64, 80: FMC {0, 1} SPI busy === | + | |
- | + | ||
- | Indicates that a SPI read or write access is going on. The value of this register has to be checked to be 0 before triggering a SPI access. | + | |
- | + | ||
- | === 65, 81: FMC {0, 1} SPI read data === | + | |
- | + | ||
- | Contains the result of a read access to a SPI register. | + | |
- | + | ||
- | === 66, 82: FMC {0, 1} I2C busy === | + | |
- | + | ||
- | Indicates that an I2C read or write access is going on. The value of this register has to be checked to be 0 before triggering an I2C access. | + | |
- | + | ||
- | === 67, 83: FMC {0, 1} I2C read data === | + | |
- | + | ||
- | Contains the result of a read access to an I2C register. | + | |
- | + | ||
- | === 68, 84: FMC {0, 1} PLL status === | + | |
- | + | ||
- | Value of the configurable output pin //status// of the AD9510 PLL and clock distribution IC. By default this pin indicates lock status of the PLL. | + | |
- | + | ||
- | === 69, 85: FMC {0, 1} VCXO initial RFREQ === | + | |
- | + | ||
- | //RFREQ// is a factory calibrated multiplicator to the XTAL frequency of the Si571 programmable VCXO. Before the programming of a new output frequency this value has to be read (see [[# | + | |
- | + | ||
- | === 69, 86: FMC {0, 1} VCXO RFREQ === | + | |
- | + | ||
- | The VCXO output frequency is programmed to 125 MHz by the gateware. This register holds the value of //RFREQ// that has been programmed (see [[# | + | |
- | + | ||
- | === 71, 87: FMC {0, 1} measured ADC clock frequency === | + | |
- | + | ||
- | The ADC clock is measured against the main processing clock. This register holds the number of detected ADC clock cycles during 1 second of the main processing clock. | + | |
- | + | ||
- | === 72, 88: FMC {0, 1} ADC FIFO underflow counter === | + | |
- | + | ||
- | If the ADC clock is slower than the main processing clock, samples will be repeated by the clock domain crossing FIFO output logic. For each repetition the underflow counter will be incremented by 1. | + | |
- | + | ||
- | === 73, 89: FMC {0, 1} ADC FIFO underflow counter === | + | |
- | + | ||
- | If the ADC clock is faster than the main processing clock, samples will discarded by the clock domain crossing FIFO input logic. For each discarded sample the overflow counter will be incremented by 1. | + | |
- | + | ||
- | === {96 - 103}: ADC {0 - 7} max peak to peak === | + | |
- | + | ||
- | The maximum and the minimum value of the ADC data is determined over a free running period of 1 second. This register contains the difference of the maximum and the minimum value. | + | |
- | + | ||
- | === 111: SDRAM initial calibration complete === | + | |
- | + | ||
- | The communication to the SDRAM is controlled by an IP core by Xilinx which performs a timing calibration at start up. The value of this register will be 1 after completion of the initial calibration. | + | |
- | + | ||
- | === 124: observer triggered === | + | |
- | + | ||
- | Indicates that the observer has been triggered. | + | |
- | + | ||
- | === 125: observer capture busy === | + | |
- | + | ||
- | Indicates that a capturing process is ongoing. | + | |
- | + | ||
- | === 126: build timestamp === | + | |
- | + | ||
- | Time when the bitstream was created. Can be used to identify the gateware version (together with the Git commit information documented in [[# | + | |
- | + | ||
- | ^ | + | |
- | | 0 - 5|seconds | + | |
- | | 6 - 11|minutes | + | |
- | | 12 - 16|hours | + | |
- | | 17 - 22|last two decimal digits of the year | | + | |
- | | 23 - 26|month | + | |
- | | 27 - 31|day | + | |
- | + | ||
- | ===== 8.2 Architecture information storage ===== | + | |
- | + | ||
- | The first seven eights of the Block RAM (see memory mapping, table 7.1) are used to store information about the observer signals, the registers and the gateware version. | + | |
- | + | ||
- | ==== 8.2.1 Observer signal information ==== | + | |
- | + | ||
- | Information about the signals connected to the eight observer multiplexer inputs is stored in the first half of the Block RAM. Following information is stored for every bit of each of the eight 64 bits wide multiplexer inputs: | + | |
- | + | ||
- | name of signal (30 bytes), display type of signal (1 byte), bit index in signal (1 byte) | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | | | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | | | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | Table 8.4 shows the storage format of the 512 entries, each of which has a width of 32 bytes. The coding of the display type byte is the following: | + | |
- | + | ||
- | ^ value^display type ^ | + | |
- | | 0|hexadecimal | + | |
- | | 1|signed | + | |
- | | 2|unsigned | + | |
- | | 3|binary | + | |
- | | 4|analog | + | |
- | + | ||
- | The names are stored as ASCII strings. If a name is shorter than 30 bytes, the remaining bytes are filled with Null characters.\\ | + | |
- | The observer signal information is used by the FPGA Observer software to display the observer signals in the Data Acquisition tab (see [[# | + | |
- | + | ||
- | ==== 8.2.2 Register information ==== | + | |
- | + | ||
- | Information about the 128 configuration registers and the 128 status registers is stored in the third quarter of the Block RAM. Following information is stored for every register: | + | |
- | + | ||
- | name of register (31 bytes), number of bits (1 byte) | + | |
- | + | ||
- | ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | | + | |
- | | '' | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | Table 8.5 shows the storage format of the 256 entries, each of which has a width of 32 bytes. The names are stored as ASCII strings. If a name is shorter than 31 bytes, the remaining bytes are filled with Null characters. If not all registers are in use, a width of 0 bits indicates that a register is not present.\\ | + | |
- | The register information is used by the FPGA Observer software to display the registers in the Register Access tab (see [[# | + | |
- | + | ||
- | ==== 8.2.3 Gateware information ==== | + | |
- | + | ||
- | The address range from 0x80006000 to 0x80006FFF is used to store information about the gateware version. The information is stored as an ASCII string of variable length (maximum 4 kiB), which is assembled from information from the Git repository. It contains the URL of the remote server of the Git repository, the latest commit hash and the latest commit date.\\ | + | |
- | The gateware information is used by the FPGA Observer software to display the information in the Gateware Information tab (see [[# | + | |
- | + | ||
- | ====== 9 Test software ====== | + | |
- | + | ||
- | ===== 9.1 FPGA Observer ===== | + | |
- | + | ||
- | There is a graphical test software intended to be run on the CPU unit. It is implemented in Python using the GTK 3 GUI toolkit. | + | |
- | + | ||
- | ==== 9.1.1 Installation and usage ==== | + | |
- | + | ||
- | The sources and an installation script can be found under '' | + | |
- | + | ||
- | === Installation === | + | |
- | + | ||
- | Connect to the CPU unit e.g. via ssh. Clone the // | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | Install the PCIe driver: | + | |
- | + | ||
- | '' | + | |
- | '' | + | |
- | + | ||
- | Install the FPGA Observer software: | + | |
- | + | ||
- | '' | + | |
- | '' | + | |
- | + | ||
- | === Usage === | + | |
- | + | ||
- | For the PCIe driver to work, the bitstreams of the FPGAs have to be loaded before powering the CPU unit. If that is not the case, power cycle the CPU unit by pulling out the Hot Swap Handle and pushing it in again. A software reboot does not work. | + | |
- | + | ||
- | Connect to the CPU unit e.g. via '' | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | A GUI should open and a choice of FPGA serial numbers should be displayed on the upper left corner. If the list is empty, either the loading of the FPGAs finished after powering the CPU unit or the PCIe driver did not install correctly. The FPGA serial numbers can be used to identify the AFC board you like to access. Choose a serial number and click // | + | |
- | + | ||
- | ==== 9.1.2 Register Access tab ==== | + | |
- | + | ||
- | The names and widths of the registers are read from an information memory region in the FPGA (see [[# | + | |
- | The //read// button reads all the status registers either once or continuously if the // | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== 9.1.3 Data Acquisition tab ==== | + | |
- | + | ||
- | The names and widths of the signals connected to the eight observer multiplexer inputs are read from an information memory region in the FPGA (see [[# | + | |
- | The two combo boxes //Observer 0// and //Observer 1// determine which multiplexer inputs are selected for data acquisition. The combo box //Trigger// determines on which of the observer inputs the trigger will listen.\\ | + | |
- | The //Number Of Samples// entry determines how many samples will be stored after a trigger event when the //capture// button has been pressed. If the // | + | |
- | + | ||
- | The individual //Trigger Active (&)// check buttons define on which signals the trigger will listen. All of the enabled conditions have to become true for a trigger event.\\ | + | |
- | The trigger conditions for t = -1 and t = 0 contain the compare vectors of the two stage trigger which have to match in consecutive clock cycles. The //Trigger Mask (&)// defines on which bits of a signal the trigger will listen. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | When the data acquisition completes, the open source waveform viewer GTKWave is called to display the captured data, which has been stored to a .vcd file before. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== 9.1.4 BPM Calculation tab ==== | + | |
- | + | ||
- | The result of the BPM algorithm and of the averaging are displayed in this tab. If the //read// button is pressed, the values from status registers 0 - 3 and 4 - 7 (see [[# | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== 9.1.5 Peripherals Configuration tab ==== | + | |
- | + | ||
- | A configuration file with a special syntax can be loaded to configure the different peripheral devices documented in [[# | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ==== 9.1.6 Gateware Information tab ==== | + | |
- | + | ||
- | Information about the gateware version is displayed in this tab. The URL of the remote server of the Git repository, the latest commit hash and the latest commit date are read from an information memory region in the FPGA (see [[# | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | ===== 9.2 Test scripts ===== | + | |
- | + | ||
- | ==== 9.2.1 PCIe access test script ==== | + | |
- | + | ||
- | There is a PCIe access test script under '' | + | |
- | + | ||
- | ====== 10 Helper scripts ====== | + | |
- | + | ||
- | ===== 10.1 VHDL beautification ===== | + | |
- | + | ||
- | There is a script '' | + | |
- | + | ||
- | The script expects one parameter: '' | + | |
- | + | ||
- | The script applies several corrections and changes to the //Emacs// formatting result: | + | |
- | + | ||
- | * correction of the handling of the comparison operator '' | + | |
- | * correction of the handling of initializations like '' | + | |
- | * enforcing of spaces around the operators '' | + | |
- | * no indentation for closing brackets | + | |
- | * aligning of full comment lines to the indentation level of the following VHDL command | + | |
- | * indentation with tabs instead of spaces | + | |
- | + | ||
- | ===== 10.2 Remote power cycling of the CPU unit ===== | + | |
- | + | ||
- | Whenever the bitstream of an FPGA is reloaded, the CPU unit has to be rebooted via its Hot Swap Handle in order to establish a PCIe connection. A software reboot does not work. | + | |
- | + | ||
- | An alternative possibility of remote power cycling the CPU unit is via the MCH. | + | |
- | + | ||
- | The script '' | + | |
- | + | ||
- | The script takes about 60 seconds to complete. | + | |
- | + | ||
- | ===== 10.3 Plotting of measurement data ===== | + | |
- | + | ||
- | The script '' | + | |
- | + | ||
- | ===== 10.4 Generation of a VHDL file for monitoring and control ===== | + | |
- | + | ||
- | The monitoring and control configuration of the gateware is defined by the configuration files '' | + | |
- | + | ||
- | The script '' | + | |
- | + | ||
- | The script is also executed by the gateware build flow documented in [[# | + | |
- | + | ||
- | ===== 10.5 Generation of documentation ===== | + | |
- | + | ||
- | ==== 10.5.1 PDF ==== | + | |
- | + | ||
- | There is a script '' | + | |
- | + | ||
- | ==== 10.5.2 Markdown ==== | + | |
- | + | ||
- | There is a script '' | + | |
- | + | ||
- | The result of //Pandoc// is postprocessed for multiple reasons: | + | |
- | + | ||
- | * conversion of the math syntax to Gitlab’s .md format | + | |
- | * corrections of the bibliography, | + | |
- | * corrections of the references to figures, tables and equations | + | |
- | * enumeration of chapters, sections and subsections | + | |
- | * adding of captions for figures and equations | + | |
- | * enumeration of figures, tables and equations | + | |
- | * generation of a table of contents | + | |
- | * implementation of citations | + | |
- | + | ||
- | Additional documentation which is not included in the Latex sources is appended from the file '' | + | |
- | + | ||
- | ==== 10.5.3 DokuWiki ==== | + | |
- | + | ||
- | ====== 11 Programming and hardware configuration ====== | + | |
- | + | ||
- | ===== 11.1 Programming the gateware ===== | + | |
- | + | ||
- | ==== 11.1.1 Using a JTAG programmer ==== | + | |
- | + | ||
- | Before being able to access the FPGA you need to program the JTAG switch on the AFC board using a script from the // | + | |
- | + | ||
- | //Tools Run Tcl Script//: '' | + | |
- | + | ||
- | You should now see a // | + | |
- | Right click on it and choose //Program Device// | + | |
- | Choose the correct bitstream (.bit file) and press //OK//. The programming takes about one minute. | + | |
- | + | ||
- | ==== 11.1.2 Using a JTAG Switch Module ==== | + | |
- | + | ||
- | If there is a JTAG Switch Module (JSM) in the MicroTCA crate, the bitstream can also be programmed remotely via a so called Xilinx Virtual Cable: | + | |
- | + | ||
- | * download '' | + | |
- | * download '' | + | |
- | * convert '' | + | |
- | * upload '' | + | |
- | * open Vivado Hardware Manager | + | |
- | * //Open Target New Target Next Local Server Add Xilinx Virtual Cable (XVC)// | + | |
- | * // | + | |
- | * Port: find correct port number in //NAT-MCH GUI JSM// | + | |
- | * // | + | |
- | * //Open target// | + | |
- | * you should see the FPGA now in Vivado Hardware Manager and can program it | + | |
- | + | ||
- | The first four steps are persistent and only have to be executed initially. | + | |
- | + | ||
- | ==== 11.1.3 Storing a bitstream persistently in the SPI Flash ==== | + | |
- | + | ||
- | There is a 256 MB SPI Flash memory on the AFCv3.1 board for persistent bitstream storage. | + | |
- | + | ||
- | === File format conversion === | + | |
- | + | ||
- | First you have to convert the bitstream (.bit) file to a .mcs file. There is a script in the // | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | The .mcs file will be generated in the same folder as the .bit file. | + | |
- | + | ||
- | === Programming === | + | |
- | + | ||
- | Program the JTAG switch on the AFC board as described in [[# | + | |
- | Right click on it and choose //Add Configuration Memory Device// and choose // | + | |
- | + | ||
- | You should now see a // | + | |
- | + | ||
- | Right click on it and choose //Program Configuration Memory Device// | + | |
- | Choose the .mcs file you created before and press //OK//. The programming is really slow and can take up to half an hour. | + | |
- | + | ||
- | ===== 11.2 Configuration of the MCH ===== | + | |
- | + | ||
- | ==== 11.2.1 Via the MCH’s web interface ==== | + | |
- | + | ||
- | === Base configuration === | + | |
- | + | ||
- | //MCH global parameter SSH access: enabled// | + | |
- | This will trigger SSH key generation which takes some minutes to complete. | + | |
- | + | ||
- | //PCIe parameter Upstream slot power up delay: 5 sec//\\ | + | |
- | Delay before the CPU unit will power up on start up. For making sure that the bitstreams are loaded to the AFC’s FPGAs from Flash memory before the CPU unit boots you might have to increase this value. | + | |
- | + | ||
- | //PCIe parameter PCIe hot plug delay for AMCs: 0 sec//\\ | + | |
- | Delay before the AFC boards will power up on start up. | + | |
- | + | ||
- | === Switch PCIe x80 === | + | |
- | + | ||
- | Set the CPU-Unit as upstream AMC source in ’Virtual Switch 0’: | + | |
- | + | ||
- | //PCIe Virtual Switches Upstream AMC: AMC1/ | + | |
- | (for CPU unit in AMC slot 1) | + | |
- | + | ||
- | Make sure you enable //PCIe downstream ’4..7’// | + | |
- | + | ||
- | ==== 11.2.2 Via USB ==== | + | |
- | + | ||
- | The most comfortable way of configuring the MCH is via its web interface. If you have accidentally disabled the webserver, set an invalid IP or DHCP configuration or reset the MCH settings to default, you can access the MCH via an USB connection to the micro USB port on the left side of the front panel. | + | |
- | + | ||
- | On a Linux PC, connect a micro USB cable and check via '' | + | |
- | + | ||
- | Now typing '' | + | |
- | + | ||
- | ===== 11.3 Programming the MMC firmware ===== | + | |
- | + | ||
- | For programming the MMC firmware into the LPC microcontroller you need to install a proprietary software from NXP called LPCxpresso. | + | |
- | + | ||
- | ==== 11.3.1 Installation of LPCxpresso on Linux ==== | + | |
- | + | ||
- | Download LPCxpresso from the NXP website [[# | + | |
- | + | ||
- | ==== 11.3.2 Programming ==== | + | |
- | + | ||
- | Disconnect the AFC board completely. The power for programming the microcontroller will come from the LPC-Link programmer. Connect and power the LPC-Link programmer via USB and connect the customized cable to the // | + | |
- | + | ||
- | Program the device via: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | You can find the openMMC firmware binary under '' | + | |
- | + | ||
- | ====== 12 Analog characteristics ====== | + | |
- | + | ||
- | ===== 12.1 ADC input filter ===== | + | |
- | + | ||
- | The FMC ADC boards were originally designed for very high input frequencies and are equipped with input filters that show a pronounced high pass characteristic. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The part labeled //TR1B BD0205F5050A00// | + | |
- | For being able to use the FMC ADC boards in the Cryring BPM system, the baluns have to be replaced by more suitable components. | + | |
- | + | ||
- | Two approaches have been implemented (probably by Piotr Miedzik): | + | |
- | + | ||
- | - each balun is replaced by two wires | + | |
- | - each balun is replaced by two capacitors of probably 100 nF (hint in an old email) | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | The heatspreader under the bottom of the FMC ADC board has to be unscrewed to access the baluns. | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | < | + | |
- | + | ||
- | + | ||
- | + | ||
- | Figure 12.4 shows the magnitude frequency responses of the original input filter and of the two modifications. The diagram data was created by using a sine signal from a signal generator with an amplitude of 2 $`V_{pp}`$ and by measuring the maximum amplitude swing of the raw ADC data. | + | |
- | + | ||
- | ====== 13 References ====== | + | |
- | + | ||
- | [1] A. Reiter, R. Singh: Comparison of beam position calculation methods for application in digital acquisition systems. //Nuclear Instruments and Methods in Physics Research Section A: Accelerators, | + | |
- | + | ||
- | [2] P. Miedzik, H. Bräuning, T. Hoffmann, A. Reiter, R. Singh: A MicroTCA based beam position monitoring system at CRYRING@ESR. //16th Int. Conf. on Accelerator and Large Experimental Control Systems, Barcelona, Spain//, October 2017, https:// | + | |
- | + | ||
- | [3] A. Reiter, W. Kaufmann, R. Singh, P. Miedzik, T. Hoffmann, H. Bräuning: The CRYRING BPM cookbook, March 2015, https:// | + | |
- | + | ||
- | [4] AMC FMC Carrier (AFC) Git repository, Open Hardware Repository, https:// | + | |
- | + | ||
- | [5] Silicon Labs: Timing part decoder web page, https:// | + | |
- | + | ||
- | [6] Silicon Labs: Si571 datasheet, https:// | + | |
- | + | ||
- | [7] Analog Devices: AD9510 datasheet, https:// | + | |
- | + | ||
- | [8] Renesas: ISLA216P datasheet, https:// | + | |
- | + | ||
- | [9] AFC v3.1 schematics, https:// | + | |
- | + | ||
- | [10] NXP: LPCxpresso download web page, https:// | + | |
- | + | ||
- | [11] FMC ADC 250M 16B 4ch schematics, https:// | + | |
- | + | ||
- | [12] Anaren: Balun BD0205F5050A00 datasheet, https:// | + | |
- | + | ||
- | ====== Additional yet unordered documentation ====== | + | |
- | + | ||
- | ===== Block diagram ===== | + | |
- | + | ||
- | {{: | + | |
- | + | ||
- | ===== Prerequisites ===== | + | |
- | + | ||
- | Xilinx Vivado 2019.2 or 2017.4 | + | |
- | + | ||
- | Set your preferred Vivado version in '' | + | |
- | + | ||
- | The IPs are currently only maintained for Vivado 2019.2. | + | |
- | + | ||
- | ===== Vivado GUI based build flow ===== | + | |
- | + | ||
- | If you intend to use the Vivado GUI proceed as follows: | + | |
- | + | ||
- | ==== Windows ==== | + | |
- | + | ||
- | * open Vivado GUI | + | |
- | * use the TCL console in the bottom of the GUI to navigate to '' | + | |
- | * type '' | + | |
- | + | ||
- | There is one generated VHDL file '' | + | |
- | + | ||
- | ==== Linux (Windows description also works) ==== | + | |
- | + | ||
- | * navigate to the root folder of the repository in a terminal | + | |
- | * type '' | + | |
- | + | ||
- | This will open the Vivado GUI and set up a project, which can take some minutes.\\ | + | |
- | The project will be generated in the folder '' | + | |
- | + | ||
- | ===== Non GUI based build flow ===== | + | |
- | + | ||
- | For a completely automatic script based build flow without using the Vivado GUI proceed as follows: | + | |
- | + | ||
- | ==== Linux ==== | + | |
- | + | ||
- | * navigate to the root folder of the repository in a terminal | + | |
- | * type '' | + | |
- | + | ||
- | A project will be generated in the folder '' | + | |
- | The bitstream (if successful) will be generated in the subfolder '' | + | |
- | + | ||
- | ===== GUI Simulation ===== | + | |
- | + | ||
- | * click on “Run Simulation” in the Vivado GUI (prerequisite: | + | |
- | + | ||
- | ===== Simulation times ===== | + | |
- | + | ||
- | The SDRAM interface IP starts needs an initial calibration process which finishes after about 120 us. The simulation time should be longer than that to see the actual system behavior, e.g. 200 us. | + | |
- | + | ||
- | ===== Non GUI Simulation ===== | + | |
- | + | ||
- | ==== Linux ==== | + | |
- | + | ||
- | * navigate to the root folder of the repository in a terminal | + | |
- | * type '' | + | |
- | * you will find the output files of the simulation in the folder '' | + | |
- | + | ||
- | ===== Differences between AFC version 2 and AFC version 3.1 ===== | + | |
- | + | ||
- | Both boards carry 2 GiBytes of DDR3-SDRAM, divided in four modules of 512 MiBytes each.\\ | + | |
- | The SDRAM model can be determined via the FBGA code printed on the modules using [[https:// | + | |
- | + | ||
- | === AFC version 2: === | + | |
- | + | ||
- | * FBGA code: D9PBC, translates to Micron MT41J512M8RA-125: | + | |
- | * operates at 1.5 V | + | |
- | + | ||
- | === AFC version 3.1: === | + | |
- | + | ||
- | * FBGA code: D9QBV, translates to Micron MT41K512M8RH-125 IT:E | + | |
- | * compatible to older MT41J family, operates at 1.5 V or 1.35 V | + | |
- | + | ||
- | ===== Differences between FMC ADC 250 M 16B 4CH versions ===== | + | |
- | + | ||
- | The ‘LD1’ (v1.0) or ‘STATUS’ (v1.2 and v2.3) LED on the right of the FMC board front panel is connected differently between v1.0 and (v1.2 and v2.3). When using the location constraints for v1.2 and v2.3 together with a v1.0 board, the LED lights as follows: | + | |
- | + | ||
- | wanted red → off\\ | + | |
- | wanted green → lights green\\ | + | |
- | wanted blue → lights red | + | |
- | + | ||
- | Besides that, there seem to be no differences in the connections to the FPGA. | + | |
- | + | ||
- | ===== Maximum achievable data rate to and from SDRAM ===== | + | |
- | + | ||
- | The gross data rate of the SDRAM interface is 800 MT/s with 32 bits/ | + | |
- | 125 MT/s * 16 bits/ | + | |
- | The SDRAM capacity of 2 GiBytes is sufficient to store the stream data of all eight ADCs for 1 second. | + | |
- | + | ||
- | ===== Configuration of IDELAYS in FPGA logic to compensate ADC clock vs. ADC data misalignment ===== | + | |
- | + | ||
- | In the FPGA design there are configurable input delay modules for the clock and for the data input pins. By increasing the input delay of either a clock or of the associated data inputs, the alignment can be corrected in both directions. The input delays provide a 32 tap delay line with a configurable delay between 0 to 31 taps. | + | |
- | + | ||
- | TODO: Find a data sheet which provides the delay time of one tap. | + | |
- | + | ||
- | The ADCs offers programmable user patterns that can be sent in place of the ADC samples to check the correct timing of the digital interface.\\ | + | |
- | For finding the optimum delay values, the following procedure is applied: | + | |
- | + | ||
- | * The clock input delay is increased until the pattern begins to deteriorate and the delay index at which that happens is noted. | + | |
- | * After that the clock input delay is reset to 0 and the data delay is increased until the pattern begins to deteriorate. | + | |
- | * The optimum value is assumed to be the midpoint between these values. | + | |
- | + | ||
- | For an ADC clock frequency of 125 MHz the result is: | + | |
- | + | ||
- | FMC0: no deterioration at clock delay 0x1f but at data delay 0x06 → | + | |
- | + | ||
- | * ADC clock delay value: 0x0D | + | |
- | * ADC data delay value: 0x00 | + | |
- | + | ||
- | FMC1: no deterioration at clock delay 0x1f but at data delay 0x05 → | + | |
- | + | ||
- | * ADC clock delay value: 0x0D | + | |
- | * ADC data delay value: 0x00 | + | |
- | + | ||
- | ===== Software running on microTCA CPU unit ===== | + | |
- | + | ||
- | ==== Installation of Xilinx PCIe DMA driver ==== | + | |
- | + | ||
- | run '' | + | |
- | + | ||
- | ==== install necessary software ==== | + | |
- | + | ||
- | run '' | + | |
- | + | ||
- | this will: * install Git * install development tools needed for building kernel modules * clone the Xilinx PCIe DMA driver Git repository | + | |
- | + | ||
- | ==== compile, install and load the PCIe DMA kernel module ==== | + | |
- | + | ||
- | run '' | + | |
- | + | ||
- | ==== additional documentation ==== | + | |
- | + | ||
- | You can find additional documentation in a README file in the Xilinx PCIe DMA driver Git repository: | + | |
- | + | ||
- | ‘dma_ip_drivers/ | + | |
- | + | ||
- | ===== FPGA Observer software for monitoring and controlling the FPGA gateware ===== | + | |
- | + | ||
- | run '' | + | |
- | + | ||
- | run '' | + | |
- | + | ||
- | '' | + | |
- | The raw monitoring data can be stored to a binary file, described in [[# | + | |
- | The observer software converts the binary .bin file to a .vcd file which can be read in waveform viewers like GTKWave. | + | |
- | + | ||
- | ==== GTKWave ==== | + | |
- | + | ||
- | GTKWave is available in the EPEL packet sources for CentOS 7. Install it with: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | Copy ‘.gtkwaverc’ to your home directory for disabling GTKWave’s splash screen: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | Run GTKWave: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | The file ‘gtkwave_settings/ | + | |
- | + | ||
- | ===== PCIe access to SDRAM content ===== | + | |
- | + | ||
- | After loading the bitstream to the FPGA on the AFC board, the CPU unit has to be rebooted by pulling its hot swap handle out and pushing it in again. A Linux reboot via ‘sudo reboot’ does NOT work, this will disable the PCIe connection to the AFC board. | + | |
- | + | ||
- | TODO: Find the reasons for that behaviour, maybe it has something to do with the BIOS settings or with the MCH PCIe settings. | + | |
- | + | ||
- | This behaviour does not occur with the LNLS 100 MHz design, which survives a Linux soft reboot. The 100 MHz design does not even need a reboot, since its PCIe driver works directly after programming the bitstream (different from the Xilinx XDMA driver). | + | |
- | + | ||
- | TODO: Check for differences in the settings of the PCIe Xilinx IP cores. LNLS’s 100 MHz design uses ‘7 Series Integrated Block for PCI Express’ which seems to be a subset of ‘DMA/ | + | |
- | + | ||
- | ===== Alternative to CPU unit reboot via Hot Swap Handle ===== | + | |
- | + | ||
- | The CPU unit can also be shut down and started again via the command line interface of the MCH: | + | |
- | + | ||
- | Connect to the MCH via '' | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | The script [[../ | + | |
- | You might have to adapt the FRU number in the script to your needs, depending on the location of the CPU unit. | + | |
- | + | ||
- | ===== Installation of Xilinx XDMA driver ===== | + | |
- | + | ||
- | Install necessary packages and get the driver’s source code: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | Build and load the driver: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | The driver can be loaded again (e.g. after a reboot) by '' | + | |
- | + | ||
- | ===== Reading data from SDRAM on AFC board to a file ===== | + | |
- | + | ||
- | Example for reading 4096 bytes of data from address 0x00000000: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | The output file is of a binary format which you can display with | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | Reading the whole SDRAM content (2 GiB) to a file on the CPU-Unit’s hard disk takes about 17 seconds. | + | |
- | + | ||
- | ==== Creating a RAM disk for accelerating the reading ==== | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | Now the reading of the whole SDRAM content takes about 5 seconds: | + | |
- | + | ||
- | '' | + | |
- | + | ||
- | ===== Dependency of MMC firmware and PCIe IP core settings ===== | + | |
- | + | ||
- | The clock frequency of the input clock to the PCIe IP core is different between the MMC firmwares of Creotech and LNLS: | + | |
- | + | ||
- | PCIE_CLK1_C (name on AFC schematic): 125 MHz (Creotech), 100 MHz (LNLS) | + | |
- | + | ||
- | There is a setting ‘Reference Clock Frequency’ in the Xilinx ‘DMA/ | + | |
- | At the moment it is set to 125 MHz to support Creotech’s original MMC firmware. | + | |
- | + | ||
- | To make this design work with LNLS’s MMC firmware, the setting has to be changed to 100 MHz. | + | |
- | + | ||
- | ===== LEDs on AFC and FMC board front panels ===== | + | |
- | + | ||
- | ==== LEDs driven by FPGA pins ==== | + | |
- | + | ||
- | There are two tricolor LEDs connected to FPGA pins: | + | |
- | + | ||
- | * ‘L3’ in the center of the AFC board front panel | + | |
- | * ‘LD1’ (v1.0) or ‘STATUS’ (v1.2) on the right of the FMC board front panel | + | |
- | + | ||
- | Each tricolor LED consists of three independent LEDs (red, green and blue). | + | |
- | + | ||
- | ==== LEDs driven by MMC ==== | + | |
- | + | ||
- | In Service (‘L1’, green)\\ | + | |
- | Alarm (‘L2’, red)\\ | + | |
- | Hot Swap (‘HS’, blue) | + | |
- | + | ||
- | Insertion of AFC board: | + | |
- | + | ||
- | ^ | + | |
- | | AMC inserted into chassis with handle open | + | |
- | | AMC handle closed (requests activation from chassis IPMI Controller) | + | |
- | | Activation granted and AMC powers up | Closed | + | |
- | + | ||
- | Removal of AFC board: | + | |
- | + | ||
- | ^ | + | |
- | | AMC handle pulled open (requests de-activation from chassis IPMI controller) | + | |
- | | | + | |
- | + | ||
- | Source: [[https:// | + | |
- | + | ||
- | ===== MCH PCIe status LEDs ===== | + | |
- | + | ||
- | ^ LED state ^ meaning | + | |
- | | off | + | |
- | | 1 blink/ | + | |
- | | 2 blinks/ | + | |
- | | | + | |
- | + | ||
- | Source: [[https:// | + | |
- | + | ||
- | ===== Settings of Gitlab CI / CD ===== | + | |
- | + | ||
- | * Use git clone to get the recent application code, otherwise the pipelines might fail during git fetch.\\ | + | |
- | Settings → CI / CD → General pipelines → Git strategy for pipelines: git clone | + | |
- | * Increase timeout to allow FPGA build to finish.\\ | + | |
- | Settings → CI / CD → General pipelines → Timeout: 6h | + | |
- | + | ||
- | ===== Run times from scratch ===== | + | |
- | + | ||
- | Core i5-8500, 2018, 6 cores / 6 threads, 3.0 - 4.1 GHz, CPU Mark: (11830 MT, 2405 ST), 32 GB RAM, SSD | + | |
- | + | ||
- | * simulation: 00:05:20 h | + | |
- | * FPGA build: 01:16:17 h | + | |
- | + | ||
- | Xeon W3505, 2009, 2 cores / 2 threads, 2.53 GHz, CPU Mark: (1842 MT, 1052 ST), 24 GB RAM, HDD | + | |
- | + | ||
- | * simulation: 00:14:16 h | + | |
- | * FPGA build: 02:54:53 h | + | |
- | + | ||
- | ===== JSM JTAG device numbering on PowerBrige 6 slot crate ===== | + | |
- | + | ||
- | JTAG Device | + | |
- | + | ||
- | ^ left slots ^ right slots ^ | + | |
- | | - (CPU) | | + | |
- | | | + | |
- | | | + | |
- | | ? | + | |
- | + | ||
- | ===== Differences in MMC firmwares of Creotech and LNLS ===== | + | |
- | + | ||
- | Creotechs MMC firmware routes a 125 MHz clock to the PCIe reference clock input, whereas LNLS’s Open MMC firmware routes a 100 MHz clock to this pin.\\ | + | |
- | The frequency of sys_clk is 125 MHz for both.\\ | + | |
- | The Open MMC firmware forwards the signal of the reset button on the front panel to the FPGA pin AG26 after some seconds. Additionally, | + | |
- | + | ||
- | This gateware design is currently functional for the Creotech MMC firmware. For running it together with the Open MMC firmware, the PCIe reference clock frequency in the IP ‘pcie_dma_ip’ has to be changed to 100 MHz. | + | |
- | + | ||
- | ===== Handling of multiple AFC boards together with one CPU unit ===== | + | |
- | + | ||
- | The ‘xmda’ PCIe driver enumerates the detected PCIe connections randomly starting with ’/ | + | |
- | + | ||
- | ===== List of AFC v3.1 boards ===== | + | |
- | + | ||
- | ^ AFC serial number | + | |
- | | | + | |
- | | | + | |
- | | | + | |
- | | | + | |
- | | | + | |
- | | | + | |
+ | < | ||
+ | lalala | ||
+ | < |