# 16-point DCT as the spectral $\mathbf{1}^{st}$ level surface detector trigger in the Pierre Auger Observatory Z. SZADKOWSKI<sup>1,2</sup> <sup>1</sup>Department of Experimental Physics, University of Łódź, Pomorska 151, 90-236 Łódź, Poland. zszadkow@kfd2.phys.uni.lodz.pl **Abstract:** Hadron induced, very inclined EAS, starting their development early in the atmosphere produce narrow, relatively flat muonic fronts on the Pierre Auger detection level. The signatures of FADC traces (very short rise time with fast exponential attenuation) from water Cherenkov tanks can be used for their detection. Currently used triggers in the Pierre Auger surface detector (Threshold and Time over Threshold triggers) have been selected to detect showers with possibly wide range of distances from the core, angles and energies. However they are not optimized for horizontal and very inclined showers, interesting as potentially generated by neutrinos. The paper describes the implementation of 16-point discrete cosine transform algorithm into progressive Altera<sup>®</sup> FPGA, used in the 4th generation of the surface detector trigger. An application of DSP blocks significantly improves a performance and reduces the total resources utilization. # Introduction The Pierre Auger Observatory is a 3000 $km^2$ ground based detector located in Malargüe (Argentina) at 1400 m above the see level and dedicated to ultra high-energy cosmic ray detection. The surface detector of the Pierre Auger Observatory is a 1600 water Cherenkov tank array on a triangular 1.5 km grid. Each tank contains $12 m^3$ of super-pure water. Ultra-relativistic particles of air showers passing through the water generate the Cherenkov light, which is detected by three 9-inch photo-multiplier tubes (PMTs) placed on the top of each tank. Signals are extracted both from the anode and the last dynode of each PMT, amplified to achieve 15-bit dynamic range, filtered by a 5 pole anti-aliasing filters with a cutoff at 20 MHz and finally digitized at 40 MHz by 10-bit FADC [1]. Two different triggers are currently implemented at the $1^{st}$ level. The first uses a Time over Threshold (ToT) trigger, requiring that 13 bins in a 120 bins window be above a threshold of $0.2\ I_{VEM}$ in coincidence on 2 PMTs. The estimated current for a Vertical Equivalent Muon ( $I_{VEM}$ ) is the reference unit for the calibration of FADC traces signals. This trigger has a relatively low rate of about 1.6 Hz, which is the expected rate for dou- ble muons for the Auger tank. It is dedicated for selecting small but spread signals, typical for high energy distant EAS or for low energy showers, while ignoring single muons background. The $2^{nd}$ trigger is a 3-fold coincidence of a simple 1.75 $I_{VEM}$ threshold. This trigger is noisier, with a rate of $\sim$ 100 Hz, but it is needed to detect fast signals (~200 ns) corresponding to the muonic component generated by horizontal showers. However, currently used triggers are not dedicated to very inclined showers, because the threshold 1.75 $I_{VEM}$ is relatively high and the efficiency of detection decreases with the zenithal angle [2]. Those showers should be triggered using their signatures, i.e. a curvature of the shower front. Hadron induced showers with dominant muon component give early peak with a typical rise time of about 15 ns and decay time of the order of 80 ns [3]. The spectral trigger investigating on-line a contribution of spectral components may support the current pure threshold analysis. #### DCT vs. DFT A trigger based on DFT $\ [4]$ has already been implemented in the $3^{rd}$ generation of the Front End Board (FEB) based on Cyclone® Altera® chip [5]. However, for real signal $x_n$ , $\bar{X}_{\frac{N}{2}+k} = \bar{X}_{\frac{N}{2}-k}^*$ and $\frac{N}{2}$ spectral line of $\bar{X}_k$ , k=0,1,...,N-1 is lying on a symmetry axis: the real part is symmetric, the imaginary part is asymmetric. The useful information is contained only in $1^{st}$ $\frac{N}{2}+1$ spectral lines for k=0,1,...,N/2 corresponding to frequencies $f_k=k\cdot f_0=k\frac{1}{N\Delta t}$ , changing from zero to $\frac{f_{smpl}}{2}$ with $\frac{f_{smpl}}{N}$ grid. Discrete Cosine Transform (DCT-II) is defined as follows: $$\bar{X}_k = \alpha_k \sum_{n=0}^{N-1} x_n \cos\left(\frac{\pi}{N}(n+\frac{1}{2})k\right)$$ (1) where $\alpha_0 = \frac{1}{\sqrt{N}}$ and $\alpha_k = \frac{2}{\sqrt{N}}$ for $k \ge 1$ . DCT for real signal $x_n$ gives independent spectral coefficients for $\mathbf{k}=0,1,...,N$ -1, changing $f_k$ also from zero to $\frac{f_{smpl}}{2}$ , but with $\frac{f_{smpl}}{2N}$ grid. DCT vs. DFT gives twice better resolution. # General algorithm Splitting the sum (1) and redefining the indices as well as using a symmetry of the cosine function we can introduce the new set of variables. DCT coefficients can be separated for even and odd indices respectively: $$\bar{X}_{\binom{even}{odd}} = \alpha_k \sum_{n=\binom{0}{N-1}}^{\frac{N}{2} - \binom{1}{0}} A_n \cos\left(\frac{\pi}{N}(n + \frac{1}{2})k\right)$$ Let us notice that (2) for even indices has the same structure as (1) with only shorter range of indices. Recurrently we can introduce new sets of variables for the set of indices k=2p, where p is integer, till $k\leq N.$ In order to use symmetry of trigonometric functions in a maximal way, N should be a power of 2, similarly to Radix-2 approach used in FFT algorithm. If $N=2^q,$ recurrent minimization is possible till p=q. The twiddle factors for successive minimization steps m equal to $\cos\left(\frac{2\pi}{2^q}\frac{2^{p+m}}{2}\right)=-1$ , because the sum of step index m and range factor p is constant and equals to q. For the rest of indices twiddle factor depends on fractional angle $\alpha=\frac{\pi 2^{q-m}}{2N}$ . After the 1st step of minimization, the terms of the sum (2) for odd indices depends on only odd multiplicity of fractional angle $$\bar{X}_k = \alpha_k \sum_{n=N-1}^{N/2-1} A_n \cos\left(\frac{\pi}{2N}(2n+1)k\right) \quad (3)$$ Using a following trigonometric identity $$cos\alpha = \frac{1}{2cos\beta}(cos(\alpha + \beta) + cos(\alpha - \beta)) \quad (4)$$ the fractional angles can be increased by the factor of 2 for $\beta=\frac{k\pi}{2N}$ . Thus: $$\bar{X}_{k} = \frac{\alpha_{k}}{2\cos\left(\frac{k\pi}{2N}\right)} \sum_{n=N-1}^{N/2} A_{n} \times \left[\cos\left(\frac{k\pi(2n+1)}{2N}\right) + \cos\left(\frac{k\pi n}{N}\right)\right]$$ (5) Let us notice that: - 1). $cos(k\pi) = (-1)^k$ , for n = N-1, hence pure $A_n$ coefficient survives, - 2). $cos(\frac{k\pi}{2}) = 0$ , for $n = \frac{N}{2}$ because of odd k, - 3). the rest of indices appear in cosine terms twice in $A_{n+1}$ and $A_n$ coefficients, which allows introducing new set of variables with continuous range, which can be split again on even and odd parts. The above procedure can be repeated recurrently. #### **Timing** Trigger described in [4] based on 16-point FFT algorithm and was clocked with 40 MHz. Analyzing sliding window corresponded to 400 ns. 40 MHz clock was used in all three generations of the 1 st level surface detector trigger of the Pierre Auger Observatory [6, 7, 5]. 400 ns length of a sliding window is sufficient for the preliminary analysis of the muonic bump, especially for the "old" showers. The next $4^{th}$ generation of the Front-End being developed for the Infill Array (dedicated detectors on the Auger South with half a geometrical grid and targeted to lower EAS energy range) [8] as well as being a test platform for the Auger North will be clocked with at least 80 MHz (being developed 160 MHz). 16-point sampling would correspond to 200 ns or 100 ns respectively. A "length" of spikes coming from "old" showers is expected less than 200 ns. So, for 160 MHz sampling rate, when 16-point transform corresponds to 100 ns, an undersampling can be considered for an optimization. # 16-point algorithm The 16-point DCT algorithm has been implemented as the pipeline structure. The scaling procedure was used for odd indices of $X_k$ with the fractional angles $\beta = \frac{k\pi}{32}$ . For even indices $X_k$ can be expressed by variables $C_k$ from the $3^{rd}$ pipeline stage and scaling factors $S_k = \cos\left(\frac{k\pi}{16}\right)$ with the following factorization: $$V = \begin{bmatrix} 1 & 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & -1 \\ 1 & 1 & 0 & 0 & -1 & 1 \\ 1 & 1 & -1 & -1 & 0 & 0 \end{bmatrix} \begin{bmatrix} C_7 \\ S_4 C_5 \\ S_2 C_6 \\ S_6 C_4 \\ S_6 C_6 \\ S_2 C_4 \end{bmatrix}$$ $$(6)$$ A direct hardware implementation requires 5 multiplications. Well known and widely used in numerical calculations AAN algorithm [9], derived from 16-point DFT calculated for real samples, corresponds to the following factorization: $$V = \begin{bmatrix} 1 & 1 & 1 & 0 & -1 \\ 1 & -1 & 0 & -1 & 1 \\ 1 & -1 & 0 & 1 & -1 \\ 1 & 1 & -1 & 0 & 1 \end{bmatrix} \begin{bmatrix} C_7 \\ S_4C_5 \\ (S_2 + S_6)C_6 \\ (S_2 - S_6)C_6 \\ S_6(C_6 - C_4) \end{bmatrix}$$ (7) and requires only 4 multiplications (compare Fig.2 in [9]). A minimization of multiplications amounts is one of a fundamental goal in long-term numerical calculations. Reduction of product terms significantly speed up sophisticated calculations, because a single multiplication requires several clock cycles of processor. Multiplications in powerful FPGA chips can be performed in very fast dedicated DSP blocks in a single clock cycle. Signals processed in parallel threads in a hardware implementation of a pipeline design have to be synchronized to each other. Pipeline approach requires additional shift registers for synchronization also for signal currently not being processed. However, such synchronization needs additional resources. | algo | chip | ALUT | regs | DSP | $f_{reg}$ | |-------|------|------|------|-----|-----------| | rithm | | /LUT | | | (MHz) | | oAAN | S | 175 | 195 | 8 | 300 | | D+E | C | 211 | 195 | 8 | 167 | | oAAN | S | 175 | 170 | 8 | 182 | | DE | C | 199 | 170 | 8 | 110 | | this | S | 155 | 139 | 10 | 310 | | paper | C | 178 | 139 | 10 | 180 | Table 1: Resouces utilization for various algorithms calculated $\bar{X}_4,...,\bar{X}_7$ implemented in EP2S15F484I4 (S) and EP2C35F484I8 (C). The $2^{nd}$ row corresponds to merging two routines D and E in a single pipeline stage. The rest of cases correspond to all pipelined stages. Coefficients in direct (6) approach can be calculated in 3 pipeline stages. Direct implementation of the pure AAN algorithm (7) requires two stages more (see Fig. 2 in [9]), which utilize additional resources of shift registers for synchronization for operations like: X(t+1) = X(t). In a numerical calculation in processors data are simply waiting for a next performance cycle. Additionally, the $lpm\_add\_sub$ megafunction from the Altera® library of parameterized modules (LPM) does not support an inversion of a sum i.e.: $C_4 = -(B_4 + B_5)$ . This operation should be performed in a cascade way by an adder and a sign inversion, otherwise performed in the same clock cycle would significantly slow down a global registered performance. The original AAN algorithm has been optimized for a FPGA environment. A simple implementation of a part of DCT8 algorithm for $\bar{X}_4,...,\bar{X}_7$ shows that a reduction of a single pipeline stage at the expense of one additional multiplication performed in a dedicated DSP block significantly decreases a logic elements utilization as well as significantly improves a registered performance. Table 1 shows that a merging of two routines (D and E) into a single clock one admittedly reduces some registers, but dramatically decreases a speed. Next pipeline stages has been developed according to the fast approach corresponding to the factorization (6). Relations between new variables and coefficients in subsequent pipeline stages are shown in Fig.1. Figure 1: Global pipeline internal structure of 16-point DCT. # **Pedestal** $$\bar{X}_k(ped) = \bar{X}_k + \alpha_k ped \sum_{n=0}^{N-1} F(k,n)$$ where $F(k, n) = cos\left(\frac{k\pi}{N}(n + \frac{1}{2})\right)$ Taking into account a symmetry of cosine func. $$\sum_{n=0}^{N-1} F(k,n) = \sum_{n=0}^{\frac{N}{2}-1} F(k,n) \left(1 + (-1)^k\right) = 0$$ for even k or: $$2 \sum_{n=0}^{\frac{N}{2}-1} F(k,n) \text{ for odd k.}$$ Repeating reccurently, we get finally $\frac{N}{2}=2$ and $k=0,\frac{N}{2}.$ For $k=\frac{N}{2}:\sum_{n=0}^{1}\cos\left(\frac{\pi}{2}(n+\frac{1}{2})\right)=0$ $\bar{X}_k$ are independent of pedestal for $k \geq 1$ . Daily temperature variation in the electronic box may reach more than $50\,^{\circ}\text{C}$ . It causes a variation of the pedestal of 4-5 ADC-counts and may significantly affect the trigger conditions especially for the ToT. Currently, the calibration channel modifies thresholds in order to keep required trigger conditions. The trigger independent of pedestal seems to be better approach than thresholds tuning via the slow calibration loop. # Acknowledgements This work was funded by the Polish Committee of Science under KBN Grant No. 2 P03D 001 24 # References - [1] Pierre Auger Collaboration, *Nucl. Instr. Meth.* **A523**, pp. 50-95, 2004. - [2] D. Allard et. al, in Proc. 29th ICRC, Pune, 2005 - [3] M. Aglietta, et. al, in Proc. 29th ICRC, Pune, 2005. - [4] Z. Szadkowski, Nucl. Instr. Meth., A560, pp. 309-316, 2006. - [5] Z. Szadkowski, K-H. Becker, K-H. Kampert, Nucl. Instr. Meth., A545, pp. 793-802, 2005. - [6] Z. Szadkowski, D. Nitz, Nucl. Instr. Meth., A545, pp. 624-631, 2005. - [7] Z. Szadkowski, Nucl. Instr. Meth., A551, pp. 477-486, 2005. - [8] Z. Szadkowski, to be submitted to *Nucl. Instr. Meth.* - [9] Y. Arai, T. Agui, M. Nakajima. *Trans. IEICE*, E-71, no. 11, pp. 1095-1097, 1988.