Compression algorithm

What follow is just the first iteration to try to define this very important algorithm. I appreciate feedback (mailto Nicolas.Produit@cern.ch) and I try to give you all the means to be able to test your own ideas in releasing the most representative data that we have actually.

Description of the algorithm

To know the last word the best it to look inside the programs but I will try to explain it anyway:
The voltage of the shaped signal of each strips is proportional to the charge deposited in the strip. It is read out by a 12 bit ADC. Let call it x(i,n) i is the strip and n is the event number. The signal is over a baseline dependent of the input stage of the ADC and of the particular strip. We try to fix this number around 500 ADC count. We will name pedestal the (mean over n) ADC readout value with no signal, ped(i). The readout has also two different noise contributions for normal strips. First a noise which is correlated on all stips of a VA chip (we think it is correlated on whole hybrid but we always consider it chip wide because each chip can have a different gain, perhaps an other algorith can be designed that use hybrid wide correlation). We call it common noise cn(l,n) l ist VA number and n event number. Second a strip noise we call it sigma(i) i is strip number. Some strips are abnormal because of: bad channel in VA chip, bad bounding to the detector, strip with high reverse current, strip near a high reverse current strip, strip near the border of the chip, short to neighbour. The calibration procedure has the job to measure for each strip: ped(i), sigma(i); for each VA: typical baseline (<cn>(l))and fluctuation thereof (sig(cn)(l))and to know wich are the good and wich are the bad strips. A very usefull measure will be to know the energy deposition to ADC conversion factor (we will call it gain) for each strip. We plan to measure it in AMS but for the moment I have no experiance with it and we will consider it to be constant for all good strips. Bad strip normally have a higher noise (or a very small one for unbonded strips) and their amplification of common noise can be vastely different from good strips. It is therefore very important to know wich are the bad strips if we want to get rid of the common noise. We estimate the common noise in making a chip wide mean of the signal-pedestal on all good strips. If we left in the mean a bad strips in can very well be that the contribution from this only strip completely dominate the mean. The compression algorithm try to find clusters. We know that the charge deposition of a MIPS is mainly in one or two strips (this depend on the readout pitch so it can vary from one place to the other in the detector). The actual algorithm do the following: starting from the 12 bit value read from the strip we substract pedestal, then compute common noise for the chip and substract it we call this quantity signal. signal(i,n)=x(i,n)-ped(i)-cn(l,n) where l is the va coresponding to i. A good strip with signal(i,n) > 5 sigma(i) is the seed of the cluster. We add to the cluster adjacent strips that have signal(i,n) > 3 sigma(i). The sum of the signal over the cluster is the integral, the integral over sqrt of sum sigma(i)**2 is the signal over noise, we compute also the center of gravity as the coordinate of the hit. We can use the signal over noise to further select the cluster and the integral to measure the De/Dx (if we knew the gain we could weight the integral by the gain). The clustering algorithm will update the pedestal after each event in feeding a fraction of the signal to the pedestal if the signal is not too big. This seems to be usefull because we saw strips having very slow time variation. Especially after power up we can see pedestal variation of -10 counts over 10 minutes, then it stabilised. So in fact in the formula for sigma(i,n) above ped is ped(i,n). You can find here the full distribution of the programs that we use on the pc to evaluate the hybrid. You can find here files from a real hybrid/detector (beware that code mentioned above did evolve a lot since those file were made):

calib.cal: calibration file
enoise.lis: event file with no source ~1000 events
cnoise.lis: clustering result of enoise.lis
esignal.lis: event file with Sr90 source ~2000 events random trigger
csignal.lis: clustering result of esignal.lis

The format for event file is one strip per line (one 12 bit number).Each event is 128 lines long. No separation between events. For the format of cluster file look in treat.pas for the routine dump_cluster. All file are for 2 v, clsuter file have a sovern>5 cut. You can find here some plots and kumacs to interpret calibration and cluster files.