Students are required to hand in the exercises and questions below in the form of a SWAN notebook or pdf file (with code snippets) by Wednesday, May 2. Please send the material to Anna Sfyrla (anna.sfyrla@cern.ch) and T.J. Khoo. (Teng.Jian.Khoo@cern.ch).
Last time, we familiarised ourselves with the procedures for fitting and estimating parameter values. We will now apply these techniques to the goal of hypothesis testing, that is comparing two hypotheses to determine which is preferred by the data. For now, we neglect systematic uncertainties, as these will be covered in the last lecture.
The statistical test will determine whether the data show no difference to the "null hypothesis", with a statistically significant difference usually being the interesting outcome. This is established if the data show much better consistency with the "alternative hypothesis".
We will carry out two hypothesis tests:
Note that this notebook won't run all together this time; there are parts of the code for you to fill in (see "Q:").
As usual, we start by importing the ROOT and RooFit modules and extracting the sample data, but this time we want to use Sample1a
.
import ROOT
from ROOT import RooFit
ROOT.enableJSVis()
inputfile = ROOT.TFile('files/ttreedata.root')
inputtree = inputfile.Get('Sample1a')
Nevents = inputtree.GetEntries()
c = ROOT.TCanvas('plot','plot',800,600)
Then, as before, we define the variable x
and generate a dataset from the tree.
w = ROOT.RooWorkspace()
w.factory('x[0.,200.]')
x = w.var('x')
inputdata = ROOT.RooDataSet('data_1a','data_1a',inputtree,ROOT.RooArgSet(x))
Before manipulating any of the data this time, let's focus on the model of signal + background that we want to use. For a cut & count analysis, we have the following basic parameters:
The data will tell us Nobs, which if we theorise correctly satisfies Nobs = Nexp = (Nsig+Nbkg)
. This number is Poisson-distributed, so we can construct the relevant pdf, building the sum Nexp
as the Poisson mean.
w.factory('sum::Nexp(Nsig[0,-100.,100],Nbkg[100,0,500])')
w.factory('Poisson::pois(Nobs[100,0,500],Nexp)')
Now we define the selection (i.e. the range of x in which we will count events).
##### Set the values of SRmin and SRmax here
SRmin = 1
SRmax = 2
To extract the background estimate without biasing ourselves based on the possibility of observing a signal, we can fit the background distribution to sidebands. Here we generate the RooDataSet explicitly excluding the SR region whose boundaries you chose.
sidebanddata = ROOT.RooDataSet('sideband_1a','sideband_1a',inputtree,ROOT.RooArgSet(x),'x<{0} || x>{1}'.format(SRmin,SRmax))
xframe = x.frame()
sidebanddata.plotOn(xframe)
xframe.Draw()
c.Draw()
To carry out the background-only fit, we need build the extended PDF, as we did for the first hands-on session.
bkg
, the extended background pdf bkg_ext
, with decay constant tau
splusb
Nsig
as we did for the cut-and-count model (this is reasonable if our signal region selects most of the events in the signal peak region)bkgnorm
for the total number of background events in the entire range of x
(remember that Nbkg
is specifically the number of background events in the SR).##### Use the workspace factory to generate the relevant pdfs
Now we retrieve these from the workspace as python variables
bkgpdf = w.pdf('bkg_ext')
sbpdf = w.pdf('splusb')
bkgnorm = w.var('bkgnorm')
As before, we can simply fit the plot, but now need to specify the allowed fit ranges. We can attach "ranges" to the variable x
, so that we can exclude portions of the spectrum, e.g. to avoid unblinding the SR.
Beware: fitting more than once can cause the normalisation to be screwed up and increase every time the curve is plotted.
# Limit the fit range, so that the 0 values don't influence the fit
x.setRange('sideband_low',0.,SRmin)
x.setRange('sideband_high',SRmax,200.)
# Now we explicitly run an extended fit
fitresult = bkgpdf.fitTo(sidebanddata,RooFit.PrintLevel(-1),
RooFit.Range('sideband_low,sideband_high'),
RooFit.Extended())
print "Extracted decay constant tau={0:.3g}, total background normalisation bkgnorm={1:.2f}".format(w.var('tau').getVal(),bkgnorm.getVal())
# Add the data and fitted function to the plot
xframe = x.frame()
sidebanddata.plotOn(xframe)
# Plot function with dashed line in full fit range, and solid in fitted range (default)
bkgpdf.plotOn(xframe,RooFit.Range('Full'),RooFit.LineStyle(ROOT.kDashed))
bkgpdf.plotOn(xframe)
xframe.Draw()
c.Draw()
Having completed the sideband fit we can now estimate the normalisation in the signal region. To do so, we need to create an "integral" object over the region of interest, i.e. the SR -- so we need a range for that as well.
The RooFit syntax for generating integrals also requires us to specify over what range we want to normalise the integral (i.e. it will be 1 over this range). So we extract the fraction of the distribution that falls in the SR, and then are able to scale this to the total background normalisation that we got from the fit.
x.setRange('signalregion',SRmin,SRmax)
bkgint_SR = bkgpdf.createIntegral(ROOT.RooArgSet(x),ROOT.RooArgSet(x),'signalregion')
Nbkg = w.var('Nbkg')
Nbkg_cnc = bkgint_SR.getVal() * bkgnorm.getVal()
Nbkg.setVal(Nbkg_cnc)
print 'Integral normalised over x range: {0:.3g}, normalisation: {1:.2f}'.format(bkgint_SR.getVal(),bkgnorm.getVal())
print '*** Number of fitted background events in SR: {0:.2f}'.format(Nbkg.getVal())
For the cut-and-count analysis, we of course need to set the number of signal events. To test whether we have discovered a new particle, we use the following hypotheses:
Now, set Nsig
according to the null hypothesis.
Nsig = w.var('Nsig')
Knowing how much background we expect in the signal region, we can now compare that to what we see in data (Nobs), by extracting the number of events in our input tree satisfying the SR cuts.
In the signal region, we can now compare the number of total observed events and the number of total background events we had estimated with the fit. Following this comparison, we can estimate the p-value.
inputtree.Draw('>>elist','x>{0} && x<{1}'.format(SRmin,SRmax))
elist = ROOT.gDirectory.Get('elist')
NSR = elist.GetN() # Note: this is just a plain float, which we'll use to integrate for the p-value
print '*** Number of total (observed) events in SR:', NSR
Nobs = w.var('Nobs')
Nobs.setVal(NSR)
# We can do the following if Nobs > Nbkg
Nsig_cnc = Nobs.getVal() - Nbkg.getVal()
print 'A first estimate of the signal number of events: {0:.2f}'.format(Nsig_cnc)
Hopefully you got an encouraging result (i.e. a reasonably large Nsig
value). But of course we wish to test this quantitatively with our Poisson background model, so let's conduct our hypothesis test.
In its simplest form, we can state the test in terms of the following question: what is the probability that the the observed value Nobs
originates purely from a statistical fluctuation of the data (the $p$-value)? This defines Nobs-Nbkg
as the "test statistic".
To get a binary answer from our test, we have to establish the significance level $\alpha$ for which we will claim a statistically significant result. Of course, in HEP, the standard thresholds in terms of Gaussian standard deviations are equivalent to 3$\sigma$ for evidence and 5$\sigma$ for observation.
To compute the (one-sided) $p$-value, we integrate the poisson distribution corresponding to our null hypothesis (Nsig=0
) from Nobs
to infinity.
# Get the value of the Poisson distribution for Nbkg, just for plotting purposes
pois = w.pdf('pois')
Nobs.setVal(Nbkg.getVal())
pois_Nbkg = pois.getVal(ROOT.RooArgSet(Nobs))
Nobs.setVal(NSR)
# Set a range to integrate over
# For discovery, we test the hypothesis that the SR is only populated by background.
# The discovery p-value p0 is the probability that the background has fluctuated up to the observed value.
Nobs.setRange('greaterThanObs',NSR,ROOT.RooNumber.infinity())
p0 = pois.createIntegral(ROOT.RooArgSet(Nobs),'greaterThanObs')
z = ROOT.RooStats.PValueToSignificance(p0.getVal())
# Analytically, this would be erfInverse(1-2p0)*sqrt(2) to get the normalisation correct
print "Discovery p-value: {0:.3g}, significance = {1:.3f} sigma".format(p0.getVal(),z)
# Explicitly set the number of bins in the frame
# Otherwise the plot values can go wrong, because the histogram gets rebinned
obsframe = Nobs.frame(500)
Nobs.setRange('toshade',NSR,500)
pois.plotOn(obsframe)
# We draw the pdf in the integrated range, shaded to show the integral.
# We need to renormalise to the integral, otherwise the pdf normalisation
# will be adjusted to unity
pois.plotOn(obsframe,RooFit.FillColor(ROOT.kRed),
RooFit.DrawOption('BC'), RooFit.Range('toshade'),
RooFit.Normalization(p0.getVal()))
# Redraw the full pdf
pois.plotOn(obsframe)
obsframe.Draw()
NSRarrow = ROOT.TArrow(Nbkg.getVal(),pois_Nbkg*0.2,Nbkg.getVal(),0)
NSRarrow.Draw()
c.Draw()
In the above example, we didn't actually bother comparing the null hypothesis with an explicit alternative hypothesis. Given that we are just trying to reject the null hypothesis, what alternative hypothesis could we compare to?
One reasonable choice is -- whatever the data prefers. In other words, we can compare the likelihood under the assumption of no background to that under the assumption that Nsig
is equal to Nobs-Nbkg
. Specifically, we can look at the distribution of the ratio of the two likelihoods -- this is the "likelihood ratio test". So we use the likelihood ratio $\Lambda(N_\mathrm{obs})/\Lambda(N_\mathrm{bkg})$ as the test statistic, rather than simply $N_\mathrm{obs} - N_\mathrm{bkg}$.
It turns out to be easier in RooFit to extract the negative log likelihood (NLL) than the likelihood function itself. Of course, we get back to the likelihood ratio by computing the difference in the logs ($\Delta$NLL). We can plot the NLL for our data as follows:
Nobs_bkg = ROOT.RooDataSet('Nobs_bkg', 'Nobs_bkg', ROOT.RooArgSet(Nobs))
Nobs_bkg.add(ROOT.RooArgSet(Nobs))
nll_cnc = pois.createNLL(Nobs_bkg)
Nsigframe = Nsig.frame()
# We set the plotted min to 0, as we're only interested in differences wrt the min NLL
nll_cnc.plotOn(Nsigframe,RooFit.ShiftToZero())
Nsigframe.SetMaximum(10)
Nsigframe.SetMinimum(-1)
Nsigframe.Draw()
c.Draw()
We have the NLL, now let's extract the $\Delta$NLL, and determine the $p$-value for this. As we have determined that the signal is only positive, we want the test to be one-sided, such we do not disfavour the null hypothesis if we observe fewer events in the signal region than our background prediction.
For how to do this, see https://arxiv.org/abs/1007.1727 Referring to section 2.3, we should define our test statistic as $q_0 = -2 \ln \Lambda(0)$ for a signal strength > 0, and 0 otherwise, where $\Lambda(0)$ is the likelihood value assuming a zero signal. The pdf of $-2 \ln \Lambda(0)$ is approximately chi-square distributed with 1 dof (in the limit of large statistics).
Nsig.setVal(0)
nll_0sig = nll_cnc.getVal()
Nsig.setVal(Nsig_cnc)
nll_obs = nll_cnc.getVal()
# We divide by two because we assigned 50% of the probability to the case
# where the observation is no greater than the background prediction,
# and the chi2 probability is symmetric.
p0_nosyst = ROOT.TMath.Prob( 2*(nll_0sig-nll_obs), 1)/2
print "P-value based on DeltaNLL = {0:.3g}".format(p0_nosyst)
Earlier, you defined a more specific signal model, which we can also use for hypothesis testing. How well do we expect this to behave compared to the cut & count analysis?
Let's now use the detailed signal+background model and fit this to the data.
sbpdf = w.pdf('splusb')
fitresult = sbpdf.fitTo(inputdata,ROOT.RooFit.Save(),ROOT.RooFit.PrintLevel(-1))
fitresult.Print()
Nsig_fit = w.var('Nsig').getVal()
We will now construct the NLL plot and compare it to the one we retrieved from the cut & count procedure.
nll_fit = sbpdf.createNLL(inputdata)
Nsigframe = Nsig.frame()
nll_cnc.plotOn(Nsigframe,RooFit.ShiftToZero(),RooFit.LineColor(ROOT.kRed))
nll_fit.plotOn(Nsigframe,RooFit.ShiftToZero())
Nsigframe.SetMaximum(10)
Nsigframe.SetMinimum(-1)
Nsigframe.Draw()
zeroline = ROOT.TLine(0,0,100,0)
zeroline.Draw()
c.Draw()
What if, instead of trying to establish how significant a signal is in the context of the background-only hypothesis, we have an expected signal, but wish to understand if this is larger than predicted due to the effects of some new physics?
For this we will determine confidence limits, i.e. the range in which the data are expected to fall in a fraction $\alpha$ of trials, where $\alpha$ is the chosen "confidence level". By convention, we typically "exclude" models for which the parameters are outside the 95% confidence level range.
In this case, where we assume new physics must have a positive contribution, our null hypothesis now becomes a signal of a given size, while the alternative hypothesis is a smaller signal strength. Often you will find that the limits are set on a signal strength parameter $\mu$, scaled relative to a nominal cross-section value (which may be the SM or some other specific model).
We can work mostly with the models that we have already built. First, we illustrate two ways to approach the problem using just the naïve Poisson integral:
Nsig
that is exceeded by only 5% of trials, assuming the observed Nsig
to be the true value. (Especially in the case where we have systematics, often we call this a "model-independent upper limit".)Nsig
and find the value for which only 5% of trials would give the observed Nsig
. (This is a model-dependent limit.)# Finding Nsig such that p(Nobs>Nbkg+Nsig|Nexp=Nobs)<5%
Nobs.setVal(NSR)
Nsig.setVal(Nsig_cnc)
Nbkg.setVal(Nbkg_cnc)
# Use numpy to generate a float range
# Obviously the upper limit on Nsig has to be larger than the observed Nsig...
Nsig_95 = Nsig_cnc
from numpy import arange
for Nsig_hypo in arange(int(Nsig_cnc),100.,0.01):
Nobs.setRange('greaterThanHypo',Nbkg_cnc+Nsig_hypo,ROOT.RooNumber.infinity())
p1 = pois.createIntegral(ROOT.RooArgSet(Nobs),'greaterThanHypo')
# Store the values of Nsig until we hit the magic 5% range
if p1.getVal()<0.05:
Nsig_95 = Nsig_hypo
break
obsframe = Nobs.frame(500)
Nobs.setRange('toshade',Nbkg_cnc+Nsig_95,500)
pois.plotOn(obsframe)
# We draw the pdf in the integrated range, shaded to show the integral.
# We need to renormalise to the integral, otherwise the pdf normalisation
# will be adjusted to unity
pois.plotOn(obsframe,RooFit.FillColor(ROOT.kRed),
RooFit.DrawOption('BC'), RooFit.Range('toshade'),
RooFit.Normalization(0.05))
# Redraw the full pdf
pois.plotOn(obsframe)
obsframe.Draw()
c.Draw()
print "95% of trials assuming Nsig equal to the observed value {0:.2f} have Nsig<{1:.2f}".format(Nsig_cnc,Nsig_95)
# Finding Nsig such that p(Nobs>Nbkg+Nsig|Nexp=Nbkg+Nsig)<5%
Nobs.setVal(NSR)
Nbkg.setVal(Nbkg_cnc)
# Use numpy to generate a float range
# Obviously the upper limit on Nsig has to be larger than the observed Nsig...
Nsig_95 = Nsig_cnc
from numpy import arange
for Nsig_hypo in arange(int(Nsig_cnc),100.,0.01):
Nsig.setVal(Nsig_hypo)
p1 = pois.createIntegral(ROOT.RooArgSet(Nobs),'greaterThanObs')
# Store the values of Nsig until we hit the magic 5% range
if p1.getVal()>0.95:
Nsig_95 = Nsig_hypo
break
obsframe = Nobs.frame(500)
Nobs.setRange('toshade',0,Nobs.getVal())
pois.plotOn(obsframe)
# We draw the pdf in the integrated range, shaded to show the integral.
# We need to renormalise to the integral, otherwise the pdf normalisation
# will be adjusted to unity
pois.plotOn(obsframe,RooFit.FillColor(ROOT.kRed),
RooFit.DrawOption('BC'), RooFit.Range('toshade'),
RooFit.Normalization(0.05))
# Redraw the full pdf
pois.plotOn(obsframe)
obsframe.Draw()
c.Draw()
print "95% of trials assuming Nsig equal to the observed value {0:.2f} have Nsig<{1:.2f}".format(Nsig_cnc,Nsig_95)
</div>
Step 1:
Sample1b
tree and generate a corresponding dataset, plotting this data (also defined in terms of the real variable x
).Step 2:
Step 3:
Sample1b
dataset to extract the background prediction for this SR, then with this plot the combined PDF vs the number of events observed in SR1b. Determine the number of signal events in SR1b, based on this background estimate.Step 4:
createHistogram
function of the pdf (https://root.cern.ch/doc/master/classRooAbsReal.html#a552a08367c964e689515f2b5c92c8bbe). This will return a 2D histogram that you can draw as usual in ROOT (to display this, use Draw("colz") to make a contour plot with colours).Step 5:
Now we perform the hypothesis test.
We need to redefine our null hypothesis in this case -- stating that Nsig=Nsig1b=0. For exclusion limits, actually it is a little trickier, as they need to have a consistent signal definition across the two regions.
But for discovery, it's still quite easy for us to do the hypothesis test so long as we use the NLL. Simply integrating over the two Poisson distributions is more difficult, because one has to choose the region to integrate over, and it does not make sense to pick e.g. rectangular cuts.
Generate the NLL for the combined PDF, then draw it as a function of:
Step 6:
Step 7:
Determine the p-value and gaussian significance a second time, now using toys instead of the asymptotic estimate. Think carefully about the appropriate initial values to use. How many toys do you need for a reliable p-value calculation? Does this result agree well with the asymptotic approach?
Note: for performance reasons, it may be good to delete the toy dataset and the associated NLL in each iteration of the loop.