Statistics Hands-on session for Lecture 3

Students are required to hand in the exercises and questions below in the form of a SWAN notebook or pdf file (with code snippets) by Wednesday, May 16. Please send the material to Anna Sfyrla (anna.sfyrla@cern.ch) and T.J. Khoo. (Teng.Jian.Khoo@cern.ch).

Advanced hypothesis testing: systematic uncertainties and Look Elsewhere Effect

As you heard in the lecture, a serious attempt at hypothesis tests requires us to take into account a few more effects that can cause us to overestimate the statistical significance of a result:

  • Systematic uncertainties -- these can affect the normalisation of both signal and background as well as correlated information about different observations, whether these are different bins in a cut-and-count experiment, or different data points in an unbinned fit.
  • Look Elsewhere Effect -- the fact that we may have multiple trials of a given analysis increases the probability of observing a statistical fluctuation.

Today you will see how to incorporate treatments of these effects into the statistical models that we built last time.

In [ ]:
# Usual setup
import ROOT
from ROOT import RooFit
ROOT.gStyle.SetOptStat(0) # Disable stat box
ROOT.enableJSVis()
inputfile = ROOT.TFile('files/ttreedata.root')
inputtree = inputfile.Get('Sample1a')
Nevents = inputtree.GetEntries()
c = ROOT.TCanvas('plot','plot',800,600)

w = ROOT.RooWorkspace()
w.factory('x[0.,200.]')
x = w.var('x')
inputdata = ROOT.RooDataSet('data_1a','data_1a',inputtree,ROOT.RooArgSet(x))

Cut & count analysis with systematic uncertainty on the background

We reproduce the single-bin cut & count analysis from the previous demonstration. The first step is to do the sideband fit to extract the background estimate. This time, however, we want to also record how the number of estimated background events in the SR varies depending on the uncertainties on the fitted parameters.

In [ ]:
# Define sidebands & SR
SRmin = 35
SRmax = 50
x.setRange('signalregion',SRmin,SRmax)
x.setRange('sideband_low',0.,SRmin)
x.setRange('sideband_high',SRmax,200.)

# Fit background pdf to sideband data
w.factory('Exponential::bkg(x,tau[-0.1,-0.2,0.])')
w.factory('SUM::bkg_ext(bkgnorm[1.,0.,1000]*bkg)')
bkgpdf = w.pdf('bkg_ext')
sidebanddata = ROOT.RooDataSet('sideband_1a','sideband_1a',inputtree,ROOT.RooArgSet(x),'x<{0} || x>{1}'.format(SRmin,SRmax))
fitresult = bkgpdf.fitTo(sidebanddata,RooFit.PrintLevel(-1),
                         RooFit.Range('sideband_low,sideband_high'),
                         RooFit.Extended())

bkgnorm = w.var('bkgnorm')
bkgnorm_nom = bkgnorm.getVal()
bkgnorm_err = bkgnorm.getError()
print "Estimated total background events: {0:.2f} +/- {1:.2f}".format(bkgnorm_nom, bkgnorm_err)
tau = w.var('tau')
tau_nom = tau.getVal()
tau_err = tau.getError()
print "Background exponential display constant: {0:.2g} +/- {1:.2g}".format(tau_nom, tau_err)

bkgint_SR = bkgpdf.createIntegral(ROOT.RooArgSet(x),ROOT.RooArgSet(x),'signalregion')
Nbkg_vals = {"Nominal":     bkgint_SR.getVal()*bkgnorm_nom,
             "BkgNormUp":   None,
             "BkgNormDown": None,
             "TauUp":       None,
             "TauDown":     None
            }
print "Nominal SR Nbkg: {0:.2f}".format(Nbkg_vals["Nominal"])

We typically refer to the output we get from carrying out steps of our analysis (e.g. background estimates, parameter estimation from MLE) without taking into account systematic uncertainties as the "nominal" result, and then determine the size of our systematic uncertainties by comparing the nominal value with "systematic variations".

In this case, we could evaluate the systematic uncertainties on the background estimate by computing Nbkg where instead of using the nominal bkgnorm and tau parameter values, we add/subtract the fitted uncertainties. These are what we refer to as $\pm 1\sigma$ variations.

Q: Complete the Nbkg_vals dictionary by computing the $\pm 1\sigma$ variations on Nbkg independently for the uncertainties on tau and on bkgnorm.

HINT: You don't need to recreate the integral of bkgpdf for each variation. Try changing the value of tau by calling tau.setVal(...) and see how the return value of bkgint_SR.getVal() changes.

In [ ]:
# Up to you...
In [ ]:
for var,val in sorted(Nbkg_vals.iteritems()):
    percentdelta = 100.*(val/Nbkg_vals["Nominal"]-1)
    print "For variation '{0:12}', SR Nbkg = {1:.2f} (delta = {2:.2f}%)".format(var,val,percentdelta)
In [ ]:
diff_BkgNorm = max(Nbkg_vals["BkgNormUp"]-Nbkg_vals["Nominal"],Nbkg_vals["Nominal"]-Nbkg_vals["BkgNormDown"])
w.factory('prod::delta_BkgNorm(sigma_BkgNorm[{0}],theta_BkgNorm[0.,-5.,5.])'.format(diff_BkgNorm))
diff_Tau = max(Nbkg_vals["TauUp"]-Nbkg_vals["Nominal"],Nbkg_vals["Nominal"]-Nbkg_vals["TauDown"])
w.factory('prod::delta_Tau(sigma_Tau[{0}],theta_Tau[0.,-5.,5.])'.format(diff_Tau))

w.factory('sum::Nbkg_var(Nbkg[{0},0,200],delta_BkgNorm,delta_Tau)'.format(Nbkg_vals["Nominal"]))
w.factory('sum::Nexp(Nsig[0,-20.,100.],Nbkg_var)')
w.factory('Poisson::pois(Nobs[100,0,250],Nexp)')

Now, we carry on with the hypothesis test, but need to extend the PDF to incorporate nuisance parameters for the two uncertainty components we have identified.

To set up a convention, we label these nuisance parameters with "$\theta$", and define them such that a value of $\theta_\mathrm{syst} = \pm1$ corresponds to a $\pm1\sigma$ variation of Nbkg by $\sigma_\mathrm{syst}$, where "syst" can be "Tau" or "BkgNorm". The uncertainties we extracted above are pretty symmetric, so we can be a little conservative and define the $\sigma$ values to be the larger difference between the nominal value of Nbkg and the up/down variations.

Then, we define a poisson pdf like last time, but build the Nexp value from an expression that incorporates the $\theta$ nuisance parameters: $$ N_\mathrm{exp} = N_\mathrm{sig} + N_\mathrm{bkg} + \sum_\mathrm{syst} (\sigma_\mathrm{syst} * \theta_\mathrm{syst}), $$ where $N_\mathrm{bkg}$ of course refers to the nominal background estimate.

Instead of using absolute uncertainties, we could instead compute relative uncertainties on $N_\mathrm{bkg}$ and then apply them using the formula $$ N_\mathrm{exp} = N_\mathrm{sig} + N_\mathrm{bkg} \times \prod_\mathrm{syst} (1 +\sigma_\mathrm{syst}^\mathrm{rel} * \theta_\mathrm{syst}). $$

What remains missing from the model is to allow the nuisance parameters to vary. Recall from the lecture that the standard way to do so is to restrict each one to be normal-distributed (i.e. to follow a gaussian with mean 0 and width 1).

Q: Define a PDF (call it "pois_with_constraint") that applies the gaussian constraints to the two nuisance parameters that we defined above.

In [ ]:
# Up to you...

Now we proceed as before, extracting the Nsig estimate from the observed number of events.

In [ ]:
inputtree.Draw('>>elist','x>{0} && x<{1}'.format(SRmin,SRmax))
elist = ROOT.gDirectory.Get('elist')
NSR = elist.GetN() # Note: this is just a plain float, which we'll use to integrate for the p-value
print '*** Number of total (observed) events in SR:', NSR
Nobs = w.var('Nobs')
Nobs.setVal(NSR)
Nbkg = w.var('Nbkg')

# We can do the following if Nobs > Nbkg
Nsig_cnc = Nobs.getVal() - Nbkg.getVal()
print 'A first estimate of the signal number of events: {0:.2f}'.format(Nsig_cnc)

As discussed in the lecture, we can now use the Profile Likelihood to carry out statistical tests, much as we did with the NLL previously. However, the nuisance parameters inject some elasticity into the likelihood, allowing it to absorb deviations from the model parameters more easily.

As before, we create the NLL, but then proceed to produce the PL from the NLL (or rather, the negative Profile Log Likelihood, abbreviated PLL). To do this, we have to declare the parameters of interest to RooFit, i.e. state which parameters are not nuisance parameters to be profiled.

Is the plot of the PLL compared to the NLL w/o systematics as you would expect it to be?

In [ ]:
Nobs_bkg = ROOT.RooDataSet('Nobs_bkg', 'Nobs_bkg', ROOT.RooArgSet(Nobs))
Nobs_bkg.add(ROOT.RooArgSet(Nobs))
pois_with_constraint = w.pdf('pois_with_constraint')
nll_cnc = pois_with_constraint.createNLL(Nobs_bkg)

Nsig = w.var('Nsig')
Nobs.setVal(NSR)
Nsigframe = Nsig.frame()

# We declared the poisson+constraint PDF to have four parameters: Nsig, Nbkg, theta_BkgNorm, theta_Tau
# So declare the first two as POIs
pll_cnc = nll_cnc.createProfile(ROOT.RooArgSet(Nsig,Nbkg))

# We set the plotted min to 0, as we're only interested in differences wrt the min NLL
nll_cnc.plotOn(Nsigframe,RooFit.ShiftToZero())
pll_cnc.plotOn(Nsigframe,RooFit.LineColor(ROOT.kRed),RooFit.ShiftToZero())
Nsigframe.SetMaximum(10)
Nsigframe.SetMinimum(-1)
Nsigframe.Draw()
c.Draw()

As before, we can extract the p-values for the models with and without systematics. Now, we rely on the stronger version of Wilks's theorem, stating that the $\chi^2$ approximation is still valid in the case of the profile likelihood.

Note: Because evaluating the PLL requires a scan over the NPs, we need to reset the thetas to 0 if we want to evaluate the plain poisson NLL safely.

In [ ]:
w.var('theta_Tau').setVal(0)
w.var('theta_BkgNorm').setVal(0)
Nsig.setVal(0)
nll_0sig = nll_cnc.getVal()
pll_0sig = pll_cnc.getVal()

w.var('theta_Tau').setVal(0)
w.var('theta_BkgNorm').setVal(0)
Nsig.setVal(Nsig_cnc)
nll_obs = nll_cnc.getVal()
pll_obs = pll_cnc.getVal()

# We divide by two because we assigned 50% of the probability to the case
# where the observation is no greater than the background prediction,
# and the chi2 probability is symmetric.
p0_nosyst = ROOT.TMath.Prob( 2*(nll_0sig-nll_obs), 1)/2
z_nosyst = ROOT.RooStats.PValueToSignificance(p0_nosyst)
print "P-value based on DeltaNLL = {0:.3g} ==> {1:.3g} sigma".format(p0_nosyst,z_nosyst)

p0_syst = ROOT.TMath.Prob( 2*(pll_0sig-pll_obs), 1)/2
z_syst = ROOT.RooStats.PValueToSignificance(p0_syst)
print "P-value based on DeltaPLL = {0:.3g} ==> {1:.3g} sigma".format(p0_syst,z_syst)

The uncertainties we determined from the background fit were pretty small, and as a result don't have a very significant impact on the p-value.

Q: How big is the impact on the p-value and significance if we assume a larger uncertainty (e.g. set one of the sigma values to 10% of the background)?

Exercise 1:
It's important, as with other fits, to effectively test the pulls with toys. It's not only the POIs whose pulls are interesting. Generate toys and make histograms of the pulls for the NPs and verify that they are reasonable.

Exercise 2:
When considering systematic uncertainty treatments for multiple signal regions, we also need to consider whether the uncertainties are correlated, uncorrelated or anticorrelated. Build an analysis of a multiple SR bins as in the second exercise, but rather than using Sample1a and Sample1b, define a few (at least 2) cut & count SRs for different ranges in x. Then try a similar analysis to the single bin profile likelihood, and compare the results when you assume (a) uncorrelated systematics across all bins, and (b) correlated systematics across all bins.

This is a very essential, yet quite long exercise. We will get you started in what follows.

Here is the basic poisson product PDF for a three-bin signal region. It is of course up to you to choose where you would like to place the bin boundaries. Feel free to adjust variable ranges depending on which bin boundaries you choose. You could even consider automating the setup such that you could define an arbitrary set of bins.

Question: In a realistic experimental setup, what factors would you need to take into account to determine the minimum bin width that should be used?

In [ ]:
w.factory('sum::Nexp_1(Nbkg_1[50,0,100],Nsig_1[0,0,100])')
w.factory('sum::Nexp_2(Nbkg_2[50,0,100],Nsig_2[0,0,100])')
w.factory('sum::Nexp_3(Nbkg_3[50,0,100],Nsig_3[0,0,100])')

w.factory('Poisson::pois_bin1(Nobs_1[50,0,100],Nexp_1)')
w.factory('Poisson::pois_bin2(Nobs_2[50,0,100],Nexp_2)')
w.factory('Poisson::pois_bin3(Nobs_3[50,0,100],Nexp_3)')

w.factory('PROD::pois_multibin(pois_bin1,pois_bin2,pois_bin3)')

Then, as before, we extract the background estimates and bin counts for each of the signal region bins.

In [ ]:
binrange = [(35,40),(40,45),(45,50)]
Nsig_obs_bins = [0.,0.,0.]
for ibin in range(len(binrange)):
    binxlow,binxhigh = binrange[ibin]
    x.setRange('SR_bin{0}'.format(ibin),binxlow,binxhigh)
    bkgint = bkgpdf.createIntegral(ROOT.RooArgSet(x),ROOT.RooArgSet(x),'SR_bin{0}'.format(ibin))    
    bkgest = bkgint.getVal()*bkgnorm_nom
    
    inputtree.Draw('>>elist','x>{0} && x<{1}'.format(binxlow,binxhigh))
    elist = ROOT.gDirectory.Get('elist')
    bincount = elist.GetN()

    w.var('Nbkg_{0}'.format(ibin+1)).setVal(bkgest)
    w.var('Nobs_{0}'.format(ibin+1)).setVal(bincount)
    
    Nsig_obs_bins[ibin] = bincount - bkgest

Nobs_multibin = ROOT.RooDataSet('Nobs_multibin', 'Nobs_multibin', ROOT.RooArgSet(w.var('Nobs_1'),w.var('Nobs_2'),w.var('Nobs_3')))
Nobs_multibin.add(ROOT.RooArgSet(w.var('Nobs_1'),w.var('Nobs_2'),w.var('Nobs_3')))
nll_multibin = w.pdf('pois_multibin').createNLL(Nobs_multibin)

We can compute the deltaNLL and thereby the p-value and significance as before.

In [ ]:
for ibin in range(3):
    w.var('Nsig_{0}'.format(ibin+1)).setVal(0)
nll_multibin_0sig = nll_multibin.getVal()

for ibin in range(3):
    w.var('Nsig_{0}'.format(ibin+1)).setVal(Nsig_obs_bins[ibin])
nll_multibin_obs = nll_multibin.getVal()

dnll_multibin = nll_multibin_0sig - nll_multibin_obs
p0_multibin = ROOT.TMath.Prob( 2*(dnll_multibin), 1)/2
z_multibin = ROOT.RooStats.PValueToSignificance(p0_multibin)
print 'Discovery p-value for {0}-bin counting experiment = {1:.2g} (Z = {2:.2f})'.format(len(binrange),p0_multibin,z_multibin)

Now, it's your turn! Adapt the code above to do the following:

  • Modify the poisson PDFs to incorporate the bkgnorm and tau nuisance parameters as in the first part of the demo -- do this assuming uncertainties that are uncorrelated across bins, and again assuming uncertainties that are fully correlated across bins.
  • Create the profile likelihood and compute the p-value and significance with the two uncertainty correlation scenarios. Which assumptions cause the uncertainties to have a larger impact?
  • Generate toys and study the pulls of the various nuisance parameters that you have defined in the two cases of correlated and uncorrelated systematics. What do you observe?

Hint: Don't forget that you need to create new variable and pdf names for the cases with the nuisance parameters. It might be helpful to experiment in a standalone python script, or use another notebook where you can keep the code together. If you need to overwrite existing objects in the workspace, either restart the kernel, or call .Delete() on the object first.

End of exercise 2

Look Elsewhere Effect with bump-hunt

We want to return to the bump-hunt (gaussian + exponential fit) we did previously and compare the local and global significance of the observed bump in Sample1a. We will do this in several steps.

First, we set up the signal+background model for the "bump-hunt" analysis that we performed previously, but restrict ourselves to a scan only in the mass of the signal resonance, fixing the width to be a fraction of the mass.

In [ ]:
w.factory('prod::sigma(widthfrac[0.1],mu[40,0,200])')
w.factory('Gaussian::sig(x,mu,sigma)')
w.factory('SUM::splusb(Nsig*sig,bkgnorm*bkg)')
sbpdf = w.pdf('splusb')
fitresult = sbpdf.fitTo(inputdata,ROOT.RooFit.Save(),ROOT.RooFit.PrintLevel(-1))
fitresult.Print()
Nsig_fit = w.var('Nsig').getVal()

xframe = x.frame()
inputdata.plotOn(xframe)
sbpdf.plotOn(xframe)
sbpdf.plotOn(xframe,RooFit.LineStyle(ROOT.kDashed),RooFit.Components('bkg'))
sbpdf.plotOn(xframe,RooFit.LineColor(ROOT.kRed),RooFit.Components('sig'))
xframe.Draw()
c.Draw()

As usual, we extract the p-value and significance for the fitted signal. Remember, this is the local p-value for a single value of the mass parameter (i.e. the MLE).

In [ ]:
nll_fit = sbpdf.createNLL(inputdata)

Nsig.setVal(0)
nll_0sig = nll_fit.getVal()
Nsig.setVal(Nsig_fit)
nll_obs = nll_fit.getVal()

p0_fit = ROOT.TMath.Prob( 2*(nll_0sig-nll_obs), 1)/2
z_fit = ROOT.RooStats.PValueToSignificance(p0_fit)
print "P-value based on DeltaNLL = {0:.3g} ==> {1:.3g} sigma".format(p0_fit,z_fit)

We can scan the mass parameter over the full range and make the local p-value plot, to illustrate how many other significant bumps we might have found satisfying the constraints we placed on our signal. To do so, we need to make mu a constant, so it is not allowed to vary in the fit. We also take the absolute delta NLL, mainly because a negative value cannot be evaluated for a $\chi^2$ probability.

Note: If you are keen-eyed, you might have noticed that we let the signal range go a little bit negative. For a few mu values, this leads to the PLL printing errors because the PDF is undefined, but RooFit can basically cope.

In [ ]:
mu = w.var('mu')
mu.setConstant(True)
from numpy import arange

graph_plocal = ROOT.TGraph(200)
graph_zlocal = ROOT.TGraph(200)
for idx,mu_hypo in enumerate(arange(1.,201.,1.)):
    mu.setVal(mu_hypo)
    fitresult = sbpdf.fitTo(inputdata,ROOT.RooFit.Save(),ROOT.RooFit.PrintLevel(-1))
    Nsig_fit = w.var('Nsig').getVal()
    Nsig.setVal(0)
    nll_0sig = nll_fit.getVal()
    Nsig.setVal(Nsig_fit)
    nll_obs = nll_fit.getVal()
    dnll = abs(nll_0sig-nll_obs)
    p0_local = ROOT.TMath.Prob( 2*dnll, 1)/2
    z_local = ROOT.RooStats.PValueToSignificance(p0_local)
    graph_plocal.SetPoint(idx,mu_hypo,p0_local)
    graph_zlocal.SetPoint(idx,mu_hypo,z_local)

Now we draw the distribution of the local Z significance (feel free to draw the p-value as well if you find that more intuitive). The line at a chosen $Z_\mathrm{test}$ is marked.

In [ ]:
z_test = 1

zframe = ROOT.TH2F("zframe","zframe",1,0,200,1,0,7)
zframe.Draw()
graph_zlocal.SetLineWidth(2)
graph_zlocal.SetLineColor(ROOT.kBlue)
graph_zlocal.Draw("c")
Ztestline = ROOT.TLine(0,z_test,200,z_test)
Ztestline.SetLineWidth(2)
Ztestline.SetLineStyle(ROOT.kDashed)
Ztestline.SetLineColor(ROOT.kRed)
Ztestline.Draw()
c.Draw()

For the asymptotic trial factor estimates, we want to use: $$ N_\mathrm{trials} = 1 + \frac{1}{p_\mathrm{local}} \left\langle N_\mathrm{up} (Z_\mathrm{test}) \right\rangle \exp\left( \frac{Z_\mathrm{test}^2-Z_\mathrm{local}^2}{2} \right) $$ where $N_\mathrm{up}$ is the number of "up-crossings", i.e. the number of times $Z_\mathrm{local} > Z_\mathrm{test}$ (i.e. we count the number of excesses that exceed some test significance, and then use this to calibrate the trial factor.

In [ ]:
isAboveZtest = False
Nup = 0
# We only want to count the crossings, so need to test for when the Zlocal increases past Ztest
for ipoint in xrange(graph_zlocal.GetN()):
    z_local = graph_zlocal.GetY()[ipoint]
    if z_local > z_test:
        if not isAboveZtest:
            Nup += 1
            isAboveZtest = True
    else:
        isAboveZtest = False
print "Number of upcrossings for Ztest={0:.1f} is {1}".format(z_test,Nup)

import math
p0_global = p0_fit + Nup * math.exp( 0.5 * (z_test*z_test - z_fit*z_fit) )
z_global = ROOT.RooStats.PValueToSignificance(p0_global)
Ntrials = p0_global / p0_fit
print "Trials factor for local p0 = {0:.2g} (Z = {1:.2f}) is {2}".format(p0_fit,z_fit,Ntrials)
print "Global p0 = {0:.2g} (Z = {1:.2f})".format(p0_global,z_global)

Not a very impressive result after all! Try changing z_test, and see if the result changes. Also, switch to the high statistics data sample and repeat the process. How much does a factor of 10 increase in statistics improve the global p0? Does the LEE matter more for low- or high-significance results?

Hints: Remember, this is the file called "ttreedata_highstats.root", not "_hs.root". You may need to change some of the allowable ranges for a few fit parameters

Exercise 3:
Now generate background-only pseudodata and determine (a) how often you find an observation at least as large as that seen in "Sample1a", (b) how large the most significant observation is in 100, 1000, 10000 toys. How do you compare these observations with the trials factor we calculated?

Exercise 4:
Repeat the "bump-hunt" analysis using "Sample0" as the input dataset (this we know has no signal), relaxing the constraint on the gaussian width. What is the largest significance bump that you observe? *Bear in mind that you will have to allow both $\mu$ and $\sigma$ to vary across the full range that is being studied*.

Bonus Exercise:
Evaluate the trials factor for the width-agnostic bump hunt, using the asymptotic results for the 2D parameter scan. *This will really be quite a long exercise, mostly due to the computation of the Euler characteristic, so it is really optional.*