-
Notifications
You must be signed in to change notification settings - Fork 18
MSSM Higgs to Tau Tau Analysis
The first step to get the root files containing the shapes of the signal and background processes, as well as the shape uncertainties. This is done using the script scripts/makePlots_datacardsMSSMHtt.py
(TODO: This doesn't work out-of-the-box). The input required for this is the output from running Artus. Note that the discriminating variable used in the HTauTau MSSM analysis is mttot, so be sure to set this as well when producing your own shapes files.
Alternatively, the files produced by other groups are available here on GitLab. You will need to get this repository anyway, as it also contains files about the MSSM scenarios, which are needed later. To copy it into your workspace, use:
git clone https://gitlab.cern.ch/cms-htt/MSSM-Full-2016.git shapes
You should put this "shapes" folder in the "CombineHarvester/MSSMFull2016/" folder (which only exists in the "MSSMFull2016-dev" branch of CombineHarvester).
CombineHarvester is required of all of the following steps. First of all, CombineHarvester also provides a list of tutorials on how to use it for different purposes. These can be found here (though some of them might be outdated). Instructions specifically on the MSSM workflow are also found here.
The production of the (2016 MSSM) datacards is done by running MorphingMSSMFull2016
. There are many options which can be set here. Here is an explanation of the most important ones:
- "input_folder_xy": For the channel xy, use the shape files of the group set by this parameter. The default values are the same ones used for the official analysis. If you want to use your own shape files, make a new folder in "CombineHarvester/MSSMFull2016/shapes/", place the shape-file in that folder and set the corresponding parameter to the name of the folder.
- "postfix": This is the name of the discriminating variable, and should be found on the name of the shape files. Set it to "-mttot".
- "real_data": If this is set to "false" (which is the default), the data is replaced with an Asimov dataset.
- "mass": This isn't what you think it is. If this value is nothing (which is default), the datacards will be created containing contributions from each neutral MSSM boson h, H and A. If the mass is set to "MH", the contribution is combined into one single signal H. This is used for model independent limits.
- "SM125": Whether this parameter is set to "bkg_SM125", "signal_SM125" or "" (default) determines, if the contribution from the SM Higgs is added to the background, to the signal, or not at all.
The output you will get from running MorphingMSSMFull2016
will be datacards for each channel and each category within that channel. These are divided in folders. For example: The "mt" folder contains all datacards for the mt channel, while the "htt_cmb_btag_13TeV" folder contains the datacards of all channels, but only from the btag category. This means you will find the same copies of datacards in different folders, along with other datacards.
Unless you want to work with the output of only one category of one channel, you need to combine many datacards into a single one. This is done with the "T2W" method of combineTool.py
. Use:
combineTool.py -M T2W -i output/mssm_run2/cmb --cc
This will combine all datacards (because the folder "cmb" contains all of them) into a file output/mssm_run2/cmb/combined.txt.cmb
. You might want to rename that file into something more obvious:
mv output/mssm_run2/cmb/combined.txt.cmb output/mssm_run2/htt_cmb_13TeV.txt
(Actually you can just set the output name in the combineTool.py step after "--cc", but for some reason that doesn't work for me)
The next step is to turn the datacard into a workspace. In this step you will also set the MSSM scenario you want to use. This will scale the cross sections and branching ratios of the signal processes for each mA and tanb according to what it should be in the chosen scenario. The command used to do this is called text2workspace.py
. As an example, this command sets the mhmodp scenario on the datacard:
text2workspace.py -b output/mssm_run2/htt_cmb_13TeV.txt -o output/htt.output.mssm.root --default-morphing shape2 -P CombineHarvester.CombinePdfs.MSSM:MSSM --PO filePrefix=$CMSSW_BASE/src/CombineHarvester/MSSMFull2016/shapes/Models/ --PO modelFiles=13TeV,mhmodp_mu200_13TeV.root,1
As input (-b) you give the datacard. The option "-P" refers to the model you want to apply. In this case: In file CombineHarvester/CombinePdfs/python/MSSM.py, the model named "MSSM" is chosen. This model requires additional parameters, that are given with "--PO". "filePrefix" refers to the location of the MSSM scenario files. These files that you find in the GitLab repository above are the same ones you can also find here on the official TWiki page. In the "modelFiles" parameter you specify energy, the name of the file of the MSSM scenario you want to use, and the version number (leave this at "1" for all 13/14 TeV analyses).
Now you can finally start to produce the limit. This step is slightly different depending on whether you want to calculate the limits using the quick asymptotic method, or the much more time consuming and resource heavy (though slightly more correct) toy-based method. I recommend using the toy-based method only when you want to produce a plot for your thesis.
For the asymptotic method, use again combineTool.py
, this time with the method "AsymptoticGrid". The most simple command line may look like this:
combineTool.py -M AsymptoticGrid scripts/mssm_asymptotic_grid.json -d output/htt.output.mssm.root
As input you set the workspace created in the previous step along with the argument "-d". You also need to specify a json file, which contains the options for the limit calculating. Here is a minimal example:
{
"opts" : "--singlePoint 1.0 --redefineSignalPOIs x --setPhysicsModelParameters r=1 --freezeNuisances r",
"POIs" : ["mA", "tanb"],
"grids" : [
["90:500|10", "1:30|1", ""],
["90:500|10", "32:60|2", ""]
],
"hist_binning" : [87, 130, 1000, 60, 1, 60]
}
You should take these values for "opts", "POIs" and "hist_binning" if you make your own json file. The values in "grids" specify for what combinations of mA and tanb a computation should be performed. In this example, calculations are performed between 90 GeV and 500 GeV in steps of 10 GeV on the mA axis. The tanb values range from 1 to 30 in steps of 1, then (for the same mA range) from 32 to 60 in steps of 2. Some of the options in "opts" define, that a MSSMvsSM test is to be performed (another possible test would be MSSMvsBG).
The command above calculates the limits interactivly. If you want to send jobs to NAF, add the following options:
--job-mode 'SGE' --task-name 'AsymptoticGrid' --sub-opts '-l distro=sld6 -l h_vmem=2000M -l h_rt=3:00:00 -j y -o /dev/null' --merge=20 --prefix-file $CMSSW_BASE/src/CombineHarvester/CombineTools/input/job_prefixes/job_prefix_naf.txt
The "task-name" can be anything you want. A runtime of 3h and 2G memory are plenty for asymptotic limits (These setting also put you on the fastest queue on the BIRD server). A single job is the calculation of the CLs value (needed for the limit) in a single mA-tanb point. This takes about a minute. Since this in not efficient in terms of job-submission, you can set with the option "merge" how many jobs you want to combine into a single submitted job.
If you want to run jobs on the rwthcondor instead, you only need to add the option --job-mode 'script' --merge=20
. This will only create the .sh files, which you can give to grid-control. (TODO: I haven't exactly tested this yet)
Once the jobs are done, rerun the same combineTool.py
command. This will either submit new jobs for the mA-tanb points which previously failed, or, if everything went well, will create the file "asymptotic_grid.root" which contains the results of the calculations in each mA-tanb point. Use this file as the input for the last step "Creating the plot".
The general procedure when using the toy-based method is similar. I do not recommend using this for anything other than a final plot for your thesis, as it takes AT LEAST one week to produce an acceptable result. To use it, use the "HybridNewGrid" method of combineTool.py
:
combineTool.py -M HybridNewGrid scripts/mssm_hybrid_grid.json -d output/htt.output.mssm.root
You should NOT use the "merge" option from before, because it will take many hours for a single job to complete. You'll also probably want to use CRAB3 to submit jobs (TODO: How to, I haven't done this with CombineHarvester in a while).
The json file here is a bit more complicated:
{
"verbose" : false,
"opts" : "--testStat=TEV --frequentist --singlePoint 1.0 --saveHybridResult --clsAcc 0 --fork 0 --fullBToys --redefineSignalPOIs x --setPhysicsModelParameters r=1 --freezeNuisances r",
"POIs" : ["mA", "tanb"],
"grids" : [
["90:500|10", "1:30|1", ""],
["90:500|10", "32:60|2", ""]
],
"toys_per_cycle" : 1000,
"min_toys" : 1000,
"max_toys" : 10000,
"signif" : 4.0,
"CL" : 0.95,
"contours" : ["obs", "exp-2", "exp-1", "exp0", "exp+1", "exp+2"],
"zipfile" : "collected.zip",
"statusfile" : "status.json",
"output" : "HybridNewGridMSSM.root",
"output_incomplete" : true
}
The "--testStat" parameter in "opts" is important for toy-based calculation, but for MSSMvsSM tests "TEV" is the usual default. In "toys_per_cycle" you specify how many toys should be generated for each job. In my experience, if you set this to 2500, you can expect your jobs to be finished within 24h.
At first, the workflow is similar to the asymptotic case: run combineTool.py
to submit all of your jobs, and then you rerun it while also adding the "--output" argument to collect the output in a single file named "HybridNewGridMSSM.root". However, reruning combineTool.py
will also submit new jobs of mA-tanb point where more toys are required to get a smooth limit in the end. This means, that additional toys are not computed in an area in the mA-tanb plane which is clearly excluded or not excluded, but instead where it is close to any exclusion limit (including the contour of the error bars). This is determined based on a calculation with the variable "signif" set in the json. A higher value means, that a higher precision (so more toys) is needed in a single mA-tanb point, until it is considered useless to produce more toys for that point. Even so, it won't produce more toys in a single mA-tanb point than given by the "max_toys" variable.
After running combineTool.py
multiple times, you can start using the "--cycles 2" argument (2 or higher). This is basically the opposite of "merge": For those mA-tanb points which require more toys, in this case 2 jobs are submitted, each producing the number of toys specified in "toys_per_cycle".
After each iteration of submitting new jobs, you should create a plot of the current results so you can estimate, how many more times you need to run combineTool.py
.
You can create the plot using the script CombineHarvester/CombineTools/scripts/plotLimitGrid.py
. It doesn't matter at this point, if you used the asymptotic or toy-based method eariler.
python ../../CombineTools/scripts/plotLimitGrid.py asymptotic_grid.root --scenario-label "m_{h}^{mod+}"
Aside from the output of the previous step, you should also add a scenario label.