CMS Data Analysis School Pre-Exercises - First Set
Overview
Teaching: 0 min
Exercises: 60 minQuestions
How do you setup a CMSSW release?
How do you find a dataset using the Data Aggregation Service (DAS)?
What are some EDM standalone utilities and what do they do?
What is MiniAOD and how do you use it?
Objectives
Understand how to setup a CMSSW release.
Know how to find a CMS dataset.
Know how to use the EDM utilities to find information about a dataset.
Become familiar with the MiniAOD format.
Introduction
Welcome to the first set of CMS Data Analysis School (CMSDAS) pre-exercises. The purpose of these exercises is to become familiar with the basic software tools required to perform physics analysis at the school. Please run and complete these exercises. Throughout the exercises there will be questions for you to answer. Submit your answers in the online response form available from the course web area - For CMSDAS@CERN 2023, Fermilab, the complete set of links can be found at the CMSDAS pre-exercises indico page. A large amount of additional information about these topics is available in the twikis that we reference. Please remember that twikis evolve but aim to provide the best information at any time.
Note
The CMSDAS exercises (pre-exercises as well as exercises during the school) are intended to be as generic as possible. However, CMSDAS is held at different CMS collaborating institutes - e.g. CERN, the LPC at Fermilab, DESY, etc.) Participants are expected to request and obtain local (at the intended school location) computer accounts well in advance of the school start date, to ensure they will be able to work right away. In the case of the CMSDAS@CERN 2023, the computer account you should use for all exercises is the standard CERN computing account. It is very important for participants to use the pre-exercises as a setup tool, so we recommend to use the same laptop they intend to bring with them at the school (no computer/laptop will be provided at the school), and to connect to the CERN computing resources that will be used for the school.
There are several sets of pre-exercises. As outlined above, if you are going through the pre-exercises in preparation for attending a CMSDAS, we strongly recommend using the laptop you intend to bring to the school and logging into the computing cluster local to the school, as specified below.
Note
Before proceeding with this and the following pre-exercises, make sure that you have gone through all setup steps.
Exercise 1 - Simple cut and paste exercise
This exercise is designed to run only on lxplus as copies of the scripts are present there.
Login to the lxplus cluster. If you are preparing for CMSDAS@CERN 2023, this is the cluster you are supposed to use for the pre-exercises. If you have not used the Linux command line before, you may learn more at WorkBookBasicLinux.
To connect to lxplus service, try the following commands (using Terminal with a Mac/Linux operating system; or putty or cygwin with a Windows operating system):
ssh -Y <YourUsername>@lxplus.cern.ch
replacing <YourUsername>
with your actual username. Enter the password. After a successful login, you should see the following message:
* ********************************************************************
* Welcome to lxplus753.cern.ch, CentOS Linux release 7.9.2009 (Core)
* Archive of news is available in /etc/motd-archive
* Reminder: you have agreed to the CERN
* computing rules, in particular OC5. CERN implements
* the measures necessary to ensure compliance.
* https://cern.ch/ComputingRules
* Puppet environment: production, Roger state: production
* Foreman hostgroup: lxplus/nodes/login
* Availability zone: cern-geneva-b
* LXPLUS Public Login Service - http://lxplusdoc.web.cern.ch/
* An AlmaLinux8 based lxplus8.cern.ch is now available
* An AlmaLinux9 based lxplus9.cern.ch is now available
* Please read LXPLUS Privacy Notice in http://cern.ch/go/TpV7
* ********************************************************************
As the exercises often require copying and pasting from instructions, we will make sure that you will have no problems. To verify if cut and paste to/from a terminal window works, first copy the script runThisCommand.py as follows. Once connected use the following command to copy the runThisCommand.py
script and make it so that the script is executable (Mac/Linux/Windows):
cp /afs/cern.ch/cms/Tutorials/CMSDASatCERN23/runThisCommand.py .
chmod +x runThisCommand.py
Next, cut and paste the following and then hit return:
./runThisCommand.py "asdf;klasdjf;kakjsdf;akjf;aksdljf;a" "sldjfqewradsfafaw4efaefawefzdxffasdfw4ffawefawe4fawasdffadsfef"
The response should be your username followed by alphanumeric string of characters unique to your username, for example for a user named slaurila
:
success: slaurila fynhevyn
If you executed the command without copy-pasting (i.e. only ./runThisCommand.py
without the additional arguments) the command will return:
Error: You must provide the secret key
Alternatively, copying incorrectly (i.e. different arguments) will return:
Error: You didn't paste the correct input string
If you are not running on lxplus7 (for example locally on a laptop), trying to run the command will result in:
bash: ./runThisCommand.py: No such file or directory
or (for example):
Unknown user: slaurila.
Question 1
Post the alphanumeric string of characters unique to your username. For CMSDAS@CERN 2023 please submit your answers for the CMSDAS@CERN 2023 Google Form first set. NOTE, answer only Question 1 at this point. Question 2 in the form is related to the next exercise. There is a one-to-one correspondence between the question numbers here and in the Google Form.
Exercise 2 - Simple edit exercise
This exercise is designed to run only on lxplus.
The purpose of this exercise is to ensure that the user can edit files. We will first copy and then edit the editThisCommand.py script. This means that you need to be able to use one of the standard text editors (emacs, pico, nano, vi, vim, etc.) available on the cluster you are running (lxplus), open a file, edit it and save it!
On the lxplus cluster, run:
cp /afs/cern.ch/cms/Tutorials/CMSDASatCERN23/editThisCommand.py .
Then open editThisCommand.py
with your favorite editor (e.g. emacs -nw editThisCommand.py
) and make sure that the 11th line has #
(hash character) as the first character of the line. If not, explicitly change the following three lines:
# Please comment the line below out by adding a '#' to the front of
# the line.
raise RuntimeError, "You need to comment out this line with a #"
to:
# Please comment the line below out by adding a '#' to the front of
# the line.
#raise RuntimeError, "You need to comment out this line with a #"
Save the file (e.g. in emacs CTRL+x CTRL+s
to save, CTRL+x CTRL+c
to quit the editor) and execute the command:
./editThisCommand.py
If this is successful, the result will again contain your username and another string, i.e. something like:
success: slaurila 0x-7343CEEA
If the file has not been successfully edited, an error message will result such as:
Traceback (most recent call last):
File "./editThisCommand.py", line 11, in ?
raise RuntimeError, "You need to comment out this line with a #"
RuntimeError: You need to comment out this line with a #
Question 2
Paste the line beginning with “success”, resulting from the execution of
./editThisCommand.py
, into the form provided.
Exercise 3 - Setup a CMSSW release area
CMSSW is the CMS SoftWare framework used in our collaboration to process and analyze data. In order to use it, you need to set up your environment and set up a local CMSSW release.
source /cvmfs/cms.cern.ch/cmsset_default.sh
export CMSSW_GIT_REFERENCE=/cvmfs/cms.cern.ch/cmssw.git.daily
source /cvmfs/cms.cern.ch/cmsset_default.csh
setenv CMSSW_GIT_REFERENCE /cvmfs/cms.cern.ch/cmssw.git.daily
Actually you should edit your
~/.tcshrc
file (or~/.bash_profile
if bash is your default shell), create it if you do not have one, to include the above commands so that they are automatically executed after login and you do not have to execute them manually each time you log into the cluster.
For the following exercises, or generally when you start working with larger scripts, code repositories, configuration files and possibly larger input and output files, it is a good idea NOT to do this inside your lxplus home directory, but in an area with more disk space. We won’t stop you if you wish to use your afs user space, but have in mind that you might face a “disk quota full” problem at some point in time. An alternative can for example, on CERN lxplus, arises in the fact that every user has an eos user home directory of the form /eos/user/z/zorro
(for a user named Zorro) that can be used for “heavier” projects.
Now let us proceed with the creation of a working area (called YOURWORKINGAREA in the following):
cd /eos/user/<first-letter-of-username>/<username>
mkdir YOURWORKINGAREA
cd YOURWORKINGAREA
### If you are using Bash shell
export SCRAM_ARCH=slc7_amd64_gcc700
### Alternatively, If you are using the default tcsh shell (or csh shell)
setenv SCRAM_ARCH slc7_amd64_gcc700
### Then, in both cases:
cmsrel CMSSW_10_6_18
cd CMSSW_10_6_18/src
cmsenv
To be able to check out specific CMSSW packages from GitHub, you will need to configure your local account. You only have to do this command once for any given cluster you are working on, such as lxplus:
git config --global user.name "[Name]"
git config --global user.email [Email]
git config --global user.github [Account]
Note
To see your current git configuration you can use the following command:
git config --global -l
More information will be given in the fifth set of pre-exercises.
Now you can initialize the CMSSW area as a local git repository:
git cms-init
This last command will take some time to execute and will produce some long output, be patient.
When you get the prompt again, run the following command:
echo $CMSSW_BASE
Question 3
Paste the result of executing the above command (
echo $CMSSW_BASE
) in the form provided.
Note
The directory (on lxplus)
/eos/user/<initial>/<username>/CMSSW_10_6_18/src
is referred to as your WORKING DIRECTORY.
Every time you log out or exit a session you will need to setup your environment in your working directory again. To do so, once you have executed the steps above for the first time (assuming you have added the source /cvmfs/cms.cern.ch/cmsset_default.(c)sh
in your ~/.tcshrc
or ~/.bash_profile
file), you can simply do:
cd /eos/user/<initial>/<username>/CMSSW_10_6_18/src
cmsenv
And you are ready to go!
Exercise 4 - Find data in the Data Aggregation Service (DAS)
In this exercise we will locate the MC dataset RelValZMM and the collision dataset /DoubleMuon/Run2018A-12Nov2019_UL2018-v2/MINIAOD using the Data Aggregation Service (not to be confused with the Data Analysis School in which you are partaking!).
Go to the DAS webpage. You will be asked for your Grid certificate, which you should have loaded into your browser by now. Also note that there may be a security warning message, which you will need to ignore and still load the page. From there, enter the following into the space provided:
dataset release=CMSSW_10_6_14 dataset=/RelValZMM*/*CMSSW_10_6_14*/MINIAOD*
This will search for datasets, processed with release CMSSW_10_6_14
, which is named like /RelValZMM*/*CMSSW_10_6_14*/MINIAOD*
. The syntax for searches is found here, with many useful common search patterns under “CMS Queries”.
For this query, several results should be displayed (you may be queried for security exceptions in the process). Select (click) on the dataset name /RelValZMM_13/CMSSW_10_6_14-106X_mc2017_realistic_v7-v1/MINIAODSIM and after a few seconds another page will appear.
Question 4.1a
What is the size of this (/RelValZMM_13/CMSSW_10_6_14-106X_mc2017_realistic_v7-v1/MINIAODSIM)dataset in MB? Make sure your answer is only numerical (no units)
Question 4.1b
Click on “Sites” to get a list of sites hosting this data. Is this data available at FNAL or DESY?
Back in the main dataset page, click on the “Files” link to get a list of the ROOT files in our selected dataset. One of the files contained in the dataset should look like this:
/store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root
If you want to know the name of the dataset from the name of a file, one can go to DAS and type:
dataset file=/store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root
and hit “Enter”.
Now we will locate a fresh 2023 collisions dataset using the keyword search, which is often convenient if you know the dataset you are looking for. In this example, the dataset that we are looking for is the “MuonEG” dataset (which contains events with a muon plus an electron or photon).
In DAS type:
dataset=/MuonEG/*Run2023A*/MINIAOD*
and hit “Enter”.
Question 4.2
What release was the dataset /MuonEG/Run2023A-PromptReco-v2/MINIAOD collected in?
Note: If you see more than one release, just answer with a single release.
Having set your CMSSW environment one can also search for the dataset /MuonEG/Run2023A-PromptReco-v2/MINIAOD by invoking the DAS command in your WORKING DIRECTORY. The DAS command dasgoclient
is in the path for CMSSW_9_X_Y versions and above, so you do not need to download anything additional. More about dasgoclient
can be found here.
First, we need to initialize the Grid proxy:
voms-proxy-init --valid 192:00 --voms cms
You will be asked for your grid certificate passphrase. Then you can execute the query with:
dasgoclient --query="dataset=/MuonEG/Run2023A-PromptReco-v2/MINIAOD" --format=plain
You will see something like:
/MuonEG/Run2023A-PromptReco-v2/MINIAOD
More information about accessing data in the Data Aggregation Service can be found in WorkBookDataSamples.
Exercise 5 - Event Data Model (EDM) standalone utilities
The overall collection of CMS software, referred to as CMSSW, is built around a framework, an Event Data Model (EDM), and services needed by the simulation, calibration and alignment, and reconstruction modules that process event data so that physicists can perform analysis. The primary goal of the Framework and EDM is to facilitate the development and deployment of reconstruction and analysis software. The EDM is centered around the concept of an Event. An Event is a C++ object container for all RAW and reconstructed data related to a particular recorded collision. To understand what is in a data file and more, several EDM utilities are available. In this exercise, one will use three of these EDM utilities. They will be very useful at CMSDAS and after. More about these EDM utilities can be found at WorkBookEdmUtilities. These together with the GitHub web interface for CMSSW and the CMS LXR Cross Referencer are very useful to understand and write CMS code.
AAA and xrootd
Since the various datasets listed in CMSDAS and needed for data analysis may be stored on different grid sites around the world, CMS has implemented a service known as Any Data, Anytime, Anywhere (AAA), which is an implementation of a more generic xrootd service. It allows analysis of CMS data located at any grid site with bare ROOT or the CMSSW/FWLite environment, without downloading it to your local storage space.
The AAA service works via so-called redirectors, which are intermediate servers that automatically find the physical location of the given file and transmit it to you. Which redirector you use depends on your region, to minimize the distance over which the data must travel and thus minimize the reading latency. These “regional” redirectors will try file locations in your region first before trying to go overseas.
If you are working in the US, it is best to use the redirector cmsxrootd.fnal.gov
, while in Europe and Asia, it is best to use xrootd-cms.infn.it
. There is also a “global redirector” at cms-xrd-global.cern.ch
which will query all locations.
In the examples below, cms-xrd-global.cern.ch
is always used, but feel free to replace that with a choice more appropriate for your region.
To open a file from the MuonEG
2023A file (stored at CERN), with ROOT:
root -l
TFile *f =TFile::Open("root://cms-xrd-global.cern.ch///store/data/Run2023A/MuonEG/MINIAOD/PromptReco-v2/000/366/323/00000/f2b1462f-6d41-4b11-b8e3-7624af2e29bf.root");
If this works correctly, you should see a long list of warnings about missing dictionaries, such as:
Warning in <TClass::Init>: no dictionary for class pat::TauJetCorrFactors is available```
{: .output}
Soon we will learn how to properly deal with the NanoAOD file format. Similarly, you can open the `RelValZMM_13` file that we previously located at FNAL:
```shell
TFile *f =TFile::Open("root://cms-xrd-global.cern.ch///store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root");
You can quit the ROOT command line with:
.q
edmDumpEventContent
Next we will use edmDumpEventContent
to dump a summary of the products that are contained within the file we’re interested in. We will be able to see what class names etc. to use in order to access the objects in the MiniAOD file.
If you want to look at a specific object (say, slimmedMuons), then execute:
edmDumpEventContent --all --regex slimmedMuons root://cms-xrd-global.cern.ch//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root
This will return:
Type Module Label Process Full Name
-----------------------------------------------------------------------------------------
edm::RangeMap<CSCDetId,edm::OwnVector<CSCSegment,edm::ClonePolicy<CSCSegment> >,edm::ClonePolicy<CSCSegment> > "slimmedMuons" "" "RECO" CSCDetIdCSCSegmentsOwnedRangeMap_slimmedMuons__RECO
edm::RangeMap<DTChamberId,edm::OwnVector<DTRecSegment4D,edm::ClonePolicy<DTRecSegment4D> >,edm::ClonePolicy<DTRecSegment4D> > "slimmedMuons" "" "RECO" DTChamberIdDTRecSegment4DsOwnedRangeMap_slimmedMuons__RECO
vector<pat::Muon> "slimmedMuons" "" "RECO" patMuons_slimmedMuons__RECO
The output of edmDumpEventContent
has information divided into four variable width columns. The first column is the C++ class type of the data, the second is the module label, the third is the product instance label, and the fourth is the process name. More information is available at Identifying Data in the Event.
Instead of the above, let us try without the option --regex slimmedMuons
. This will dump the entire event content - a file with many lines. For this reason we’ll send the output to a file called EdmDumpEventContent.txt
with a UNIX output redirection command (then you can inspect the file with your favorite editor or with less EdmDumpEventContent.txt
:
edmDumpEventContent root://cms-xrd-global.cern.ch//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root > EdmDumpEventContent.txt
Question 5.1a
How many modules produce products of type vector in this particular MiniAOD file?
Note: We mean a plain
std::vector
, not aBXVector
or any other type.
Question 5.1b
What are the names of (any) three of the modules that produce products of type vector?
edmProvDump
To aid in understanding the full history of an analysis, the framework accumulates provenance for all data stored in the standard ROOT output files. Using the command edmProvDump
one can print out all the tracked parameters used to create the data file. For example, one can see which modules were run and the CMSSW version used to make the MiniAOD file. In executing the command below it is important to follow the instructions carefully, otherwise a large number of warning messages may appear. The ROOT warning messages can be ignored.
To do this on lxplus execute:
edmProvDump root://cms-xrd-global.cern.ch//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root > EdmProvDump.txt
Note
EdmProvDump.txt is a very large file of the order of 40000-60000 lines. Open and look at this file and locate Processing History (about 20-40 lines from the top).
Question 5.2
Which version of CMSSW was used to produce the MiniAOD file? The answer will take the form CMSSW_X_Y_Z, where you will need to fill in the X, Y, and Z with the correct numerical values.
edmEventSize
Finally we will execute edmEventSize
to determine the size of different branches in the data file. Further details about this utility may be found at SWGuideEdmEventSize. edmEventSize
isn’t actually a ‘Core’ helper function (anyone can slap ‘edm’ on the front of a program in CMSSW).
At lxplus execute the following command:
edmEventSize -v `edmFileUtil -d root://cmsxrootd-site.fnal.gov//store/relval/CMSSW_10_6_14/RelValZMM_13/MINIAODSIM/106X_mc2017_realistic_v7-v1/10000/0EB976F4-F84B-814D-88DA-CB2C29A52D72.root` > EdmEventSize.txt
Question 5.3
What is the number of events processed (contained in this file) if you execute the edmEventSize command at lxplus?
Open and look at file EdmEventSize.txt and locate the line containing the text patJets_slimmedJetsPuppi__RECO
. There are two numbers following this text that measure the plain and the compressed size of this branch.
Question 5.4
What are the plain and compressed size numbers for the branch
patJets_slimmedJetsPuppi__RECO
in this file?
Exercise 6 - Get familiar with the MiniAOD format
Analyzing physics data at CMS is a very complicated task involving multiple steps, sharing of expertise, cross checks, and comparing different analysis. To maximize physics productivity, CMS developed a high-level data tier MiniAOD in 2014 to serve the needs of mainstream physics analyses while keeping a small event size (30-50 kb/event), with easy access to the algorithms developed by Physics Objects Groups (POGs) in the framework of the CMSSW offline software. The production of MiniAODs is done centrally for common samples. MiniAOD samples are commonly used for Run-2 physics analyses. More information about MiniAOD can be found in WorkBookMiniAOD.
Note
A new, even more compact data tier called NanoAOD has been developed more recently. The goal of this tier is to centralize the ntuple production of ~50% of analyses and to keep the event size below 2kb/event. This pre-exercise will not cover the use of NanoAOD, but you will get familiar with it during the school week. More information can be found at WorkBookNanoAOD.
The main contents of the MiniAOD are:
- High level physics objects (leptons, photons, jets, ETmiss), with detailed information in order to allow e.g. retuning of identification criteria, saved using PAT dataformats. Some preselection requirements are applied on the objects, and objects failing these requirements are either not stored or stored only with a more limited set of information. Some high level corrections are applied: L1+L2+L3(+residual) corrections to jets, type1 corrections to ETmiss.
- The full list of particles reconstructed by the ParticleFlow, though only storing the most basic quantities for each object (4-vector, impact parameter, pdg id, some quality flags), and with reduced numerical precision; these are useful to recompute isolation, or to perform jet substructure studies. For charged particles with pT > 0.9 GeV, more information about the associated track is saved, including the covariance matrix, so that they can be used for b-tagging purposes.
- MC Truth information: a subset of the genParticles enough to describe the hard scattering process, jet flavour information, and final state leptons and photons; GenJets with pT > 8 GeV are also stored, and so are the other mc summary information (e.g event weight, LHE header, PDF, PU information). In addition, all the stable genParticles with mc status code 1 are also saved, to allow reclustering of GenJets with different algorithms and substructure studies.
- Trigger information: MiniAOD contains the trigger bits associated to all paths, and all the trigger objects that have contributed to firing at least one filter within the trigger. In addition, we store all objects reconstructed at L1 and the L1 global trigger summary, and the prescale values of all the triggers.
Please note that the files used in the following are from older releases, but they still illustrate the points they intended to. To avoid the fact that RelVal files (produced to validate new release in the rapid CMSSW development cycle) become unavailable on a short (month) timescale, a small set of files have been copied to the CERN EOS storage. They are available at root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/
.
The Z to dimoun MC file root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root
is made in CMSSW_7_3_0_pre1
release and the datafile root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_Data_706_MiniAOD.root
made from the collisions dataskim /DoubleMu/CMSSW_7_0_6-GR_70_V2_AN1_RelVal_zMu2011A-v1/MINIAOD.
In your working directory, open the root file root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root
. Begin by opening ROOT:
root -l
## Note If you already have a custom
.rootrc
or.rootlogon.C
, you can start ROOT without them by using the commandroot -l -n
.
On the ROOT prompt, type (or copy-paste) the following:
gSystem->Load("libFWCoreFWLite.so");
FWLiteEnabler::enable();
gSystem->Load("libDataFormatsFWLite.so");
gROOT->SetStyle ("Plain");
gStyle->SetOptStat(111111);
TFile *theFile = TFile::Open("root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root");
TBrowser b;
Note
The TBrowser is a graphical ROOT file browser. It runs on the computer, where you started ROOT. Its graphical interface needs to be forwarded to your computer. This can be very slow. You either need a lot of patience, a good connection or you can try to run ROOT locally, copying the root files that are to be inspected. Since everyone is running a different operating system on their local computer, we do not support the setup of ROOT on your local computer. However, instructions exist on the official ROOT website.
Note
You can start the ROOT interpreter and open the file in a single step by doing:
root -l <filename>
This may have some issues when using the xrootd redirector, here we are avoiding that by directly addressing the file at FNAL.
To be able to use the member functions of a CMSSW data class from within ROOT, a ‘dictionary’ for that class needs to be available to ROOT. To obtain that dictionary, it is necessary to load the proper library into ROOT. The first three lines of the code above do exactly that. More information is at WorkBookFWLiteExamples. Note that gROOT->SetStyle ("Plain");
sets a plain white background for all the plots in ROOT.
Note
If the rootlogon.C is created in the home area, and the above five lines of code (fifth line is
gStyle
) are in that file, the dictionary will be obtained, and all the plots will have a white background automatically upon logging in to ROOT.
Now a ROOT browser window opens and looks like this (“Root Files” may or may not be selected):
In this window click on ROOT Files
on the left menu and now the window looks like this:
Double-click on the ROOT file you opened: root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root
, then Events
, then scroll down and click patMuons_slimmedMuons__PAT
(or the little + that appears next to it), and then patMuons_slimmedMuons__PAT.obj
. A window appears that looks like this:
Scroll a long way down the file (not too fast) and click on pt()
. A PAT Muon Pt distribution will appear. These muons have been produced in the Z to mumu interactions as the name of the data sample implies.
Question 6.1
What is the mean value of the muon pt for this file (root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root`)?
Note
To exit ROOT simply type
.q
in the command line.
Now open the data file root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_Data_706_MiniAOD.root
. Similarly run the following command, and answer the following question:
root -l
On the ROOT prompt type the following:
gSystem->Load("libFWCoreFWLite.so");
FWLiteEnabler::enable();
gSystem->Load("libDataFormatsFWLite.so");
gROOT->SetStyle ("Plain");
gStyle->SetOptStat(111111);
TFile *theFile = TFile::Open("root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_Data_706_MiniAOD.root");
TBrowser b;
Question 6.2
What is the mean value of the muon pt for the collision data (current file)?
Remember
Be sure to submit your answers to the Google Form first set, then proceed to the second set.
Helpful Hint
Rather than using the TBrowser, you can perform the drawing action using ROOT interpreter. An example is shown below:
root -l root://cms-xrd-global.cern.ch//store/user/cmsdas/2022/pre_exercises/Set1/CMSDataAnaSch_MiniAODZMM730pre1.root Events->Draw("patMuons_slimmedMuons__PAT.obj.pt()")
Key Points
Setting up CMSSW requires some environment setup and the
cmsrel
command.You can use the web portal for DAS or the dasgoclient to find information about a given dataset.
There are several utilities for gaining insight into EDM ROOT files.
CMS Data Analysis School Pre-Exercises - Second Set
Overview
Teaching: 0 min
Exercises: 30 minQuestions
How to slim a MiniAOD file?
How to know the size of a MiniAOD file?
How to use FWLite to analyze data and MC?
Objectives
Learn how to reduce the size of a MiniAOD by only keeping physics objects of interest.
Learn how to determine the size of a MiniAOD file using EDM standalone utilities
Learn to use FWLite to perform simple analysis.
Introduction
Welcome to the second set of CMSDAS pre-exercises. As you know by now, the purpose of the pre-workshop exercises is for prospective workshop attendees to become familiar with the basic software tools required to perform physics analysis at CMS before the workshop begins. Post the answers in the online response form available from the course web area:
Indico page
The Second Set of exercises begins with Exercise 7
. We will use Collision data events and simulated events (Monte Carlo (MC)). To comfortably work with these files, we will first make them smaller by selecting only the objects that we are interested in (electrons and muons in our case)
The collision data events are stored in DoubleMuon.root
. DoubleMuon refers here to the fact, that when recording these events, we believed that there are two muons in the event. This is true most of the time, but other objects can fake muons, hence at closer inspection we might find events that actually don’t have two muons.
The MC file is called DYJetsToLL. You will need to get used to cryptic names like this if you want to survive in the high energy physics environment! The MC file contains Drell Yan events, that decay to two leptons and that might be accompanied by one or several jets.
Exercises 8
and Exercise 9
are using FWLite (Frame Work Lite). This is an interactive analysis tool integrated with the CMSSW EDM (Event Data Model) Framework. It allows you to automatically load the shared libraries defining CMSSW data formats and the tools provided, to easily access parts of the event in the EDM format within ROOT interactive sessions. It reads produced ROOT files, has full access to the class methods and there is no need to write full-blown framework modules. Thus having FWLite distribution locally on the desktop one can do CMS analysis outside the full CMSSW framework. In these two exercises, we will analyze the data stored in a MiniAOD sample using FWLite. We will loop over muons and make a Z mass peak.
We assume that having done the first set of pre-exercises by now, one is comfortable with logging onto cmslpc-sl7.fnal.gov
and setting up the cms environment.
Exercise 7 - Slim MiniAOD sample to reduce its size by keeping only Muon and Electron branches
In order to reduce the size of the MiniAOD we would like to keep only the slimmedMuons and slimmedElectrons objects and drop all others. The config files should now look like slimMiniAOD_MC_MuEle_cfg.py and slimMiniAOD_data_MuEle_cfg.py. To work with this config file and make the slim MiniAOD, execute the following steps in the directory YOURWORKINGAREA/CMSSW_10_6_18/src
Cut and paste the script slimMiniAOD_MC_MuEle_cfg.py and slimMiniAOD_data_MuEle_cfg.py in its entirety and save it with the same name. Open with your favorite editor and take a look at these python files. The number of events has been set to 1000:
process.maxEvents = cms.untracked.PSet( input = cms.untracked.int32(1000) )
To run over all events in the sample, one can change it to -1.
Now run the following command:
cmsRun slimMiniAOD_MC_MuEle_cfg.py
This produces an output file called slimMiniAOD_MC_MuEle.root
in your $CMSSW_BASE/src
area.
Now run the following command:
cmsRun slimMiniAOD_data_MuEle_cfg.py
This produces an output file called slimMiniAOD_data_MuEle.root
in your $CMSSW_BASE/src
area.
On opening these two MiniAODs one observes that only the slimmedMuons and the slimmedElectrons objects are retained as intended.
To find the size of your MiniAOD execute following Linux command:
ls -lh slimMiniAOD_MC_MuEle.root
and
ls -lh slimMiniAOD_data_MuEle.root
You may also try the following:
To know the size of each branch, use the edmEventSize
utility as follows (also explained in First Set of Exercises):
edmEventSize -v slimMiniAOD_MC_MuEle.root
and
edmEventSize -v slimMiniAOD_data_MuEle.root
To see what objects there are, open the ROOT file as follows and browse to the MiniAOD samples as you did in Exercise 6:
Here is how you do it for the output file slimMiniAOD_MC_MuEle.root
root -l slimMiniAOD_MC_MuEle.root;
TBrowser b;
OR
root -l
TFile *theFile = TFile::Open("slimMiniAOD_MC_MuEle.root");
TBrowser b;
To quit ROOT application, execute:
.q
Remember
For CMSDAS@CERN2023 please submit your answers at the Google Form second set.
Question 7.1a
What is the size of the MiniAOD
slimMiniAOD_MC_MuEle.root
in MB? Make sure your answer is only numerical (no units).
Question 7.1b
What is the size of the MiniAOD
slimMiniAOD_data_MuEle.root
in MB? Make sure your answer is only numerical (no units).
Question 7.2a
What is the mean eta of the muons for MC?
Question 7.2b
What is the mean eta of the muons for data?
Question 7.3a
What is the size of the slimmed output file compared to the original sample?
Compare one of your slimmed output files to the original MiniAOD file it came from. To find sizes of the files in EOS, you can use e.g., edmFileUtil -l root://cms-xrd-global.cern.ch///store/user/filepath/filename.root
with the appropriate path and filename.
Question 7.3b
Is the mean eta of muons for MC and data the same as in the MC and data samples in Exercise 6?
Exercise 8 - Use FWLite on the MiniAOD created in Exercise 7 and make a Z Peak (applying pt
and eta
cuts)
FWLite (pronounced “framework-light”) is basically a ROOT session with CMS data format libraries loaded. CMS uses ROOT to persistify data objects. CMS data formats are thus “ROOT-aware”; that is, once the shared libraries containing the ROOT-friendly description of CMS data formats are loaded into a ROOT session, these objects can be accessed and used directly from within ROOT like any other ROOT class!
In addition, CMS provides a couple of classes that greatly simplify the access to the collections of CMS data objects. Moreover, these classes (Event and Handle) have the same name as analogous ones in the Full Framework; this mnemonic trick helps in making the code to access CMS collections very similar between the FWLite and the Full Framework.
In this exercise we will make a ZPeak
using our data and MC sample. We will use the corresponding slim MiniAOD created in Exercise 7. To read more about FWLite, have a look at Section 3.5
of Chapter 3
of the WorkBook.
We will first make a ZPeak
. We will loop over the slimmedMuons
in the MiniAOD and get the mass of oppositely charged muons. These are filled in a histogram that is written to an output ROOT file.
First make sure that you have the MiniAODs created in Exercise 7. They should be called slimMiniAOD_MC_MuEle.root
and slimMiniAOD_data_MuEle.root
.
Go to the src area of current CMSSW release
cd $CMSSW_BASE/src
The environment variable CMSSW_BASE will point to the base area of current CMSSW release.
Check out a package from GitHub.
Make sure that you get github setup properly as in obtain a GitHub account. It’s particularly important to set up ssh keys so that you can check out code without problems: https://help.github.com/articles/generating-ssh-keys
To check out the package, run:
git cms-addpkg PhysicsTools/FWLite
Then to compile the packages, do
scram b
cmsenv
Note
You can try
scram b -j 4
to speed up the compiling. Here-j 4
will compile with 4 cores. When occupying several cores to compile, you will also make the interactive machine slower for others, since you are using more resources. Use with care!
Note 2
It is necessary to call cmsenv again after compiling this package because it adds executables in the
$CMSSW_BASE/bin
area.
To make a Z peak, we will use the FWLite executable called FWLiteHistograms
. The corresponding code should be in $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc
With this executable we will use the command line options. More about these can be learned from SWGuideCommandLineParsing.
To make a ZPeak
from this executable, using the MC MiniAOD, run the following command (which will not work out of the box, see below):
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100
You can see that you will get the following error
terminate called after throwing an instance of 'cms::Exception'
what(): An exception of category 'ProductNotFound' occurred.
Exception Message:
getByLabel: Found zero products matching all criteria
Looking for type: edm::Wrapper<std::vector<reco::Muon> >
Looking for module label: muons
Looking for productInstanceName:
The data is registered in the file but is not available for this event
This error occurs because your input files slimMiniAOD_MC_MuEle.root
is a MiniAOD and does not contain reco::Muon whose label is muons. It contains, however, slimmedMuons (check yourself by opening the root file with ROOT browser). However, in the code FWLiteHistograms.cc there are lines that say:
using reco::Muon;
and
event.getByLabel(std::string("muons"), muons);
This means you need to change reco::Muon
to pat::Muon
, and muons
to slimmedMuons
.
To implement these changes, open the code $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteHistograms.cc
. In this code, look at the line that says:
using reco::Muon;
and change it to
using pat::Muon;
and in this:
event.getByLabel(std::string("muons"), muons);
and change it to:
event.getByLabel(std::string("slimmedMuons"), muons);
Now you need to re-compile:
scram b
Now again run the executable as follows:
FWLiteHistograms inputFiles=slimMiniAOD_MC_MuEle.root outputFile=ZPeak_MC.root maxEvents=-1 outputEvery=100
You can see that now it runs successfully and you get a ROOT file with a histogram called ZPeak_MC.root. Open this ROOT file and see the Z mass peak histogram called mumuMass. Answer the following question.
Question 8.1a
What is mean mass of the ZPeak for your MC MiniAOD?
Question 8.1b
How can you increase statistics in your ZPeak histogram?
Now a little bit about the command that you executed.
In the command above, it is obvious that slimMiniAOD_MC_MuEle.root
is the input file, ZPeak_MC.root
is output file. maxEvents
is the events you want to run over. You can change it any other number. The option -1
means running over all the events which is 1000 in this case. outputEvery
means after how any events should the code report the number of event being processed. As you may have noticed, as you specified, when your executable runs, it says processing event:
after every 100 events.
If you look at the code FWLiteHistograms.cc , it also contains the defaults corresponding to the above command line options. Answer the following question:
Question 8.2
What is the default name of the output file?
Exercise 9 - Re-run the above executable with the data MiniAOD
Re-run the above executable with the data MiniAOD file called slimMiniAOD_data_MuEle.root
as follows:
FWLiteHistograms inputFiles=slimMiniAOD_data_MuEle.root outputFile=ZPeak_data.root maxEvents=-1 outputEvery=100
This will create an output histogram ROOT
file called ZPeak_data.root
Then answer the following question.
Question 9a
What is mean mass of the ZPeak for your data MiniAOD?
Question 9b
How can you increase statistics in your ZPeak histogram?
Key Points
A MiniAOD file can be
slimmed
by just retaining physics objects of interest.EDM standalone utilities can be used to determine the size of MiniAOD files.
FWLite is a useful tool to perform simple analysis on a MiniAOD file.
CMS Data Analysis School Pre-Exercises - Third Set
Overview
Teaching: 0 min
Exercises: 240 minQuestions
How do I do an analysis with so much data that I cannot run it interactively on my computer?
What is CRAB? How do I use it to run an analysis on the grid?
How do configuration files look like?
How do I extract the luminosity of the dataset I analyzed?
Objectives
Become familiar with the basic Grid tools used in CMS for user analysis
Learn about grid certificate usage
Know what CRAB is and how to use it for your analysis
Know how to use BRILcalc to extract luminosities
Introduction
This is the third set of CMSDAS exercises. The purpose of these exercises are for the workshop attendees to become familiar with the basic Grid tools used in CMS for user analysis. Please run and complete each of these exercises. However, unlike the previous sets of exercises, this set will take considerably longer. Having your storage space set up may take several days, Grid jobs run with some latency, and there can be problems. You should set aside about a week to complete these five exercises. The actual effort required is not the whole week but a few hours (more than the previous two sets). If, at any time problems are encountered with the exercise please e-mail cmsdas-cern-organizers@cern.ch with a detailed description of your problem. For CRAB questions unrelated to passing these exercises, to send feedback and ask for support in case of CRAB related problems, please consult the CRAB troubleshooting twiki. All CRAB users should subscribe to the very useful hn-cms-computing-tools@cern.ch hypernews forum.
Note
This section assumes that you have access to lxplus at CERN. Learn more about lxplus here and the lxplus knowledge guide.
Later on, you can check with your university contact for Tier 2 or Tier 3 storage area. Once you are granted the write permission to the specified site, for later analysis you can use CRAB as the below exercise but store the output to your Tier 2 or Tier 3 storage area.
AGAIN: To perform this set of exercises, lxplus access, Grid Certificate, and CMS VO membership are required. You should already have these things, but if not, follow these instructions from the first set of exercises.
Question
Questions for each exercise are in boxes such as this.
For CMSDAS@CERN 2023 please submit your answers for the CMSDAS@CERN Google Form third set.
Support
There is a dedicated Mattermost team, called CMSDAS@CERN 2023, setup to facilitate communication and discussions via live chat (which is also archived). You will need your CERN login credentials (SSO) and you will need to join the private CMSDAS@CERN 2023 team in order to be able to see (or find using the search channels functionality) the channels setup for communications related to the school. The sign-up link is here and the Pre-exercises channel can be found here.
Exercise 10 - Verify your grid certificate is OK
This exercise depends on obtaining a grid certificate and VOMS membership, but does not depend on any previous exercises. After you’ve installed your grid certificate, you need to verify it has all the information needed.
Login to lxplus.cern.ch and initialize your proxy:
voms-proxy-init -voms cms
Then run the following command:
voms-proxy-info -all | grep -Ei "role|subject"
The response should look like this:
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=vmilosev/CN=757854/CN=Vukasin Milosevic/CN=40175424
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=vmilosev/CN=757854/CN=Vukasin Milosevic
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/country/Role=NULL/Capability=NULL
attribute : /cms/country/ch/Role=NULL/Capability=NULL
If you do not have the first attribute line listed above, you have not completed the VO registration above and you must complete it before continuing.
Question 10
Copy the output corresponding to the text in the output box above.
For CMSDAS@CERN 2023 please submit your answers for the CMSDAS@CERN 2023 Google Form third set.
Exercise 11 - Obtain a /store/user area and setup CRAB
Obtain a /store/user area
This exercise depends on successfully completing Exercise 10. Completion of this exercise requires a users to have /store/user/YourCERNUserName
in Tier2 or Tier3 site. (ex, eos
area at lxplus). and a user should get this automatically once they have a lxplus account.
CRAB Introduction
In this exercise, you will learn an important tool CRAB, which is used in all the data analysis at CMS. CRAB (CMS Remote Analysis Builder) is a utility to submit CMSSW jobs to distributed computing resources. By using CRAB you will be able to access CMS data and Monte-Carlo which are distributed to CMS aligned centres worldwide and exploit the CPU and storage resources at CMS aligned centres. You will also test your grid certificate and your cms EOS storage element which will be useful during CMSDAS@CERN2023.
Help or questions about CRAB: Follow the FAQ to get help with CRAB.
The most recent CRAB3 tutorial is always in the WorkBook under WorkBookCRABTutorial. This tutorial provides complete instructions for beginner and expert user to use CRAB in their studies. We strongly recommend you to learn the CRAB tutorial after you finish these exercises. In this exercise, you will use CRAB to generate a MC sample yourself and publish it to the DAS.
Setup CRAB
In this exercise, we will use CMSSW_10_6_18
.
You can follow the same instructions from Exercise 3. The instructions are reproduced here:
cd ~/YOURWORKINGAREA
export SCRAM_ARCH=slc7_amd64_gcc700
### If you are using the default tcsh shell (or csh shell)
setenv SCRAM_ARCH slc7_amd64_gcc700
###
cmsrel CMSSW_10_6_18
cd CMSSW_10_6_18/src
cmsenv
git cms-init
After setting up the CMSSW environment via cmsenv
, you’ll have access to the latest version of CRAB. It is possible to use CRAB from any directory after setup. One can check that the crab command is indeed available and the version being used by executing:
which crab
/cvmfs/cms.cern.ch/common/crab
or
crab --version
CRAB client v3.230404
The /store/user
area is commonly used for output storage from CRAB. When you complete Exercise 11, you can follow these instructions to make sure you can read from and write to your space using CRAB command.
Initialize your proxy:
voms-proxy-init -voms cms
Check if you can write to the /store/user/
area. The crab checkwrite command can be used by a user to check if he/she has write permission in a given CERN eos directory path (by default /store/user/<HN-username>/
) in a given site. The syntax to be used is:
crab checkwrite --site= <site-name>
For example:
crab checkwrite --site=T3_CH_CERNBOX
The output should look like this:
Show/Hide
Will check write permission in the default location /store/user/<username> Validating LFN /store/user/vmilosev... LFN /store/user/vmilosev is valid. Will use `gfal-copy`, `gfal-rm` commands for checking write permissions Will check write permission in /store/user/vmilosev on site T3_CH_CERNBOX Will use PFN: davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp Attempting to create (dummy) directory crab3checkwrite_20230421_105013 and copy (dummy) file crab3checkwrite_20230421_105013.tmp to /store/user/vmilosev Executing command: which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; gfal-copy -p -v -t 180 file:///afs/cern.ch/user/v/vmilosev/Test_CMSDAS_Crab/CMSSW_10_6_18/src/crab3checkwrite_20230421_105013.tmp 'davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp' Please wait... Successfully created directory crab3checkwrite_20230421_105013 and copied file crab3checkwrite_20230421_105013.tmp to /store/user/vmilosev Attempting to delete file davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp Executing command: which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; gfal-rm -v -t 180 'davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp' Please wait... Successfully deleted file davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/crab3checkwrite_20230421_105013.tmp Attempting to delete directory davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/ Executing command: which scram >/dev/null 2>&1 && eval `scram unsetenv -sh`; gfal-rm -r -v -t 180 'davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/' Please wait... Successfully deleted directory davs://eosuserhttp.cern.ch:443//eos/user/v/vmilosev/crab3checkwrite_20230421_105013/ Checkwrite Result: Success: Able to write in /store/user/vmilosev on site T3_CH_CERNBOX
Choosing the T3_CH_CERNBOX
“site” allows you to have the option of outputing crab jobs to your EOS area, providing you with an easy way to access produced files. However this does not allow for publishing of produced samples as CERNBOX is NOT a CMS storage, and files in there can not be listed in DBS. For more details about crab output options, visit the following link.
Question 11
What is the name of your directory name in eos?
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.
Exercise 12 - Generate (and publish) a minimum bias dataset with CRAB
CMSSW configuration file to generate MC events
In this section we provide an example of a CMSSW parameter-set configuration file to generate minimum bias events with the Pythia MC generator. We call it CMSDAS_MC_generation.py
. Using CRAB to generate MC events requires some special settings in the CRAB configuration file, as we will show later.
We use the cmsDriver tool to generate our configuration file:
cmsDriver.py MinBias_13TeV_pythia8_TuneCUETP8M1_cfi --conditions auto:run2_mc -n 10 --era Run2_2018 --eventcontent FEVTDEBUG --relval 100000,300 -s GEN,SIM --datatier GEN-SIM --beamspot Realistic25ns13TeVEarly2018Collision --fileout file:step1.root --no_exec --python_filename CMSDAS_MC_generation.py
If successful, cmsDriver
will return the following
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
Step: GEN Spec:
Loading generator fragment from Configuration.Generator.MinBias_13TeV_pythia8_TuneCUETP8M1_cfi
Step: SIM Spec:
Step: ENDJOB Spec:
Config file CMSDAS_MC_generation.py created
Feel free to investigate (look at) the newly outputted
CMSDAS_MC_generation.py
.
Generating MC events locally
We want to test this Configuration file locally for a small number of events before we submit to CRAB for massive generation. To test this file, we can run
cmsRun CMSDAS_MC_generation.py
This MC generation code will then produce an EDM output file called step1.root
with the content of a GEN-SIM data tier for 10 generated events.
Show/Hide
*------------------------------------------------------------------------------------* | | | *------------------------------------------------------------------------------* | | | | | | | | | | | PPP Y Y TTTTT H H III A Welcome to the Lund Monte Carlo! | | | | P P Y Y T H H I A A This is PYTHIA version 8.240 | | | | PPP Y T HHHHH I AAAAA Last date of change: 20 Dec 2018 | | | | P Y T H H I A A | | | | P Y T H H III A A Now is 21 Apr 2023 at 11:32:03 | | | | | | | | Christian Bierlich; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: christian.bierlich@thep.lu.se | | | | Nishita Desai; Department of Theoretical Physics, Tata Institute, | | | | Homi Bhabha Road, Mumbai 400005, India; | | | | e-mail: desai@theory.tifr.res.in | | | | Ilkka Helenius; Department of Physics, University of Jyvaskyla, | | | | P.O. Box 35, FI-40014 University of Jyvaskyla, Finland; | | | | e-mail: ilkka.m.helenius@jyu.fi | | | | Philip Ilten; School of Physics and Astronomy, | | | | University of Birmingham, Birmingham, B152 2TT, UK; | | | | e-mail: philten@cern.ch | | | | Leif Lonnblad; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: leif.lonnblad@thep.lu.se | | | | Stephen Mrenna; Computing Division, Simulations Group, | | | | Fermi National Accelerator Laboratory, MS 234, Batavia, IL 60510, USA; | | | | e-mail: mrenna@fnal.gov | | | | Stefan Prestel; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: stefan.prestel@thep.lu.se | | | | Christine O. Rasmussen; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: christine.rasmussen@thep.lu.se | | | | Torbjorn Sjostrand; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: torbjorn@thep.lu.se | | | | Peter Skands; School of Physics, | | | | Monash University, PO Box 27, 3800 Melbourne, Australia; | | | | e-mail: peter.skands@monash.edu | | | | | | | | The main program reference is 'An Introduction to PYTHIA 8.2', | | | | T. Sjostrand et al, Comput. Phys. Commun. 191 (2015) 159 | | | | [arXiv:1410.3012 [hep-ph]] | | | | | | | | The main physics reference is the 'PYTHIA 6.4 Physics and Manual', | | | | T. Sjostrand, S. Mrenna and P. Skands, JHEP05 (2006) 026 [hep-ph/0603175] | | | | | | | | An archive of program versions and documentation is found on the web: | | | | http://www.thep.lu.se/Pythia | | | | | | | | This program is released under the GNU General Public Licence version 2. | | | | Please respect the MCnet Guidelines for Event Generator Authors and Users. | | | | | | | | Disclaimer: this program comes without any guarantees. | | | | Beware of errors and use common sense when interpreting results. | | | | | | | | Copyright (C) 2018 Torbjorn Sjostrand | | | | | | | | | | | *------------------------------------------------------------------------------* | | | *------------------------------------------------------------------------------------* *------------------------------------------------------------------------------------* | | | *------------------------------------------------------------------------------* | | | | | | | | | | | PPP Y Y TTTTT H H III A Welcome to the Lund Monte Carlo! | | | | P P Y Y T H H I A A This is PYTHIA version 8.240 | | | | PPP Y T HHHHH I AAAAA Last date of change: 20 Dec 2018 | | | | P Y T H H I A A | | | | P Y T H H III A A Now is 21 Apr 2023 at 11:32:03 | | | | | | | | Christian Bierlich; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: christian.bierlich@thep.lu.se | | | | Nishita Desai; Department of Theoretical Physics, Tata Institute, | | | | Homi Bhabha Road, Mumbai 400005, India; | | | | e-mail: desai@theory.tifr.res.in | | | | Ilkka Helenius; Department of Physics, University of Jyvaskyla, | | | | P.O. Box 35, FI-40014 University of Jyvaskyla, Finland; | | | | e-mail: ilkka.m.helenius@jyu.fi | | | | Philip Ilten; School of Physics and Astronomy, | | | | University of Birmingham, Birmingham, B152 2TT, UK; | | | | e-mail: philten@cern.ch | | | | Leif Lonnblad; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: leif.lonnblad@thep.lu.se | | | | Stephen Mrenna; Computing Division, Simulations Group, | | | | Fermi National Accelerator Laboratory, MS 234, Batavia, IL 60510, USA; | | | | e-mail: mrenna@fnal.gov | | | | Stefan Prestel; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: stefan.prestel@thep.lu.se | | | | Christine O. Rasmussen; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: christine.rasmussen@thep.lu.se | | | | Torbjorn Sjostrand; Department of Astronomy and Theoretical Physics, | | | | Lund University, Solvegatan 14A, SE-223 62 Lund, Sweden; | | | | e-mail: torbjorn@thep.lu.se | | | | Peter Skands; School of Physics, | | | | Monash University, PO Box 27, 3800 Melbourne, Australia; | | | | e-mail: peter.skands@monash.edu | | | | | | | | The main program reference is 'An Introduction to PYTHIA 8.2', | | | | T. Sjostrand et al, Comput. Phys. Commun. 191 (2015) 159 | | | | [arXiv:1410.3012 [hep-ph]] | | | | | | | | The main physics reference is the 'PYTHIA 6.4 Physics and Manual', | | | | T. Sjostrand, S. Mrenna and P. Skands, JHEP05 (2006) 026 [hep-ph/0603175] | | | | | | | | An archive of program versions and documentation is found on the web: | | | | http://www.thep.lu.se/Pythia | | | | | | | | This program is released under the GNU General Public Licence version 2. | | | | Please respect the MCnet Guidelines for Event Generator Authors and Users. | | | | | | | | Disclaimer: this program comes without any guarantees. | | | | Beware of errors and use common sense when interpreting results. | | | | | | | | Copyright (C) 2018 Torbjorn Sjostrand | | | | | | | | | | | *------------------------------------------------------------------------------* | | | *------------------------------------------------------------------------------------* *------- PYTHIA Process Initialization --------------------------* | | | We collide p+ with p+ at a CM energy of 1.300e+04 GeV | | | |------------------------------------------------------------------| | | | | Subprocess Code | Estimated | | | max (mb) | | | | |------------------------------------------------------------------| | | | | non-diffractive 101 | 5.642e+01 | | A B -> X B single diffractive 103 | 6.416e+00 | | A B -> A X single diffractive 104 | 6.416e+00 | | A B -> X X double diffractive 105 | 8.798e+00 | | | *------- End PYTHIA Process Initialization -----------------------* *------- PYTHIA Multiparton Interactions Initialization ---------* | | | sigmaNonDiffractive = 56.42 mb | | | | pT0 = 2.81 gives sigmaInteraction = 267.96 mb: accepted | | | *------- End PYTHIA Multiparton Interactions Initialization -----* PYTHIA Warning in MultipartonInteractions::init: maximum increased by factor 1.055 *------- PYTHIA Multiparton Interactions Initialization ---------* | | | diffraction XB | | | | diffractive mass = 1.00e+01 GeV and sigmaNorm = 10.00 mb | | pT0 = 0.46 gives sigmaInteraction = 54.25 mb: accepted | | diffractive mass = 6.00e+01 GeV and sigmaNorm = 10.00 mb | | pT0 = 0.72 gives sigmaInteraction = 28.53 mb: accepted | | diffractive mass = 3.61e+02 GeV and sigmaNorm = 10.00 mb | | pT0 = 1.14 gives sigmaInteraction = 20.25 mb: accepted | | diffractive mass = 2.16e+03 GeV and sigmaNorm = 10.00 mb | | pT0 = 1.79 gives sigmaInteraction = 30.44 mb: accepted | | diffractive mass = 1.30e+04 GeV and sigmaNorm = 10.00 mb | | pT0 = 2.81 gives sigmaInteraction = 52.87 mb: accepted | | | *------- End PYTHIA Multiparton Interactions Initialization -----* *------- PYTHIA Multiparton Interactions Initialization ---------* | | | diffraction AX | | | | diffractive mass = 1.00e+01 GeV and sigmaNorm = 10.00 mb | | pT0 = 0.46 gives sigmaInteraction = 54.35 mb: accepted | | diffractive mass = 6.00e+01 GeV and sigmaNorm = 10.00 mb | | pT0 = 0.72 gives sigmaInteraction = 28.27 mb: accepted | | diffractive mass = 3.61e+02 GeV and sigmaNorm = 10.00 mb | | pT0 = 1.14 gives sigmaInteraction = 20.31 mb: accepted | | diffractive mass = 2.16e+03 GeV and sigmaNorm = 10.00 mb | | pT0 = 1.79 gives sigmaInteraction = 30.66 mb: accepted | | diffractive mass = 1.30e+04 GeV and sigmaNorm = 10.00 mb | | pT0 = 2.81 gives sigmaInteraction = 52.96 mb: accepted | | | *------- End PYTHIA Multiparton Interactions Initialization -----* *------- PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings (changes only) ------------------* | | | Name | Now | Default Min Max | | | | | | Beams:eCM | 13000.000 | 14000.000 10.00000 | | Check:epTolErr | 0.0100000 | 1.0000e-04 | | Main:timesAllowErrors | 10000 | 10 0 | | MultipartonInteractions:ecmPow | 0.25208 | 0.21500 0.0 0.50000 | | MultipartonInteractions:expPow | 1.60000 | 1.85000 0.40000 10.00000 | | MultipartonInteractions:pT0Ref | 2.40240 | 2.28000 0.50000 10.00000 | | Next:numberShowEvent | 0 | 1 0 | | ParticleDecays:allowPhotonRadiation | on | off | | ParticleDecays:limitTau0 | on | off | | SLHA:minMassSM | 1000.000 | 100.00000 | | SoftQCD:doubleDiffractive | on | off | | SoftQCD:nonDiffractive | on | off | | SoftQCD:singleDiffractive | on | off | | Tune:preferLHAPDF | 2 | 1 0 2 | | | *------- End PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings -----------------------------* -------- PYTHIA Particle Data Table (changed only) ------------------------------------------------------------------------------ id name antiName spn chg col m0 mWidth mMin mMax tau0 res dec ext vis wid no onMode bRatio meMode products no particle data has been changed from its default value -------- End PYTHIA Particle Data Table ----------------------------------------------------------------------------------------- *------- PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings (changes only) ------------------* | | | Name | Now | Default Min Max | | | | | | Next:numberShowEvent | 0 | 1 0 | | ParticleDecays:allowPhotonRadiation | on | off | | ParticleDecays:limitTau0 | on | off | | ProcessLevel:all | off | on | | | *------- End PYTHIA Flag + Mode + Parm + Word + FVec + MVec + PVec + WVec Settings -----------------------------* -------- PYTHIA Particle Data Table (changed only) ------------------------------------------------------------------------------ id name antiName spn chg col m0 mWidth mMin mMax tau0 res dec ext vis wid no onMode bRatio meMode products no particle data has been changed from its default value -------- End PYTHIA Particle Data Table ----------------------------------------------------------------------------------------- Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:06.079 CEST -------- PYTHIA Info Listing ---------------------------------------- Beam A: id = 2212, pz = 6.500e+03, e = 6.500e+03, m = 9.383e-01. Beam B: id = 2212, pz = -6.500e+03, e = 6.500e+03, m = 9.383e-01. In 1: id = 3, x = 5.935e-05, pdf = 4.937e-01 at Q2 = 3.474e+00. In 2: id = 1, x = 1.439e-03, pdf = 4.936e-01 at same Q2. Process non-diffractive with code 101 is 2 -> 2. Subprocess q q(bar)' -> q q(bar)' with code 114 is 2 -> 2. It has sHat = 1.443e+01, tHat = -5.823e+00, uHat = -8.610e+00, pTHat = 1.864e+00, m3Hat = 0.000e+00, m4Hat = 0.000e+00, thetaHat = 1.376e+00, phiHat = 2.086e+00. alphaEM = 7.539e-03, alphaS = 2.754e-01 at Q2 = 1.136e+01. Impact parameter b = 1.874e+00 gives enhancement factor = 1.343e-02. Max pT scale for MPI = 1.864e+00, ISR = 1.864e+00, FSR = 1.864e+00. Number of MPI = 1, ISR = 2, FSRproc = 0, FSRreson = 0. -------- End PYTHIA Info Listing ------------------------------------ -------- PYTHIA Event Listing (hard process) ----------------------------------------------------------------------------------- no id name status mothers daughters colours p_x p_y p_z e m 0 90 (system) -11 0 0 0 0 0 0 0.000 0.000 0.000 13000.000 13000.000 1 2212 (p+) -12 0 0 3 0 0 0 0.000 0.000 6500.000 6500.000 0.938 2 2212 (p+) -12 0 0 4 0 0 0 0.000 0.000 -6500.000 6500.000 0.938 3 3 (s) -21 1 0 5 6 101 0 0.000 0.000 0.386 0.386 0.000 4 1 (d) -21 2 0 5 6 102 0 0.000 0.000 -9.353 9.353 0.000 5 3 s 23 3 4 0 0 102 0 1.581 -0.895 -3.611 4.073 0.500 6 1 d 23 3 4 0 0 101 0 -1.581 0.895 -5.356 5.666 0.330 Charge sum: -0.667 Momentum sum: 0.000 0.000 -8.967 9.739 3.799 -------- End PYTHIA Event Listing ----------------------------------------------------------------------------------------------- Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:09.990 CEST Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:11.147 CEST Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:15.916 CEST Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:15.918 CEST Begin processing the 6th record. Run 1, Event 6, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:22.698 CEST Begin processing the 7th record. Run 1, Event 7, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:22.858 CEST Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:25.345 CEST Begin processing the 9th record. Run 1, Event 9, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:26.413 CEST Begin processing the 10th record. Run 1, Event 10, LumiSection 1 on stream 0 at 21-Apr-2023 11:32:39.373 CEST *------- PYTHIA Event and Cross Section Statistics -------------------------------------------------------------* | | | Subprocess Code | Number of events | sigma +- delta | | | Tried Selected Accepted | (estimated) (mb) | | | | | |-----------------------------------------------------------------------------------------------------------------| | | | | | non-diffractive 101 | 7 7 7 | 5.642e+01 0.000e+00 | | A B -> X B single diffractive 103 | 1 1 1 | 6.416e+00 6.416e+00 | | A B -> A X single diffractive 104 | 1 1 1 | 6.416e+00 6.416e+00 | | A B -> X X double diffractive 105 | 1 1 1 | 8.798e+00 8.798e+00 | | | | | | sum | 10 10 10 | 7.805e+01 1.264e+01 | | | *------- End PYTHIA Event and Cross Section Statistics ----------------------------------------------------------* *------- PYTHIA Error and Warning Messages Statistics ----------------------------------------------------------* | | | times message | | | | 3 Warning in MultipartonInteractions::init: maximum increased | | | *------- End PYTHIA Error and Warning Messages Statistics ------------------------------------------------------* *------- PYTHIA Event and Cross Section Statistics -------------------------------------------------------------* | | | Subprocess Code | Number of events | sigma +- delta | | | Tried Selected Accepted | (estimated) (mb) | | | | | |-----------------------------------------------------------------------------------------------------------------| | | | | | non-diffractive 101 | 7 7 7 | 5.642e+01 0.000e+00 | | A B -> X B single diffractive 103 | 1 1 1 | 6.416e+00 6.416e+00 | | A B -> A X single diffractive 104 | 1 1 1 | 6.416e+00 6.416e+00 | | A B -> X X double diffractive 105 | 1 1 1 | 8.798e+00 8.798e+00 | | | | | | sum | 10 10 10 | 7.805e+01 1.264e+01 | | | *------- End PYTHIA Event and Cross Section Statistics ----------------------------------------------------------* *------- PYTHIA Error and Warning Messages Statistics ----------------------------------------------------------* | | | times message | | | | 3 Warning in MultipartonInteractions::init: maximum increased | | | *------- End PYTHIA Error and Warning Messages Statistics ------------------------------------------------------* ------------------------------------ GenXsecAnalyzer: ------------------------------------ Before Filter: total cross section = 7.805e+10 +- 1.264e+10 pb Filter efficiency (taking into account weights)= (10) / (10) = 1.000e+00 +- 0.000e+00 Filter efficiency (event-level)= (10) / (10) = 1.000e+00 +- 0.000e+00 [TO BE USED IN MCM] After filter: final cross section = 7.805e+10 +- 1.264e+10 pb After filter: final fraction of events with negative weights = 0.000e+00 +- 0.000e+00 After filter: final equivalent lumi for 1M events (1/fb) = 1.281e-08 +- 2.075e-09 =============================================
Question 12.1
What is the file size of
step1.root
?
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.
Generate MC dataset using CRAB
CRAB is handled by a configuration file. In CRAB3, the configuration file is in Python language. Here we give an example CRAB configuration file to run the CMSDAS_MC_generation.py
MC event generation code. You can download a copy of crabConfig_MC_generation.py.
Below you also find the file:
Show/Hide
from WMCore.Configuration import Configuration config = Configuration() config.section_("General") config.General.requestName = 'CMSDAS_MC_generation_test0' config.General.workArea = 'crab_projects' config.section_("JobType") config.JobType.pluginName = 'PrivateMC' config.JobType.psetName = 'CMSDAS_MC_generation.py' config.JobType.allowUndistributedCMSSW = True config.section_("Data") config.Data.outputPrimaryDataset = 'MinBias' config.Data.splitting = 'EventBased' config.Data.unitsPerJob = 10 NJOBS = 10 # This is not a configuration parameter, but an auxiliary variable that we use in the next line. config.Data.totalUnits = config.Data.unitsPerJob * NJOBS config.Data.publication = True config.Data.outputDatasetTag = 'CMSDAS2023_CRAB3_MC_generation_test0' config.section_("Site") config.Site.storageSite = 'T3_CH_CERNBOX'
Put the copy of crabConfig_MC_generation.py
under YOURWORKINGAREA/CMSSW_10_6_18/src
.
All available CRAB configuration parameters are defined at CRAB3ConfigurationFile.
Now let us try to submit this job via crab by
crab submit -c crabConfig_MC_generation.py
For the detail of the crab command, you can find them from CRABCommands. You will be requested to enter your grid certificate password.
Then you should get an output similar to this:
Will use CRAB configuration file crabConfig_MC_generation.py
Enter GRID pass phrase for this identity:
Importing CMSSW configuration CMSDAS_MC_generation.py
Finished importing CMSSW configuration CMSDAS_MC_generation.py
Sending the request to the server at cmsweb.cern.ch
Success: Your task has been delivered to the prod CRAB3 server.
Task name: 230421_132846:vmilosev_crab_CMSDAS_MC_generation_test0
Project dir: crab_projects/crab_CMSDAS_MC_generation_test0
Please use ' crab status -d crab_projects/crab_CMSDAS_MC_generation_test0 ' to check how the submission process proceeds.
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0/crab.log
Now you might notice a directory called crab_projects
is created under CMSSW_10_6_18/src/
. See what is under that directory. After you submitted the job successfully (give it a few moments), you can check the status of a task by executing the following CRAB command:
crab status [-t] <CRAB-project-directory>
In our case, we run:
crab status crab_projects/crab_CMSDAS_MC_generation_test0
The crab status
command will produce an output containing the task name, the status of the task as a whole, the details of how many jobs are in which state (submitted, running, transfering, finished, cooloff, etc.) and the location of the CRAB log (crab.log
) file. It will also print the URLs of two web pages that one can use to monitor the jobs. In summary, it should look something like this:
CRAB project directory: /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0
Task name: 230421_132846:vmilosev_crab_CMSDAS_MC_generation_test0
Grid scheduler - Task Worker: crab3@vocms0196.cern.ch - crab-prod-tw01
Status on the CRAB server: SUBMITTED
Task URL to use for HELP: https://cmsweb.cern.ch/crabserver/ui/task/230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0
Dashboard monitoring URL: https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=vmilosev&var-task=230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0&from=1682080126000&to=now
Task bootstrapped at 2023-04-21 13:29:37 UTC. 19 seconds ago
Status information will be available within a few minutes
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0/crab.log
Now you can take a break and have some fun. Come back after couple hours or so and check the status again.
[vmilosev@lxplus700 src]$ crab status crab_projects/crab_CMSDAS_MC_generation_test0/
CRAB project directory: /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0
Task name: 230421_132846:vmilosev_crab_CMSDAS_MC_generation_test0
Grid scheduler - Task Worker: crab3@vocms0196.cern.ch - crab-prod-tw01
Status on the CRAB server: SUBMITTED
Task URL to use for HELP: https://cmsweb.cern.ch/crabserver/ui/task/230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0
Dashboard monitoring URL: https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=vmilosev&var-task=230421_132846%3Avmilosev_crab_CMSDAS_MC_generation_test0&from=1682080126000&to=now
Status on the scheduler: COMPLETED
Jobs status: finished 100.0% (10/10)
Publication status of 1 dataset(s): done 100.0% (10/10)
(from CRAB internal bookkeeping in transferdb)
Output dataset: /MinBias/vmilosev-CMSDAS2023_CRAB3_MC_generation_test0-67359df6f8a0ef3c567d7c8fea38a809/USER
Output dataset DAS URL: https://cmsweb.cern.ch/das/request?input=%2FMinBias%2Fvmilosev-CMSDAS2023_CRAB3_MC_generation_test0-67359df6f8a0ef3c567d7c8fea38a809%2FUSER&instance=prod%2Fphys03
Warning: the max jobs runtime is less than 30% of the task requested value (1250 min), please consider to request a lower value for failed jobs (allowed through crab resubmit) and/or improve the jobs splitting (e.g. config.Data.splitting = 'Automatic') in a new task.
Warning: the average jobs CPU efficiency is less than 50%, please consider to improve the jobs splitting (e.g. config.Data.splitting = 'Automatic') in a new task
Summary of run jobs:
* Memory: 26MB min, 66MB max, 40MB ave
* Runtime: 0:04:34 min, 0:05:05 max, 0:04:41 ave
* CPU eff: 14% min, 58% max, 33% ave
* Waste: 1:17:58 (62% of total)
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_MC_generation_test0/crab.log
Note: If at lxplus
, it will write out to your eos area. You can access them from /eos/user/$U/$USER/SUBDIR
with SUBDIR
being the subdirectory name you provided. Take a look at that directory. (In our example we looked at MinBias
and named the task CMSDAS2021_CRAB3_MC_generation_test0
. The subsequent date string depends when you started your task.)
From the bottom of the output, you can see the name of the dataset and the DAS link to it. Congratulations! This is the your first CMS dataset.
Question 12.2
What is the name of the dataset you produced?
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.
Exercise 13 - Running on a dataset with CRAB
Now we’re going to apply what you’ve learned using CRAB to the MiniAOD
exercises you’ve been working on in the first two sets of exercises. Make sure that you finished and still have the scripts from Exercise 7 under the YOURWORKINGAREA/CMSSW_10_6_18/src
.
Set up CRAB to run your MiniAOD jobs
If you forget, go back to the YOURWORKINGAREA/CMSSW_10_6_18/src
and setup crab.
cmsenv
We will make another CRAB config file: crabConfig_data_slimMiniAOD.py
. Copy it from here: crabConfig_data_generation.py and find it below:
Show/Hide
from WMCore.Configuration import Configuration config = Configuration() config.section_("General") config.General.requestName = 'CMSDAS_Data_analysis_test0' config.General.workArea = 'crab_projects' config.section_("JobType") config.JobType.pluginName = 'Analysis' config.JobType.psetName = 'slimMiniAOD_data_MuEle_cfg.py' config.JobType.allowUndistributedCMSSW = True config.section_("Data") config.Data.inputDataset = '/DoubleMuon/Run2016C-03Feb2017-v1/MINIAOD' config.Data.inputDBS = 'global' config.Data.splitting = 'LumiBased' config.Data.unitsPerJob = 50 config.Data.lumiMask = 'https://cms-service-dqmdc.web.cern.ch/CAF/certification/Collisions16/13TeV/Cert_271036-275783_13TeV_PromptReco_Collisions16_JSON.txt' config.Data.runRange = '275776-275782' config.section_("Site") config.Site.storageSite = 'T3_CH_CERNBOX'
Most of this file should be familiar by now, but a few things may be new. The runRange
parameter is used to further limit your jobs to a range of what is in the lumiMask
file. This is needed if your two input datasets overlap. That way you can control which events come from which datasets. Instructions how to do this are at https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVAnalysisSummaryTable. You can find the year specific instructions by clicking any of the links at the bottom.
Run CRAB
Now go through the same process for this config file. You submit it with
crab submit -c crabConfig_data_slimMiniAOD.py
and check the status with
crab status
After a while, you should see something like below:
CRAB project directory: /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0
Task name: 230421_160319:vmilosev_crab_CMSDAS_Data_analysis_test0
Grid scheduler - Task Worker: crab3@vocms0199.cern.ch - crab-prod-tw01
Status on the CRAB server: SUBMITTED
Task URL to use for HELP: https://cmsweb.cern.ch/crabserver/ui/task/230421_160319%3Avmilosev_crab_CMSDAS_Data_analysis_test0
Dashboard monitoring URL: https://monit-grafana.cern.ch/d/cmsTMDetail/cms-task-monitoring-task-view?orgId=11&var-user=vmilosev&var-task=230421_160319%3Avmilosev_crab_CMSDAS_Data_analysis_test0&from=1682089399000&to=now
Status on the scheduler: COMPLETED
Jobs status: finished 100.0% (31/31)
Publication status of 1 dataset(s): done 100.0% (31/31)
(from CRAB internal bookkeeping in transferdb)
Output dataset: /DoubleMuon/vmilosev-crab_CMSDAS_Data_analysis_test0-dfbd2918d11fceef1aa67bdee18b8002/USER
Output dataset DAS URL: https://cmsweb.cern.ch/das/request?input=%2FDoubleMuon%2Fvmilosev-crab_CMSDAS_Data_analysis_test0-dfbd2918d11fceef1aa67bdee18b8002%2FUSER&instance=prod%2Fphys03
Warning: the max jobs runtime is less than 30% of the task requested value (1250 min), please consider to request a lower value for failed jobs (allowed through crab resubmit) and/or improve the jobs splitting (e.g. config.Data.splitting = 'Automatic') in a new task.
Summary of run jobs:
* Memory: 153MB min, 914MB max, 578MB ave
* Runtime: 0:03:25 min, 0:17:22 max, 0:07:30 ave
* CPU eff: 22% min, 77% max, 56% ave
* Waste: 0:04:15 (2% of total)
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/crab.log
Create reports of data analyzed
Once all jobs are finished (see crab status
above) you can report:
crab report
You’ll get something like this
Running crab status first to fetch necessary information.
Will save lumi files into output directory /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/results
Summary from jobs in status 'finished':
Number of files processed: 64
Number of events read: X
Number of events written in EDM files: X
Number of events written in TFileService files: 0
Number of events written in other type of files: 0
Processed lumis written to processedLumis.json
Summary from output datasets in DBS:
Number of events:
/DoubleMuon/vmilosev-crab_CMSDAS_Data_analysis_test0-dfbd2918d11fceef1aa67bdee18b8002/USER: 2167324
Output datasets lumis written to outputDatasetsLumis.json
Additional report lumi files:
Input dataset lumis (from DBS, at task submission time) written to inputDatasetLumis.json
Lumis to process written to lumisToProcess.json
Log file is /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/crab.log
crab report
prints to the screen how many events were analyzed.
Question 13
How many events were analyzed? (n.b. the number in the above example were replaced with
X
)
For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.
Optional: View the reconstructed Z peak in the combined data
Note
You will be doing a short analysis later when going to exercise set number four.
Use the FWLiteHistograms
executable you were using in the previous exercises to aggregate the data from all the CRAB output files. The root files created in the above step have been kept at the directory below: /eos/user/$U/$USER/DoubleMuon/crab_CMSDAS_Data_analysis_test0/
One can use the command:
FWLiteHistograms inputFiles=File1,File2,File3,... outputFile=ZPeak_data.root maxEvents=-1 outputEvery=100
In my case, File1=/eos/user/v/vmilosev/DoubleMuon/crab_CMSDAS_Data_analysis_test0/230421_160319/0000/slimMiniAOD_data_MuEle_1.root
etc.. Make sure there is no space in File1,File2,File3,...
You may look at ZPeak_data.root
using TBrowser
.
Exercise 14 - Combining the data and calculating luminosity
Note
This last exercise in this set is done on lxplus.
Install the BRIL Work Suite
We will use the BRIL work suite, a commandline toolkit for CMS Beam Radiation Instrumentation and Luminosity to calculate the total luminosity of the data we ran over.
Refer to the documentation for further information on BRIL.
Enter the following command:
/cvmfs/cms-bril.cern.ch/brilconda3/bin/python3 -m pip install --user --upgrade brilws
When running crab report
, the report will give you the location of a JSON-formatted file containing the luminosity information
Will save lumi files into output directory /afs/cern.ch/user/v/vmilosev/CMSDAS2023/Pre-exercises/CMSSW_10_6_18/src/crab_projects/crab_CMSDAS_Data_analysis_test0/results
This directory contains various luminosity files. Let’s figure out how much luminosity was run on by our jobs.
First step is to copy the processedLumis.json
file to your .local/bin/
folder:
cp [lumi directory]/processedLumis.json ~/.local/bin/
Here, [lumi directory]
is the directory reported by crab report
.
Find the luminosity for the dataset
We now let brilcalc
calculate the luminosity we processed with our jobs using the json file by typing following commands:
cd ~/.local/bin/
./brilcalc lumi -b "STABLE BEAMS" --normtag /afs/cern.ch/user/l/lumipro/public/Normtags/normtag_DATACERT.json -i processedLumis.json -u /fb
if the above does not work, try instead:
./brilcalc lumi -b "STABLE BEAMS" --normtag /afs/cern.ch/user/l/lumipro/public/Normtags/normtag_DATACERT.json -i processedLumis.json -c /cvmfs/cms.cern.ch/SITECONF/T0_CH_CERN/JobConfig/site-local-config.xml -u /fb
The end of the output should look similar to this (note this example summary is for a different json file):
#Summary:
+-------+------+-------+-------+-------------------+------------------+
| nfill | nrun | nls | ncms | totdelivered(/fb) | totrecorded(/fb) |
+-------+------+-------+-------+-------------------+------------------+
| 9 | 37 | 17377 | 17377 | 2.761 | 2.646 |
+-------+------+-------+-------+-------------------+------------------+
#Check JSON:
#(run,ls) in json but not in results: [(275890, 721)]
In the example of that other json file, the total recorded luminosity for those CRAB jobs is 2.6 fb-1.
Question 14
What is the reported number of inverse femtobarns analyzed? (n.b. it is not the same sample as listed above with luminosity 2.6-1. ) For CMSDAS@CERN2023 please submit your answers for the CMSDAS@CERN2023 Google Form third set.
Where to find more on CRAB
- CRAB Home
- CRAB FAQ
- CRAB troubleshooting guide: Steps to address the problems you experience with CRAB and how to ask for support.
- CMS Computing Tools mailing list, where to send feedback and ask support in case of jobs problem (please send to us your crab task HELP URL from crab status output).
Note also that all CMS members using the Grid subscribe to the Grid Annoucements CMS HyperNews forum. Important CRAB announcements will be announced on the CERN Computing Announcement HyperNews forum.
_Last reviewed: 2023/04/20 by Vukasin Milosevic
Key Points
Use and validate your grid certificate.
Setting up your CRAB configuration and run jobs over the CMS grid.
Publish your CRAB datasets.
Calculate the luminosities of the datasets processed via CRAB.
CMS Data Analysis School Pre-Exercises - Fourth Set
Overview
Teaching: 0 min
Exercises: 60 minQuestions
How do we analyze an EDM ROOT file using an EDAnalyzer?
How do we analyze an EDM ROOT file using an FWLite executable?
How do we use ROOT/RooFit to fit a function to a histogram?
Objectives
Learn how to use an EDAnalyzer
Learn how to use FWLite
Understand a variety of methods for performing a fit to a histogram
Introduction
In this set of exercises, we will analyze the MiniAOD file that was made in the third set of exercise. You must have this skimmed MiniAOD stored locally (in your eos
user space) in order to access them. We will use several different workflows for analyzing the MiniAOD, namely an EDAnalyzer, a FWLite executable, a FWLite Macro, and a FWLite PyROOT script. We will basically re-make the Z peak and few other histograms and store them in an output root file. In the exercise in the end we will try to fit with a Gaussian, Breit-Wigner function, etc.
Warning
To perform this set of exercises, a CERN computing account, Grid Certificate, and CMS VO membership are required. You should already have these things, but if not, follow these instructions from the setup instructions.
Objective
Please post your answers to the questions in the Google form fourth set.
Exercise 15 - Analyzing MiniAOD with an EDAnalyzer
In this exercise we will analyze the skimmed MiniAODs created in the third set of exercises using an EDAnalyzer
. In these skimmed MiniAODs, if you recall, we saved only the muons and electrons. So do not look for jets, photons, or other objects as they were simply not saved. We will use a python config
file and an EDAnalyzer
( a .cc
file) to make a Z mass peak. You can find an example list of files below, but please first try using the files you created.
Example file list
root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_1.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_10.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_11.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_12.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_13.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_14.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_15.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_16.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_17.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_18.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_19.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_2.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_20.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_21.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_22.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_23.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_24.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_25.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_26.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_27.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_28.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_29.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_3.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_30.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_31.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_4.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_5.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_6.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_7.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_8.root root://cmseos.fnal.gov//eos/uscms/store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/slimMiniAOD_data_MuEle_9.root
First we will add the PhysicsTools/PatExamples package as follows to <YOURWORKINGAREA>/CMSSW_10_6_18/src
. The PatExamples
package has lot of examples for a user to try. However, we will add our own code and config file to it and then compile. To add this package, do this:
cd $CMSSW_BASE/src/
git cms-addpkg PhysicsTools/PatExamples
Note
We are assuming that you’ve already checked out a CMSSW_10_6_18 release and have performed the
cmsenv
setup command.
In this package, you will find the python configuration file $CMSSW_BASE/src/PhysicsTools/PatExamples/test/analyzePatBasics_cfg.py
. You will also see the EDAnalyzer in $CMSSW_BASE/src/PhysicsTools/PatExamples/plugins/PatBasicAnalyzer.cc
.
Next, create the following two files (download/save): $CMSSW_BASE/src/PhysicsTools/PatExamples/src/MyZPeakAnalyzer.cc and $CMSSW_BASE/src/MyZPeak_cfg.py.
Hint
A quick way to do this on Linux, or any machine with
wget
, is by using the following commands:wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/MyZPeakAnalyzer-CMSSW_10_6_18.cc -O $CMSSW_BASE/src/PhysicsTools/PatExamples/src/MyZPeakAnalyzer.cc wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/MyZPeak_cfg.py -O $CMSSW_BASE/src/MyZPeak_cfg.py
Then we will compile the code that you just saved by doing:
cd $CMSSW_BASE/src/
scram b
The compilation should print many lines of text to your terminal. Among those lines you should see a line like the one below. If you can’t find a similar line, then the code you just added is not compiled.
>> Compiling <$CMSSW_BASE>/src/PhysicsTools/PatExamples/src/MyZPeakAnalyzer.cc
After successful compilation, you must run the config file as follows:
cmsRun MyZPeak_cfg.py
Successful running of the above config file will produce an output file myZPeakCRAB.root
. The output file myZPeakCRAB.root has several histograms, besides the Z peak, called mumuMass, like muonMult, muonEta, muonPhi, muonPt and similarly for electrons.
Note
In the case above, the file
MyZPeak_cfg
.py will read from arearoot://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/
. You should have a similar location from where you can read your CRAB outputROOT
files. You can edit theMyZPeak_cfg.py
file to use the MiniAOD files you made in Exercise 13 by replacing the location of the input files to the path of file you generated. From my side, the files are stored in:'/eos/user/v/vmilosev/DoubleMuon/crab_CMSDAS_Data_analysis_test0/230421_160319/0000/slimMiniAOD_data_MuEle_1.root'
Question 15
What is the number of entries in the
mumuMass
plot if you just used the first input file, probably namedslimMiniAOD_data_MuEle_1.root
?
Exercise 16 - Analyzing MiniAOD with an FWLite executable
In this exercise we will make the same ROOT
file, myZPeakCRAB.root
, as in Exercise 15, but we call it myZPeakCRAB_fwlite.root so that you do not end up overwriting the file previously made in Exercise 15.
First, check out the following two packages by doing:
cd $CMSSW_BASE/src/
git cms-addpkg PhysicsTools/FWLite
git cms-addpkg PhysicsTools/UtilAlgos
Next, replace the existing $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteWithPythonConfig.cc
with this FWLiteWithPythonConfig.cc. You are simply updating an existing analyzer. Then, create the file $CMSSW_BASE/src/parameters.py.
Hint
You can easily download the needed files by running the following commands:
wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/FWLiteWithPythonConfig.cc -O $CMSSW_BASE/src/PhysicsTools/FWLite/bin/FWLiteWithPythonConfig.cc wget https://cern-cms-das-2023.github.io/cms-das-pre-exercises/code/parameters.py -O $CMSSW_BASE/src/parameters.py
Note
In case you have completed Exercise Set 3 successfully, put the names and path of the
ROOT
files that you made yourself via submitting CRAB job, instead of those currently inparameters.py
.
parameters.py
will read from arearoot://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Input/DoubleMuon/
. You should have a similar location from where you can read your CRAB outputROOT
files. You can edit theparameters.py
file to use the MiniAOD files you made in Exercise 13 by replacing the location of the input files. From my side, the files are stored in:'/eos/user/v/vmilosev/DoubleMuon/crab_CMSDAS_Data_analysis_test0/230421_160319/0000/slimMiniAOD_data_MuEle_1.root'
Then we will compile the code that you just saved by doing:
cd $CMSSW_BASE/src/
scram b -j 4
You should see among the output a line like the one below. If not, it is probable that you haven’t compiled the code on which we are working.
>> Compiling /your_path/YOURWORKINGAREA/CMSSW_10_6_18/src/PhysicsTools/FWLite/bin/FWLiteWithPythonConfig.cc
After successful compilation, you must run the config file as follows:
cd $CMSSW_BASE/src/
cmsenv
FWLiteWithPythonConfig parameters.py
Note
Take note of the extra
cmsenv
is to ensure the changes to files in thebin
subdirectory are picked up in your path.
Warning
You might get a segfault when running this exercise. Just ignore it; the output
ROOT
file will still be created and be readable.
Note
Take a look at how the parameters defined in
parameters.py
get input to the executable codeFWLiteWithPythonConfig.cc
.
A successful running of the FWLite executable, FWLiteWithPythonConfig
, results in an output file called myZPeakCRAB_fwlite.root
.
The output ROOT
file myZPeakCRAB_fwlite.root
is a bit different from myZPeakCRAB.root
made in Exercise 15 since we did not make any of the electron histograms. The histograms do have the mumuMass
, besides, muonEta
, muonPhi
, and muonPt
.
Question 16
What is the number in entries in the
mumuMass
obtained in Exercise 16, again using only the first input file.?
Exercise 17 - Fitting the Z mass peak
The main intention of fitting the Z mass peak is to show how to fit a distribution. To do this exercise, you will need the ROOT
files that you made in Exercise 15 and Exercise 16. Make sure you have the ROOT
file $CMSSW_BASE/src/myZPeakCRAB.root
( Exercise 15) or myZPeakCRAB_fwlite.root
(Exercise 16). If you have not managed to create at least one of these ROOT
files, you can get them from the following locations:
File list
/afs/cern.ch/cms/Tutorials/TWIKI_DATA/CMSDataAnaSch/myZPeakCRAB.root # lxplus or Bari root://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Output/myZPeakCRAB.root # cmslpc root://cmseos.fnal.gov//store/user/cmsdas/2023/pre_exercises/Set4/Output/myZPeakCRAB_fwlite.root # cmslpc
This will allow you to continue with Exercise 17. For this exercise, we will use the ROOT
file myZPeakCRAB.root
. Alternatively, you can use the file myZPeakCRAB_fwlite.root
, but just make sure to have the right name of the ROOT
file. The most important factor is that both of these files have the histogram mumuMass
.
We also ask that you create a rootlogon.C file in the $CMSSW_BASE/src/
directory. We will reference this version as opposed to anyone’s personalized rootlogon file. This sets up the libraries needed to complete this exercise.
The different distribution that we would fit to the Z mass peak are:
- Gaussian
- Relativistic Breit-Wigner
- Convolution of relativistic Breit-Wigner plus interference term with a Gaussian
Some general remarks about fitting a Z peak
To fit a generator-level Z peak, a Breit-Wigner fit makes sense. However, reconstructed-level Z peaks have many detector resolutions that smear the Z mass peak. If the detector resolution is relatively poor, then it is usually good enough to fit a Gaussian (since the gaussian detector resolution will overwhelm the inherent Briet-Wigner shape of the peak). If the detector resolution is fairly good, then another option is to fit a Breit-Wigner (for the inherent shape) convolved with a Gaussian (to describe the detector effects). This is in the “no-background” case. If you have backgrounds in your sample (Drell-Yan, cosmics, etc…), and you want to do the fit over a large mass range, then another function needs to be included to take care of this; an exponential is commonly used.
Fitting a Gaussian
There are several options to fit a Gaussian
Using the inbuilt Gaussian in ROOT
Open ROOT
as follows:
root -l
Then execute the following commands:
TFile f("myZPeakCRAB.root");
f.cd("analyzeBasicPat");
gStyle->SetOptFit(111111);
mumuMass->Fit("gaus");
This will pop up the following histogram. Save this histogram as pdf
, ps
, or eps
file using the menu of the histogram window. As you can see we should fit a sub-range as this fit is not a good fit. In the next part of this exercise, we will fit a sub-range of the mumuMass
distribution, but for this we will use a ROOT
macro as using inbuilt ROOT
functions have very minimal usage. For more complex or useful fitting functions, one has to use a macro.
For now, we can improve the fit description of the Z resonance by limiting our fit range:
TFile f("myZPeakCRAB.root");
f.cd("analyzeBasicPat");
gStyle->SetOptFit(111111);
g1 = new TF1("m1","gaus",85,95);
mumuMass->Fit(g1,"R");
One should obtain a similar histogram as:
Reminder
You can quit
ROOT
using the.q
command.
The line gStyle->SetOptFit(111111);
` enables all the histogram statistics to be displayed. For more options and other information please refer to ROOT documentation.
Question 17.1a
What is the value of the mean Z Mass that you get?
Question 17.1b
What is the value of the chisquare/ndf that you get?
Using a macro of your own in ROOT
As you have seen above, we should fit a sub-range of the Z mass distribution because the fit in the full range is not all that great. In this exercise, we will fit a sub-range of the mumuMass
distribution but for this we will use a ROOT
macro. For more complex or useful fitting functions, one has to use a macro. The macro to run is FitZPeak.C. This macro calls another macro, BW.C. Please download/save them with the corresponding names in $CMSSW_BASE/src
. Note that now the myZPeakCRAB.root
file is opened by executing the macro itself, in addition to fitting the Z mass peak.
To run this macro execute the following command from the $CMSSW_BASE/src
directory:
root -l FitZPeak.C
This should pop up a histogram (shown below) and you will find yourself in a ROOT
session.
Reminder
You can save this plot from the menu on top of the histogram and then quit
ROOT
using the.q
command.
Hint
You can also save the plot to an encapsulated postscript file by running the macro as:
root -l FitZPeak.C\(true\)
Here is some explanation of the macro. We have defined the Gaussian distribution that we want to fit in the macro BW.C
(shown below). Note that in the same macro we have also is defined a Breit-Wigner function that you can try yourself. However, in the later part of the exercise, we will use RooFit
to fit the distribution using a Breit-Wigner function.
Double_t mygauss(Double_t * x, Double_t * par)
{
Double_t arg = 0;
if (par[2]<0) par[2]=-par[2]; // par[2]: sigma
if (par[2] != 0) arg = (x[0] - par[1])/par[2]; // par[1]: mean
//return par[0]*BIN_SIZE*TMath::Exp(-0.5*arg*arg)/
// (TMath::Sqrt(2*TMath::Pi())*par[2]);
return par[0]*TMath::Exp(-0.5*arg*arg)/
(TMath::Sqrt(2*TMath::Pi())*par[2]); // par[0] is constant
}
par[0]
, par[1]
, and par[2]
are the constant
, mean
, and sigma
parameters, respectively. Also x[0]
mean the x-axis variable. BW.C
is called by FitZPeak.C
in the line gROOT->LoadMacro("BW.C");
. The initial values of the three fitted parameters are defined in FitZPeak.C
as follows:
func->SetParameter(0,1.0); func->SetParName(0,"const");
func->SetParameter(2,5.0); func->SetParName(2,"sigma");
func->SetParameter(1,95.0); func->SetParName(1,"mean");
Also note that in the macro FitZPeak.C
, we have commented the following lines and used the two lines below it. The reason being that we want to fit a sub-range. If you would want to fit the entire range of the histogram, get the minimum and maximum value of the range by instead using the lines that have been commented.
//float massMIN = Z_mass->GetBinLowEdge(1);
//float massMAX = Z_mass->GetBinLowEdge(Z_mass->GetNbinsX()+1);
float massMIN = 85.0;
float massMAX = 96.0;
Question 17.2
What mean value of the Z mass do you get in the fitted sub-range?
Using a macro in RooFit
Before we start, have a look at the RooFit twiki to get a feeling for it. Then save the macro RooFitMacro.C in the $CMSSW_BASE/src/
directory. This macro will fit the Z mass peak using RooFit
.
Take a look at the code and then execute the following:
root -l RooFitMacro.C
You may need to add the following line to your rootlogon.C
file to get this interpreted code to work:
gROOT->ProcessLine(".include $ROOFITSYS/include/");
This should pop a histogram (shown below) and you will find yourself in a ROOT
session.
Reminder
You can save this plot from the menu on top of the histogram and then quit
ROOT
using the.q
command.
We fit the distribution with a Gaussian by default. However, we can fit a Breit-Wigner or Voigtian (convolution of Breit-Wigner and Gaussian) by uncommenting the appropriate lines.
Question 17.3a
What is the mean for the gaussian fit in RooFit?
Question 17.3b
What is the sigma for the gaussian fit in RooFit?
Fitting a Breit-Wigner
Using a macro in ROOT
To fit the Z mass peak using a Breit-Wigner distribution, we first uncomment the Breit-Wigner part of FitZPeak.C
and comment out the Gaussian part as follows (using /*
and */
):
////////////////
//For Gaussian//
///////////////
/*
TF1 *func = new TF1("mygauss",mygauss,massMIN, massMAX,3);
func->SetParameter(0,1.0); func->SetParName(0,"const");
func->SetParameter(2,5.0); func->SetParName(2,"sigma");
func->SetParameter(1,95.0); func->SetParName(1,"mean");
Z_mass->Fit("mygauss","QR");
TF1 *fit = Z_mass->GetFunction("mygauss");
*/
/////////////////////
// For Breit-Wigner//
////////////////////
TF1 *func = new TF1("mybw",mybw,massMIN, massMAX,3);
func->SetParameter(0,1.0); func->SetParName(0,"const");
func->SetParameter(2,5.0); func->SetParName(1,"sigma");
func->SetParameter(1,95.0); func->SetParName(2,"mean");
Z_mass->Fit("mybw","QR");
TF1 *fit = Z_mass->GetFunction("mybw");
Then execute the following:
root -l FitZPeak.C
This should pop a histogram (shown below) and you will find yourself in ROOT
seession.
Reminder
You can save this plot from the menu on top of the histogram and then quit
ROOT
using the.q
command.
Question 17.4a
What is the mean for the Breit-Wigner fit using the macro?
Question 17.4b
What is the sigma for Breit-Wigner fit using the macro?
Using a macro in RooFit
Before we proceed we need to uncomment and comment out few lines in RooFitMacro.C
to have them look as follows:
//RooGaussian gauss("gauss","gauss",x,mean,sigma);
RooBreitWigner gauss("gauss","gauss",x,mean,sigma);
// RooVoigtian gauss("gauss","gauss",x,mean,width,sigma);
Then execute:
root -l RooFitMacro.C
This should pop a histogram (shown below) as follows and you will find yourself in ROOT
session.
Reminder
You can save this plot from the menu on top of the histogram and then quit
ROOT
using the.q
command.
Question 17.5a
What is the mean for the Breit-Wigner fit using RooFit tool?
Question 17.5b
What is the sigma for the Breit-Wigner fit using RooFit tool?
Fitting a Convolution of Gaussian and Breit-Wigner
Using a macro in RooFit
Before we proceed we need to uncomment and comment out few lines in RooFitMacro.C
to have them look as follows:
//RooGaussian gauss("gauss","gauss",x,mean,sigma);
// RooBreitWigner gauss("gauss","gauss",x,mean,sigma);
RooVoigtian gauss("gauss","gauss",x,mean,width,sigma);
Then execute:
root -l RooFitMacro.C
This should pop a histogram (shown below) as follows and you will find yourself in ROOT
seession.
Reminder
You can save this plot from the menu on top of the histogram and then quit
ROOT
using the.q
command.
Question 17.6a
What is the mean for the convolved fit using RooFit tool?
Question 17.6b
What is the sigma for the convolved fit using RooFit tool?
Key Points
You can use both an EDAnalyzer or FWLite to analyze MiniAOD files
Various methods exist for performing fits. You can use inbuilt functions or user defined functions. You can use plain ROOT or the RooFit package.
CMS Data Analysis School Pre-Exercises - Fifth Set
Overview
Teaching: 0 min
Exercises: 30 minQuestions
How do I setup git on my computer/cluster?
How do I collaborate using GitHub?
Objectives
Setup your git configuration for a given computer.
Learn how to make and commit changes to a git repository.
Learn how to create a pull request on GitHub.
Introduction
This exercise is intended to provide you with basic familiarity with Git and GitHub for personal and collaborative use, including terminology, commands, and user interfaces. The exercise proceeds step-by-step through a standard collaboration “Fork and Pull” workflow. This is a highly condensed version of the tutorial exercises at CMSGitTutorial. Students are encouraged to explore those more in-depth exercises if they want to learn more about using Git. There are also accompanying slides on that twiki page. Students with no experience using Git or other version control software are recommended to read at least the first set of slides.
Warning
As a prerequisite for this exercise, please make sure that you have correctly followed the instructions for obtaining a GitHub account in the setup instructions.
Google Form
Please post your answers to the questions in the Google form fifth set.
Exercise 18 - Learning Git and GitHub
Git Configuration
Begin by setting up your .gitconfig on your local machine or lxplus:
git config --global user.name "[Name]"
git config --global user.email [Email]
git config --global user.github [Account]
Make sure you replace [Name]
, [Email]
, and [Account]
with the values corresponding to your GitHub account. After this, you can check the contents of .gitconfig by doing:
cat ~/.gitconfig
Output
[user] name = [Name] email = [Email] github = [Account]
Optional settings:
- Your preferred editor:
git config --global core.editor [your preferred text editor]
- This setting makes Git push the current branch by default, so only the command
git push origin
is needed. (NOTE: do not try to execute that command now; it will not work without a local repository, which you have not created yet.)
git config --global push.default current
- This is an alias to make the print out of the log more concise and easier to read.
git config --global alias.lol 'log --graph --decorate --pretty=oneline --abbrev-commit'
- These make it easier to clone repositories from GitHub or CERN GitLab, respectively. For example,
git clone github:GitHATSLPC/GitHATS.git
.
git config --global url."git@github.com:".insteadOf github:
git config --global url."ssh://git@gitlab.cern.ch:7999/".insteadOf gitlab:
GitHub User Interface
Look carefully at the GitHub user interface on the main page for the GitHATSLPC/GitHATS repository. Click on various tabs.
- Top left row: Code, Issues, Pull Requests, Actions, Projects, Wiki, Security, Insights, Settings
-
Settings: Options, Collaborators, Branches
-
Top right row: Notifications, Star, Fork
-
Lower row on Code page: commits, branches, releases, contributors
Collaboration on GitHub
Fork the repository GitHATSLPC/GitHATS repository by clicking “Fork” at the top right corner of the page. This makes a copy of the repository under your GitHub account.
Clone your fork of the repository to a scratch directory on your local machine or lxplus:
mkdir scratch
git clone git@github.com:[user]/GitHATS.git
Output
Cloning into 'GitHATS'... Enter passphrase for key '/home/------/.ssh/id_rsa': remote: Counting objects: 21, done. remote: Total 21 (delta 0), reused 0 (delta 0), pack-reused 21 Receiving objects: 100% (21/21), done. Resolving deltas: 100% (5/5), done. Checking connectivity... done.
What does the ls
command show?
cd GitHATS
ls -a
Output
. .. .git README.md standard_model.md
The .git folder contains a full local copy of the repository.
Inspect the .git directory:
ls .git
Output
config description HEAD hooks index info logs objects packed-refs refs
When you use git clone
as we did above, it starts your working area on the default branch for the repository. In this case, that branch is master. (The default branch for a repo can be changed in the “Branches” section of the GitHub settings page, which you explored in the previous step.)
Inspect the branches of the repository.
git branch -a
Output
* master remotes/origin/HEAD -> origin/master remotes/origin/atlas_discovery remotes/origin/cms_discovery remotes/origin/dune_discovery remotes/origin/master
Adding remotes and synchronizing
Look at your remote(s):
git remote
Output
origin
Hint
For additional information you can add the
-v
option to the commandgit remote -v
Output
origin git@github.com:[user]/GitHATS.git (fetch) origin git@github.com:[user]/GitHATS.git (push)
The “origin” remote is set by default when you use git clone
. Because your repository is a fork, you also want to have a remote that points to the original repo, traditionally called “upstream”.
Add the upstream remote and inspect the result:
git remote add upstream git@github.com:GitHATSLPC/GitHATS.git
git remote -v
Output
origin git@github.com:[user]/GitHATS.git (fetch) origin git@github.com:[user]/GitHATS.git (push) upstream git@github.com:GitHATSLPC/GitHATS.git (fetch) upstream git@github.com:GitHATSLPC/GitHATS.git (push)
Before you make edits to your local repo, you should make sure that your fork is up to date with the main repo. (Someone else might have made some updates in the meantime.)
Check for changes in upstream:
git pull upstream master
Output
From github.com:GitHATSLPC/GitHATS * branch master -> FETCH_HEAD * [new branch] master -> upstream/master Already up-to-date.
Note
git pull upstream master
is equivalent to the following two commands:git fetch upstream master git merge upstream/master
If you pulled any changes from the upstream repository, you should push them back to origin. (Even if you didn’t, you can still practice pushing; nothing will happen.)
Push your local master branch back to your remote fork:
git push origin master
Output
Everything up-to-date
Making edits and committing
When collaborating with other developers on GitHub, it is best to make a separate topic branch to store any changes you want to submit to the main repo. This way, you can keep the default branch in your fork synchronized with upstream, and then make another topic branch when you want to make more changes.
Make a topic branch:
git checkout -b MyBranch
Edit the table standard_model.md to add a new particle. The new particle is called a Giton, with symbol G, spin 2, charge 0, and mass 750 GeV.
Note
Any resemblance to any other real or imaginary particles is entirely coincidental.
Once you have made changes in your working area, you have to stage the changes and then commit them. First, you can inspect the status of your working area.
Try the following commands to show the status:
git status
Output
On branch MyBranch Changes not staged for commit: (use "git add ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) modified: standard_model.md no changes added to commit (use "git add" and/or "git commit -a")
git status -s
Output
M standard_model.md
git diff
Output
diff --git a/standard_model.md b/standard_model.md index 607b7b6..68f37ad 100644 --- a/standard_model.md +++ b/standard_model.md @@ -18,4 +18,5 @@ The Standard Model of Particle Physics | Z boson | Z | 1 | 0 | 91.2 | | W boson | W | 1 | ±1 | 80.4 | | gluon | g | 1 | 0 | 0 | -| Higgs boson | H | 0 | 0 | 125 | \ No newline at end of file +| Higgs boson | H | 0 | 0 | 125 | +| Giton | G | 2 | 0 | 750 |
Now stage your change, and check the status:
git add standard_model.md
git status -s
Output
M standard_model.md
Commit your change:
git commit -m "add Giton to standard model"
Output
[MyBranch b9bc2ce] add Giton to standard model 1 file changed, 2 insertions(+), 1 deletion(-)
Push your topic branch, which now includes the new commit you just made, to origin:
git push origin MyBranch
Output
Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 8 threads Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 356 bytes | 356.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0) remote: Resolving deltas: 100% (1/1), completed with 1 local object. remote: remote: Create a pull request for 'MyBranch' on GitHub by visiting: remote: https://github.com/mtonjes/GitHATS/pull/new/MyBranch remote: To github.com:mtonjes/GitHATS.git * [new branch] MyBranch -> MyBranch
Making a pull request
Now that you have made your change, you can submit it for inclusion in the central repository.
When you open the page to send a pull request on GitHub, you will notice that you can send a pull request to any fork of the repo (and any branch).
Send a pull request to the master branch of the upstream repo (GitHATSLPC).
Question 18.1
Post the link to your pull request.
For CMSDAS@CERN 2023 please submit your answer at the Google Form fifth set.
Optional
If you want to practice merging a pull request, you can send a pull request from your branch
MyBranch
to your own master branch.
Advanced topics
Advanced topics not explored in this exercise include: merging, rebasing, cherry-picking, undoing, removing binary files, and CMSSW-specific commands and usage.
Students are encouraged to explore these topics on their own at CMSGitTutorial.
Key Points
Interact with your git configuration using
git config --global
.Use the
git clone
command to obtain a local copy of a git repository.Add and interact with new remotes using the
git remote
command.Use the
add
andcommit
commands to add changes to the local repository.The
pull
andpush
commands will transfer changes between the remote and local copies of the repository.
CMS Data Analysis School Pre-Exercises - Sixth Set
Overview
Teaching: 0 min
Exercises: 30 minQuestions
What is Jupyter?
What is pyROOT?
Objectives
Learn how to use Jupyter and the Jupyter service (SWAN) at CERN.
Learn how to interact with the ROOT libraries using pyROOT.
Introduction
This exercise is intended to provide you with basic familiarity with pyROOT provides bindings for all classes within the ROOT libraries and allows for replacing the usual C++ with the often less cumbersome python. The goal is to obtain a general understanding of the syntax required to import and make use of the ROOT libraries within a basic python script. Various examples are provided in order to demonstrate TH1 histogram manipulation including; reading from a .root file, creating, binning, re-binning, scaling, plotting and fitting to a Gaussian.
Many courses have begun to use Jupyter notebooks as a teaching tool, this exercise has been formatted as a notebook to give a preliminary introduction to how they work. This knowledge will be used later in various DAS exercises.
Whether you use python or C++ to complete your analysis is a personal preference. However, with the current lack of documentation on pyROOT, many students stick with C++ in order to ensure their access to coding examples and experts. It is our hope that through providing you with this basic introduction and Github repository of example scripts, which you are encouraged to add to, that we can bring together the existing pyROOT community within CMS and foster its growth.
Warning
As a prerequisite for this exercise, please make sure that you have correctly followed the instructions for obtaining a GitHub account in the setup instructions.
It is also helpful to have already completed the “Collaboration on GitHub” section of the fifth set of exercises.
Objective
Please post your answers to the questions in the Google form sixth set.
Exercise 19 - Introduction to pyROOT and Jupyter
Load and execute the exercise on JupyterHub
This exercise is stored completely within Jupyter notebooks. This exercise will use a premade Jupyter service hosted at CERN, SWAN. To begin, visit pyROOTforCMSDAS and follow the directions on the first page.
Question 19.1
What is the mean value of the Gaussian fit of the jet mass spectrum for jets of pt 300-400 GeV?
Hopefully this extremely brief introduction has piqued your interest in pyROOT and encouraged you to learn more about this versatile tool.
Advanced topics
Advanced topics not explored in this exercise, but to be included on the pyROOT for CMSDAS GitHub page in the near future are:
- reading and writing a TTree
- using a python analyzer to skim a TTree
- creating plots in the CMS PubCom format
Students are encouraged to explore these and other topics on their own and to assist with the CMS effort to document pyROOT by creating your own fork of pyROOTforCMSDAS and adding to the example scripts available there.
Key Points
pyROOT is an easy to use alternative to using the ROOT libraries in a C++ program.
Jupyter notebooks are a great way to perform real-time analysis tasks.
CMS Data Analysis School Pre-Exercises - Seventh Set
Overview
Teaching: 0 min
Exercises: 60 minQuestions
What is an image? How about a container?
What is Docker/Singularity?
Why is containerization useful?
Ummmm…how is this different from a virtual machine?
Objectives
Gain a basic understanding of how to run and manage a container.
Understand the absolute basic commands for Docker.
Know how to start a Singularity container.
Introduction
Warning
As a prerequisite for this exercise, please make sure that you have correctly followed the setup instructions for installing Docker and obtaining a DockerHub account.
Objective
Please post your answers to the questions in the Google form seventh set.
Limitation
This exercise seeks to introduce the student to the benefits of containerization and a handful of container services. We cannot cover all topics related to containerization in this short exercise. In particular, we do not seek to explain what is happening under the hood or how to develop your own images. There are other great tutorials covering a variety of containerization topics as they relate to LHC experiments:
- Docker/Singularity HATS@LPC
- Introduction to Docker
- Software containers for CMSSW
- Official Docker documentation and tutorial
There are undoubtedly also other, non-LHC oriented tutorials online.
Containers and Images
Containers are like lightweight virtual machines. They behave as if they were their own complete OS, but actually only contain the components necessary to operate. Instead, containers share the host machine’s system kernel, significantly reducing their size. In essence, they run a second OS natively on the host machine with just a thin additional layer, which means they can be faster than traditional virtual machines. These container only take up as much memory as necessary, which allows many of them to be run simultaneously and they can be spun up quite rapidly.
Images are read-only templates that contain a set of instructions for creating a container. Different container orchestration programs have different formats for these images. Often a single image is made of several files (layers) which contain all of the dependencies and application code necessary to create and configure the container environment. In other words, Docker containers are the runtime instances of images — they are images with a state.
This allows us to package up an application with just the dependencies we need (OS and libraries) and then deploy that image as a single package. This allows us to:
- replicate our environment/workflow on other host machines
- run a program on a host OS other than the one for which is was designed (not 100% foolproof)
- sandbox our applications in a secure environment (still important to take proper safety measures)
Container Runtimes
For the purposes of this tutorial we will only be considering Docker and Singularity for container runtimes. That said, these are really powerful tools which are so much more than just container runtimes. We encourage you to take the time to explore the Docker and Singularity documentation.
Side Note
As a side note, Docker has very similar syntax to Git and Linux, so if you are familiar with the command line tools for them then most of Docker should seem somewhat natural (though you should still read the docs!).
Exercise 20 - Pulling Docker Images
Much like GitHub allows for web hosting and searching for code, the image registries allow the same for Docker/Singularity images. Without going into too much detail, there are several public and private registries available. For Docker, however, the defacto default registry is Docker Hub. Singularity, on the other hand, does not have a defacto default registry.
To begin with we’re going to pull down the Docker image we’re going to be working in for this part of the tutorial (Note: If you already did the docker pull
, this image will already be on your machine. In this case, Docker should notice it’s there and not attempt to re-pull it, unless the image has changed in the meantime.):
docker pull sl
#if you run into a premission error, use "sudo docker run ..." as a quick fix
# to fix this for the future, see https://docs.docker.com/install/linux/linux-postinstall/
# if you have a M1 chip Mac, you may want to do "docker pull sl --platform amd64"
Using default tag: latest
latest: Pulling from library/sl
175b929ba158: Pull complete
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:latest
docker.io/library/sl:latest
The image names are composed of NAME[:TAG|@DIGEST]
, where the NAME
is composed of REGISTRY-URL/NAMESPACE/IMAGE
and is often referred to as a repository. Here are some things to know about specifying the image:
- Some repositories will include a
USERNAME
as part of the image name (i.e.fnallpc/fnallpc-docker
), and others, usually Docker verified content, will include only a single name (i.e.sl
). - A registry path (
REGISTRY-URL/NAMESPACE
) is similar to a URL, but does not contain a protocol specifier (https://). Docker uses the https:// protocol to communicate with a registry, unless the registry is allowed to be accessed over an insecure connection. Registry credentials are managed by docker login. If no registry path is given, the docker daemon assumes you meant to pull from Docker Hub and automatically appendsdocker.io/library
to the beginning of the image name. - If no tag is provided, Docker Engine uses the
:latest
tag as a default. - The SHA256
DIGEST
is much like a Git hash, where it allows you to pull a specific version of an image. - CERN GitLab’s repository path is
gitlab-registry.cern.ch/<username>/<repository>/<image_name>[:<tag>|@<digest>]
.
Now, let’s list the images that we have available to us locally
docker images
If you have many images and want to get information on a particular one you can apply a filter, such as the repository name
docker images sl
REPOSITORY TAG IMAGE ID CREATED SIZE
sl latest 5237b847a4d0 2 weeks ago 186MB
or more explicitly
docker images --filter=reference="sl"
REPOSITORY TAG IMAGE ID CREATED SIZE
sl latest 5237b847a4d0 2 weeks ago 186MB
You can see here that there is the TAG
field associated with the
sl
image.
Tags are way of further specifying different versions of the same image.
As an example, let’s pull the 7
release tag of the
sl image (again, if it was already pulled during setup, docker won’t attempt to re-pull it unless it’s changed since last pulled).
# if you have a M1 chip Mac, this may not work. In that case continue the following examples using sl instead of sl:7
docker pull sl:7
docker images sl
7: Pulling from library/sl
Digest: sha256:d38e6664757e138c43f1c144df20fb93538b75111f922fce57930797114b7728
Status: Downloaded newer image for sl:7
docker.io/library/sl:7
REPOSITORY TAG IMAGE ID CREATED SIZE
sl 7 5237b847a4d0 2 weeks ago 186MB
sl latest 5237b847a4d0 2 weeks ago 186MB
Question 20.1
Pull down the
python:3.7-slim
image and then list all of thepython
images along with thesl:7
image. What is the ‘Image ID’ of thepython:3.7-slim
image? Try to do this without looking at the solution.Solution
docker pull python:3.7-slim docker images --filter=reference="sl" --filter=reference="python"
3.7-slim: Pulling from library/python 7d63c13d9b9b: Pull complete 7c9d54bd144b: Pull complete a7f085de2052: Pull complete 9027970cef28: Pull complete 97a32a5a9483: Pull complete Digest: sha256:1189006488425ef977c9257935a38766ac6090159aa55b08b62287c44f848330 Status: Downloaded newer image for python:3.7-slim docker.io/library/python:3.7-slim REPOSITORY TAG IMAGE ID CREATED SIZE python 3.7-slim 375e181c2688 13 days ago 120MB sl 7 5237b847a4d0 2 weeks ago 186MB sl latest 5237b847a4d0 2 weeks ago 186MB
Exercise 21 - Running Docker Images
To use a Docker image as a particular instance on a host machine you run it as a container. You can run in either a detached or foreground (interactive) mode.
Run the image we pulled as a container with an interactive bash terminal:
docker run -it sl:7 /bin/bash
The -i
option here enables the interactive session, the -t
option gives access to a terminal and the /bin/bash
command makes the container start up in a bash session.
You are now inside the container in an interactive bash session. Check the file directory
pwd
ls -alh
Output
/ total 56K drwxr-xr-x 1 root root 4.0K Oct 25 04:43 . drwxr-xr-x 1 root root 4.0K Oct 25 04:43 .. -rwxr-xr-x 1 root root 0 Oct 25 04:43 .dockerenv lrwxrwxrwx 1 root root 7 Oct 4 13:19 bin -> usr/bin dr-xr-xr-x 2 root root 4.0K Apr 12 2018 boot drwxr-xr-x 5 root root 360 Oct 25 04:43 dev drwxr-xr-x 1 root root 4.0K Oct 25 04:43 etc drwxr-xr-x 2 root root 4.0K Oct 4 13:19 home lrwxrwxrwx 1 root root 7 Oct 4 13:19 lib -> usr/lib lrwxrwxrwx 1 root root 9 Oct 4 13:19 lib64 -> usr/lib64 drwxr-xr-x 2 root root 4.0K Apr 12 2018 media drwxr-xr-x 2 root root 4.0K Apr 12 2018 mnt drwxr-xr-x 2 root root 4.0K Apr 12 2018 opt dr-xr-xr-x 170 root root 0 Oct 25 04:43 proc dr-xr-x--- 2 root root 4.0K Oct 4 13:19 root drwxr-xr-x 11 root root 4.0K Oct 4 13:19 run lrwxrwxrwx 1 root root 8 Oct 4 13:19 sbin -> usr/sbin drwxr-xr-x 2 root root 4.0K Apr 12 2018 srv dr-xr-xr-x 13 root root 0 Oct 25 04:43 sys drwxrwxrwt 2 root root 4.0K Oct 4 13:19 tmp drwxr-xr-x 13 root root 4.0K Oct 4 13:19 usr drwxr-xr-x 18 root root 4.0K Oct 4 13:19 var
and check the host to see that you are not in your local host system
hostname
<generated hostname>
Question 21.1
Check the
/etc/os-release
file to see that you are actually inside a release of Scientific Linux. What is the Version ID of this SL image? Try to do this without looking at the solution.Solution
cat /etc/os-release
NAME="Scientific Linux" VERSION="7.9 (Nitrogen)" ID="scientific" ID_LIKE="rhel centos fedora" VERSION_ID="7.9" PRETTY_NAME="Scientific Linux 7.9 (Nitrogen)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:scientificlinux:scientificlinux:7.9:GA" HOME_URL="http://www.scientificlinux.org//" BUG_REPORT_URL="mailto:scientific-linux-devel@listserv.fnal.gov" REDHAT_BUGZILLA_PRODUCT="Scientific Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.9 REDHAT_SUPPORT_PRODUCT="Scientific Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.9"
Exercise 22 - Monitoring, Exiting, Restarting, and Stopping Containers
Monitoring Your Containers
Open up a new terminal tab on the host machine and list the containers that are currently running
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Up n minutes <generated name>
Notice that the name of your container is some randomly generated name. To make the name more helpful, rename the running container
docker rename <CONTAINER ID> my-example
and then verify it has been renamed
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Up n minutes my-example
Specifying a name
You can also startup a container with a specific name
docker run -it --name my-example sl:7 /bin/bash
Exiting a Container
As a test, go back into the terminal used for your container, and create a file in the container
touch test.txt
In the container exit at the command line
exit
You are returned to your shell. If you list the containers you will notice that none are running
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
but you can see all containers that have been run and not removed with
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
<generated id> <image:tag> "/bin/bash" n minutes ago Exited (0) t seconds ago my-example
Restating a Container
To restart your exited Docker container start it again and then attach it interactively to your shell
docker start <CONTAINER ID>
docker attach <CONTAINER ID>
exec
commandThe attach command used here is a handy shortcut to interactively access a running container with the same start command (in this case
/bin/bash
) that it was originally run with.In case you’d like some more flexibility, the exec command lets you run any command in the container, with options similar to the run command to enable an interactive (
-i
) session, etc.For example, the
exec
equivalent toattach
ing in our case would look like:docker start <CONTAINER ID> docker exec -it <CONTAINER ID> /bin/bash
You can start multiple shells inside the same container using
exec
.
Notice that your entry point is still /
and then check that your
test.txt
still exists
ls -alh test.txt
-rw-r--r-- 1 root root 0 Oct 25 04:46 test.txt
Clean up a container
If you want a container to be cleaned up — that is deleted — after you exit it then run with the
--rm
option flagdocker run --rm -it <IMAGE> /bin/bash
Stopping a Container
Sometimes you will exited a container and it won’t stop. Other times your container may crash or enter a bad state, but still be running. In order to stop a container you will exit it (exit
) and then enter:
docker stop <CONTAINER ID> # or <NAME>
Exercise 23 - Removing Containers and Images
You can cleanup/remove a container docker rm
docker rm <CONTAINER NAME>
Note: A container must be stopped in order for it to be removed.
Start an instance of the sl:latest
container, exit it, and then remove it:
docker run sl:latest
docker ps -a
docker rm <CONTAINER NAME>
docker ps -a
Output
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES <generated id> <image:tag> "/bin/bash" n seconds ago Exited (0) t seconds ago <name> <generated id> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
You can remove an image from your computer entirely with docker rmi
docker rmi <IMAGE ID>
Question 23.1
Pull down the Python 2.7 image (2.7-slim tag) from Docker Hub and then delete it. What was the image ID for the
python:2.7-slim
images? Try not to look at the solution.Solution
docker pull python:2.7-slim docker images python docker rmi <IMAGE ID> docker images python
2.7: Pulling from library/python <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete <some numbers>: Pull complete Digest: sha256:<the relevant SHA hash> Status: Downloaded newer image for python:2.7-slim docker.io/library/python:2.7-slim REPOSITORY TAG IMAGE ID CREATED SIZE python 2.7-slim eeb27ee6b893 14 hours ago 148MB python 3.7-slim 375e181c2688 13 days ago 120MB Untagged: python@sha256:<the relevant SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> Deleted: sha256:<layer SHA hash> REPOSITORY TAG IMAGE ID CREATED SIZE python 3.7-slim 375e181c2688 13 days ago 120MB
Exercise 24 - File I/O with Containers
Copying Files To and From a Container
Copying files between the local host and Docker containers is possible. On your local host find a file that you want to transfer to the container and then
touch io_example.txt
# If on Mac need to do: chmod a+w io_example.txt
echo "This was written on local host" > io_example.txt
docker cp io_example.txt <NAME>:<remote path>
Note: Remember to do docker ps
if you don’t know the name of your container.
From the container check and modify the file in some way
pwd
ls
cat io_example.txt
echo "This was written inside Docker" >> io_example.txt
<remote path>
io_example.txt
This was written on local host
and then on the local host copy the file out of the container
docker cp <NAME>:<remote path>/io_example.txt .
and verify if you want that the file has been modified as you wanted
cat io_example.txt
This was written on local host
This was written inside Docker
Volume Mounting
What is more common and arguably more useful is to mount volumes to containers with the -v
flag. This allows for direct access to the host file system inside of the container and for container processes to write directly to the host file system.
docker run -v <path on host>:<path in container> <image>
For example, to mount your current working directory on your local machine to the data
directory in the example container
docker run --rm -it -v $PWD:/home/`whoami`/data sl:7
From inside the container you can ls
to see the contents of your directory on your local machine
ls
and yet you are still inside the container
pwd
/home/<username>/data
You can also see that any files created in this path in the container persist upon exit
touch created_inside.txt
exit
ls *.txt
created_inside.txt
This I/O allows for Docker images to be used for specific tasks that may be difficult to do with the tools or software installed on the local host machine. For example, debugging problems with software that arise on cross-platform software, or even just having a specific version of software perform a task (e.g., using Python 2 when you don’t want it on your machine, or using a specific release of TeX Live when you aren’t ready to update your system release).
Mounts in Cygwin
Special care needs to be taken when using Cygwin and trying to mount directories. Assuming you have Cygwin installed at
C:\cygwin
and you want to mount your current working directory:echo $PWD
/home/<username>/<path_to_cwd>
You will then need to mount that folder using
-v /c/cygwin/home/<username>/<path_to_cwd>:/home/docker/data
Exercise 24 - Using Singularity on lxplus
So far we’ve only discussed using Docker images and using the Docker runtime. For a variety of reasons Docker is not ideal for use on machines like lxplus
, but luckily Singularity is. Therefore, this next section will cover how to run Docker and Singularity images in a Singularity runtime environment.
Before we go into any detail, you should be aware of the central CMS documentation.
Running custom images with Singularity
As an example, we are going to run a container using the ubuntu:latest
image. Begin by loggin into lxplus
:
ssh -Y <username>@lxplus.cern.ch
Before running Singularity, you should set the cache directory (i.e.
the directory to which the images are being pulled) to a
place outside your $HOME
/AFS space (here we use the /tmp/user
directory):
export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
singularity shell -B $HOME -B /tmp/$(whoami)/ -B /cvmfs docker://ubuntu:latest
# try accessing cvmfs inside of the container
source /cvmfs/cms.cern.ch/cmsset_default.sh
INFO: Environment variable SINGULARITY_CACHEDIR is set, but APPTAINER_CACHEDIR is preferred
INFO: Converting OCI blobs to SIF format
INFO: Starting build...
Getting image source signatures
Copying blob 2ab09b027e7f done
Copying config 08d22c0ceb done
Writing manifest to image destination
Storing signatures
2023/04/22 14:05:16 info unpack layer: sha256:2ab09b027e7f3a0c2e8bb1944ac46de38cebab7145f0bd6effebfe5492c818b6
INFO: Creating SIF file...
INFO: underlay of /etc/localtime required more than 50 (69) bind mounts
If you are asked for a docker username and password, just hit enter twice.
One particular difference from Docker is that the image name needs to be prepended by docker://
to tell Singularity that this is a Docker image. Singularity has its own registry system, which doesn’t have a de facto default registry like Docker Hub.
As you can see from the output, Singularity first downloads the layers from the registry, and is then unpacking the layers into a format that can be read by Singularity, the Singularity Image Format (SIF). This is a somewhat technical detail, but is different from Docker. It then unpacks the SIF file into what it calls a sandbox, the uncompressed image files needed to make the container.
-B
(bind strings)The -B option allows the user to specify paths to bind to the Singularity container. This option is similar to ‘-v’ in docker. By default paths are mounted as rw (read/write), but can also be specified as ro (read-only).
You must bind any mounted file systems to which you would like access (i.e.
nobackup
).If you would like Singularity to run your
.bashrc
file on startup, you must bind mount your home directory.
In the next example, we are executing a script with singularity using the same image.
export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
echo -e '#!/bin/bash\n\necho "Hello World!"\n' > hello_world.sh
singularity exec -B $HOME -B /tmp/$(whoami)/ docker://ubuntu:latest bash hello_world.sh
exec
vs.shell
Singularity differentiates between providing you with an interactive shell (
singularity shell
) and executing scripts non-interactively (singularity exec
).
Saving the Singularity Sandbox
You may have noticed that singularity caches both the Docker and SIF images so that they don’t need to be pulled/created on subsequent Singularity calls. That said, the sandbox needed to be created each time we started a container. If you will be using the same container multiple times, it may be useful to store the sandbox and use that to start the container.
Begin by building and storing the sandbox:
export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
singularity build --sandbox ubuntu/ docker://ubuntu:latest
INFO: Starting build...
Getting image source signatures
Copying blob d72e567cc804 skipped: already exists
Copying blob 0f3630e5ff08 skipped: already exists
Copying blob b6a83d81d1f4 [--------------------------------------] 0.0b / 0.0b
Copying config bbea2a0436 done
Writing manifest to image destination
Storing signatures
2020/09/28 00:14:16 info unpack layer: sha256:d72e567cc804d0b637182ba23f8b9ffe101e753a39bf52cd4db6b89eb089f13b
2020/09/28 00:14:17 warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
2020/09/28 00:14:17 warn xattr{/uscms_data/d2/aperloff/rootfs-7379bde5-0149-11eb-9685-001a4af11eb0/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
2020/09/28 00:14:38 info unpack layer: sha256:0f3630e5ff08d73b6ec0e22736a5c8d2d666e7b568c16f6a4ffadf8c21b9b1ad
2020/09/28 00:14:38 info unpack layer: sha256:b6a83d81d1f4f942d37e1f17195d9c519969ed3040fc3e444740b884e44dec33
INFO: Creating sandbox directory...
INFO: Build complete: ubuntu/
Once we have the sandbox we can use that when starting the container. Run the same command as before, but use the sandbox rather than the Docker image:
export APPTAINER_CACHEDIR="/tmp/$(whoami)/Singularity"
singularity exec -B $HOME -B /tmp/$(whoami)/ ubuntu/ bash hello_world.sh
WARNING: underlay of /etc/localtime required more than 50 (66) bind mounts
Hello World!
You will notice that the startup time for the container is significantly reduced.
Question 24.1
What is the size of the singularity sandbox? Hint: Use the command
du -hs <sandbox>
.
Key Points
Docker images are super useful for encapsulating a desired environment.
Docker images can be run using the Docker or Singularity runtimes.