Release notes for DAQ/HLT-I S/W Release tdaq-01-09-00
General notes from s/w librarian
General info
This is a general DAQ/HLT-I release which is intended to be
used for the ATLAS detector commissioning and first data runs starting
from April 2008.
Since tdaq-01-08-04 the externals were changed, in order to
get latest patches for LCG packages: it is
compatible with LCG54d s/w
and offline-14.0.X
releases
branch.
Supported platforms and compilers
System
and compiler
|
CMTCONFIG
|
Compatibility
list
|
| Linux
2.6.9 (SLC4), gcc-3.4.x |
i686-slc4-gcc34-opt
|
SLC4.x,
RHEL4
|
| Linux
2.6.9 (SLC4), gcc-3.4.x |
i686-slc4-gcc34-dbg |
-
~ -
|
External s/w and run-time environment tuning.
external
package & version
|
tdaq-common-01-09-03
(NB: API changes in ers and
eformat!)
|
dqm-common-00-04-02
|
LCG
54d
|
Java
Runtime Environment 1.6.0
|
Release distribution
This release is distributed
in RPM format, along with all required dependencies.
Development environment
Versions
and paths of used external s/w are defined in TDAQExternal
package.
Tools needed for development:
The default compiler is gcc-3.4.x on SLC4, no need to
install it additionally.
CMT v1r20p20070720 (installed
with RPM in <inst_root>/CMT/v1r20p20070720)
JDK 1.6.0 (installed with RPM
in <inst_root>/sw/lcg/external/Java/JDK/1.6.0)
Important changes overview (see detailed notes below)
- new Run Control
- setup (infrastructure initializtion, h/w and PMG testing)
functionality is merged with main run controller functinlaity and now
distributed
- setup segments are moved from release installation area to the
databases area
- dal/DFConfiguration: schema changes
- tmgr: schema changes
Packages and tags used in the release
| ac |
v4r3p1 |
| AccessManager |
AccessManager-00-05-10 |
| clips |
clips-06-10-06 |
| cmdl |
cmdl-01-04-14 |
| cmem_rcc |
v2r0p23 |
| coca |
coca-01-05-00 |
| config |
config-01-16-02 |
| coral_auth |
coral_auth-01-02-00 |
| dal |
tdaq-01-09-00_patches-01 |
| DataflowPolicy |
v1r6p0 |
| dccommon |
v1r1p12 |
| dcmessages |
tdaq-01-09-00_patches_01 |
| ddc |
ddc-05-04-021 |
| DFConfiguration |
v8r5p18 |
| DFDebug |
v2r0p19 |
| DFExceptions |
v3r0p0 |
| DFInfo |
v2r2p21 |
| DFM |
v2r18p13 |
| DFMessage |
v2r0p0 |
| DFRelease |
DFnightly-00-00-43 |
| DFSubSystemItem |
v6r3p4 |
| DFTests |
v2r1p6 |
| DFThreads |
v2r4p0 |
| dqm_config |
dqm_config-00-01-02 |
| dqmf |
dqmf-00-01-07 |
| dvs |
dvs-00-31-11 |
| ed |
ed-00-02-16 |
| efd |
efd-01-13-11 |
| efio |
v2r6p9 |
| emon |
tdaq-01-09-00_patches_01 |
| errorRecovery |
v3r2p16 |
| ErrorReporting |
v3r0p1 |
| FarmTools |
FarmTools-01-05-03 |
| file_sampler |
tdaq-01-09-00-p3 |
| gatherer |
v8r5p81 |
| genconfig |
genconfig-02-08-01 |
| gnam |
gnam-010900-03-02-06 |
| gnamDummyLib |
gnamDummyLib-01-01-01 |
| histmon |
histmon-00-02-20 |
| igui |
tdaq-01-09-00_patches_03 |
| instrumentation |
v1r3p6 |
| io_rcc |
v2r0p41 |
| ipc |
ipc-04-07-01 |
| is |
is-05-04-00 |
| ispy |
ispy-00-00-11 |
| ktidbexplorer |
HEAD |
| l2dummy |
v1r8p10 |
| l2pu |
tdaq-01-09-00_patch_03 |
| l2rh |
tdaq-01-09-00_patch_03 |
| L2streamTest |
|
| l2sv |
v1r14p6 |
| ls |
ls-01-02-02 |
| mda |
mda-03-00-00 |
| MonaIsa |
tdaq-01-09-00-p5 |
| MonGatherer |
MonGatherer-02-07-00 |
| mrs |
mrs-01-09-04 |
| msg |
v1r1p7 |
| msgconf |
v1r2p24 |
| msginput |
v1r2p2 |
| msgsctp |
v1r0p3 |
| msgtcp |
v1r1p5 |
| msgudp |
v1r0p48 |
| oh |
oh-00-00-78 |
| ohp |
ohp-02-04-04 |
| ohpplugins |
ohpplugins-00-00-08 |
| oks |
tdaq-01-09-00_patches_01 |
| oks2cool |
oks2cool_tdaq-01-09-00_01 |
| oks2coral |
tdaq-01-09-00_patches_01 |
| oksconfig |
oksconfig-02-02-00 |
| OMD |
tdaq-01-09-00_p5 |
| omni |
omni-04-08-07 |
| onasic |
onasic_tdaq-01-08-09_00 |
| OnlinePolicy |
online-00-22-23 |
| OnlineRecovery |
OnlineRecovery-02-01-11 |
| OnlineRelease |
nightly-00-00-53 |
| opmon |
opmon-00-00-07 |
| owl |
owl-00-00-26 |
| PackageID |
v1r0p19 |
| PartitionMaker |
PartitionMaker-06-04-23 |
| PmgGui |
PmgGui-00-00-06 |
| ProcessManager |
ProcessManager-01-02-26 |
| pt |
v4r4p5 |
| ptdummy |
v3r10p5 |
| PTIO |
PTIO-03-08-01 |
| queues |
v1r0p16 |
| rcc_corbo |
v2r0p4 |
| rcc_error |
v2r0p5 |
| rcc_rodbusy |
v2r0p8 |
| rcc_time_stamp |
v2r0p6 |
| rcdal |
rcdal-00-01-11 |
| RCDBitString |
v1r4p1 |
| RCDExampleConfiguration |
v2r1p1 |
| RCDExampleModules |
v2r3p3 |
| RCDExampleTriggers |
v0r3p1 |
| RCDJtagChain |
v1r2p0 |
| RCDLtp |
v1r9p2 |
| RCDLtpi |
v1r9p3 |
| RCDLtpiModule |
v1r9p4 |
| RCDLTPModule |
v1r9p2 |
| RCDMenu |
v1r3p0 |
| RCDModuleDesign |
v2r2p2 |
| RCDTtc |
v1r4p0 |
| RCDUtilities |
v1r3p0 |
| RCDVme |
v1r6p1 |
| RCInfo |
RCInfo-00-01-00 |
| RCUtils |
tdaq-01-09-00_05 |
| rdb |
tdaq-01-09-00_patches-01 |
| rdbconfig |
tdaq-01-09-00_patches_01 |
| rm |
rm-02-02-04 |
| rn |
rn-01-02-05 |
| robin_ppc |
v0r0p77 |
| RobinTestSuite |
v2r1p21 |
| RODBusy |
v1r9p2 |
| RODBusyModule |
v1r9p2 |
| roib |
v2r7p2 |
| ROSApplication |
tdaq-01-09-00_patch01 |
| ROSBufferManagement |
v2r3p0 |
| ROSCore |
tdaq-01-09-00_patch01 |
| rose |
tdaq-01-09-00_patch_03 |
| ROSEventFragment |
v2r1p20 |
| ROSEventInputManager |
v2r1p3 |
| ROSfilar |
v2r0p35 |
| ROSGetInput |
v2r0p4 |
| ROSInterruptScheduler |
v0r0p7 |
| ROSIO |
tdaq-01-09-00_patch01 |
| ROSMemoryPool |
v2r2p0 |
| ROSModules |
v2r7p17 |
| ROSMonitor |
v1r1p2 |
| ROSObjectAllocation |
v2r0p1 |
| ROSRCDdrivers |
ROSRCDdrivers-00-00-36 |
| ROSRobin |
v0r1p56 |
| ROSslink |
v2r0p11 |
| ROSsolar |
v2r0p48 |
| ROSUtilities |
v2r4p0 |
| RunController |
RunController-01-06-18 |
| setup |
|
| SFI |
v4r7p32 |
| SFIOEmulators |
SFIOEmulators-00-06-04 |
| SFO |
v2r12p24 |
| siom |
v1r2p6 |
| sysmon |
v2r2p13 |
| sysmonapps |
tdaq-01-09-00_patch_03 |
| system |
system-00-00-12 |
| TDAQExternal |
TDAQExternal-00-10-00 |
| TDAQPolicy |
TDAQPolicy-00-07-08 |
| thread_allocator |
v1r0p7 |
| tidb2 |
tidb2_tdaq-01-09-00_01 |
| tmgr |
tmgr-01-04-11 |
| training |
training-00-04-05 |
| transport |
v1r0p57 |
| TriP |
TriP-00-00-12 |
| ttcpr |
v2r2p0 |
| TTCviModule |
v1r9p2 |
| vme_rcc |
v2r0p48 |
| wmi |
wmi-00-01-07 |
| xmext |
xmext-01-02-09 |
Changes in packages (in ABC order)
AccessManager |
config |
dal |
ddc |
DFConfiguration |
dqm_config |
dqmf |
dvs |
errorRecovery |
gatherer |
gnam |
histmon |
igui |
is |
l2pu |
l2rh |
ls |
MonGatherer |
mrs |
msg |
msginput |
oh |
ohp |
ohpplugins |
oks |
OMD |
OnlineRecovery |
PmgGui |
ProcessManager |
ptdummy |
PTIO |
queues |
RCDExampleModules |
RCDLtpi |
RCDLtpiModule |
RCDLTPModule |
RCInfo |
RCUtils |
rdbconfig |
rm |
rn |
RODBusyModule |
ROSApplication |
ROSCore |
rose |
ROSEventFragment |
ROSfilar |
ROSInterruptScheduler |
ROSIO |
ROSModules |
ROSRCDdrivers |
ROSRobin |
ROSslink |
ROSsolar |
RunController |
setup |
SFI |
sysmon |
sysmonapps |
system |
tmgr |
training |
TriP |
TTCviModule |
vme_rcc |
wmi |
AccessManager
Roles
The roles used to be defined in LDAP using a special LDAP schema developed for this purpose.
The drawback of this approach was that operating system security mechanism (PAM) could not access easily the roles information. The NIS netgroups defined in LDAP are used by several PAM modules (pam_access for example) and, in the same time, they have the aggregation ability which may be used to implement the roles hierarchy.
In this TDAQ release the roles are defined using NIS netgroups in LDAP.
-
The roles defined as NIS netgroups can be used now to define access control policies with various OS tools:
- sudo: the execution of some applications can be restricted to roles
- remote access: remote login (ssh) to nodes can be restricted to a set of roles
-
The roles hierarchy is defined in LDAP using the netgroups inclusion mechanism.
Policies
The Policy Access Point code has been reviewed and one of the changes is that the roles hierarchy is looked up in LDAP.
Changes and additions:
-
The PMG policy has one more attribute to specify if the requestor is also the owner of the process to be accessed.
-
There is a new resource category (Operating System) to be used for various policies regarding OS specific tools (e.g., open a shell, remote login through the application gateway).
Server
- Added the functionality to monitor the server load and to respond with SERVER_BUSY message when the load increases over a specified limit.
- When the SERVER_BUSY message is sent, a list of secondary AM servers is attached to the message so the clients may try to contact the other servers to get a valid response for their authorization request.
Client API (Java and C++)
The API accepts a list of AM servers to be contacted: the servers provided in the environment variables are checked one after the other until one of them responds with a valid answer.
If a server responds with "server busy" message and provides a list of servers to be contacted, then this list is used before continuing to iterate through the client list of am servers.
Tools
-
The amRoles has been replaced by amUserRoles script. The command line arguments are back compatible.
-
The amRolesManager script has been added to administer the roles and roles hierarchy in LDAP.
-
The amServer script has the functionality to set a cronjob to archive periodically the server logs.
config
New ConfigActions Class
An action is called on database load/reload/unload operation or config
object modification.
As an example
it is used to implement DAL's Component::disabled() algorithm in a way
transparent to users (the disabled() algorithm holds static set, that
needs to be updated in case of database modification).
To add an action user has to:
- implement new class deriving from ConfigAction class (defined in the
config/ConfigAction.h for C++ and config/jsrc/config/ConfigAction.java
for Java)
- create new action object and register it using Configuration::add_config_action(ConfigAction * ac)
method in C++ or config.Configuration.add_config_action(config.ConfigAction
obj) in Java
Unread All Template Objects (C++)
Add method Configuration::unread_all_objects() to unread all template
objects.
As an example
it is used when attribute string converter reads many template objects
(partition, segments, resources, applications, sw repositories) while
builds conversion map. All such objects have to be unread after
conversion map has been built, since their attributes may have
variables also need to be converted.
Existing unread_objects() method was replaced by
_unread_objects(CacheBase*) to be effectively used by the new
unread_all_objects().
Thread Safe (C++)
As required by new RunControl accessing DAL objects simultaneously from
several threads, the config and generated code has been made
thread-safe.
Performance Improvements and API Changes (C++)
Improve performance of most DAL methods replacing STL set and map by
GNU hash_set and hash_map.
Above requires changes in code using Configuration superclasses() and subclasses() methods:
Old methods:
- const std::map<std::string, std::set<std::string>
>& superclasses() const throw ()
- const std::map<std::string, std::set<std::string>
>& subclasses() const
New methods:
- const config::map<config::set>& superclasses()
const throw ()
- const config::map<config::set>& subclasses()
const
See config/map.h and config/set.h for more information about config::map and config::set classes.
Note, new methods return unordered data.
dal
Visit DAL package TWiki
page https://twiki.cern.ch/twiki/bin/view/Atlas/DaqHltDal
Algorithms
- if Segment's default host is defined, it is used as a host for
"localhost" applications (i.e. when "RunsOn"
is not set) of that segment;
- remove non-used Partition:get_applications()
algorithm; modify Application:get_host()
algorithm (add segment);
- add template
applications instance "magic number": if the TemplateApplication::Instances
attribute value is 0, then set number of template application instances
for given host equal to the number of CPU cores;
- bug fix:
re-implmenet the Component::disabled()
algorithm using iterations over auto-disabling resource sets:
- new algorithm calculates full set of all disabled components
when is called first time;
- when it is called next time, the result is returned by
seraching in the disabled set, that is efficient from performance and
scalability points of view;
- when database is reloaded, opened, closed or modified, the
disabled set is automatically clearned and initialized on next call of
the disabled() algorithm.
Schema Changes
- remove Partition:get_applications()
algorithm;
- modify
Application:get_host()
algorithm (add segment);
- removed ControlledByOnline attribute from
the BaseApplication class;
- remove Java
dal.Algorithms.get_template_applications()
algorithm since no one is
using it (is going to do the same in C++ as soon as opmon package will
be fixed);
- add abstract TestableObject
class;
- modify MasterTrigger
relationship in the Partition class;
- add MasterTrigger class;
- split InfrastructureApplication
class on the normal and the template branches:
- InfrastructureApplication
is derived from abstract InfrastructureBase;
- InfrastructureTemplateApplication
is derived from abstract InfrastructureBase;
Changes in dal_create_sw_repository
Changes in dal_dump_apps
- remove option -c host-id to provide ID of the partition's default
computer object (this is n ot supported by used
Partition::get_all_applications() algorithm);
- change meaning of -g seg-id option: when defined, print all
applications of this segment (before only template applications).
ddc
Introduction
The complete documentation for the DAQ - DCS Communication package
may be found at CERN/EDMS as https://edms.cern.ch/document/684955/5.3
General changes
The DIM library version 17 release 1installed, where a series of DIM
problems is fixed.
Extention of the Command Transfer facilities
The DDC Commander is moved to new control interface of TDAQ Run
Controller (this does not affect the DDC users).
The facility of sending different non-transition commands (i.e.
addressed to different PVSS datapoints) in parallel is introduced. In
order to enable this facility
- The additional command line option -B YES
of
ddc_ct_dim application should be used.
The default option is NO.
- The optional last parameter (bool
parallelizm) of the ddcExecCommand()
call must be set by true.
Take into account that with these settings an NT-command will be
sent to DCS for execution independently of both the NT-commands
addressed to other PVSS DIM RPC services AND any transition
command. If the required service is occupied by other command of the
same client (user process) the coming command will be queued. The
command of other client will be rejected in this case.
Extention of DDC Message Transfer facility
The DDC text variables declared as messages for TDAQ by default will
not be passed to TDAQ just after starting DDC-MT application - only on
change. In order to keep passing data just after DDC started, one has
to define in the DDC configuration (the schema of DdcTextMessage class is extended)
the string type attribute SendWhileConnection by "YES".
No changes in the Data Transfer application
DFConfiguration
Configuration Schema Changes since tdaq-01-08-04
SFO
-
Attribute AllStreamsTypes added to class SFOConfiguration
(vector of the stream type strings for which you don't want multiple
data streams)
-
Attribute LumiBlockTimeout added to class SFOConfiguration
-
Relationship to class SFODBConnection added to class
SFOConfiguration
-
New SFODBConnection class added
EFD
-
Created base class EFD_IOTask
for connection to external components (inherited by EFD_InputTask and EFD_OutputTask)
- the parameter mask is
a binary veto mask that can be used to mask out SFIs (SFOs) associated
to the parent EFD_Application.
Eg:
- the value 0xFFFE (ie: ...111111110) masks out all the SFIs
(SFOs) but the 1st one
- the value 0x4 (ie: 100) masks out the 3rd SFI (SFO)
- the parameter connectionMultiplicity
defines the number of parallel connections between the EFD and each SFI
(SFO)
-
Created base class EFD_Routing
for internal routing based on eformat::StreamTag::Type field
(inherited by EFD_InputTask
and EFD_ExtPTsTask)
- IndexOfPhysicsTag:
Index of next task to which forward events of type physics.
- IndexOfCalibrationTag:
Index of next task to which forward events of type calibration
- IndexOfDebugTag: Index
of next task to which forward events of type debug.
- IndexOfReservedTag:
Index of next task to which forward events of type reserved
-
Removed ExtPTsTask::answer
parameter. the routing is now based on the eformat::StreamTag::Type and
it is configured via the EFD_Routing
class
ptdummy
-
In class PtDummySteering,
attributes tagName and tagType replaced by streamtags: it is avector of
strings encoded according to the syntax
"streamName@streamType:ObeyLumiBlock". If not set the default is a
single entry: "ptdummy@physics:1"
-
In class PtDummySteering,
added attribute nbrPscErrors
used to generate dummy PSC error words for debugging purpose. Default
value is 0.
- In class PtDummySteering,
added attribute partialEventPercentage
that could be used to generate partial events. Eg: is set to 10, the
ptdummy steering will return a list of RobIDs containing only the
10% of the ROBs. Default value is 100 (100%)
Information Service Info Description Changes
since tdaq-01-08-04
SFO
-
Attribute CurrentEventSavedRate added to class SFO (current rate of saved events, events/s)
-
Attribute CurrentDataSavedRate added to class SFO (current rate of saved data, MB/s)
SFI
DFM
Configuration Data Changes (examples) since
tdaq-01-08-04
dqm_config
New features:
- DQM.schema.xml changes to
accomodate new features in dqm_core: multiple input source Parameter
and Parameter input source containing regular expresion
- DQParameter.InputDataSource changed from single to multi-value
string
- new enumeration attribute DQParameter.InputDataType =
Plain(default), RegularExpression. Attribute value is ignored in case
of multiple input strings.
- AlgorithmConfig::getReference
method is able to read reference from OH server as well as from static
root files.
- Reference hist. is read from Root file:
- path_to_root_file.root:Histogram_path_and_name
Ex:
${TDAQ_INST_PATH}/share/data/ReferenceHistograms.root:CaloClusterVecMon/EtaPhi/EtaPhiCut0
- path_to_root_file.root:
(don't forget the ' : ' ). In this case the reference histogram
path and name is the same as the checked histogram.
Ex:
${TDAQ_INST_PATH}/share/data/ReferenceHistograms.root:
- Reference histogram is read from OH server:
- OH#Server_name.Provider_name.Histogram_name
Ex: OH#Histogramming.LAr._EtaEcut0
Known bugs/To do:
- improve performance on IS.RunParams subscription
dqmf
New features:
- DQMF Agent:
- add variable substitution from the database. Database examples
in /dqmf/data/ are updated with this use case.
- The DQMF agent binary is able to write the DQ status to COOL
database (patch available for tdaq-01-08-04 as well).
In order to use this functionality one has to make some changes to the
sub-systems
DQ configurations.
The instructions are:
1. Add include statement with the "daq/sw/setup-environment.data.xml"
file to the sub-system DQ segment file:
<file
path="daq/sw/setup-environment.data.xml"/>
2. Add "CORAL_AUTH_PATH" Variable to the ProcessEnvironment
relationship of the
sub-system specific DQAgent object
3. Add
extra command line parameter to the command line of the sub-system
specific DQAgent object.
-c
oracle://ATONR_COOL;schema=ATLAS_COOLONL_GLOBAL;dbname=COMP200;
Note that NO quotes are necessary around the connect
string.
4. Pick from the following list those names which represents your
specific sub-systems:
PIXB, PIXEA, PIXEC, SCTB, SCTEA, SCTEC, TRTB, TRTEA, TRTEC, EMBA, EMBC,
EMECA,
EMECC, HECA, HECC, FCALA, FCALC, TILBA, TILBC, TIEBA, TIEBC, MDTBA,
MDTBC, MDTEA,
MDTEC, RPCBA, RPCBC, TGCEA, TGCEC, CSCEA, CSCEC, L1CAL, L1MU, L1CTP,
HLTL2,
HLTEF, EIDB, EIDEA, EIDEC, MIDB, MIDEA, MIDEC, JETB, JETEA, JETEC, MET,
BTGB, BTGEA, BTGEC
and use them as the names of the respective DQRegions in your DQ
configuration. The status of these
regions will be written to COOL by the DQMF Agent.
Known bugs/To do:
dvs
New features:
- pmgserver log files are changed to:
in Point1:
/var/log/pmg.log The .err file is not produced
elsewhere:
InitialPartition_LogRoot/initial/pmg_agent_MachineName_Timestamp.out
InitialPartition_LogRoot/initial/pmg_agent_MachineName_Timestamp.err
- application log files are changed to:
Partition_LogRoot/Partition_Name/ApplicationName_MachineName_Timestamp.out
Partition_LogRoot/Partition_Name/ApplicationName_MachineName_Timestamp.err
- dvs is registering to variable substitution from the database.
- setup is not using anymore dvs core functionality.
errorRecovery
General changes
-
No ERS Issue when application disabled by RunControl.
-
Minor improvements, e.g., better code protection against uninitialised use.
-
Replace obsolete ERS macros with with package specific
ERS Issues or the ERS_LOG macro.
gatherer
The ResetAfterPublish parameter is removed from schema. Now resetting is done after each iteration.
gnam
Changes from tdaq-01-08-04 to tdaq-01-09-00
New plugin API
- Added a new entry point:
This function will be called at the stopGathering transition. Users should not
update the histograms in this function because the changes will no be
published.
The function has been added mainly to support a clean, final, IS gathering by
MonaIsa.
histmon
General changes
Updated ERS macros just to compile the package. No new functionality.
The implementation of the THistRegister update to reflect changes in hltinterfaces/ITHistRegister.
This means that the rgister can now accept all ROOT objects. It does reasonable things with TH1 (whole faimilly of ROOT histograms)
TGrap and TGraph2D which are supported by offline and OH already.
igui
Introduction
The IGUI (Integrated Graphical User Interface) web page is at:
http://atlas-onlsw.web.cern.ch/Atlas-onlsw/components/igui/welcome.html
General changes
- The IGUI supports the new Run
Control. The igui_start option
-Digui.newRC (added in
tdaq-01-08-04) is no more
needed.
- A new version of the Infrastructure
panel is used. The displayed information is obtained from Setup IS server. For each segment an
Infrastructure panel is
available. In the final IGUI frame, there is no more a tab for Infrastructure
panel. In order to view the partition infrastructure the user has to
select the Root Controller in
the RunControl panel and then
select the Infrastructure sub-panel.
- In the Infrastructure
panel, selecting a component in the Infrastructure tree, it is possible
(using the right mouse) to
have a menu with the options Ignore,
Restart and Get Log. For the HW
and PMG group of components it
is
possible to have a menu with only one command (Retest Components). For a component
in the HW or PMG
groups themenu contain two items (Ignore and Get Log). Get Log option displays the log of
the test application for the selected component.
- The interface with the setup component was removed. The IGUI start-up and the exit procedures have been updated,
according to the the use of Root Controller for infrastructure setup.
- The main command panel and the Run Control Commands panel have
been modified according to the new Run Control FSM. There are 3 new
states (BOOTED, GTHSTOPPED and SFOSTOPPED).
- In the RunControl Status panel, the clear error button has been replaced
by a restart button. The restart button is enabled for
controllers and applications having the parent controller running.
- In the RunControl Status panel, the kill button is enabled for
running controllers and applications having the parent controller
running.
- In the Settings menu, a
switch has been added, allowing to control the display of applications
of the Online segment (Setup applications) in the RunControl panel tree. At IGUI
start,
the Online segment
applications are not displayed.
- In the Segment & Resource
panel, it is possible to see if an enabled Resource Set has a disabled
child in his sub-tree. In this case, the text color for the Enabled label is red (not black as for an enabled
Resource Set without disabled children).
- In the Monitoring and Data Set Tags panel the
configuration
information is obtained using Java DAL.
- The new flag logic is used for the Recording Enable/Disable.
- The search of ROS/RCD recording application was updated according
to DataFlow schema changes.
- For DataFlow panel, the subscription for DF IS servers is done at
the INITIAL to CONFIGURED transition The unsubscription is done at the
CONFIGURED to INITIAL transition and at the Shutdown command.
- An updated ELOG - IGUI interface is used. Only the ATLAS type
elog is supported. In the Comments
field, the user is asked to give some information on Run goal (at the start of the run)
and on Stop reason (at the
stop of the run). The Component Affected selection boxes have been
removed. Two new informations are automatically inserted in the ELOG:
the TTC partitions list and the trigger keys.
- The subscription to MRS server is done earlier, to allow the
display of messages during basic infrastructure setup.
- For an IGUI in Status Display
mode, if the partition infrastructure (ipc_server)
is stopped, the displayed RUN CONTROL STATE will be ABSENT. If the
partition is restarted (Setup IS server
is running again), the IGUI re-connects
at the new infrastructure.
- In the Monitoring
and Data Set Tags panel
the
configuration
information is obtained using Java DAL.
- In the RunParams panel
the Run Types information is
obtained using Java DAL.
- In the RunParams panel,
in the Filename Tag, the
spaces are not allowed.
- The IS Logger button was
removed.
- In the IguiPanel class (used by user panels), two new methods
have been added (connectSpecificInfrastructure()
and disconnectSpecificInfrastructure()).
They can be used for specific segment infrastructure related operations.
- An upated version of the Online
Help is available.
Advices
- If your panel subclasses
IguiPanel then there are some methods that can be very useful to
you:
- public boolean panelSelected():
it is called every time the panel is selected (you could use it to subscribe to
IS);
- public boolean panelDeselected():
it is called every time the panel is de-selected (you could use it to unsubscribe
from IS);
- public boolean iguiExit(): it
is called when the IGUI exits (very useful to do some cleanup and/or
remove IS subscriptions);
- public boolean accessControlChanged(int
newStatus): it is called every time the access control status
changes (to know the current status you can use the accessControlStatus field of the IguiFrame (i.e., the main IGUI) instance);
- public boolean runStateChanged(String
newState): called every time the run state changes (you could
use this method to keep a perment subscription to IS implementing it
for the right state).
- public boolean connectSpecificInfrastructure():
called at the end of transition NONE to BOOTED.
- public boolean disconnectSpecificInfrastructure():
called at the end of transition BOOTED to NONE.
- Due to the new Run Control structure the connectInfrastructure() method for
user panels may be called before IS servers are started. If this is the
case then one of the previously described methods should be used.
Bug fixes
- Correctly display the run control applications
in the Run Control tree.
- For Segments/Resource enable/disable commit operation, correct
the error frame in the case the partition file was read-only.
- For DataFlow panel, correct the list of info types for DF IS
server subscribe/unsubscribe.
- In the auxiliary IGUI, use a separate thread to update panels
after a database notification.
is
Using infomation object tags
IS repository provides a way of keeping several versions of an
infomation object in the IS repository. Each version has a unique
32-bit tag associated with it. Using tags it is possible to update,
remove and read value of a specific version of an infomation object.
The IS API, which provides this functionality, has been slightly
changed with respect to the
tdaq01-08-03(04) releases where it was itroduced for the first time.
Automatic tags assignment
When an infomation object is updated using the keep_history mode a new tag is
automatically created and assigned to the new version of the object.
The new tag is equal to the maximum tag value for the given object plus
1. This mode can be used by calling one of the following functions with
the true value for the keep_history parameter:
ISInfoDictionary::update( const std::string & name, ISInfo & info, bool keep_history = false );
ISInfoDictionary::checkin( const std::string & name, ISInfo & info, bool keep_history = false );
ISNamedInfo::chekin( bool keep_history = false );
Note that the default value of the keep_history
parameter is false which means
that the old object version will be overwritten by a new one and the
tag will remain unchanged.
Setting tags explicitely
It is possible to set new tags explicitely by using another form of the
update (checkin) functions:
ISInfoDictionary::update( const std::string & name, int tag, ISInfo & info );
ISInfoDictionary::checkin( const std::string & name, int tag, ISInfo & info );
ISNamedInfo::chekin( int tag );
If the object version associated with the given tag alreasy exists in
the repositopry then it will be replaced with the new one. Otherwise a
new version will be created and associated with the given tag.
Reading objects with tags
There are two groups of functions which can be used for reading objects
from the IS repostiory. The first group contains function which do not
require tag value to be provided:
ISInfoDictionary::getValue( const std::string & name, ISInfo & info );
ISNamedInfo::chekout( );
These functions return the most recent value of the
given infomation object.
Another group of functions require object tags to be explicitely
provided:
ISInfoDictionary::getValue( const std::string & name, int tag, ISInfo & info );
ISNamedInfo::chekout( int tag );
These functions return the object value which is
associated with the given tag. If there is no value for the given tag
in the IS repository then daq::is::InfoNotFound
exception will be thrown.
l2pu
General changes
-
Update to new dcmessages API.
-
Handle transactions in DataCollector for which send operation fails.
This was a condition for segment fault.
-
All Resources (monitor objects) in the L2PU are sent to ERS_LOG
during the STOP transition.
-
Replace obsolete ERS macros with with package specific
ERS Issues or the ERS_LOG macro.
l2rh
General changes
-
The LVL2 Result can span any number (was one in unpatched tdaq-01-08-04)
of message passing pages up to total message size of 4MB.
-
Use oldest LVL1 ID to garbage collect 'old' entries in the LVL2Result store.
-
Add/update monitoring attributes.
-
All Resources (monitor objects) in the L2RH are sent to ERS_LOG
during the STOP transition.
-
Replace obsolete ERS macros with with package specific
ERS Issues or the ERS_LOG macro.
-
Internal updates, e.g., use std::map for result sore book keeping.
ls
Introduction
This package substitues the obsolete logService. A new requirement whereby database technologies
other than ORACLE had to be dropped came in around spiring 2007. This new requirement meant the
re-writing of the logService package, which was higly dependent on MySQL. This opportunity was taken
to refactorize the code, especially the log manager, which was never very user friendly. The database
access in C++ is done using the CORAL interface, which hides the underlying technology. For the log manager,
JAVA was the language chosen, since it brings in the flexibility requiered to make this tool more intuitive.
The resulting java application can be run from the console, or remotely using the Java Web Start technology.
Known issues/bugs
None known.
To be implemented
Add an option to display statistics, internal and from IS.
Changes from previous release
Not many changes have been introduced. A few bugs have been fixed (like the SIGTERM not being caught) and a new
utility has been added: logGetPartitionNames which retrieves the name of the partitions currently stored in the database.
Example applications
None exist at the moment.
Applications
Log Manager
Usage: log_manager
Instruction on how to use to be written.
The Log Manager can also be run using Java Web Start technology from the link:
If possible, it is preferrable to launch this tool as a Java application rather than from the link above.
lsReceiver
Description: This application subscribes to the MRS service to receive and log on a database messages produced by TDAQ applications.
Usage: lsReceiver [-p partition-name] [-u user-name] [-n IS-server-name] [-s threshold-size] -c connect-string [-S subscribe-expression]
Options/Arguments:
-p partitionName Partition name
-u userName User name
-n ISserverName Name of the Information Service to publish the message rate into.
-c connectionString Database connection string.
Test units
logTest
Description: Test binary for the Log Receiver application.
Usage: logTest -c connect-string [-p partition-name] [-l complexity-level]
Options/Arguments:
-p partitionName Partition name
-l level Level of Complexity of the test [1: open/close - 2: tests the Log Service Infrastructure].
-c connectionString Database connection string.
Utilities
logSelect
Description: Application to retrieve log messages for a given partition according to the search criteria specified. By default, messages are dumped on std::cout.
Usage: logSelect -c connect-string -p partition-name [-i message-name] [-m machine-name]
[-a application-name] [-l time-low] [-u time-up] [-s severity]
[-x text] [-r parameters] [-d order-list] [-e max-rows] [-f offset-row]
Options/Arguments:
-p partitionName Partition name
-c connectionString Database connection string.
-i message-name Message name or ID.
-m machine-name Machine name where the message was issued.
-a application-name Application name where the message was issued.
-l time-low Lower time threshold.
-u time-up Upper time threshold.
-s severity Message severity:
1 - FATAL
2 - ERROR
3 - WARNING
4 - DEBUG
5 - INFORMATION
6 - SUCCESS
-x text Text in the message body.
-r parameters Message parameters.
-d order-list Parameter to sort the messages by.
-e max-rows Maximum number of rows to retrieve from the database; 100 by default. If 0, all entries are retrieved.
-f offset-row Offset in the table to retrieve the messages from.
logDelete
Description: Application to remove log messages for a given partition according to the search criteria specified.
Usage: logDelete -c connect-string -p partition-name [-i message-name] [-m machine-name]
[-a application-name] [-l time-low] [-u time-up] [-s severity]
[-x text] [-r parameters]
Options/Arguments:
-p partitionName Partition name
-c connectionString Database connection string.
-i message-name Message name or ID.
-m machine-name Machine name where the message was issued.
-a application-name Application name where the message was issued.
-l time-low Lower time threshold.
-u time-up Upper time threshold.
-s severity Message severity:
1 - FATAL
2 - ERROR
3 - WARNING
4 - DEBUG
5 - INFORMATION
6 - SUCCESS
-x text Text in the message body.
-r parameters Message parameters.
logGetPartitionNames
Description: Application to retrieve the list of partition names.
Usage: logGetPartitionNames -c connectionString
Options/Arguments:
-c connectionString Database connection string.
logCleanDatabase
Description: Application to clean the database by removing all the existing tables.
Usage: logCleanDatabase -c connectionString
Options/Arguments:
-c connectionString Database connection string.
MonGatherer
Implemented summation of arbitrary IS objects with ISInfoDynAny class.
If an object has some composite structure, the respective fields will be summed if they are numbers or vectors of numbers.
Other fields are not summed, instead the first observed value is assigned to the sum.
A basic check of types and dimensionality of arrays is performed
Corrected averaging of profile histograms.
For the profile histograms in the 'average' mode histograms are considered as independent measurements, so
the error of average is less, than the error for each histogram.
mrs
General changes
New implementation of algorithm to handle messages when internal queue on private server (worker) is full.
New implementation of subscribe and is_subscribe_active methods to fix possible inconsistency between
private and public servers if private server has connectivity problems during subscription/subscription_checking operation.
New IS information provided by public and private servers
MRS server (public and private component)
- Message Reporting System (MRS) is following design of public and private servers. The public server
is now registered with interface name mrs/manager and server name MRS. The private servers
are not published in IPC. To have working MRS system at least one worker must be running,
started after public server. It is the worker which provides message processing and receiver notification.
Both servers can now publish IS information about working status.
- Public Server binary: mrs_server (new/changed are bold)
Options/Arguments:
-p partition-name partition name
-t seconds backup period in seconds
-b file-name backup file name
-r file-name restore file name
-l int set load balancing algorithm, default 0 (round-robin)
-v verbosity-level possible levels are 0, 1 (default) and 2
-i seconds IS info publishing rate period in seconds, default = 0 (no publish), if 0< period <5 => periods = 5
-n IS-server-name IS server name, default RunCtrlStatistics (to publish information rest of name taken from TDAQ_APPLICATION_NAME, to retrieve information rest of name provided by worker)
-s use PMG synchronization
- Private Server binary: mrs_worker (new are bold)
Options/Arguments:
-p partition-name partition name
-D file-names database data files
-I plugin-name e.g. oksconfig (default)
-v verbosity-level possible levels are 0, 1 (default) and 2
-t seconds IS info publishing rate period in seconds, default = 0 (no publish), if 0 periods = 5
-n IS-server-name IS server name, default RunCtrlStatistics (rest of name taken from TDAQ_APPLICATION_NAME)
-a msg Threshold for internal queue in numbers of msg, default 500k messages
-b value Fraction to decrease queue size during suppression, default 0.8
-u Throttle-switch Set 1 to allow message suppression (throttling), by default turned off (set to 0)
-c msg Threshold to start message suppression (throttling) of messages, default 30 messages
-d seconds Time interval between messages for throttling reset, default 30 seconds
-S severity Severities of messages to be throttled, separated by |, default 'SUCCESS|INFORMATION|DIAGNOSTIC|WARNING|ERROR
-e int Number of active internal threads for queue processing, default=0, max = 8
-s use PMG synchronization.
- For example of setting see the MRS part of infrastructure application in setup.data.xml
- Decoupling message receiving and processing
- In the implementation the received message is stored in internal queue and in parallel processed
by internal threads. The queue should cover peak moments with temporary raise of received messages. The size
of queue is set in number of messages. Should be scaled according to machine resources on which private server is
running. By default it is set to 500k messages (for standard use case no need to change) . NEW: If queue limit is
reached, the MRS private server start suppression of incoming messages. The processing of messages in queue continue and setting of the queue processing
is not changed. The limit of maximum queue size,the queue threshold, is changed to new value equal to product of initial queue threshold and Fraction
(see parameter -b above, by default 0.8). The queue processing makes free space in queue. If the queue size reaches the new temporary decreased threshold,
the incoming messages suppression is stopped, suppression issues are reported and all records produced by suppression are reseted. The temporary
threshold value is changed back to the original value.
- Using IS service by MRS
- private MRS server (worker) is publishing in IS if parameter -t is >0, by default is set to 0.
The non-zero (should be greater then 5) number is period in second of info publishing.
By default is IS info published in server name RunCtrlStatistics.app_name, where app_name
is taken from system variable TDAQ_APPLICATION_NAME. The app_name must be unique
(second private server with already reigistered name at public server will not join public server).
The name RunCtrStatistics can be changed by parameter -n (on private as well as on public MRS server).
If proper IS server is not running at MRS service startup, the MRS public/private servers switch to IS presence test mode. The servers keep testing
IS server availability with period of 3 min. If the presence of IS is detected, MRS service start
publishing with period set by MRS parameters.
- public MRS server is retrieving and publishing information from/to IS server. The retrieved information is used for optional load balancing.
MRS API
The sender
The receiver/li>
The commander:
- New switch to apply changes to internal queue of private servers
- The options related to internal queue of MRS private servers
-
-t msg Threshold for internal queue in numbers of msg, default 500'000 messages
-
-b value Fraction to decrease queue size during suppression, default 0.8
- Example:Set internal queue at private servers to 200'000 messages and fraction to decrease threshold to 0.7 * threshold
- mrs_commander -c 1 -t 200000 -i 0.7
- New options related to message suppression (throttling)
-u Throttle-switch 0 switch off, 1 switch on message suppression
-a msg Initial threshold for message suppression throttling, default 30 messages
-b seconds Time interval between messages for throttling reset, default 30 seconds
-S severity Severities to be throttled, separated by |, e.g. 'INFORMATION|WARNING'
- Example:Turn on message suppression with initial threshold 50 messages, message reset timeout 10 seconds and suppress messages with severity SUCCESS and ERROR
- mrs_commander -c 0 -u -a 50 -b 10 -S "SUCCESS|ERROR"
Documentation
msg
A new default buffer manager is used that is simpler than the old
version with thread specific data, but more efficient that the
alternatives (which were never meant for production use).
class MultipleBufferManager
This implemenation guarantees that the just received message is contained in a single
memory area (manipulation of the Buffer object may still lead to multiple
internal memory areas), and that there are no more size limits on the
message - the old default version had an upper limit due to the internal
structure and the maximum number of iovec structures that can be used
in a recvmsg call.
The old default buffer manager can be selected by chosing threaded
as the DFParameters.BufferManager entry. The simple
and single options should no longer be used, they were initially
only meant for testing.
msginput
Package msginput
The 'msginput' package contains various utility functions which
are used throughout the dataflow software.
class MessageHeader
The 'MessageHeader' class can be used to decode the generic
values in every message buffer. A typical use is to initialize
it with a buffer that has just been received. The various methods
return the field values:
MessagePassing::Buffer *buf = ...
MessageInput::MessageHeader header(buf);
if(header.valid()) {
switch (header.type()) {
case ...:
}
} else {
// error message
}
class MessageDispatcher
The 'MessageDispatcher' class can be used to register callback handlers
for different message types and handle them centrally. There are two different
ways to do this: a backward compatible 'old style', where every callback
function has to inherit from the 'InputHandler' base class and implement
two methods:
void InputHandler::message(Buffer *, void *);
void InputHandler::timeout(unsigned int xid, void *);
When registering an InputHandler, an additional 'void *' cookie parameter can
be passed, which will in turn be passed back when the callbacks are executed
(this is the second 'void*' parameter in the signature above).
The new-style callbacks use the boost::function and boost::bind packages
to provide much more flexibility. Any free function, an arbitrary method
or an object with the function call operator can be used as a callback.
All it has to do is be compatible with the boost::function taking
one parameter (a 'Buffer *') for the message handlers, and no parameter
for the timeout callbacks.
Note there are no more 'cookie' parameters, and the timeout callback does
not even get the 'xid' parameter it used to get in the old scheme. The reason
is that all these additional parameters can now be specified via
boost::bind(), providing much more flexibility than before.
Some examples:
- Using a free function as callback:
void my_default_handler(Buffer *buf)
{
cout << "Got a buffer !" << endl;
}
dispatcher->register(my_default_handler);
- Using different methods of the same object for
callbacks:
class MyClass {
public:
MyClass()
{
dispatcher->register(MSG_TYPE_A, boost::bind(&MyClass::handle_msg_type_A,
this,
_1));
dispatcher->register(MSG_TYPE_B, boost::bind(&MyClass::handle_msg_type_B,
this,
_1));
}
private:
void handle_msg_type_A(Buffer *buf);
void handle_msg_type_B(Buffer *buf);
};
- Additional parameters for a handler function:
class MyClass {
public:
class Context { // ... };
MyClass()
{
Context *ctx = new Context(...);
dispatcher->register(MSG_TYPE_A, boost::bind(&MyClass::handle_msg_type_A,
this,
_1,
ctx));
}
private:
void handle_msg_type_A(Buffer *buf, Context *mycontext);
};
- Using separate xid and timeout handlers, passing the xid to the timeout handler:
class MyClass {
public:
class Context { // ... };
MyClass()
{
unsigned int xid = ...;
dispatcher->register(xid,
boost::bind(&MyClass::reply_handler
this,
_1),
boost::bind(&MyClass::timeout_handler
this,
xid),
10000);
}
private:
void reply_handler(Buffer *buf);
void timeout_handler(unsigned int xid);
};
- Assume that you don't really care about the xid, or that you would like
to use your own internal data structures for book-keeping. You can add
parameters to the normal and timeout handlers, and bind them to the
appropriate values. The message dispatcher never cares about them:
class MyClass {
public:
class DataRequest { // ... };
MyClass()
{
unsigned int xid = ...;
DataRequest *req = new DataRequest();
dispatcher->register(xid,
boost::bind(&MyClass::reply_handler
this,
_1,
req),
boost::bind(&MyClass::timeout_handler
this,
req),
10000);
}
private:
void reply_handler(Buffer *buf, DataRequest *req);
void timeout_handler(DataRequest *);
};
- Using a function object. Maybe you would like to couple the
handler to some state:
class MyCallback {
public:
explicit MyCallback(int someState);
void operator()(Buffer *buf)
{
// use 'buf' together with state
}
private:
// object state, that can be copied and assigned
int m_someState;
};
dispatcher->register(MSG_TYPE_A,
MyCallback(112));
class InputThread
The 'InputThread' is an additional class that helps to execute
the MessageDispatcher in a separate thread.
dispatch = new MessageDispatcher();
dispatch->register(...);
inputThread = new InputThread(dispatch);
inputThread->start();
....
inputThread->stop();
inputThread->wait(Thread::Finished);
oh
New features of the OH Display application (oh_display):
- Displays IS server name for all providers and histograms
- Items in the partitions/providers tree are always sorted
alphabetically
- Can display several histohrams in the same canvas
- Automatically updates histograms which are displayed in canvas
ohp
General changes
OHP is the new Online Histogram Presenter.
For more details, please refer to the "README" file (in the ohp installation directory).
New drawing options are supported. In share directory a commented example (example.conf.xml) is provided
with instructions on how to write a configuration file for OHP.
OHP can be configured through an XML file instead of the ASCII format. XML is more flexible: for example it is possible to break the configuration in multiple files and use an "include" statement. Configuration of OHP through old ASCII format is deprecated, ohp will still work with ASCII but new features are compatible only with XML.
OHP has a plug-ins system. Users can develop their own GUIs to extend or modify OHP functionalities. The standard GUI has been migrated to a plug-in, even if nothing changes in the aspect/usage. You can refer to ohpplugin package as an example of developing plugins. More information on plug-ins system can be found on OHP twiki.
There are some new improvements for this TDAQ release:
I- Browser is now much faster in being filled/updated
Web pages are available:
OhpMonitoring TWiki
OhpUserGuide TWiki
To be implemented/known issues:
- New GUI
- Connection to multiple OH servers
ohpplugins
General changes
This package contains common plug-ins for the OHP general enough to be used by all ATLAS sub-systems. Also specific examples from detectors are included.
The package contains the follwing plug-ins:
Plug-ins list
- Histo Window: Simple Plugins to display histograms in a tab
- Histo Window Tab: A set of Histo Window plug-ins organized in tabs
- Status Window: A penl with the status of OHP and information on active session
- Tile Cosmics: This is a TileCal specific plugin that selects some histograms out of a list to be displayed.
Under Development
- Browser: An extended version of classic OHP browser
- MDI Interface: A desktop to arrange and control the plug-ins
- Histo Window RegExp: A modified version of Histo Window with support for regular expression
- Options Editor: To help modifying histogram drawing options
- Help System: Simple help system, html-based
oks
OKS Library
- reload any consistent data file (also including changes in the
included files)
- speed up OKS XML loading (about 25% faster comparing with release
1.8.4)
- when read XML schema file, throw exception if base class is not
loaded
- oks query supports regular expressions (add attribute comparator
'~=')
OKS Archiving Library
- use temporal tables to create "try" incremental data version (to
reduce unnecessary overhead on Oracle stream replication as requested
by ATLAS Oracle DBA)
OKS GUI Library
- add support for mouse wheel (can be used in most dialogs of OKS
schema and data editors)
OKS Data Editor
- improvements in the Find/Replace dialog:
- optionally find by Class and Attribute/Relationship names
- present result as table
- select visible classes by name and objects by
UID (Savannah request http://savannah.cern.ch/bugs/?34890)
- see search panel at bottom of main window and object dialogs
- the search panel supports simple string search (auto-select
when modify the selection pattern) and regular expressions (press
button appearing
when this option selected to apply regular expression)
- improve performance when build class dialog containing big number
of objects (can be seen in 1.8.4 when number of objects is greater than
10K)
OMD
Changes
- Configuration-File Format Changed. Use "-o" option to convert old files. See Help file
- Added "-C" option to combine multiple configuration files. See Help file.
- Custom Tables are available. Custom Tables display the selected properties of the
selected objects in a row. See Help file.
- Help file contains information about commandline parameters.
OnlineRecovery
Introduction
The OnlineRecovery package is responsible for all recovery mechanisms from the RunControl point of view. It consists of two main parts. The first is a
plug-in to the new RunController and will reproduce all recovery related behavior seen in the old RunControl (such as restart, ignore, etc). In addition it
will include some more advanced recovery mechanism and also better statistics. The second part is a stand-alone server which will handle errors with a
system wide impact and will also receive information from the RunController plug-ins.
RunController expert system
This is integrated as a plug-in to the new RunController. It receives updates directly from the controller and decides what to do in error-cases (such
as ignoring, restarting, etc). It implements the ExpertSystemInterface defined in the RunController package.
Server
The OnlineRecovery server is responsible for handling all system wide errors. Currently the automatic disabling of RODs and the notification to the
corresponding ROS has been enabled.
Core functionality
The OnlineRecovery takes the decision what to do in case of applications dying, going into the error state, failed test, etc.
Normally the action taken will be according to the configuration settings for the specific application (IF-FAILS, IF-DIES, IF-ERROR)
with the following exceptions:
-
An application with decision set to RESTART will be restarted up to a maximum of 5 times. After this restarts will not happen and it will be considered an error instead.
If the last restart happened more than 30 min ago the counter is reset (it is also reset if the controller goes back to NONE state).
-
An application with a failed test will be re-tested up to a maximum of 3 times. After this a failed test is considered an error. If the last retest was more than
10 min ago the counter is reset.
-
If specific recovery is defined (through extra rules files) these might override the database settings.
- An application with decision IGNORE will not be ignored if they are depended upon by an application that is running and has membership IN. In this case it is considered
and error and an ers message will be sent containing all the applications depending upon it.
Changes since tdaq-01-08-04
Please read through 'Core functionality' section for more details on changes.
- Automatic re-enabling
- Raising error instead of ignore if application is depended upon
- Maximum number or restarts and retests
- Specific recovery for L2, EF and EB have been implemented. The behavior should be the same as for the old controller.
Known bugs
- Backup-hosts are not taken into account
Utilities
A graphical utility that allows a complete view of all errors in the system (including when and where) is under development and
will be introduced as a patch when it is ready.
PmgGui
Introduction
The PmgGui package
includes GUI interfaces to the ProcessManager
system.
Changes wrt tdaq-01-08-04
PmgISPanel
- The iguiExit() method
has been implemented to be sure to remove IS subscription at exit (this
changes has effect only when running in the main IGUI);
- Checking if the PMG IS server is ready again after a failure.
PmgControlPanel
- Added an option to the process tree to show infomation abuot
agents too.
ProcessManager
Changes since tdaq-01-08-04
General
- Using boost program_options
instead of cmdl for command
line parsing;
- Using IPCPipeline in
helper applications involving partition wide operations;
- The Handle constructors
now throws the pmg::Invalid_Handle
exception when the built Handle
is not valid;
- Added the name of the host where the application could not be
started to the pmg::Failed_Start
exception.
Server
- AMBridge: added check on
the process owner;
- ProcInteface: fixed
problems with proc fs and 64 bit machines;
- Fixed problem stopping the report threads when the server is aked
to exit;
- Added idl methods to ask the server information about itself (and
the host where it is running);
- Fixed launcher reconnection problem (bug #34183);
- Improved the way to check if the launcher is running;
- Fixed problem showing in IS Agent info more than 4GB of RAM.
Launcher
- Added the possibility to propagate the GID bit (if present) on
working and log dirs: needed to solve some log files issue at P1;
Helpers
- pmg_start_app: The current LD_LIBRARY_PATH
and PATH vars are added to the process environment only if not defined in the command line
options;
- New helper application pmg_wait_app
(it waits for an application to exit):
Return
codes
0 - The
application exited
1 - Timeout elapsed while
waiting
2 -
Application not found
3 - Some
error occurred (i.e., while contacting the pmgserver)
Command line
options:
-n [ --AppName ]
arg The name of the application
-p [
--Partition ] arg The name of the partition
-t [ --Timeout
] arg Time to wait in seconds (if none wait for
ever)
-H [ --Host
] arg Host the application
should be running on
-h [ --help
] Print
help message
Note:
Because of the move from cmdl
to boost program_options some
helper application have small changes in their command line options.
If you use some of them please check the new options using the -h switch.
ptdummy
Introduction
Dummy implementation of the HLT Steering functionality. The HLT
processing operation is emulated by a burning loop.
Changes since tdaq-01-08-04
The configuration
schema has been changed and some new functionalities have been added.
The ptdummy steering is configured via the DB class PtDummySteering:
-
burnTime Burn
time in milliseconds; not used if burnTimeDistribution is
set, def=0
-
burnTimeDistribution
Eg: "H|rootFileName|histoName|rescaleFactor" or
"F|exp(-((x-300)**2)/(2*(40**2)))|600"
-
resultSize
Size of the ef fragment (in words)
-
acceptance
Acceptance of the selection: range=0...1, def=1.
-
streamtags
Vector of StreamTags associated to
accepted events. Syntax: streamName@streamType:ObeyLumiBlock|prob,
def="ptdummy@physics:1|100". The "prob" field is used to set the
probability of the given streamTag.
-
nbrEfTriggerInfo
Number of EF trigger info words, def=0
-
nbrPscErrors
Number of PSC error words, def=0
-
partialEventPercentage
Percentage of ROBs to be selected for event stripping tests,
range="1..100", def=100
PTIO
Introduction
Define communication library between EFD and PTs
General changes
- Removed answer strings in PTIO -> EFD communication
- Use eformat::helper::streamtag class to route the events in the
EFD
- Removed old method based on answer strings
- Support for event stripping:
- The HLT steering can provides the PTIO with the list of
ROB IDs (or detector IDs) to be used by the SFO (or, in the future by
the EFD, to strip the event). The list is written in a special event
header (DAQEvent, defined in the efio package) that wraps the eformat
event. The DAQEvent header is readed by the SFO, that do the actual
stripping and removed the header before storing the event
queues
Package queues
class ProtectedQueue
The ProtectedQueue class has a new templated clear() method which
allows you to specify a function. This function is called on each remaining
item in the queue as it is removed.
This handles the case where the user called wakeup() on the queue,
but then has to manually clear the queue because there is some additional
action to be taken for each item. At the moment the only way to do this
is to call try_get() repeatedly.
The most common case where the queue entry type is a pointer which has to be
deleted is handled by the pre-defined deleter class:
ProtectedQueue queue;
// many times..
queue.put(new SomeT());
queue.wakeup();
queue.clear(ProtectedQueue::deleter());
RCDExampleModules
Introduction
The package is described in EDMS.
General changes since release TDAQ-01-08-04
- new scheme for publishing to IS ( no more strings ..)
Changes in API
Known bugs, problems and limitations
Currently none.
RCDLtpi
Introduction
This package contains low-level software for the Local Trigger Processor. Please see https://edms.cern.ch/document/584060/1 for further details.
General changes
New package, with parts of code taken from RCDLTPModule.
RCDLtpiModule
Introduction
This package allows the LTPi to be configured in the standard way. More details can be found on TWiki: LTPi user guide
General changes
RCDLTPModule
Introduction
This package contains RCD Software for the Local Trigger Processor. Please see ATLAS Timing Signal Distribution and https://edms.cern.ch/document/588024/1 for further details.
General changes
- restructured oks schema: RCDLTPModule now has a relationship 'Configuration' to an 'LTPConfigBase'. There is an LTPConfig class for every typical usecase of the LTP.
RCInfo
Introduction
This new package contains the xml schema for several IS classes used by
the DAQ:
ChildrenStatistics (statistic about died/restarted applications)
DAQApplicationInfo (status of an applications from a "process" point of
view)
LuminosityInfo (luminosity block number)
RCStateInfo (run control state of applications)
TestInfo (information about tests executed on the DAQ system)
RunParams (basic information about the run)
StorageInfo (basic information about raw data files stored on file)
The c++ and java classes are generated dynamically.
C++ headers are installed in rc/ClassName.h and rc/ClassNameNamed.h.
The Java classes are installed as RCInfo.jar.
RCUtils
Introduction
This new package contains several utilities which used to be in the
RunControl, setup and onl_integ packages which have been removed from
the release.
As an example it contains the setup_daq script and a script to
migrate the databases from the old to the new run control.
In addition this package contains the run control plugin for the web
based monitoring (WMI).
General changes
New package.
New utilities
old2newRC.sh: script to translate databases from the old to the new run
control. (go into directory containing your database files and execute
it)
setup_daq <-h>: script to launch the DAQ. Note that the
<-i> option is not supported anymore.
The old script get_tdaq_env.sh
(onl_integ package) is substituted by the c++ utility rc_print_partition_env. From
within a bash script also the function get_partition_env ${TDAQ_PARTITION}
can be used after having sourced the file
${TDAQ_INST_PATH}/share/bin/setup_functions: this function requires
${TDAQ_DB_DATA} to be set.
rc_commander: simple graphical tool to send commands to a Controller.
rc_sendcommand <-h>: sends commands to a Controller.
rc_isread <-h>: reads the RC state of a controller from IS.
rc_waitstate <-h>: waits for a controller to reach a particular
RC state.
rc_timetest <-h>: sends a series of commands to a Controller and
measures the duration of each transition.
rc_decode_detectormask <-h> : decodes a numerical detector mask
into human readable TTC pastition names.
rc_print_root <-h>: prints the name of the root controller and
the top segment.
rc_print_tree <-h>: prints the tree of controllers for a
partition.
rc_getrunnumber <-h>: gets the current run number for a
partition from IS.
rc_checkapps <-h>: dumps a tabular view of the applications
in a
partition with their basic properties.
rc_print_partition_env <-h>: extracts form the database the basic
environment for the partition.
rc_is2cool_archive <-h>: inserts basic run information into COOL.
Changes in API
No APIs.
Known bugs, problems and limitations
None.
rdbconfig
Bug fixes
In C++ fix unread/read problem for vectors (before values of attributes
were duplicated on each unread() operation instead of replacement).
New Format for plug-in parameter (C++ and Java)
Add new format for plug-in parameter: "server-name[@partition-name]", e.g. "rdbconfig:RDB@be_test" references
rdb_server with name RDB
running in partition be_test.
The old format "[partition-name::]server-name"
is also supported, e.g. "rdbconfig:be_test::RDB"
references rdb_server with name RDB
running in partition be_test.
rm
Introduction
The main last changes in the rm package for release tdaq-01-08-04
described at the end of this notes because they was not added correctly
in according release notes. Main change for release tdaq-01-09-01 is
changes in RM_Server backup facility.
RM server backup facility changes for
tdaq-01-09-01
- API was not changed but RM now keeps DAL backup and journal
instead of using log file for backup store/restore.
- Backup file keeps all current RM data. Log file keeps all
requests that changed RM data since backup was saved.
- Save backup is produced every time during start or update
partition in order to decrease the size of journal.
- The name of backup and journal differs only by extension
(".journal" added to backup name for journal file name). The name of
according RM_Server is added automatically to the name of backup if it
was not a part of backup name. It was done in order to avoid the lost
of data if somebody starts some RM server for tests. RM server keeps
backup and journal data in the files with the same names but with
additional extension ".old" before save backup operation.
- RM server saves backup automatically every 5 hours if during 12
hours there was no requests to the RM that changed data.
tdaq-01-08-04 changes:
Changes in IDL
Changed:
- numberOfCores added in Computer and Computer_Resource structures
Removed:
- void registerResources (
in string partition, in string dbname) (registerResourcesFromClient
should be used instead)
- void updatePartition(in
string p) (void updatePartitionFromClient( in string p, in
daq::rm::Conf_Partition confp_id ) should be used instead.)
Added:
- Exception ProcessNotFoundE
{ string partition; string pid; string computer; }
- Methods
- daq::rm::RMHandle
requestResourceForProcess ( in string p, in daq::rm::Res_ID_list
res_list, in string c, in unsigned long pid, in string
client) raises(
daq::rm::PartitionNotFoundE, daq::rm::ObjectNotFoundE,
daq::rm::UnavailableResourcesE, daq::rm::ComputerNotFoundE,
daq::rm::ResourceNotFoundE); Here p is partition ID, res_list is the list of resources ,
pid - PID of
process, and client is
the name of client (user account can be used). This interface can be
used in java to request resources in the input list for process
pid that to be run on computer c.
- string freeProcessResources(in
string p, in string c, in unsigned long
pid) raises( daq::rm::PartitionNotFoundE,
daq::rm::ProcessNotFoundE, daq::rm::ObjectNotFoundE );
Here p is the partition ID , c is the ID of the computer and pid is the process. All resource
that where granted for process that was running on computer c for
partition p will be released.
Changes in the RM_Client API
Added:
- Requests resource for process. This method got process id
automatically from the system and this value is used in the RM server
in resource request and stored in the RM in case of resource was
granted. This value can be used to free resource or get information
about it. The caller of this method is responsible to free resource
after use. Signature:
long requestResourceForMyProcess(
const std::string& partition, const std::string& resource,
const std::string& computer,const std::string& clientName =
"NO");
Here partition is the id of
the partition, resource is
resource ID, computer is UID
of computer and clientName is
the name of the client (user account can be used).
<> Free resources of the process. Resource that was granted
for
process with the same partition, resource and computer UID will be
released in the RM and can be used for other clients. Signature:
void freeProcessResources( const
std::string& partition, const std::string& computer, unsigned
long pid );
Here partition is the id of
the partition, computer is UID
of computer and pid is child
process pid (those for which resource of requested in partition).
<>
Removed:
- void registerPartition
(const std::string& pname, const std::string& dbname);
Please instead of this method use
one of methods below:
void registerMyPartition (const std::string& pname,
const std::string& dbname);
void registerMyPartition (::Configuration *config, const
daq::core::Partition *p);
<>void updatePartition(
const std::string& p );
Please instead of this method use:
void updateMyPartition (const std::string& pname, const
std::string& dbname)
RM_Server changes
RM_Server upgraded in order to
implement new RM_Client and RM_ClientExt classes functionality.
TODO Preparation to store/restore RM_Server data in backup
using DAL possibilities done but to be finished and added to release as
patch.
rm java package changes
API
changes (RMClientImpl interface) :
Added:
- Register partition by loading DB configuration data on client
side: public void
registerMyPartition(String pname, String db) throws
config.SystemException , daq.rm.ConfigException,
daq.rm.PartitionAlreadyLoadedE, RuntimeException Here
pname is the name of partition and db is the Configuration DB name.
- Update partition by loading DB configuration data on client side:
public void
updateMyPartition(String pname, String db) throws
config.SystemException , daq.rm.ConfigException,
daq.rm.PartitionNotFoundE, daq.rm.ObjectNotFoundE, RuntimeException
- Method to check if
RM server is running (return TRUE if running): public boolean rmServerExists()
<>Method to check if
partition is registered (TRUE if registered): public boolean partitionRegistered(String
p)>
Changed:
- daq.rm.PartitionNotFoundE
exception added in method public
String getPartitionStatus(String p) throws daq.rm.PartitionNotFoundE,
RuntimeException
Removed:
- registerPartition(String
pname, String db) . Use registerMyPartition instead.
- updatePartition(String
pname, String db) . Use updateMyPartition instead.
RM GUI panel changes:
- pid and number of cores for calculative resource
added in info panel.
- rdb server list removed from register and update partition panels.
- request resources panel push button enable after paste by mouse
fixed.
- partition info panel now is visible first by default when RM GUI
activated.
Executables changes:
- rm_free_resources_for_application:
Parameter process pid added: -e process_pid Utility
uses to free RM resources that were granted to binary to run in the
computer. If process pid is not empty then process resources will be
released.
- rm_request_resources
2 parameter for tests usage added:
-t !!! Only for tests!! Request resources for
process that will be started by this binary.
-r
resource_id !!Used only for test with 't'
parameter. UID of the RM resource to be requested for process
rn
List RunNumber Info
Add new command line option for rn_ls utility to present result in
user-preferred time-zone and to use it as timestamp inputs (-s and -t
command line options).
-z name time zone name (run with "-z list-regions" to see all supported time zones)
Example
Get info in UTC:
bash$ rn_ls -c oracle://devdb10/tdaq_dev_backup -w onlcool -s '2008-04-14 00:00:00'
=======================================================================================================================================
| Name | Num | Start At (UTC) | Duration | User | Host | Partition | Config | Comment |
=======================================================================================================================================
| nightly_check | 27326 | 2008-Apr-14 10:15:38 | 0:00:04 | mcaprini | pcatd114.cern.ch | igui_test | 21.39 | Clean stop of run. |
| nightly_check | 27325 | 2008-Apr-14 06:22:10 | 0:00:02 | atdsoft | pcatdbuild04.cern.ch | dummu | | It is just a test. |
| nightly_check | 27324 | 2008-Apr-14 02:39:12 | 0:00:02 | atdsoft | pcatdbuild04.cern.ch | dummu | | It is just a test. |
=======================================================================================================================================
Get the same info in CERN local time zone:
bash$ rn_ls -c oracle://devdb10/tdaq_dev_backup -w onlcool -s '2008-04-14 00:00:00' -z 'Europe/Zurich'
============================================================================================================================================
| Name | Num | Start At (local time) | Duration | User | Host | Partition | Config | Comment |
============================================================================================================================================
| nightly_check | 27326 | 2008-Apr-14 12:15:38 CEST | 0:00:04 | mcaprini | pcatd114.cern.ch | igui_test | 21.39 | Clean stop of run. |
| nightly_check | 27325 | 2008-Apr-14 08:22:10 CEST | 0:00:02 | atdsoft | pcatdbuild04.cern.ch | dummu | | It is just a test. |
| nightly_check | 27324 | 2008-Apr-14 04:39:12 CEST | 0:00:02 | atdsoft | pcatdbuild04.cern.ch | dummu | | It is just a test. |
============================================================================================================================================
The "Run Number" Web
interface was updated to support the user-preferred time zones.
RODBusyModule
Introduction
This package contains RCD Software for the RODBusyModule. Please see for further details.
General changes
ROSApplication
Introduction
This package contains the main program and configuration plugins for
the ROS / RCD.
Changes since tdaq-01-09-00
- In ROSApplication, we don't deal with emon or
MonitoringDataOut anymore. This is all ahndled by the EMonDataOut
plugin. This means the command line options -k and -v are no
longer applicable. In this release a warning will be issued if -k
or -v arguments are given on the commandline but they will
otherwise be ignored.
- In ROSDBConfig we now allow a configuration with no TriggerIn
plugin (i.e. don't invent a NullTriggerIn if there isn't a plugin
specified in the database.
- In ROSDBConfig we now allow for 3 types of output plugin and
provied a default configuration for the "sampled" output
- In ROSDBConfig we now allow any type of Resource to be
contained by a ReadoutModule, just ignore those that aren't
InputChannels instead of throwing an exception.
ROSCore
Introduction
The ROSCore package is the heart of a ROS/RCD implementation. It
includes the IOManager class that some people like to use as the name
of the whole application.
Changes since tdaq-01-08-00
Changes to the structure of the package
- New classes FragmentRequest and
FragmentBuilder have been added.
- The UserActionScheduler, SequentialInputHandler and associated
classes have been moved here from ROSModules
- The ReadoutModule base class has been moved here from
ROSModules
Functional changes in IOManager
- A new interrupt handler, the RobinInterruptHandler is now
instantiated and has its state transition methods called by
IOManager.
- The transition to using proper IS objects has been completed
and the getInfo() method is no longer called for any plugin.
- Up to 3 different DataOuts are now instantiated.
A new class FragmentRequest has ben provided which handles all
the types of data request formerly handled by subclasses of
DataRequest in the ROSIO package. It takes arguments to specify
whether or not the requested event fragment should be passed to the
monitoring system, whether the data should be cleared after the
request etc. The DataRequest class is still available but users are
encourgaed to base any new Requests that involve gathering data on
the FragmentRequest class (assuming it is not sufficient as it is).
Instead of the buildFragment() method that was
implemented by each subclass of DataRequest the FragmentRequest
calls a simple interface class BuildFragment (also in this package)
which is subclassed according to the type of Fragment being
retrieved from the ReadoutModules (it always builds a
FullEventFragment).
There are now 3 instead of 2 types of DataOut, MAIN, SAMPLED
and DEBUGGING. Pointers to each are obtained via the static methods
main(),sampled() and debug(). The getSampled() and releaseSampled()
methods have been dropped, any mutex protection required should be
implemented internally within the DataOut implementations.
rose
General changes
-
Update to new dcmessages API.
-
A single handler, the DataReqHandler, now handles requests
from both LVL2 and EB(SFI).
-
All Resources (monitor objects) in the ROSE are sent to ERS_LOG
during the STOP transition.
-
Replace obsolete ERS macros with package specific
ERS Issues or the ERS_LOG macro.
-
Optionally the ROSE can emulate errors or delays for LVL2 requests.
For details on how to enable these options,
see the release notes for tdaq-01-08-04 under DFConfiguration
with the heading 'ROS emulator'.
ROSEventFragment
General changes since
release
tdaq-01-08-00
- General update to version 4.0 of the event format
- The ROS and Subdetector fragments have been removed
- FormatVersionNumber updated to 0x04000000 (except for ROD
fragments which still use format 0x03010000)
- Dummy selectionCriteriaMatch() method added to the base class.
This method will be filled with code later. Application: Monitoring of
events based on user defined event properties
Changes in API
- The classes ROSFragment and SDFragment have been removed
- Some methods in FullFragment.cpp have been modified
Known bugs, problems and limitations
ROSfilar
Introduction
The documentation for this package is kept in EDMS.
General changes since release tdaq-01-07-00
- filar_slink_dst.cpp: flag added to ignore errors in status
word
Changes in API
None.
Known bugs, problems and limitations
Currently none.
ROSInterruptScheduler
Introduction
Contact markus.joos@cern.ch if you need documentation for this package.
General description of this new package
- RobinInterruptCatcher: This class implements a stand-alone thread
which handles interrupts sent by Robin cards in case of the reception
of a corrupted fragment. The data of the corrupted fragment is
retrieved from the Robin and sent to a monitoring stream
- InterruptCatcher: This class was moved from ROSModules. It
handles VMEbus interrupts in a RCD context
- InterruptHandler: This class was moved from ROSModules. It
implements the interface between InterruptCatcher and RCD user code
- UserActionScheduler: This class was moved from ROSModules.
It implements a thread that calls user code in user-defined intervals
- ScheduledUserAction: This class was moved from ROSModules. It
implements the interface between UserActionScheduler and RCD user code
Known bugs, problems and
limitations
Currently none.
ROSIO
Changes since tdaq-01-08-00
All plugins have been updated to provide statistics through specific
ISInfo classes.
Trigger Plugins
- DcTriggerIn updated to reflect changes in dcmessages.
- All TriggerIns that make data requests have been updated to use
the new FragmentRequest class (see ROSCore).
- FragmentBuilder classes are provided for the TriggerIns that
make data requests
- All the old style data request classes have been removed
Output plugins
- DcDataOut no longer checks for SFI destination type since the
same data message is now used for both L2 and EF
- EmulatedDataOut no longer writes data files, it is a pure
emulation, only delaying by the specified amount.
- A new FileDataOut has been impelmented to write data to files.
- The MonitoringDataOut has been re-implemented as EMonDataOut
ROSTestDC
- Updated to use the new dcmessages
- Adapted to new event format (partially)
- Data checking has been disabled since this feature has not
been updated to handle the new event format
ROSModules
Introduction
The package is described in EDMS_IOM and EDMS_RCD.
General changes since release TDAQ-01-08-00
- Some classes that were originally located in this package have
been moved to other packages (ROSCore, ROSInterruptScheduler)
- Generally: Introduction of the new interface to ISInfo
- PreloadedDataChannel: Update for eformat 4.0
- RobinDataChannel: getFragment() returns an empty fragment
(instead of "0") on "may_come"
- Bug fixes
Known bugs, problems and
limitations
Currently none.
ROSRCDdrivers
Introduction
The documentation for this package is kept in EDMS.
General changes since release tdaq-01-07-00
- robin.c:
- bug fixed wrt to driver unloading and /proc/interrupts
- some other bugs fixed
- IRQ support added
- vme_rcc.c:
Changes in API
None.
Known bugs, problems and limitations
Currently none.
ROSRobin
Introduction
If you need documentation for this package please contact
markus.joos@cern.ch
General changes since release tdaq-01-08-00
- Globally: Support for interrupts added
- robin_irq_catch: This new application was added for the testing
of the interrupts from the Robin
- The CRC word now is in the trailer of the ROB fragment. Therefore
status word #2 in the ROB header is the L1ID of the most recently
received event
Changes in API
- Robin.cpp: method waitForInterrupt() added
Known bugs, problems and limitations
Currently none.
ROSslink
Introduction
Please contact markus.joos@cern.ch if you need the documentation for
this package.
General changes since release tdaq-01-07-00
- Change related to version 4.0 of the event format
Changes in API
None.
Known bugs, problems and limitations
Currently none.
ROSsolar
Introduction
Please contact markus.joos@cern.ch if you need the documentation for
this package.
General changes since release tdaq-01-08-00
- Changes related to the new event format (4.0)
- quest_test: some bugs fixed
Known bugs, problems and limitations
Currently none.
RunController
Introduction
This package contains the Run Control for the TDAQ. It provides two
C++ interfaces which allow the user to introduce case-specific actions
carried out when a command is received: UserRoutines.h and
Controllable.h. The two interfaces have completely different purposes!
A developer shall extend from UserRoutines.h only if he is customizing
a controller at an intermediate level of the run control tree (i.e.
with child applications). He can create UserRoutines objects to be
called
at the corresponding state-transitions and commands before and/or after
they are processed by the Controller itself (i.e. transmitted to the
children).
A developer of leaf applications (run control applications without
children) shall only extend the Controllable.h interface. He creates a
Controllable object to be called
at the corresponding state-transitions and commands.
Changes from previous release
The most signicant change is the complete integration of the Setup
service into the Run Controller. In fact, the Setup server has
disappeared from the TDAQ.
Extension of the Finite State Machine
The startup procedure is modified. There is an extra state called
BOOTED (NONE -> boot -> BOOTED -> initialize -> INITIAL).
The infrastructure and run control tree are started during the boot transition, while the leaf
applications are started at initialize.
There is no reason to go back to the NONE state during a data taking
session, except if there was a major error on the run control tree. The
configuration can be relaoded up to (including) the INITIAL state.
On request from the monitoring working group the FSM has been extended
with 2 states (in the stop sequence): SFOSTOPPED and GTHSTOPPED.
The transition commands are the following:
EFSTOPPED->stopRecording->SFOSTOPPED->stopGathering->GTHSTOPPED->stopArchiving->CONNECTED.
Migration to the new Run Controller
In oder to migrate from the old Run Control to the new Run Controller
interface, users must follow these steps:
- In the requirements file (only applicable to TDAQ packages),
remove the statement 'use ClipsServer' and/or 'use
RunControl' and add instead 'use RunController'
- The namespace for all run control classes is now daq::rc::,
while it was a mixture of RC::, CS:: and rc::
in the past.
- The old Run Control headers must be replaced as follows:
- RunControl/Controllable.h --> RunController/Controllable.h
- RunControl/ItemCtrl.h --> RunController/ItemCtrl.h
- rc/UserRoutines.h --> RunController/UserRoutines.h
- rc/Controller.h -->
RunController/Controller.h
- RunControl/UserExceptions.h
--> RunController/UserExceptions.h
- In the databases, the Binary rc_empty_controller must be
replaced by run_controller. The script RCUtils: old2newRC.sh is
available to perform this action automatically.
- The Constructor of the Controller class has changed. The new
constructor needs the following parameters:
Controller (const std::string
& partitionName, const std::string & segmentName, const
std::string & parentName, const std::string &expertSystemRules,
UserRoutines * preUserRoutines, const std::string &
postUserRoutinesLib)
- The API of the UserRoutines class has changed:
- dropped controller method
- dropped send method
- dropped raiseError method (and ers::fatal message automatically
causes the controller to go into ERROR state)
- mout method dropped
- mrsReceiver method dropped
- noAction method dropped
- resetAction dropped
- clearError replaced by clearAction
- runParameter and updateRunParameter methods replaced by
getRunParameter.
Dealing with substates
Substates can be introduced into an intermediate Controller by
extending the UserRoutines class and passing it to the Controller as a
plug-in. More on this below. Substates are performed when the FSM
transition they
are associated to is 'finished'. Before the Controller transitions to
this FSM state it performs all the substates
(there can one on or more) by broadcasting them to the children and
waiting for these to acknowledge the termination
of a given state. Only when all the substates have been perfromed, does
the Controller tranistion to the original
FSM state.
For leaf applications substates are dealt with as before, as special
types of user commands.
Operation
The Root Controller is now in charge of doing the bootsrapping of
the system. When launching the setup_daq script, rc_setup is invoked to
start the pmsgerver, IPC server, RDB server and the IS servers: Setup
and RunCtrl. Once these applications are running, rc_setup exits, and
the setup_daq script launches the actual Root Controller. This two-step
operation is necessary
since the Root Controller requires of the above mentioned applications
to start running the remaining
Online Segment infrastructure applications. The main reason in fact is
that rc_setup reads the database configuration from OKS (since there is
not RDB server running). The Root Controller must however access and
subscribe to the database via RDB, and logically can only do so if this
service is running beforehand. The Setup and RunCtrl IS servers are
required to publish the status, state, test result, etc. of the
applications being controlled by the Root Controller and any other
controller.
This information is read by the IGUI to update the inforamtion
displayed to the user.
With this new philosphy, the setup server has been removed completly
from the TDAQ system.
Each controller is in charge of starting, stopping, distributing
commands to and reacting to errors from its children (other controllers
or applications). If a Controller dies and is restarted, it goes into
the state of its children (if they are all in a consistent state); if
there are no children, it goes through all required state transitions
in order to reach the state of the parent. This mimics the behaviour of
applications that extend from the Controllable interface.
Errors
The RunController hardly takes any decisions when errors occurred.
Instead,
the corresponding error are forwarded to the Expert System (Online
Recovery) which
takes a decision accordingly. This decision currently depends on the
settings defined on the configuration database. Three fields are
available to the user to define the behaviour: IfFails, IfDies,
IfError. Note on restart: If the option restart is selected
for any of these fields and the restart fails repeatedly, it will cause
an error instead.
Timeouts
There are currently two different timeouts:
Action timeout - used for transition and standard commands
(RESTART, STOP, etc).
For a controller the actual transition timeout used is its
action-timout plus the highest action timeout among its
children.
Short timeout - Used for killing applications.
Init timeout - After an application is started by a controller,
the former must send a pmg_sync to notify that it has started
correctly. Not doing this within the init timeout,
will cause an error on the controller side. This implies that the init
timeout should always be LESS than the ACTION timeout). This
synchronization is handled internally for all
RunControl applications and should not be done in the user code.
Substates
Substates are performed before the Controller transitions to a given
FSM state, that is, when all its children
have successfully moved to that state. Before the Controller actually
transitions, it propagates the user-defined substates to its children.
When all these perform the action associated to the substate, the
Controller will either
propagate the next substate if any, or terminate the original FSM
transition.
Substating is possible by extending the UserRoutines class. The user
must generate a library and pass it
to the Controller via the command line argument -u. This
facilitates the development of the substates
logic as there is no need to recompile the controller code and allows
the loading of any substate plugin by simply changing the Parameters
field of the Controller in the configuration database.
The example RunController/examples/ExampleSubstate.cc can be
taken as a starting point. In order to define substates
for, let's say, transition Configure, the user must extend
daq::rc::UserRoutines::configureAction as follows:
bool MySubstates::configureAction()
{
if (true == currentSubstate_.empty()) {
currentSubstate_ = "configureSubstate_1";
}
else if (0 == currentSubstate_.compare("configureSubstate_1")) {
currentSubstate_ = "configureSubstate_2";
}
else {
currentSubstate_ = "";
}
doSubstate(currentSubstate_);
return true;
}
In this example MySubstates inherits from daq::rc::UserRoutines.
The user must keep
track of which substate is currently being handled and the order of
substates if there are more than one. In the
example above, we use the string currentSubstate_. The method daq::rc::UserRoutines::doSubstate(const
std::string& substate) is the interface to the Run Controller
to pass the next substate name. When the last substate is processed,
the user must call doSubstate("") with an empty string.
This informs the Controller that substating is finished.
Known issues/bugs
None
To be implemented
The Controller that launches the initial applications in the
rc_setup binary is reactive to the
Online Recovery. In other words, it waits for the latter to 'tell it'
what to do: start, test, exit, etc.
However, the Expert System should be reactive and not active.
Therefore, it should be the controller in the rc_setup the one that
drives this initial bootstrapping, updating the Online Recovery as
necessary.
Example applications
None exist at the moment.
Applications
rc_setup
Description: Binary to start the basic infrastructure needed by the Root Controller.
Options/Arguments:
-p partition Name of the IPC partition (default $TDAQ_PARTITION)
-d database Name of the database (TDAQ_DB)
-s segmentname Name of the segment
-n controller Name of the controller
-R expertSystem Expert System library name (without the 'lib' prefix or the '.so' extension, followed by the arguments if any.
run_controller
Description: The RunController is a general purpose control entity for the ATLAS Online infrastructure.
Options/Arguments:
-p partition Name of the IPC partition (default $TDAQ_PARTITION)
-P parentname Name of the parent
-s segmentname Name of the segment
-n controller Name of the controller
-u substates Library with the substates definition. The library name has to be given without the 'lib' prefix nor the '.so' extension.
-R expertSystem Expert System library name (without the 'lib' prefix or the '.so' extension, followed by the arguments if any.
rc_test_controller
Description: Test unit for rc_controller binaries.
Options/Arguments:
-p partition IPC partition name (default $TDAQ_PARTITION)
-n controller Controller names
setup
Changes since
tdaq-01-08-04
The package is phased out in tdaq-01-09-00.
The setup functionality (testing of the h/w, initializing and
testing of the partition infrastructure and PMG servers) is merged with
main run controller functionality. Now each controller is responsible
for setting up and testing it's segment and partially it's child
segments h/w and ifrastructure. Root controller is responsible for
"setup segment" partition-wide infrastructure.
In IGUI, infrastructure panel is now added to each controller in the
tree.
The command-line interface to setup_daq remained unchanged, but
there are some internal changes.
Setup segment changes
Setup and setup-initial segments are moved from the relese installation
area to the database area. Now partitions must include file
daq/segments/setup.data.xml
SFI
Introduction
Sub Farm Input (SFI) is an application of the Event Building system. It
uses the DataCollection framework for error reporting, monitoring, state
transitions, message passing and etc. Only code specific to the SFI application
is in this package. There is no API, no other code should depend on this
package.
The tag of the SFI package for release tdaq-01-09-00 is v4r8P0.
Major changes since the previous release (tdaq-01-08-04):
- Partial Event Building is implemented: the SFI builds events given
a list of subdetectors and a list of ROBs (let's call them "partial-EB lists").
These lists are filled by the L2PU and are passed to the SFI via the event-assingment
data-collection message from the DFM (the dcmessages package was heavily changed
in this release, in order to accomodate our needs).
The model is that,
- If the "partial EB" lists are empty, then the event is built fully.
- Events which are tagged as Physics events are also always built fully no matter if the
associated partial-EB lists are empty or not.
- The only events which are actually built partially are events with Calibration
tags only.
- The SFI makes sure that the partial-EB information is transported downstream (to the EF)
so that the SFO can do "event stripping" (= stripping an event of unwanted ROB framgnets,
before writting the event to disk).
To do this, the SFI creates DAQEvents instead of eformat FullEventFragments as before.
A DAQEvent is made out of a DAQEventHeader and a FullEventFragment, where
the DAQEventHeader holds (among other things) the partial-EB lists (see previous item).
- Each DAQEvent is passed to the EF and then to the SFO; thus, the SFO can eventually
do event stripping.
- But, an event written by the SFI to disk, or given to a monitoring task is always an
eformat FullEventFragment, as before.
- NOTE: An eformat FullEventFragment is made out of a flat list of ROB fragments in this release; no SubDetectorFragment, no ROSFragment in an event.
More detailed description of changes (for each SFI tag) can be found in doc/ChangeLog
sysmon
General changes
-
Add get_all_resources function.
Helper for applications that wants to print
all resources, whether exported to IS or not.
Typically used during the STOP transition.
-
Replace obsolete ERS macros.
-
Add the "hostname" to PROC Resource.
sysmonapps
General changes
-
Replace obsolete ERS macros with with package specific
ERS Issues or the ERS_LOG macro.
system
Changes since tdaq-01-08-04
- Now all exceptions derive from the System::Exception class.
tmgr
Changes since tdaq-01-08-04
Test Repository schema changes
New abstract class TestableObject
was introduced in the core schema. All testable classes should be
inherited from this class. Currently the following classes are
inheriting: HW_Object, SW_Object, BaseApplication.
In the test repository schema, there are related changes in the way
instances of classes Test4Class and Test4Object are associated with
testable objects:
Test4Object has new relationship 'objects' 1..N
to TestableObject class. This is replaced attribute 'object_id' list
of scrings.
In Test4Class class, the attribute 'class_name'
changed the type from string to the
new OKS type which is class type,
so now one can select only the classes of the loaded schema as the
value for this attibute.
See the changes in schemas here:
https://pcatdwww.cern.ch/cmt/releases/nightly/tmgr/data/schema/TestRepository.ps
Actions for end users
The instances of existing tests must be adopted to the new schema.
For Test4Object instances the changes are as following:
- <attr name="object_id" type="string" num="1">
- "Computer@lxplus066.cern.ch"
- </attr>
+ <rel name="objects" num="1">
+ "Computer" "lxplus066.cern.ch"
+ </rel>
For Test4Class instances, the type of ''class_name"
attribute shall be changed from "string"
to "class"
- <attr name="class_name" type="string">"Application"</attr>
+ <attr name="class_name" type="class">"Application"</attr>
API changes
New method for testing Applications is added
std::string TestApplication( ConfigObject& obj,
const std::string& stdOut,
const env_map* extra_env,
const std::string& app_id, .... )
The difference w.r.t. TestObject is that additional information like
extra environmetn and app_is is passed to Test Manager, which is not
possible to retrive via generic ConfigObject pointer for templated
(i.e. non-DB) objects.
This method is used from rcdal::Application::test() method.
New features
Testing of Templated (Infrastructure) Applications is implemented. The
environment which is passes to the test process is extended with the
environment of the testable Application. Thus, test can read the
environment which is set by dal algorithms for Templated objects in a
Segment and get i.e. RDB database name via environment. For example
parameter for the test for RDB server is now configured as
-p #partition -a #this.SegmentProcEnvVarName
and then test internally reads the environment (e.g. TDAQ_DB_DATA) to
get the name for a particular instance of templated RDB server under
test, in the same manner as the rdb_server binary is doing that.
Known bugs, problems and limitations
Changes forseen for next major release
Merjing of test DAL schema with core schema/igui.
training
Introduction
A new version of the Training Manual (Version 3.2) is available at:
$TDAQ_INST_PATH/share/doc/training/training_doc.pdf
General changes
- The controller exercise has been modified according to the Run
Control changes.
TriP
General changes
- Farm status implemented.
- The farm structure is being read out from the dedicated server names and object names. For L2 after server iteration, from L2PU and L2SV servers, from the object names the segment n, subsegment m and L2PU id are being read. For EF from EFRack servers, from the object names the rack n,m, EFD and PT number are being read. The interesting attribute values are read from the object types, for L2 PROC (IntervalCPU, TotalVirtualSize), L2PU (LVL2IntervalRate, Errors) and L2SV (IntervalEventRate, errors), for EF PROC and PT (EF_decisions, ConfigurationErrors+OtherErrors).
The values are being read out and updated on the widget's buttons and in the history graphs.
The regular expression matches are defined and read from the configuration file.
To be implemented
TTCviModule
Introduction
This package contains RCD Software for the TTCvi Module. Please see for further details.
General changes
vme_rcc
Introduction
The documentation for this package is kept in EDMS.
General changes since release tdaq-01-08-00:
Changes in API:
None.
Known bugs, problems and limitations:
Currently none.
wmi
Introduction
Web Monitoring Interface, one of the software components of the TDAQ Software sub-system of the ATLAS TDAQ, is intended to give to remote users a view of the status of the data acquisition system and its sub-systems.
New features
- WMI library can provides new WMI classes which use java scripts.
- WMI library provides new class called TimeGraph. Objects of that
class can create time graph. User can define min and (or) max time of this graph and this graph will be dynamically repaint.
- Small changes in plug-ins (only for compile using new release).
Generated: Mon Jun 23 14:33:47 CEST 2008 by /afs/cern.ch/atlas/project/tdaq/cmt/adm/bin/do_release_notes (c)