Architecture review

Home	HELP	Collaboration	Subsystems	Physics	Documents

General
Search
News
Email
BWHO
Forthcoming Events
Site Map

Subsystems
Vertex Detector
Trackers
RICH
Calorimeters
Muon
Trigger
Computing
Electronics
Expt. Area
Magnet
Infrastructure
Detector Geometry
Test Beam
Gas

Computing
Project Planning
SICB
Simulation
Reconstruction
Analysis
DAQ
Controls
Operations
Comp. Systems
Sw Components
GAUDI
Sw Support
Readout unit
GRID

Recorder's notes from LHCb architecture review

Introduction (John Harvey)

Vincenzo: What is role of prototyping?

Pere: We have a dilemma. It is nice to prototype, but we have to produce code for subdetector developers. We chose to freeze definition of components and interfaces, individual components can be prototyped later.

Vincenzo: We don’t know if certain collaboration patterns crucial to architecture will work, because we are not experts.

Lassi: A different approach can be to implement most "risky" aspects first (this is the "incremental approach" of Atlas, not prototyping).

Dirk: Does it make sense to have parallel developments (i.e. by the physicists) if we know that interfaces will change?

Pere: We will implement by year end empty components to test the architecture.

Project scope (Pere Mato)

Christian: Which of the old SICB stuff do we want to interface to the new software, and when?

Pere: The intention is to interface the data, plus some wrappers around existing things.

Christian: Encouraging physicists to take part from the start has great implications on design.

Pere: That is why we want to get architecture basically right now, before releasing the framework.

RD: Interfaces are not everything, physicists have to be taught to think object oriented, it’s a long process, Lassi: Learning good design takes a long time, people should exposed to well designed class libraries to speed up the learning process.

Thomas: Physicists may also be convinced by quality of software (performance, usability, debuggability, documentation etc.), they need to see the benefits in order to justify the pain.

RD: Execution speed is not everything, advantages of OO may be less tangible or more long term.

Dirk: How do we ensure that current requirement to cohabit with FORTRAN world now is not influencing architecture design in ways that will be irrelevant when we are in purely OO environment?

Dirk: Separation of data and algorithms implies that data object has to expose its implementation - does this not break encapsulation?

Pere: Role of data object is to store its own data, plus methods that manipulate internally this data, but not interactions with other objects.

TO BE DISCUSSED: scenarios where e.g. track data object has to interact with its environment in order to calculate its trajectory

Lassi: We are missing one type of data: intermediate data that will not be saved. Would like to discuss architectural choice that intermediate data is also event data which one chooses not to save.

Vincenzo: Many (different) physics algorithms cache data inside algorithm in order to speed up the algorithm (e.g. in FORTRAN take data out of COMMON block into local variables to speed up nested do loops). Given that this copy will be done in any case, is transient data just another copy that will be redundant?

Lassi: Do we decide to fix just ONE transient representation of a given data object - this is a major architectural choice to be clarified.

Pere: It is up to user to decide on transient representation, to try to put some order in the private caching.

Lassi, Dirk: Should algorithms not know about persistent storage, or not use it? There is a major difference. If there is a one-to-one mapping of transient to persistent for a given class, converter may not be necessary, it may be sufficient to include the appropriate header file in the algorithm. Requires recompilation. Does not allow use of two stores at the same time.

Lassi: Name spaces could eventually be used for this (does not work now)

Pere: Could be addressed by "generic converter", while still keeping converter hook for later more specialised use.

Lassi: Change terminology: "data store centred" rather than "data centred" architectural style.

Lassi: Be careful that there is not global knowledge that gets passed as global variables in order for algorithms to communicate to each other. How do algorithms know whether data they require is in the store? If it is not there, who is responsible for loading it from storage, or for creating it?

Lassi: Encapsulation of user code, creation of placeholders. Be careful to make sure we identify all necessary placeholders, what would it cost to add some later?

Lassi: Use English in class and service names, not abbreviations.

Lassi Do we foresee to store history of which algorithm etc. created a given object - is it stored with object or is it metadata (Vincenzo thinks the latter).

Pere: Architecture allows both, algorithm developer can choose.

Dirk: How are links between multiple persistent stores handled? Are there real pointers?

Pere: There are pointers internally in a given store, but across stores links are handled via conventions (naming etc.).

Vincenzo: Even if links are logical (e.g. identifiers) there is a coupling between the different stores in the logical data model.

Dirk: This is fragile because some of the knowledge is inside the code, not in the data model.

Lassi: It introduces global knowledge (global invariant) which has to be very well documented. Everybody who writes code needs to know all the invariants to ensure consistency.

Vincenzo: What is common between services (what abstraction does service correspond to?). What is role of IService base class?

Pere: Algorithm manager uses IService to keep references to created services.

Vincenzo: This is not specific to services, it is common to any object that can be identified by its name. Could use singletons.

Pere: There are no static variables - there could be many message services in the same architecture.

RD: Is there a problem with this choice?

Vincenzo: No.

Lassi: Name of a service implies its type - more global knowledge!

Pere: However this is under the control of the framework (encapsulated in Algorithm base class): all basic services are there - algorithm needs specific knowledge only if it asking for an esoteric service.

Vincenzo: " The transient event store IS the event data service" because it owns the data in it.

Lassi insists: write list of all things that need to be known to everybody. Throw some out when list gets too long.

RD,Vincenzo: Iproperty should be able to deal with arbitrary properties, not just basic types.

RD: It would be useful to expand some interfaces (e.g. Iinterface).

Pere: Iinterface stores dynamic type information and reference count.

Lassi challenges this OLE/ActiveX way of passing dynamic type information. Easy to write, sure it works, but is it what we want?

Vincenzo argues that it could be done in a typesafe way.

Pere explained that when Algorithm wants a given interface of a specific service, Applmanager returns Iservice interface, which is cast by algorithm to the requested type. Application never calls Iqueryinterface, this is done by the ApplManager.

Lassi: Remove references to Iqueryinterface in the documentation, he was confused, thought it could be used directly.

Transient store

Lassi: Do we structure objects hierarchically, or is this a logical hierarchy (symbolic links)?

Christian doesn’t see why it is required to have physical containment, rather than just logical.

Lassi: This model applies well to raw data, but not to random collection of analysis objects.

Pere: We have learn what the difference is between the two.

Vincenzo: Can an object be in two different folders (i.e. soft links) - if yes, does it have a reference count? What happens when one of the two folders is deleted?

Dirk does not understand why we need naming scheme rather than just pointers of the language.

Pere: Loose coupling of algorithms: creator of new data does not know about other algorithms, so cannot pass pointers.

Dirk: Buy loose coupling at expense of global names.

RD: In dataflow slide (17), application programmer wants to see only apparent dataflows.

Pere: this is achieved at configuration (ideally user interface allows user to configure by joining algorithms with their dataflows)

Vincenzo thinks apparent dataflow is irrelevant.

What happens if some of the data is on the persistent store? Knowledge is needed at time of configuration.

Vincenzo would like some kind of reconstruction on demand (if data is there, use it, if not produce it).

Lassi does not favour this, thinks user should know what he is doing.

Lassi: How fault -tolerant against bugs: does it fail at compile, link or run time if a data input is missing?

Christian assumes that there are debugging mechanisms that allow querying state of the system.

Lassi: What about optional data? E.g. tracking may work without vertex information - if it works whether it is there or not, and it may or may not find it, results are unpredictable.

Discussion on abstraction. High level abstraction for framework, but keep it simple for the physicist.

Lassi believes there are high abstraction models that can be simple. E.g. a URL is simpler than a file name, even though it is more abstract. Do we want quick solution (lower abstraction) or higher abstraction (takes longer).

Christian: if we do not get fast solution at least partly right, risk is high.

Afternoon session

Questions:

Should algorithms have on their interface what their inputs and outputs are?

Properties initialised during algorithm’s initialising state

Inputs and outputs are not explicitly on the interface. Could be asked for as properties.

Two aspects to question: knowledge needed by physicist when connecting algorithms together, and knowledge needed by developer

John: Surely algorithms only know about the data they are transforming. It is the application builder who has to understand the collaboration between algorithms.

When is data type checking done? Is it done at compilation (defined in the interface) or at run time?

Lassi and Pere both favour run time discovery, but there is advantage in describing the inputs and outputs explicitly in the header file. Just a psychological issue, not a technical issue.

Pere: Thinking of having a generic algorithm, into which you can plug a user algorithm (e.g. a generic event selection algorithm) - in this case, you do not know types of input and output.

RD: but then generic algorithm has no inputs and outputs, other than a standard communication path with the nested algorithm.

With many instantiations of a given algorithm, name of created objects must be appropriate to the environment in which each instance runs.

But Vincenzo challenges this. Suppose a developer needs tracks (created by running algorithm B) to do vertexing. Another developer needs tracks (created by another instance of B) to do electron id. When you combine the two in one job, do you run B twice even though they are identical?

Pere: algorithm B could check whether its output already exists in transient store.

Vincenzo agrees: need to check that output of B exists, created by B with correct properties.

Pere: this intelligence could be put in parent A

Do we believe that this scenario affects the architecture?

Christian thinks that the lifetime of the intermediate data is important.

Pere explained mechanism for clearing store and deciding what is stored back to persistent store.

Lassi: How does user know what to mark for storage? What if an algorithm puts a temporary object in a subtree that is marked for saving?

RD says that it is not clear in architecture who is responsible for deciding what to save.

Vincenzo: decision on what to save may be need to be made at run time based on some event selection criterion (c.f. level 3 rejects in Aleph)

Vincenzo,Dirk: Saving relationships is a real problem. Suppose you have hits linked to segments and segments linked to tracks. You only want to save tracks and hits. How does converter preserve the links without the segments?

Dirk: How does S-PD-1 scenario work? How do you have private repository?

Pere: Copy of event root is needed.

Vincenzo: If you read something from official persistent store, refit a track and then save it, how do you know that only this refitted track has to go the private repository? If refitted track is in a different transient store directory, how is directory switching done? (beware RZ directories!). Agreement between Pere and Vincenzo that cd in the store is a problem, algorithm has to remember his own state

Lassi: If an alternative algorithm B’ has to be tried instead of B and outputs compared, but only for one of several instances of B in the job, is that possible?

Vincenzo: If a data type is ALWAYS created by a specific algorithm, is the fact that the data is produced by that algorithm a property of the data? Should it be known that, if the data is not there, the creating algorithm should be run?

Should algorithms always communicate via transient store, or can you have direct communication bewteen algorithms?

- Is there another class of data (intermediate data)

Pere: One mechanism can be the friend mechanism in order to couple two algorithms if using the transient store is too much of an overhead.

Vincenzo says it is dangerous to make such an exception.

RD asks why transient store is needed at all for intermediate data that you know no-one will want to save

Lassi agrees. But is worried that, from the outset, there is a workaround to the architecture.

Vincenzo asks whether sharing of data between deeply nested subdetector algorithms is of concern for the architecture - do not want to micromanage.

Pere agrees but says this is a worst case, architecture does not forbid it, but gives mechanism for doing things in a more structured way.

However (Vincenzo) transient store should be seen as something to use when a high degree of data reuse is necessary, should not put any old crap there. Pere agrees.

RD: transient store is appropriate if algorithms sharing data are distant in the execution sequence. If they are next to each other, a direct link is more appropriate.

Vincenzo says that everything on transient store should be saveable.

RD: Then everything on transient store must have a converter. Then it is more obvious that we need another mechanism for passing info between algorithm (some kind of pipe).

Christian has an example of a shell-like mechanism in a tool: algorithms are like shell commands, combined with pipes etc.

Vincenzo: concrete examples are needed for answering these questions.

Lassi: Think about transmission of global knowledge (data types input and output by algorithms). Also, he fears that forcing intermediate data in transient store may reduce reusability

How do you transfer a web (or tree) of objects from persistent store to transient model.

- Should algorithms access persistent storage directly or through a copy in transient store

- How are relationships managed between objects in transient store?

- Converters

Vincenzo: Objects to a track being only 10 numbers: there is also info about the hits on the tracks

Vincenzo: To store hits and tracks, hits have to be stored first. How does converter of track know how to convert C++ pointer to hit into a persistent reference? It knows the hit identifier in the transient hit set, but how do you map from transient hit set to persistent hit container?

Vincenzo gave a complex example to show that there is a huge overhead if an algorithm which needs the hits of one track has to request to the data service to complete a transient track with the pointers to its hits. The track should know how to complete itself. Alternative is a smart pointer - but smart pointer must come from persistent store, so must be in design of persistent store from the beginning. But then transient model is not transient any more, it becomes persistent aware, and one pays the price that persistent world is no longer clearly separated.

Should list extra management complications introduced by choice of having a purely transient store, and weigh the advantages against the disadvantages.

RD In Atlas, it is evident that transient model is essential for raw data, but have not decided for other data.

Vincenzo: if pointer is similar in size to the hit, why not store hit by value?

Lassi: it is too restrictive to absolutely forbid smart pointers.

Pere agrees, but must be careful of the consequences.

Lassi: be careful also of different lifetimes of objects in collaborations between objects.

How does architecture function in a distributed storage environment?

Change management: how to handle transition between two existing software systems?

What about fault tolerance?

Conclusion

John: Is basic conclusion that next step is to go ahead and prototype or should we do some redesign?

Lass is very positive that we could implement it, but we should be prepared to redesign many parts of it.

We should incorporate ideas from this discussion into our prototype design. Prototype should try to cover both the requirement to give something to the physicists, and also test the more tricky parts of the architecture.

Christian suggests that prototype should be compared to Lassi's system.

Lassi: worst mistake would be not to deliver. Better to deliver something incomplete that may have to be changed than to wait 6 months.

Vincenzo: Some of alternative options for architecture could be prototyped in parallel

RD wants discussion to continue at the level of LHC experiments, not just in context of LHCb review.

Vincenzo: very useful to discuss one concrete architecture in detail, with real experts who have implemented similar projects.