----------------------------------------------- LHCb: FRONT-END MULTIPLEXING / READOUT UNIT Internal Review Report ----------------------------------------------- Date: 25th July 2001 Place: CERN Reviewers: ---------- Jorgen Christiansen Hans Dijkstra Fabio Formenti Clara Gaspar Ph. Gavillet (Chairman) Markus Schulz Steve Wotton S.Schmeling (Secretary) Presentations ------------- http://lhcb-comp.web.cern.ch/lhcb-comp/daq/RU-review-documentation.htm 1. Introduction --------------- An Internal Review of the Front-End Multiplexing / Readout Unit was requested from within the DAQ Project. Its scope was to assess the current situation of both the FPGA and Network Processor based proposals and thus to provide valuable input to the future implementation choice to be made for the Trigger/DAQ/Computing Technical Design Report. A review of the scope and functionality of the FEM/RU unit and of the assessment criteria took place before the actual review, in order to ascertain a debate on already agreed objective criteria. It was repeated as introduction to the review. The reviewers would like to acknowledge the LHCb-DAQ and EP-ED teams for all the effort to prepare the review and to congratulate them for the excellent set of presentations. The general opinion of the reviewers is that both proposals have the required functionality and the requested level of performance. Section 2. discusses this appraisal in more detail and the cost issue. Sections 3. and 4. give the general appreciation of the current state of the FPGA and NP-based systems respectively and discuss the short term work program and the conditions of support from the lab. system integration, system commissioning at the experiment site, up to long term support in operation. This is followed by an enumeration of what can be considered as the strong / weak(er) points of each system. Sec.5 draws some conclusions and expresses a few recommendations to help making the final choice. 2. General ---------- The FEM/RU readout unit is devised to achieve data merging at the Front-End level (Front-end multiplexing: FEM), in the DAQ from the Front-End Level1 output buffers to the readout network and in the Level1 Trigger from the VELO sources to the Level1 network. The three applications differ by their input rate (40-100, 40-100, 1100KHz), event fragment size per input link (250, 250, 30-60 bytes), and output conditions (e.g blocking or not: N:Y:Y). This translates into specific bandwidth and buffering requirements in order to keep the various transfers essentially deadtimeless. In practice a RU unit sustaining a 50MB/s output data rate and with 1-2MB of buffer memory would fit the requirements. This is the case of both the FPGA and NP-based modules. The I/O port type depends on the data network standard. S-link is used up to the Level1 buffers and LHCb is in the process to adopt the optical link Gbit Ethernet technology downstream the data acquisition system. The S-link-Gb Ethernet interface is not yet available. - FEM: Input & Output: S-Link or Gb Ethernet. Slink is the default for the FPGA model. The NP version is by essence Gb Ethernet. - DAQ-RU: Input: S-Link or Gb Ethernet, Output: Gb Ethernet. The FPGA version needs a Gb Ethernet smart-NIC card as output interface. It could be implemented in the form of a standard PCI/PMC mezzanine card [1] or as an ordinary PCI card. The NP version has built-in the Gb Ethernet interface and therefore does not need a NIC interface. - Level1: Input: S-Link, Output: PCI-NIC(SCI). The FPGA version 2 has been developed to include the VELO-Level1 application (I/O port standard and availability of the tagnet traffic scheduler). The use of the NP version as Level1 data merger would imply to rethink the whole Level1 on the basis of a Gb Ethernet architecture similar to the readout one. This would represent a sizeable new financial and manpower investment. As for the ECS interface for control, configuration and diagnosis, both versions propose a host PCI PC-based implementation. The FPGA version plugs an home designed PMC-MCU card, however it can be substituted by a standard PMC-MCU commercially available device [1]. The NP version intends to adopt the standard LHCb commercial Credit-card PC (CC-PC). Both provide JTAG and I2C scan facilities. The programmability of both FEM/RU versions is considered to be (relatively) accessible although based on different approaches. The FPGA programming is based on the use of the advanced Hardware Description Language VHDL (IEEE 1076-1993) with code generation, simulation and debugging tools commonly used and supported at CERN, in particular for FPGA programming. The NP benefits of an elaborate development environment comprising assembler language, simulator, debugger and profiler tools. As for the potentiality to adapt (rather) easily to a new environment e.g new transport protocol, it is felt that the level of versatility and diagnosis capability of both versions is adequate. As a general appreciation, it appears that both FEM/RU systems are technically feasible. The FPGA version has been developed in the last 3 years and now exists in its almost final implementation. Five units are available. The NP version aims to be a generic data merger module implementing a new, rather promising technology. This naturally encompasses some risk. Its design is taking advantage of IBM simulation tools and hardware reference kit whose functionality is close to the needed one. The production costs of both versions were presented at the review but they could not easily be understood and extrapolated to 2003. Therefore a specific query was addressed, after the review, to H.Muller and B.Jost to provide cost estimates, based on a commercial offer, at today & 2003 prices, for quantity over 100 units, for a commercial production including standard and functional tests. Design, development and tooling costs were asked to be quoted separately. The answers are summarised in sec. 3 & 4 respectively. The commercial quotations look rather close, which is not too surprising considering that both versions, at first sight, have similar complexity and board implementation. 3. FPGA system assessment ------------------------- The FPGA-based module was originally designed as a prototype readout unit. This allowed to substantially simplify and optimise the second version now well tailored to the revised LHCb needs. The current FPGA version looks as a flexible and reconfigurable DAQ/trigger module, meeting the requirements of the three target applications (provided operation of the PCI bus at 66MHz/64bit for the VELO_Level1 [1], instead of 33MHz currently). Although the prototype stage is being finished with 5 modules available, and the basic functional tests (S-link to S-link transfer, FPGA bitstream loading, PCI initialisation, Dual-port memory access from MCU via PCI) being under completion, it still remains to: - Complete the setup of the PC-based RU exerciser station; - Perform from it, systematic performance measurements to validate the expected I/O characteristics; - Test the PCI operation mode at 66MHz/64bit for level1 - Deliver a working module ready to use in either of the foreseen applications (FEM, DAQ-RU, Level1) i.e: . Fully hardware configured and tested . With a basic control/test package to exercise the module from the PC-based exerciser station Not on the same time scale as the TDR, the general conditions of the FPGA system support will have to be settled between the LHCb-DAQ and EP-ED teams. This should concern chronologically : - System integration in lab. - System installation / commissioning (detector level & central system) - Long term support over the experiment lifetime (repair, components obsolescence,..) It requires these teams to work out a support scheme taking advantage of competences of both groups (more hardware for EP-ED and more software for LHCb-DAQ) and to provide the needed resources. In answer to the price enquiry, the cost of the FPGA versions was estimated to be: 1) CERN production: - At today prices, in quantity of 100 units a) FEM versions 4 link RU (5 Slink I/O) + MCU card 4.4 kCHF 16 link RU (4*quad Slink + 1*Slink I/O) + MCU card 4.9 kCHF b) DAQ version 4 link RU (4*Slink + MCU card, 1*NIC) 4.2 kCHF + MCU(1 kCHF?) - At 2003 prices, in quantity of 100 units. Considering that component prices would go down by ~20 %, this would translate into a ~10 % reduction of the cost of the various FPGA versions. 2) Commercial production including standard and functional tests: multiply above production cost by a factor 2 (minimum). This brings the price of the commercial FPGA versions in the 10kCHF range. The development and tooling costs amount to 60-70kCHF. 3.1 Strong points ----------------- - The FPGA RU unit exists and capitalises on 3 years of development at CERN; - It has been optimised for the FEM and DAQ-RU applications and has been updated to also fit the foreseen Level1 trigger implementation. In particular the sustained 200MB/s output data rate allows cost effective 4:1 multiplexing of Level1 at 1.1 MHz; - It appears as a simple 8-layer 9U board, with mezzanine cards ensuring flexibility for the I/O and ECS technology choices; - It basically uses standard technology (e.g PCI bus), standard components (e.g FPGAs, PMC mezzanine cards), standard connectors which should guaranty manufacturer independence, protect against obsolescence and ease any required upgrade; - The use of FPGA/FPSCs and VHDL allows code development, maintenance and evolution; - 5 prototype units (RU version II) are being validated for use (Level1,..); - If necessary, possibility of inhouse "low" cost upgrade/redesign; - Cost: Being developed at CERN: "hidden" design cost. 3.2 Weak(er) points ------------------- - The NIC interface has to be selected in agreement with the LHCB-DAQ team; - One would prefer a standard commercial ECS interface, instead of the home made MCU board. This unfortunately cannot be the standard LHCb CC-PC card as it is very unlikely to support 66MHz PCI; - The implemented PCI bus is a 66MHz/64bit bus. The FPGAs support this mode, but currently the system has to operate in a 33MHz mode, insufficient for Level1. Test of a 66MHz PMC-MCU is planned in the short term [1]; - The complete architecture was not simulated; - The error handling is pending, awaiting definition from the LHCb-DAQ team; 4. NP system assessment ----------------------- The NP_based RU is a recent development of the LHCb DAQ team, stimulated by the advent of powerful network processors focused on replacing ASICs in LAN and WAN equipment. The main virtue of this new type of processors is to offer flexible, software-based means to manipulate frames/packets input streams in high-performance routers and switches. Several of the semiconductor vendors are developing NPs, including large companies (Intel, IBM, Motorola, Lucent,..) and the market is poised for takeoff with network equipment vendors (Alcatel, CISCO, NORTEL,..) already providing NP-based equipment (e.g Alcatel 7770 Edge Services Router). The LHCb DAQ team has performed an evaluation study of the IBM NP4GS3 network processor using the software development environment provided by IBM. Two versions of a data multiplexing/merging code for small fragments of 1-100 Bytes (e.g Level1 input packets) and larger fragments of 100-500Bytes (e.g sub-events) have been developed and profiled. The results of this simulation show that event-building speeds, exceeding wire speed at the output port, can be achieved. An hardware NP4GS3 Reference kit has been acquired recently. A test setup using 4 Tigon2 1000LX NIC modules as input/destination sources has been built and first data merging tests have been performed. They essentially confirm the simulation results. The short term work program intends to run more extensive event building tests using up to 7 input Tigon NICs. The block diagram of a FEM/RU board has been presented at the review. It proposes to implement up to two NP mezzanine cards on a carrier board providing the basic supplies (power, clock) and the ECS interface (CC-PC). This board is envisaged as very generic module, usable for: - Front-End Multiplexing / Readout Unit (DAQ and Level1) - Building block for the readout network (8-port switch) - Final event-building in front of SFC, as a replacment of NIC. The realisation of this module and thus the long term future of an NP-based FEM/RU unit largely depends upon the choice of FEM/RU to be made for the forthcoming TDR. It was pointed out that interest from another LHC experiment to participate in the development work would be much appreciated. The support of such a module, in case it will be adopted, falls on the LHCb DAQ team which would take care of all hardware and software aspects from its fabrication by an external firm, the full acceptance tests, development of a small scale lab setup, the installation/commissioning at the experiment site up to the full support during operation. In answer to the price enquiry, the cost of the NP versions, based on a commercial offer was estimated to be: - At today price 3:1 multiplexing: 2900 $ +2000 $ = 4900 $ ( 8.3 kCHF) 7:1 multiplexing: 2x2900 $+2000 $ = 7800 $ (13.2 kCHF) - At 2003 price 3:1 multiplexing: 2200 $+2000 $ = 4200 $ ( 7.1 kCHF) 7:1 multiplexing: 2x2200 $+2000 $ = 6400 $ (10.8 kCHF) The development and tooling costs amount to 75kCHF. The design fee is estimated to be 335-420 kCHF. 4.1 Strong points ----------------- - DAQ Generic NxM (N+M<=8) data merger module (e.g: 7:1 or 2x3:1) for: . Front-end Multiplexing (FEM) . Readout unit (DAQ and Level1) . Building block for Switching Network [2] . Final Event Building element before SFC (no need of NIC) - Basically a software-driven readout unit . Function defined by the running software . Elaborate debugging and diagnostic capabilities . Easy control/monitoring/error handling of dataflow - Simple functional board . Simple carrier board (power, clock, ECS interface) . Mezzanine board inspired by the IBM reference kit, where most complex parts are confined - Design facilities . Commercial advanced software development environment . Hardware reference kit with basic RU functionality - FEM: the 2x3:1 version could be cost-effective for some detectors; 4.2 Weak(er) points ------------------- - Intrinsic risk of new technology . Economical failure, short life or discontinuity of production & support . Single vendor source - Hardware development resources . Design/development of the mezzanine card . Design/development of the mother board - Gbit Ethernet standard requires a Slink-Gb Ethernet interface not yet available; - Level1: Gbit Ethernet speed is marginal w.r.t I/O performance needs; - Design, development, prototype schedule a bit tight, in particular if done outside CERN as it would require a market survey and tendering; - Cost: Design cost if to be done outside CERN. 5. Conclusion / Recommendations ------------------------------- The review board has focused his effort in assessing, independently, as exhaustively as possible, the status of both the FPGA and NP-based readout proposals against the LHCb requirements (functionality, performance, support, cost,..). Due to the different states of both proposals it was not attempted to compare them directly. On the other hand, it is expected that the information out of this review, together with the following recommendations will help making the choice of which FEM/RU unit to adopt. - Whatever the chosen FEM/RU version, it would be wise to review in the short term, the program of work and resources needed to reach precise milestones such as delivering a working module i.e ready to use in either of the foreseen applications (FEM, DAQ-RU, Level1). An enquiry on the subject was addressed to H.Muller and B.Jost after the review. Detailed and precise answers were provided which suggest to: . Review the situation of specific hardware components expected either to be available commercially or to be provided from outside LHCb, whose sourcing could become a problem (e.g S-Link-Gbit Ethernet interface, smart-NIC card,..). Possibly show that alternative solutions can be provided; . Draw a detailed schedule of hardware and software developments still required up to a working module; . Evaluate precisely the needed resources. This would probably highlight shortage of manpower already in the short/medium term. - In view of the forthcoming TDR milestone, one would recommend to carry out, within the coming couple of months, the foreseen extensive I/O tests, possibly in configurations/conditions approaching the future FEM/DAQ/Level1 implementation, to validate the expected I/O performance (FPGA version) and confirm the first results (NP version). Progress in overcoming (if not fixing) the weak points would be appreciated on the same time scale. - Progress in the definition of detector partitioning would allow to better choose the multiplexing factor and thus to account more precisely for the number of FEM/RU modules. Savings up to 30% could be achieved for certain detectors. Notes ----- [1] Several PMC-NIC and 66MHz/64bit PMC-MCU mezzanine cards are now commercially available. [2] Up to now experiments are considering, for obvious reasons, to use switches readily available commercially. The main advantage of an home built NP-based switch over commercial equipment is again its programmability i.e the ability of defining the frame manipulation and transfer protocol and the facility of easy debugging, diagnosis, monitoring and error handling. +------- CERN - European Laboratory for Particle Physics -------+ | E-mail: Philippe.Gavillet@cern.ch | | Earth-mail: EP/DELPHI, CERN, CH-1211 GENEVE 23, Switzerland | | Phone: +41 22 767 30 18 Fax: +41 22 767 91 00 | +---------------------------------------------------------------+