Last Update Feb 4, 16:08 Introduction ------------ This document collects and enriches all the comments circulated up to now on the subject "Regional Centers" (RC) in order to set up a framework in terms of: - parameters and criteria - questions/issues (possibly with answers) The exercise aims to identify and list services and resources both necessary and sufficient to a RC, to set ground for discussion with potential RC candidates and to provide material to the simulation/testbed WGs. Basic Assumptions ----------------- Existence of one "central site" (CERN): the central site is able to provide all the services. The following steps happen at the central site only: - Online data acquisition and storage - Possible data pre-processing before first reconstruction - Calibration data storage - First data reconstruction - Creation and storage of Analysis Object Data (AOD) [Note:I use here the BaBar terminology for sake of brevity. It is anyhow intended that starting from the Raw Data a data format will exist, small in event size and with enough information to perform analysis. This could even be a set of links in a OODB. This set is called in the following AOD] The central site holds: - a complete archive of all raw data - a master copy of the calibration constants (including geometry, gains etc...) - a complete copy of all AOD, possibly online No indication is given at the moment about quantitative estimates for CPU, storage etc. at the central site. However, taking into account the following data: - 1 Pb raw data per year per experiment - 10**9 events per year per experiment - 100 days of data taking (i.e. 10**7 events per day per experiment) - AOD size between 10 and 100 kb per event at least one can assume that 10 to 100 Tb of disk space is allocated to AOD at the central site, and a total disk space of 500 Tb could be online at the central site. In the following, resources for the RC will be expressed in terms of percentage of the central site (CS) ones. Tasks ----- The offline software of each experiment is supposed to perform the following tasks: - data reconstruction (possibly in N steps) - MC production (includes generation, simulation AND reconstruction) - offline (re)calibration - successive data reconstruction - analysis Services -------- To execute completely and successfully the above tasks, services are required. A starting list includes: Data Services - (re)processing of data through the official reconstruction program [requires CPU, storage, bookkeeping, SW support] - generation of events [requires little CPU and storage, bookkeeping] - simulation of events [requires lot of CPU, storage, bookkeeping] - reconstruction of MC events [see point 1] - creation of the official AOD - updating of the official AOD under new conditions - AOD access (possibly with added layers of functionalities) - data archival/retrieval Technical Services - database maintenance (including backup, recovery, installation of new versions) - basic and experiment specific sw maintenance (backup, updating, installation) - production of tools for data services - production and maintenance of documentation (including Web pages) - storage management (disks, tapes, distributed file systems if applicable) - CPU usage monitoring and policing - database access monitoring and policing - network maintenance (as appropriate) - large bandwidth [Note: this is an attribute between a service and a resource] Motivations for a RC -------------------- - Insufficient resources at the CS - Unavailability to spend (large amount of) money on equipment at the CS - Slow turnaround for analysis at the CS - Outsourcing of specific activities from the CS Assuming that centers exist outside the CS, they may be specific or wide-range in terms of above services. I propose to define a specific center as a "Service Center" for the experiment, i.e. a site where one or more specific services are covered but NOT all. So one could have a MC Service Center or a Documentation Service Center. What we call a RC in the following is a multiservice center. Capabilities ------------ A RC (tier 1) should provide ALL the technical services, ALL the data services needed for the analysis and at least another class of data services (MC production or data reprocessing). A RC should have resources of the order of 10 to 20% of the CS. [Note: if one assumes 500 Tb of disks online at the CS, 10 to 20% is probably a lower limit for a large RC. It is my opinion that a RC could easily have 100 to 200 Tb of disk space online] Constituency ------------ The relation between RC and its users should be analyzed in terms of which service the user is accessing, and from where. The "quantity" of the service required is not an issue because validation procedures for service requests should be operational and enforced. The big debate is probably about users of the AOD access for analysis purposes. The reason is that this service is demanding on individual and interactive basis. However MC production may be an issue too in terms of competing requests and needs. Essentially there are three possible priority schemes: 1- services are provided on a first come-first served basis 2- services are provided to physicists of the "region" 3- services are provided to defined analysis group (interregional) [Note: this is really difficult to say and has a lot of implications, technical and political. I really would like to have large feedback here] Data Profile ------------ A RC should maintain a large fraction of the AOD. This is not a problem on the first year, while becomes a problem with accumulation of data. If the RC serves prioritized data access to defined analysis groups then the obvious choice is to keep all the AOD and fraction of larger formats for those analysis channels. A RC should maintain a fixed statistical fraction of fully reconstructed data. Calibration constants are not an issue; they are a tiny fraction of the data (Gb vs. Tb). Moreover past experience shows that they are not used in physics analysis, once one trustes the reconstruction. Calibration data (special runs) should only go where calibration studies are performed. Bookkeeping data (the catalog) should be everywhere and synchronized. Communication Profile --------------------- The mechanism of data transfer from the CS to the RC depends also on the underlying storage mechanism. If data are stored in Objectivity in a single federation, as is continuously promoted by someone, then small size data (AOD as long as they are produced) could be moved by network using Objectivity mechanisms (replication). Present state of the product and of the bandwidth of course make this undoable now. Large size data should be moved by FedEx (or competitor). Data access at the RC should be fast, lest one looses all the advantage of computing at the RC. So, again, if analysis service is concentrated on physics channels, all data needed for tha channel should be online. [Note: 10**7 events per day per experiment make 100 Gb to 1 Tb AOD per day per experiment. Counting on 100 Mb/s bandwidth, 100 Gb is doable, 1 Tb is at the limit, considering that you have four experiments.] It is difficult to judge if facilities a la HPSS are needed/requested in RC, mostly because its cost and maintenance requirements. Collaboration, Dependency ------------------------- The dependence of the RC on the CS can be further classified in: -data dependence -software dependence -synchronization mechanisms. Data Dependence: the obvious part concerns data which are copies of the CS ones (AOD, reconstructed, calibrations). These data are transferred to the RC once for all or regularly to fulfill RC services. Less obvious is "creation" of new data at the RC; example: reprocessing of data done at the RC, creation of official AOD for given analysis channel etc. In the latter case the RC holds a unique copy of these data untill retransfer to the CS. Whether this is allowed/forbidden/discouraged has to be discussed. Software Dependence: here again there is an obvious part, installation of the official experiment software. It is probably meaningless to develop software at the RC, with the exception of site specific tools Synchronization Mechanisms: this depends on the underlying OODB; if it is Objectivity and if one has a single federation with the CS, then data will be synchronized by the Objy mechanism. If one has different federations then synchronization becomes a management issue, and automatic procedures must be envisaged. In both cases the so called tag db must be kept in sync continuously; this is demanding in terms of network bandwidth, reliability and of backup procedures in case of network failure.