KIT Data Manager provides a generic and flexible architecture for large scale scientific data management. Its main goal is to support a broad spectrum of scientific communities at managing structured data. Many ideas incorporated into KIT Data Manager were inspired by well established, community-specific data management solutions like FedoraCommons, DSPACE, OMERO or ICAT.

The main difference between these solutions and KIT Data Manager is a much larger flexibility due to not focussing on a specific community. However, due to the service based approach of our solution, almost arbitrary communities can be supported by custom services based on basic services to integrate community specific features.

To support this high degree of flexibility and customizability, KIT Data Manager is organized into several layers offering different services. For detailed information please click the different layers in the figure below.HLServicesLayer

KIT Data Manager ArchitectureBasicServicesLayerAccessLayer


The main building block of the entire architecture are the High Level Services which can be divided into six categories where some categories comprise one or more High Level Services, e.g. Data Management, whereas other categories unite services and components, e.g. Security or Policy Enforcement, acting in the background:

 

Data Management: Certainly, the main focus of KIT Data Manager lays on data and data management, currently for file-based data. There are services for high-performance ingest of data into managed storage, to download data by the user and for shifting data internally, e.g. to another data center for replication or processing. All these services support a wide range of protocols, e.g. GridFTP, HTTP, SMB/NFS,  allowing access from everywhere offering appropriate performance.


Metadata Management:
In combination with the data management, metadata management services are responsible for enabling KIT Data Manager to handle structured data (datasets). Each dataset is automatically enriched by a basic set of metadata entries for administrative and management purposes, reflecting e.g. ownership, responsibilities and data location. In addition, arbitrary community-specific metadata entities might be extracted automatically during data ingest or may be provided manually by the user. All these metadata can be used to retrieve datasets using suitable search mechanisms.


Security:  A fundamental question when talking about data is, how security aspects like ownership and privacy are handled. KIT Data Manager offers security mechanisms on many levels. For external, web-based access authentification can be realized by state of the art solutions like Shibboleth or OAuth. File transfer services are protected by mechanisms offered by the chosen transfer protocol, e.g. X.509 certificates for GridFTP transfers. For authorization, KIT Data Manager offers a custom, role based concept allowing fine grained access control on entity level, e.g. for single datasets or dataset collections. This allows to share dataset(s) with users or groups of users for easy collaboration at any time.


Data Processing: For research data storing and managing the data is just a small part of the data life cycle. In many cases raw data has to be processed at least one time, more often multiple processing steps are applied to the data. For this purpose,  KIT Data Manager offers integrated data processing capabilities. This allows to link workflows to ingested datasets to process data automatically. After successful processing, result datasets are linked to their input datasets by metadata and can be accessed like any other dataset. This offers the seamless documentation of dataset provenance allowing easy reproducibility.

Life Cycle Management: Thinking of data in a large scale, one of the most challenging aspects is the management of the data life cycle. Apart from the documentation where a dataset comes from and what happened to the dataset it is also important to manage what is planned to be done with it in future. Some datasets may get open access after a period of time, other datasets have to be processed periodically, e.g. for validation purposes. However, most data life cycles strongly depend on the community they belong to. KIT Data Manager allows to conceive community-specific data life cyles.


Policy Enforcement: Policy Enforcement is part of many services, as most of them have to perform automated work steps. One example is the processing of a dataset as soon as its ingested is finished. Other policies can define regular integrity checks or trigger to perform curation tasks like format conversion. In short, policy enforcement is a central aspect of the whole KIT Data Manager architecture.

To realize functionalities from all these categories, KIT Data Manager integrates standard technologies as far as possible and interfaces to and from every architectural layer to be able to exchange single components when required to achieve a high degree of sustainability.

Copyright by SWM, Custom Footer text
Templates Joomla 1.7 by Wordpress themes free