Solicitamos su permiso para obtener datos estadísticos de su navegación en esta web. Si continúa navegando consideramos que acepta el uso de cookies. OK | Política de cookies | Política de Privacidad
Language: English
RSS

Follow us ...

  • Twitter FacebbokFlickrYouTube CESGA

SME Services

  • Servicios para Empresas

Annual report 2018

  • Anuario CESGA 2018

  • CESGA ICTS

The data storage service that CESGA provides is oriented to the storage and treatment of high performance information, large volumes of data, and high availability as well as access from any computer connected to the Internet.

  • Criteria for the classification of the information in the service storage

Robot de cintas

With the objective of responding to the increasing demands in quantity and quality of service in the storage systems as well as to the different options of available storage in the market, is necessary to classify the types of data in order to adapt the different storage services to the specific needs of each group of information. These classifications can respond to criteria such as quantity of information, level of availability required, security and access control, etc.... Taking into account the diversity of data which is serviced by CESGA, we have established the following list of main criteria to classify the information.

  • Level of availability and fault tolerance in which we indicate the criticality of the data, identifying data that must be “always available” on one extreme of the scale to “occasionally available” on the other extreme of the scale. “Always available ” can identify critical data for the operation of 24x7x365 services and “occasionally available” identifies data that should be accessed only on demand. Between both extremes, there are situations in which there exist windows of  non-available time  (4 hours, 8 hours, etc...). We have to highlight that the term, availability, does not refer to the speed of access to the data but to the fact that they should be robust concerning any type of problems that can present itself in the system (in computer terms, that is called “fault tolerance” which will finally determine the maximum number of SPOF or unique points of failure). Within this classification, we would be able to establish, for example, a high level (with multiple accesses to the data and data redundancy systems of the RAID type), average (with RAID solutions but without redundancy of components), and low (without any type of RAID nor redundancy of components).

  • Periodicity of the backup security copies is largely determined by the frequency of data modification. This can be daily, weekly, on demand  when new information is entered, or it may not to be done at all, for example, in those cases in which a backup security copy of the data already exists.

  • Connectivity defines at least two parameters of performance: the bandwidth of access and latency and the mode that is utilized (for example, if it can be shared or connected “in hot” to new servers) and the distance of scope. The interfaces of connection (by means of optical fibre, the different SCSI buses, or the connections via local or extensive area networks using NFS or CIFFS) significantly define these parameters, but should not be closed to them (for example, by using SCSI interfaces it is possible to expand the bandwidth using HBAs multiple to access to the same volume of information).
  • Storage capacity: will identify the quantity of storage that the data may require. The absolute values are not representative for this parameter because, in the temporary field, a small quantity of information may refer to some tens of Megabytes today, whereas this same quantity represented a very high volume of information only a decade ago. Thus, for this parameter, we will use percentages that are referenced to the maximum capacity available at each moment.

  • Data sharing functions according to whether it  should be accessed from different hosts and/or by different user communities.

In view of the previous parameters, it is reasonable to think that the specification of one of them influences the others on a significant way (that is to say, they do not represent a strictly orthogonal set). However, it should be kept in mind that what why we try to do at this stage is to separate the storage needs of the available technologies in order to search for the best technology that fulfils the requirements in each moment, once the requirements have been specified. For example, some years ago, it was necessary to make direct connections between the storage and the system to be utilised in order to obtain high bandwidths, whereas nowadays it is no longer necessary to fulfil this requirement due to the deployment of broadband networks (even in WAN environments).

 

In addition to these criteria, others can be introduced such as the temporary of data (that is to say, whether they are data whose presence must be permanent or that, on the other hand, are continuously replaced), security and confidentiality of the information, etc., which can be really important but that would also suppose an excessive increase in the number of classes. Taking into account that these tend to be secondary factors, within some concrete types of data and in the most significant cases, subcategories will be able to be established later.

  • Classification of the information in CESGA

Attending to these criteria, we have classified of the available information as well as computational servers and storage in 4 types:    

  • Type 1 or SCRATCH data is a very high performance (very low latency and maximum bandwidth) as it affects the performance of the computational systems of the Centre as well as the average capacity (according to the number of simultaneous jobs that must be supported) since the data are only stored while the execution of the computations last. This availability can be low (since the data are temporary) and, for this reason, it is not necessary to make backups.

  • Type 2 or home directories  contain critical data that may be analysed and modified at any moment, and that are critical since their availability depends on the functioning of the computational services of the Centre. Therefore they must give priority to accessibility (maximum) and an adequate balance between capacity (average, as a function of the number of users), and performance (average). Backups are made on a daily basis.

  • Type 3 or massive data storage (MSS) is used to store databases and research results. Usually the content does not vary (they are usually the WORM type), and the access speed is usually not critical, although they do require a high bandwidth to access to the servers, as they can be where the results of experiments are stored. Backups can be made on demand given  the fact that the content is only modified sporadically. Examples of this type of data are the results of the daily meteorological prediction or the databases used in genomics.

  • Type 4 or backups (internal and external) to disk are copies of data that the users make of their own servers or personal computers in the storage systems of CESGA, in order to have  a data backup. It is no longer necessary to do backups of this data (they themselves are “the” backup) and the availability of the service can be low. The service is offered through the network (internal or external) for which a high performance connection is not required (the bottle neck is located in the interconnection of the final user with the storage). The capacity can be low or average in accordance with the number of users of  this service.

  • Type 5 or SCRATCH PARALLEL is a very high performance (very low latency and maximum bandwidth), similar to Type 1 with the addition that the scratch data is shared by all of the cluster nodes and distributed among them which allows for an increase in both the access bandwidth to the files and the total capacity of the scratch above the capacity of the local disk. The availability can be very low due to the fact that it depends on many components that are not redundant. No backups are made of these data.
 
  • Availability
  • Backups (Periodicity)
  • Connectivity
  • Capacity
  • Sharing or accessibility

Type 1

Scratch

Drop NO Low latency, maximum bandwidth Average (20%) No sharing

Type 2

Directory Home

Maximum (depends the operation of the system) Daily Average (standard architectures, FC) Average (30%) Between all the nodes of the system or cluster
Type 3 MSS Average On-demand Network intranet or FC, to reach maximum sharing with high internal bandwidths Maximum (90%) High, internal to the Centre and sporadically external
Type 4 Backups Drop No *backup Network, intranet and internet, with half bandwidths Drop (10%) Maximum, includes internal and external systems
Type 5 Scratch Parallel Drop NO Low latency, maximum bandwidth High (50%) No sharing