CIBERAMBIENTES DE COMPUTAÇÃO DISTRIBUÍDA
 

To research and develop high-performance cyberenvironments based on distributed computing for innovative, cutting-edge medical applications.

Cloud Computing is being increasingly utilized in the scientific community, to the extent that some of their limitations are being mitigated, particularly concerning the capacity of network communication and performance. In the specific use case for the support of massively parallel and distributed computing, several studies have been carried out in order to examine how the virtualization layer affects performance, or to explore the possibility of using public providers as means of acquiring additional or even primary resources for the execution of scientific application.

Still, there is still a gap in understanding with regard to the effects of the virtualized system from clouds as a whole, on the inherent properties of scientific applications. Among these properties, the following can be cited: regarding the infrastructure, the topology of the communication layer and of the “virtual clusters”, which are created for the execution of those applications, additionally, concerning a specific application, the coupling between its intrinsic features (algorithms and parallel libraries used in its implementation). The set comprising infrastructure and applications, even in the context of the physical layer, has long been the subject of studies aimed at optimizing the utilization of computational resources, always searching for higher performance. With computational clouds, or more generally in the presence of virtualized environments, the existing virtualization layer brings a new component, which turns the desired use of massively parallel and distributed computing and its evaluation into extremely complex activities.

The currently used approach, mainly by commercial providers, is what can be classified as “brute force”, wherein additional resources are added, be it horizontally (adding more virtual servers) or vertically (increasing the capacity of virtual resources), the latter in a lesser degree.
This proposal is based on the studies about the use of cloud computing supporting scientific applications in order to attend to the specificities of applications, projects and research undertaken by the groups associated to this project. It is based on the knowledge about the different implications of executing research-oriented applications on a virtualized and shared environment, typically found on clouds, obtained during the current INCT-MACC project. These works (papers, dissertations and theses) are related to the evaluation of virtualization layer effects, the creation of a private cloud infrastructure, allocation models for virtual environments and studies about the coupled effects of the application class and implementation libraries within these classes.
These research works have paved the way for a solid knowledge base, which enabled the design of a computational infrastructure for hosting various scientific applications, seeking to maximize performance under the premise of providing the best environment given the application class to be deployed.

Background

In the next paragraphs it will be presented some examples of projects dedicated to the evaluation about the use of clouds in support of scientific computing.
The first large study that refers to the use of clouds in the scientific environment came to be with the project/study performed by the American Department of Energy (DoE), in the 2009-2011 period, along with its two largest centers of massively parallel and distributed computing (Argonne Leadership Computing Facility at the Argonne National Laboratory and NERSC: National Energy Research Scientific Computing Center at Lawrence Berkeley National Laboratory), setting as its main objective the determination of the contribution of cloud computing in supporting scientific research within the DoE. Among the conclusions and results found in its final report, “The Magellan Report on Cloud Computing for Science” (MAGELLAN); it cites the observation that, among its users, when asked which factors they think are most attractive for using cloud-based resources, 79% of them saw the possibility of accessing additional resources as one of the most attractive features, 59% saw the possibility of resource control enabled by the environment, 52% saw the possibility of sharing these resources with their peers and the same percentage for the easiness of operation when compared to a DoE cluster (respondents were allowed to choose more than one option in the assessment). However, at the time the report was produced, it highlighted on one side the need for more mature applications supporting this environment, and on the other that, regarding the DoE, its infrastructure already incorporated the support and services proposed by clouds, but in the future could come to use cloud computing as extra resources.

Another large evaluation was undertaken by the Helix-Nebula project – Scientific Cloud Computing Infrastructure for Europe (CERN). The objective was established as the creation of an infrastructure and an environment, both cloud-enabled and capable of offering support for several organizations, based in the specific needs of the European research community and space agencies. This project was rooted in a strong collaboration of public and private initiatives, which together examined the possibilities of using cloud computing in support for research institutions and projects. To this end, the identification of a small number of key projects was set as one of its initial commitments, in order to serve as a model for these partnerships. Hence, the intent of Helix-Nebula project is the following: to develop, explore and utilize the cloud infrastructure, grounded initially in the needs of the European research organizations, while enabling the inclusion of other organizations (government, business and society) as requirements emerge. For Helix-Nebula project, independence with respect to commercial cloud providers is seen as one of its main benefits for its users – guaranteeing interoperability and service level objectives – as well as security and transparency. Service and infrastructure providers associating themselves with the project, they will have the possibility of interacting with a wide environment of production and testing, cutting-edge technological developments, government resources and the assurance of a minimum amount of initial users.
The ongoing project stands currently on four case studies, with highlight on medical activities:

  • The European Molecular Biology Laboratory (EMBL), whose objective is the development of a portal based on a computational cloud, capable of enabling the sequencing and analysis of large DNA strands.
  • The “Port  d’Informació Científica” (PIC) project is concerned with the study of degenerative diseases, and the Helix-Nebula project enables the possibility of developing portals and environments aimed at image analysis and data processing, performed by medical professionals by means of a friendly and dedicated interface.

Another ongoing project is the EU-Brazil Cloud infrastructure Connecting federated resources for Scientific Advancement (EUBrazil-CC), which is about an environment dedicated to users from the Brazilian and European scientific communities. It is a two-year project, started in 2014, aimed at exploring the possibilities of a computational cloud that serves both scientific communities. Its case studies will involve three large, complementary and multidisciplinary scenarios, which cover the research areas of epidemiology, health, biodiversity, natural resources and climate change. The study intends to prospect the possibilities in leveraging clouds in this federated environment, in order to enable the management of complex workflows with high volume data common to these studies.

The following organizations represent the European community in that project: Universitat Politècnica de València – UPVLC, Barcelona Supercomputing Center BSC, and the Instituto de Salud Carlos III ISCIII (Spain); University of Newcastle UNEW (England), Euro-Mediterranean Center on Climate Change CMCC (Italy); and the University of Amsterdam UvA (Netherlands). On the Brazilian community are: Universidade Federal de Campina Grande UFCG - Campina Grande; Laboratório Nacional de Computação Científica LNCC – Petrópolis, Centro de Referência em Informação Ambiental CRIA – Campinas, Fundaçao Oswaldo Cruz FIOCRUZ - Rio de Janeiro,  Pontifícia Universidade Católica do Rio de Janeiro PUC-Rio - Rio de Janeiro and IBM Research Brazil IBM - Rio de Janeiro. As this proposal is elaborated, the project is in the phase of defining the interconnectivity and security with regard to the infrastructures and case studies. There are two laboratories associated to the INCT-MACC and present in this project that are participating in the EUBrazil-Cc project – Hemolab and ComCiDis.
Regarding the evaluation of the use of clouds and dedicated portals, another two examples are noteworthy. The first is the Blue Collar Computing (BLUE), which is focused on high performance computing in support of the productive and research sectors. The project seeks to aid manufacturing enterprises, in the US, in improving their efficiency through process adjustment and loss reduction. This is achieved with computational models and simulations, both spanning the levels from design up to manufacture. In this case, the enterprises have the possibility of utilizing simulations as tools for the analysis of the materials behavior during production process. The project, by leveraging the processing power of the Ohio Supercomputer Center, looks for lending support to the creation of applications, which are accessed and utilized through a portal. The other project is the UberCloud HPC Experiment project, which was initiated in 2012 with the goal of exploring the remote utilization of resources located in high-performance centers, by means of portals and applications tailored for their users.

Given the characteristics, objectives, challenges and results of the aforementioned projects, we can verify that massively parallel, distributed computing is benefited by the utilization of clouds, insofar as it allows the remote access and sharing of resources, thus becoming an additional source for resources. The infrastructure, as well as the existing tools dedicated to manage it, are beneficial to both their users and the maintainers of these resources. For the former in particular, the infrastructure enables remote access to dedicated environments and to resources oriented towards massively parallel and distributed computing, whose maintenance would otherwise be prohibitive for most users, be it due to having a technical staff, or due to either fixed or maintenance costs of these resources. For the maintainers on the other hand, the possibility of gathering a diversity of high-performance resources into a federation allows for working with scientific applications and their related problems inside proper environments, whose infrastructures actually attends to the requirements of these applications.

INCT-MACC Infrastructure

Figure 3 – INCT-MACC Cluster/Cloud Mixed Infrastructure


The computational infrastructure currently under INCT-MACC management comprises two high-performance clusters, which have the purpose of providing an environment that gathers architectures capable of serving various classes of applications. The first of these two clusters is a HPC Bull System, composed of 154 CPU nodes and totaling 1992 CPU cores and 3548 GPU (Graphics Processing Unit) cores, which is configured and dedicated towards application processing within a classic, high-performance cluster model. The second one is a SGI Altix cluster with 94 CPU nodes, totaling 1128 CPU cores and 10752 GPU cores; configured into three distinct environments, described as follows. The first environment is made of 18 nodes and is configured as a cloud environment based on an OpenStack cloud. It is controlled by the VirtualIS (Virtual Infrastructure for Science) environment manager, which was developed in the scope of ComCiDis owing to technological grants and resources from CNPq, FINEP and FAPERJ. This cloud aims to providing dedicated execution environments for its users. Another environment currently in deployment phase is Quimera, composed by the set of GPGPU (General Purpose Graphics Processing Unit), by servers with Intel architecture CPU, by a server with 64-core AMD architecture processors, and by two servers equipped with co-processors, amounting to 120 cores. This infrastructure set was formed to permit the possibility of aggregating different types of available technologies, according to the inherent properties of applications. Finally, the third environment is also a classic cluster with 36 nodes, similar to the Bull HPC cluster although it is possible to configure it independently of provider support. The four aforementioned environments grants the INCT-MACC the necessary flexibility for the execution of various applications and inquiries (Figure 3). An application has already been ported to the virtual environment and hosted in the cloud, namely, the Virtual ImageLab. The application, initially developed for instantiation on workstations, was hosted in a cloud environment and given an interface for remote access through a browser, so that the application can be utilized remotely –thus leveraging the power of the INCT-MACC servers– as well as  on other platforms such as mobile devices. The environment can be accessed through the link http://imagelabvirtual.lncc.br/


Activities

The following activities are scheduled for the period covered by this project.

  • Evaluation of the applications classes targeted for scientific research, with focus on applications developed by INCT-MACC as detailed in this project. The treatment of the infrastructure adequacy based on classes of applications, allows the decoupling of support for a specific application, which makes for a much more flexible environment, able to meet the needs of research and development that may occur within the scope of INCT-MACC, associated with the technological development of infrastructure. The foundation for this approach is given by studies attempting to categorize applications into classes and the assessments already carried out internally to INCT- MACC, that demonstrate that grouping them based on their characteristics in terms of consumption of computing resources, data size and the relative importance of the application, permits the design of new applications to be done and analyzed by the association of these classes, thus bringing as benefit the possibility of anticipating their behavior when running in a virtualized environment. At the present moment, three classes among the thirteen reported classes of applications, proposed by the Dwarfs approach, have been analyzed.
  • Analysis of the type of interaction in virtualized environment among applications competing for the same resources. This concurrency has effects on the performance and variability of the environment. To have the best use of the resources, it is necessary to establish which set of hosted applications in these virtualized environments can coexist in the same real environment, and which combinations of applications should be avoided. This concept is called "Affinity" and can be used for the concurrency analysis for real and virtualized environments. The concept of "Affinity" aims to determine which types of applications can coexist in the same physical environment, virtualized or real machines without causing significant performance loss. To date, no studies that associate affinity to the effect of concurrency were found, although this study and the determination of the right combinations will become fundamental to use a cloud-computing environment to host the applications developed within INCT-MACC.
  • Evaluation of the influence in performance caused by the type of parallel libraries used in the development of applications. This influence is enhanced to the extent that applications can be executed concurrently in virtualized environments. Previously conducted evaluations show that the combination of: application class, language/library and type of real/virtual environment - have major implications on both the variability and performance in the computing environment, therefore on the quality and effectiveness of this environment, respectively. This study is necessary to be able to guide the development of new applications in order to optimize the use of the infrastructure.
  • Determination of the type of virtual topology infrastructure suitable for the applications that are implemented and used inside INCT-MACC environment. The performance of the aforementioned set of application class, language/library and type of real/virtual environment is also influenced by the topology of the virtual infrastructures created in support of massively parallel and distributed computing. It is important to determine a priori the best topology to be implemented when creating "virtual clusters" in virtualized environments. The result of the research realized so far, shows that for each type of application class, a given topology has better performance. The determination between the distribution of the virtual nodes for the largest number of servers or their concentration by fewer servers has direct dependency of previous implications mentioned above. Thus, in order to be able to schedule applications efficiently on existing resources, this assessment regarding the type of topology and distribution strategies is necessary.
  • Acquisition and maintenance of infrastructure. The existing infrastructure in INCT-MACC was acquired with resources stemming from the following projects: FINEP PROINFRA 2008; subsequently expanded with resources from "Computational Modeling and Simulation of the Cardiovascular System and its Applications in Medicine Assisted by High-Performance Computing," - FAPERJ. 19/2008); "National Institute of Science and Technology in Medicine Assisted by Scientific Computing", CNPq / FAPERJ No. 015/2008 and "Cyber ??Infrastructure for Network R&D in Medicine Assisted by Scientific Computing of Rio de Janeiro" (FAPERJ/2008) and subsequently expanded with funds from "CyberInfrastructure in Simulations: Grids, Clouds, and Multi Web" project (FAPERJ no 19/2008). This infrastructure has been maintained and updated by means of the resources from the projects mentioned above. Much of the equipment that exists today is more than three years old and has lost its warranty. Although purchasing spare parts is possible, the cost of these units becomes prohibitive because of their discontinued production. In light of the above, after an analysis of possible solutions to this problem, we chose a process of "continuous revitalization” of these equipment parts, in order to achieve gradual replacement of the damaged equipment through the acquisition of new processing nodes.
  • Improvement of the virtual environment portal. One of the main requests and difficulties reported by users of massively parallel and distributed computing environment, is about the access to, and use of these resources. Many efforts have been made to create more user-friendly interfaces for those users. The INCT- MACC project environment allows their users to access its resources by a portal focused on usability. Currently, there are efforts aimed to give to those users the best user friendly environment associated with the optimization of the utilization of these resources. This will be done based on studies and research cited on items 1, 2 and 3, wherein, with information about the user and her behavior obtained through the portal, it will be possible to classify the application based on the following parameters: application class, type of deployment, affinity, optimal topology and the virtualization layer, the latter for the case of using virtualized environments. With this type of approach, it will be possible to allocate the users application in an environment that best meets their requirements. Another issue is to increase the functionality and adaptability of the portal for mobile devices, to enable remote data transmission and result retrieval.

Goals

  • By the end of 2017: Evaluation of the other classes and sub classes of Dwarfs based on applications and new developments of INCT-MACC. This goal aims for mapping the parameters of applications already developed or under development. It has as benefits: guiding the improvement of these applications, and means to determine the best infrastructure for their implementation.
  • By the end of 2017: Validate the degree of affinity of the INCT-MACC applications as a way to obtain the sets of application classes that can be executed in shared environments – cloud computing – as well as the effects of the virtualization layer in their performance. This goal is intended to examine which group of applications can be ported to cloud computing and which applications should remain deployed in dedicated infrastructure.
  • By the end of 2017: Analysis of parallel libraries based on classes of applications. This goal is designed to determine which libraries, used to implement applications, get the best utilization of the existing resources. With this approach, it will be possible to determine, before the implementation of an algorithm or application, what kind of language or library should be used for its development. This study should take into consideration the use of a shared environment or not (affinity).
  • By the end of 2019: Determination of the topology of real and virtual environments based on the classes of applications and their affinities. This study is based on the results obtained by the goals 1, 2 and 3 and aims to determine the best topology for computational architecture able to provide the best performance for the necessary research and applications developed by INCT-MACC.
  • By the end of 2021: the revitalization of the Computational Infrastructure INCT-MACC. This goal allows the continuity of the ongoing research and ensures the processing power and heritage of the existing computational resources of INCT-MACC.
  • By the end of 2021: the improvement of the access Portal. This goal aims to ensure the continuity of the development of portal-enabled applications and the Virtual Infrastructure for Science - VirtualIS), thus allowing the use of the computational resources and applications in a transparent and remote fashion, with a focus on providing the ability to access those resources and applications via mobile devices.

Impact

  • Availability of a computational environment as shown in Figure 4 for the activities of research and development of INCT-MACC, able to allow remote access to computing resources, available to researchers and dedicated to their needs. Hence, it is expected to relieve the researchers from tasks related to the configuration or execution procedures, typically present in the area of massively parallel and distributed computing, in order to devote themselves to their research purpose.


Figure 4. Environment created for users of Scientific Computing in Medicine.

  • Understanding of the classes of scientific applications according to their intrinsic and specific requirements, especially those related to massively parallel processing and distributed computing, thus optimizing the existing or newly acquired computer infrastructure. This knowledge is necessary in order to guide the allocation of applications submitted to the environment, onto an infrastructure that best meets their specificities.
  • The aforementioned knowledge will also enable a more effective use of the available computational resources and, above all, an understanding for the acquisition of resources based on the real needs of the research, thus contributing to a more rational utilization as well as assisting researchers in specifying their requirements to the funding agencies.
  • Ability to guide the development of new applications in the context of distributed and shared environments, such as cloud computing, or even dedicated ones such as high-performance architectures and real clusters environments, contributing to the efficiency of algorithms under development.
  • Ability to determine which applications can be ported to a cloud-computing environment and which should not be ported, contributing to the optimized use of resources, and to work with mixed massively parallel computing environments (cloud infrastructure and high performance computing).
  • Regarding the security aspect of clouds, the knowledge of the actual composition of the created virtual environments, in terms of architecture and topology, will enable the creation of these environments in private clouds within existing computing resources, optimizing its use, providing secure environments to its users, along with the main benefits of cloud computing – environments that are customizable, friendly and with remote access.
  • Finally, this proposal contributes to the maintenance of the acquired infrastructure, during the first phase INCT-MACC, allowing the continuity of research. In particular, the new acquisitions will be done based on the study and goals described above, focusing on high performance personnel training, now considered strategic by many nations, both for the scientific community and for the industry.