Vast efficiency improvements when using the grid

New software from the areas of gravitational physics and genome research reduces work efforts by 95%. Presentation from 22 to 24 March in Dresden.

March 19, 2010
Astrophysicists and genome researchers will be presenting software tools in Dresden that are set to ease access to the immense computing capacities of the grid. The automatic distribution and control of computer programmes in the grid enables their intensive use in the first place. This moves grid computing for all into the realm of the possible. The results will be presented for the very first time at the “All-Hands-Meeting”, the most important German conference devoted to grid computing. The All-Hands-Meeting will take place from 22-24 March at the TU Dresden: Hörsaalzentrum, Bergstrasse 64, 01062 Dresden.

Dr. Beck-Ratzka, Head of the “eScience” Group at the Max Planck Institute for Gravitational Physics (Albert Einstein Institute/AEI) and member of the Technical Advisory Board of D-Grid, explains: “Whereas around two years ago three to four hours a day were required to control 5,000 simultaneously-running data analyses, today 5-10 minutes are sufficient to get the job done. This is equal to a work reduction of 95%! This means that grid computing has reached the production maturity stage.”

Assist. Prof. U.D. Dr. Tobias A. Knoch, head of the international working group Biophysical Genomics (Bioquant Center/Deutsches Krebsforschungszentrum, Heidelberg, and Erasmus Medical Center, Rotterdam) adds: “For us, grids are the crucial key technologies for understanding life and, most importantly, in the battle against disease. In an international cooperative venture, grid resources have been a decisive element in finding, in a group of over 20,000 people, DNA sequences that especially provide critical impulses for the treatment of chronic pulmonary diseases.”

The new software made it possible for Beck-Ratzka and Knoch to use 400,000-500,000 CPU hours daily through the automatic installation of the necessary programmes and the automatic elimination of correctable errors. This high number of CPU hours makes the scientists from the AEI and the Biophysical Genomics working group the biggest users of grid resources in Germany and even number among the main users at the global level.

The astrophysicists and genomics researcher want to make their expertise and software available in the near future in order to provide users with easy access to the grid. To date, over 50 interested parties have already contacted the scientists.

Dr. Alexander Beck-Ratzka and Assist. Prof. U. D.  Dr. Tobias A. Knoch will gladly make time for you at the All-Hands-Meetings in Dresden for a personal conversation.

Publications:

Software GAT: A. Eifer et al., 2009: JavaGAT Adaptor for UNICORE 6 - Development and Evaluation in the Project AeroGrid. Proceedings of 5th UNICORE Summit 2009 in conjunction with EuroPar 2009, Delft, The Netherlands, LNCS (accepted).

Software GRIMP: K. Estrada et al., 2009: GRIMP: a web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data, Bioinformatics, 25, 2750-2752, 2009.

Pulmonary research: D. B. Hancock et al., 2009: Meta-analyses of genome-wide association studies identify multiple loci associated with pulmonary function. Nature Genetics. Online Dec. 13, 2009.

Background information

Since 2004, the German Federal Ministry of Education and Research (BMBF) has been funding the cross-linking of data centres within the framework of the D-Grid-Initiative. Grid refers to a highly heterogenous network of computers. This network encompasses both computing centres at universities and research facilities, as well as individual PCs. The aim of grid projects is the joint, effective use of these computer resources, as these are rarely working to 100% capacity.

The Group headed by Beck-Ratzka at the AEI has been successfully using grid resources since 2007 for data analyses. When it comes to taking advantage of the D-Grid resources, the largest proportion by far can be claimed by the scientists who analyze the data from the gravitational wave observatories. Since its founding in 1995, the AEI has been playing a leading role in the development of software within the framework of grid computing. This also includes the Grid Application Toolkit (GAT) with which the AEI will be making previously inaccessible D-Grid resources available.

In addition to these uses in astrophysics, grid resources were a decisive factor in an international cooperative venture in finding, in a group of over 20,000 people, DNA sequences that especially provide critical impulses for the treatment of chronic pulmonary diseases. Tobias A. Knoch has been working since 1996 with parallel supercomputers to more precisely understand the three-dimensional organization of DNA as well as the entire cell nucleus. For his international working group, grid usage is, in the meantime, the biggest element among the employed resources for the analysis of DNA sequence patterns, as well as for locating and analyzing genetic indicators of diseases.

With the help of an automated method for the distribution of various data analyses on the decentralized computer, the Group headed by Alexander Beck-Ratzka is now facilitating the use of inactive resources. Within the framework of the AstroGrid-D, DGI-1 and DGI- 2 projects, funded by the BMBF, the GAT (Grid Application Toolkit), developed at the AEI as part of the GridLab EU project, has been completed to the extent that it can now be used for the simple distribution of programmes in the grid.

The working group headed by Tobias A. Knoch not only set up its own grid with desktop computers (Erasmus Computing Grid), but, within the framework of the BMBF funded MediGRID and Services@MediGRID, as well as part of the European Europäischen EDGeS project, has also developed a software package with which computing tasks can be easily distributed and managed via various grid infrastructures.

Both systems are intended in future to be made available in a way that is as user-friendly as possible and be integrated in the medium term. To do this, the scientific application programmers want to develop a web portal with all of the necessary functionalities.

What do gene analyses and gravitational waves have in common?

The biggest common challenge in both cases is the enormous amounts of data that need to be analyzed:

In genomics research, the genome-wide association of genetic alterations with disease symptoms and disease progression is becoming increasingly important. To this effect, an analysis of the largest population groups is necessary in order to be able to successfully correlate millions of genetic variants and factors with the progression of diseases. This includes e.g. the international Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. Based on the general grid software of the working group headed by Tobias A. Koch, a further, particularly efficient software was developed specifically for this special objective. With normal resources alone, these analyses would take years to complete. These and other analyses use up in the D-Grid and other international grid infrastructures ~250,000 to 350,000 CPU-hours per day. This is equal to a cluster with up to 14,500 nodes (individual PCs).

In gravitational physics, researchers have been working on the analysis of data generated by the gravitational wave detectors, with the aim of opening a new window to the cosmos: Gravitational wave astronomy will serve to study the dark side of the universe. In order to be able to analyze the huge amount of data, efficient methods for shared computing must be very pragmatically developed, step-by-step. This encompasses, among others things, the Einstein@Home project. Einstein@Home searches the data from the gravitational wave detectors for gravitational wave signals and thereby involves hundreds of thousands of users worldwide who have made a proportion of the computing time from their PCs available. In the D-Grid, Einstein@Home uses a daily computer capacity of 100,000 to 150,000 CPU-hours, which is equal to a cluster with 6,250 nodes (individual PCs).

Genome organization
Despite a lot of effort during the past 130 years, it has not been at all possible to fully understand the organization of the genome in the cell nucleus. Although decisive progress was made thanks to the decoding of the base sequences of DNA, this is only a part of the information that is stored in the genome. The most important factor here is that the 46 individual molecules - the chromosomes - when stretched out result in a length of 2 m; however, functionally they are packed in a cell nucleus with a diameter of ca.10 micrometres. This equals a compacting factor of ca. 100,000. To this effect, the DNA is compacted in several packing steps: Initially, the DNA is wound around a protein complex through which the so-called nucleosome arises, a sort of pearl necklace. The nucleosomes then come together to form chromatin fibres. These fibres form loops, which then gather together into large groups. Around 100 of these aggregates typically make up a chromosome. The chromosomes themselves are assigned to certain positions in the cell nucleus in a more or less defined way. In each of these packing steps there are then chemical or structural modifications that in turn constitute a code. This code is, for example, responsible for the correct reading of the genetic information of individual genes. It is a known fact that the combination of alterations in the spatial genome architecture has a strong effect on the function and on the diseases. Therefore, understanding the genomic function requires the clarification of as many correlations as possible on all of these levels. Here, there is a considerable need for further research.

Biophysical genomics
The international “Biophysical Genomics” working group focuses on understanding the organization of the genome in the cell nucleus. It researches this from the level of DNA sequence organization through to the morphologic visible level of the entire cell nucleus under the microscope. A particular focus here is the relationship between the three-dimensional structure of the genome and its function. The working group is inter- and transdisciplinary and combines various processes in a so-called systembiological approach in which theoretical processes are merged with experimental ones: e.g. three-dimensional structure models of chromosomes and entire cell nuclei are simulated with parallel supercomputers. At the same time, experimental data about the structure is achieved through high-definition images and biochemical processes. Both are ultimately compared and new models and experiments initiated. Knoch and his Biophysical Genomics working group were thereby able in the past twelve months to obtain numerous new insights that have had a fundamental and lasting effect on research in this area. Parallel to this, new processes could be developed such as, e.g., brand new staining methods of the genome and new bioinformatic tools, such as grid infrastructures. These efforts have been recognized with honours and prizes.

Genome-wide association studies
In order to make reliable claims about the function of genetic variations or mutations especially with regard to disease outbreak, disease treatment and disease progression, population-based screenings are necessary. To do this, genetic variants/mutations are compared with health-related data in major population groups. The number of known variations here has in the meantime reached the millions. This explains why corresponding studies require tens of thousands of participants to obtain statistically relevant results. The work therefore takes place in large international consortiums, otherwise it would be unaffordable; the same holds true for astronomy and elementary particle physics. Newly discovered variations/mutations that can be associated with a disease partner, must initially be examined in complex laboratory-based procedures as long as they are not the sought-after missing building block. In many cases, however, these then provide the solution to a correlation between function and disease, which is important for the treatment or even enable the development of treatment options at all. The discovery of new variations/mutations, which are important for the treatment of chronic pulmonary diseases, thus represent, as mentioned above, substantial progress in this research.

Grid Application Toolkit (GAT)
The GAT software developed by the eScience Group at the AEI can communicate with all known grid middleware services e.g. Globus, UNICORE and gLite. Middleware services are programmes that enable the communication of resources among each other. With GAT, access to the computer resources is considerably eased, as now each middleware must not be addressed individually with software that is specially tailored to it, thanks to the fact that GAT is universally applicable.

Most of the D-Grid resources accessible via Globus middleware are used by the AEI. In order to make additional computing capacities also available via the other middleware services, AEI scientists want to use GAT in future. Moreover, the software package is intended to be made available via a web portal to the entire scientific community.

Einstein@Home
The data which is being gathered in the LIGO Scientific Collaboration (LSC) with the linked gravitational wave detectors GEO600 und LIGO is analyzed together with the data from the Italian-French Virgo detector by Einstein@Home. With over 200,000 participants, Einstein@Home is one of the biggest shared computing projects in the world. It is being operated jointly by the University of Wisconsin-Milwaukee, USA and the Max Planck Institute for Gravitational Physics (Albert Einstein Institute) in Potsdam and Hannover.

Gravitational waves
Within the framework of his general theory of relativity in 1916, Albert Einstein had already predicted the existence of gravitational waves - however, he was thoroughly convinced that these tiny ripples in space-time would never be able to be measured. Yet he was mistaken about this: In the past few decades, an international community of scientists has developed highly sensitive gravitational wave detectors with which the tiny elongations that occur when a gravitational wave passes through can be measured. One of these revolutionary projects, the German-British gravitational wave detector GEO600, is located near Hannover and it operated by AEI scientists together with their colleagues in Cardiff and Glasgow. The measurements with GEO600 are Germany’s contribution to the international “Laser Interferometer Gravitational Wave Observatory” (LIGO) whose data is evaluated through the Einstein@Home analyses. Einstein@Home in the D-Grid - the “Usecase GEO600” - is making the biggest contribution by far to this work.

The direct measurement of gravitational waves will provide us with new insights into the universe, as, for the first time, we will be able to “see” areas that are not accessible to other observational methods. Since astronomical methods are always a look into the past, we will be able, for the first time, to catch a glimpse of the first moments of the universe and better understand how it came into being. The observational methods to date reach back to around 380,000 years after the Big Bang; it is only from that time on that the universe has become transparent for electromagnetic radiation, e.g. X-ray, gamma or infrared radiation. Gravitational wave astronomy is thus a perfect addition to the currently existing astronomical observation options. This is why research in this area is receiving funding throughout the world.

Other Interesting Articles

Go to Editor View