Vast efficiency improvements when using the grid
New software from the areas of gravitational physics and genome research reduces work efforts by 95%. Presentation from 22 to 24 March in Dresden.
What do gene analyses and gravitational waves have in common?
The biggest common challenge in both cases is the enormous amounts of data that need to be analyzed:
In genomics research, the genome-wide association of genetic alterations with disease symptoms and disease progression is becoming increasingly important. To this effect, an analysis of the largest population groups is necessary in order to be able to successfully correlate millions of genetic variants and factors with the progression of diseases. This includes e.g. the international Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium. Based on the general grid software of the working group headed by Tobias A. Koch, a further, particularly efficient software was developed specifically for this special objective. With normal resources alone, these analyses would take years to complete. These and other analyses use up in the D-Grid and other international grid infrastructures ~250,000 to 350,000 CPU-hours per day. This is equal to a cluster with up to 14,500 nodes (individual PCs).
In gravitational physics, researchers have been working on the analysis of data generated by the gravitational wave detectors, with the aim of opening a new window to the cosmos: Gravitational wave astronomy will serve to study the dark side of the universe. In order to be able to analyze the huge amount of data, efficient methods for shared computing must be very pragmatically developed, step-by-step. This encompasses, among others things, the Einstein@Home project. Einstein@Home searches the data from the gravitational wave detectors for gravitational wave signals and thereby involves hundreds of thousands of users worldwide who have made a proportion of the computing time from their PCs available. In the D-Grid, Einstein@Home uses a daily computer capacity of 100,000 to 150,000 CPU-hours, which is equal to a cluster with 6,250 nodes (individual PCs).
Despite a lot of effort during the past 130 years, it has not been at all possible to fully understand the organization of the genome in the cell nucleus. Although decisive progress was made thanks to the decoding of the base sequences of DNA, this is only a part of the information that is stored in the genome. The most important factor here is that the 46 individual molecules - the chromosomes - when stretched out result in a length of 2 m; however, functionally they are packed in a cell nucleus with a diameter of ca.10 micrometres. This equals a compacting factor of ca. 100,000. To this effect, the DNA is compacted in several packing steps: Initially, the DNA is wound around a protein complex through which the so-called nucleosome arises, a sort of pearl necklace. The nucleosomes then come together to form chromatin fibres. These fibres form loops, which then gather together into large groups. Around 100 of these aggregates typically make up a chromosome. The chromosomes themselves are assigned to certain positions in the cell nucleus in a more or less defined way. In each of these packing steps there are then chemical or structural modifications that in turn constitute a code. This code is, for example, responsible for the correct reading of the genetic information of individual genes. It is a known fact that the combination of alterations in the spatial genome architecture has a strong effect on the function and on the diseases. Therefore, understanding the genomic function requires the clarification of as many correlations as possible on all of these levels. Here, there is a considerable need for further research.
The international “Biophysical Genomics” working group focuses on understanding the organization of the genome in the cell nucleus. It researches this from the level of DNA sequence organization through to the morphologic visible level of the entire cell nucleus under the microscope. A particular focus here is the relationship between the three-dimensional structure of the genome and its function. The working group is inter- and transdisciplinary and combines various processes in a so-called systembiological approach in which theoretical processes are merged with experimental ones: e.g. three-dimensional structure models of chromosomes and entire cell nuclei are simulated with parallel supercomputers. At the same time, experimental data about the structure is achieved through high-definition images and biochemical processes. Both are ultimately compared and new models and experiments initiated. Knoch and his Biophysical Genomics working group were thereby able in the past twelve months to obtain numerous new insights that have had a fundamental and lasting effect on research in this area. Parallel to this, new processes could be developed such as, e.g., brand new staining methods of the genome and new bioinformatic tools, such as grid infrastructures. These efforts have been recognized with honours and prizes.
Genome-wide association studies
In order to make reliable claims about the function of genetic variations or mutations especially with regard to disease outbreak, disease treatment and disease progression, population-based screenings are necessary. To do this, genetic variants/mutations are compared with health-related data in major population groups. The number of known variations here has in the meantime reached the millions. This explains why corresponding studies require tens of thousands of participants to obtain statistically relevant results. The work therefore takes place in large international consortiums, otherwise it would be unaffordable; the same holds true for astronomy and elementary particle physics. Newly discovered variations/mutations that can be associated with a disease partner, must initially be examined in complex laboratory-based procedures as long as they are not the sought-after missing building block. In many cases, however, these then provide the solution to a correlation between function and disease, which is important for the treatment or even enable the development of treatment options at all. The discovery of new variations/mutations, which are important for the treatment of chronic pulmonary diseases, thus represent, as mentioned above, substantial progress in this research.