Computer clusters at the AEI

The Max Planck Institute for Gravitational Physics runs huge computing facilities in Potsdam-Golm and Hannover. The Hannover facilities provide computing power for analysis tasks, in which gravitational-wave detector data are searched for astrophysical signals. The facilities in Potsdam-Golm are used for complex numerical simulations as well as for gravitational wave data analysis.

General purpose high performance computer clusters

The AEI runs high performance computing (HPC) clusters since 2003. The first cluster, Peyote, was installed in 2003 and was ranked 395 in the Top 500 list. In 2005 the HPC-cluster Belladonna was installed and superseded in 2007 by Damiana (rank 192 in the Top 500 list of 2007). In 2011 the Datura cluster with its 200 compute nodes with Intel-CPUs was installed.

High-performance computer cluster <em>Minerva</em> at the AEI in Potsdam-Golm.
High-performance computer cluster Minerva at the AEI in Potsdam-Golm.

The most recent HPC-cluster is Minerva. Minerva was installed in 2016. It consists of 594 compute nodes (dual-socket, 8-core Intel Haswell E5-2630v3 (2.40 GHz)) with a total of 9504 CPU cores. Each core is equipped with 4GB RAM. There exist two storage systems with about 500TB disk space running BeeGFS. Currently (June 2016) Minerva is ranked on place 463 of the top 500 list with 365.0 TFlop/s Rpeak. Minerva's primarily usage is the simulations of black holes and neutron stars in order to calculate the resulting gravitational radiation.

 

High throughput data analysis computer clusters

The Atlas Computing Cluster is the world's largest and most powerful resource dedicated to gravitational wave searches and data analysis.

The "high-throughput" data analysis computer cluster <em>Atlas</em> at the AEI in Hannover. Zoom Image
The "high-throughput" data analysis computer cluster Atlas at the AEI in Hannover.

Atlas was officially launched in May 2008 with 1344 quad-core compute nodes running at 2.4 GHz CPU clock speed. One month later it was ranked number 58 on the June 2008 Top-500 list of the worlds fastest computers. At that time it was the sixth fastest computer in Germany.

Atlas was also the worlds fastest computer that used Ethernet as the networking interconnect! This is notable since Ethernet is a relatively inexpensive networking technology. The faster machines on the Top-500 list all used costlier interconnects such as Infiniband or proprietary technologies. Worldwide, Atlas came in at the front of the performance/price competition. In recognition of this, Atlas received an InfoWorld 100 award for being one of the 100 best IT solutions for 2008.

Currently, Atlas consists of more than 3300 compute nodes, each with at least four CPU cores, and 850 GPUs. Atlas can store 5 Petabytes on hard drives and 4.5 Petabytes on magnetic tape for data archiving. Atlas' extrapolated computing power is of order 400 TeraFLOP/s. To connect all compute nodes, a total of 15 kilometers of Ethernet cables have been used. The total bandwidth is 30 Terabit/s.

Computer cluster Vulcan at the AEI in Potsdam-Golm. Zoom Image
Computer cluster Vulcan at the AEI in Potsdam-Golm.

In addition to the above, there is another computer pool available to scientists at AEI Potsdam-Golm and their collaborators worldwide.  After Merlin (2002-2008) and Morgane (2007-2014), Vulcan now provides about 2000 processor cores (in 4-core Intel Haswell CPUs).

Vulcan is a general purpose gravitational wave data analysis cluster. Within LIGO-VIRGO collaboration this cluster is used for devising and testing new methods, large scale Monte-Carlo simulations, waveform development and study of systematic biases. As a part of LISA consortium we currently participate in the mission design study. We vary configuration of the mission and investigate its impact on the scientific outcome. Within European Pulsar Timing Array collaboration we search  for gravitational wave signal from the population of  super-massive black hole binaries in the nano-Hertz band. 

Like Atlas at AEI Hannover, Vulcan has been designed for High Throughput Computing (HTC), and is best suited for running many, mostly independent, tasks in parallel. It's built from inexpensive parts, and uses a Gigabit-Ethernet network. The processes running may allocate 8 TB of memory in total, and use 50 TB of disk storage. As batch scheduler, HTCondor is used. Also, in the larger context of the LIGO Scientific Collaboration (LSC), it serves as a testbed for software and OS upgrades.

 
loading content