Compute clusters at the AEI
General purpose high performance compute clusters
The AEI has operated high performance computing (HPC) clusters since 2003. The first cluster, Peyote, was installed in 2003 and was ranked 395 in the Top 500 list. In 2005 the HPC-cluster Belladonna was installed and superseded in 2007 by Damiana (rank 192 in the Top 500 list of 2007). In 2011 the Datura cluster with its 200 compute nodes with Intel-CPUs was installed.
In 2016, HPC cluster Minerva was installed. Minerva's primarily usage is numerical-relativity simulations of coalescing black holes and neutron stars.
pic: AEI / Armin Okulla
In 2019, Sakura, a HPC cluster located at the MPCDF started its operation. Numerical-relativistic simulations of astrophysical events that generate both gravitational waves and electromagnetic radiation are run on Sakura. pic: © K. Zilker (MPCDF)
In 2016, HPC cluster Minerva was installed. Minerva's primarily usage is numerical-relativity simulations of coalescing black holes and neutron stars in order to calculate the resulting gravitational-wave radiation. The cluster was ranked on place 463 of the top 500 list with 365.0 TFlop/s. It consists of 594 compute nodes (dual-socket, 8-core Intel Haswell E5-2630v3, 2.40 GHz) with a total of 9504 CPU cores. Each core is equipped with 4GB RAM. The cluster has two storage systems with about 500TB disk space running BeeGFS.
In 2019, Sakura, a HPC cluster located at the Max Planck Computing and Data Facility (MPCDF) started its operation. Numerical-relativistic simulations of astrophysical events that generate both gravitational waves and electromagnetic radiation – e.g. mergers of neutron stars – are run on Sakura. The 11,600 CPU core computer cluster is integrated in a fast Omnipath-100 Network and 10Gb Ethernet connections. It consists of head nodes with Intel Xeon Silver 10 core Processors and 192GB to 384GB main memory as well as compute nodes with Intel Xeon Gold 6148 CPUs.
High throughput data analysis compute clusters
The Atlas Computing Cluster at the AEI in Hannover is the world's largest and most powerful resource dedicated to gravitational wave searches and data analysis.
Atlas was officially launched in May 2008 with 1344 quad-core compute nodes running at 2.4 GHz CPU clock speed. One month later it was ranked number 58 on the June 2008 Top-500 list of the world’s fastest computers. At that time it was the sixth fastest computer in Germany.
Atlas was also the world’s fastest computer that used Ethernet as the networking interconnect. This is notable since Ethernet is a relatively inexpensive networking technology. The faster machines on the Top-500 list all used costlier interconnects such as Infiniband or proprietary technologies. Worldwide, Atlas came in at the front of the performance/price competition. In recognition of this, Atlas received an InfoWorld 100 award for being one of the 100 best IT solutions for 2008.
Currently, Atlas consists of more than 50,000 physical CPU cores (about 90,000 logical) in 3,000 servers. These servers range from 2,000 older 4-core systems with 16 GB of RAM each, 550 systems with 28 CPU cores and 192 GB of RAM, to the newest 444 with 64 CPU cores and 512 GB of RAM each. In addition, there are about 350 high-performance, specialized graphics cards (GPUs) in parallel with about 2,000 “regular” graphic cards for specialized applications. Atlas' theoretical peak computing performance to more than 2 PFLOP/s.
All these computers are connected via common Gigabit Ethernet with all other compute and storage servers. To connect all compute nodes, a total of 15 kilometers of Ethernet cables have been used. The total bandwidth is about 20 terabit/s.
In addition to the Atlas cluster in Hannover, the AEI operates a computer cluster in Potsdam. After Merlin (2002-2008), Morgane (2007-2014), and Vulcan (2013 – 2019) Hypatia was installed in 2019, providing 9,000 processor cores (in 16-core AMD EPYC CPUs). Hypatia is dedicated to the development of waveform models for current and future gravitational-wave detectors, and for the analysis of gravitational-wave data to infer the properties of black holes and neutron stars, and test General Relativity. In 2022, Hypatia was extended by about 4000 cores and 32 TB RAM. The extension consists of 64 compute nodes with dual-socket AMD "Milan" processors (Epyc 3rd generation, 32 cores per CPU), each node providing 512 GB of memory.
Within the LIGO-Virgo collaboration Hypatia is used for devising and testing new methods, large scale Monte-Carlo simulations, waveform development and study of systematic biases. As a part of the LISA consortium, AEI scientists currently participate in developing the science case for LISA. Within European Pulsar Timing Array collaboration AEI researchers search for gravitational-wave signals from the population of super-massive black hole binaries in the nano-Hertz band.
Like Atlas at AEI Hannover, Hypatia has been designed for High Throughput Computing (HTC), and is best suited for running many, mostly independent, tasks in parallel. It's built from commodity parts, and uses a Gigabit-Ethernet network. The processes running may allocate 34 TB of memory in total, and uses 400 TB of disk storage. As batch scheduler, HTCondor is used. Also, in the larger context of the LIGO Scientific Collaboration (LSC), parts of Hypatia serve as testbeds for software as well as for alternative processor architectures, storage concepts, and operating systems.
Graphic Processing Unit Clusters
In 2021, a GPU cluster Saraswati was installed. Saraswati's primary usage is for the development and application of machine learning methods for parameter inference in current detectors, and the development of GPU-accelerated inference codes for future detectors. It consists of 8 Nvidia A100 GPUs, with 40GB RAM per GPU, 2 Epyc 7713 64 core CPUs and 1 TB of system RAM. It has a dedicated storage system with 12TB disk space. A second GPU machine, Lakshmi, will be installed in early 2023, with a similar configuration to Saraswati, but twice as much RAM per GPU and for the system.