Compute clusters at the AEI
General purpose high performance compute clusters
The AEI has operated high performance computing (HPC) clusters for numerical-relativistic simulations since 2003. The first cluster, Peyote, was installed in 2003 and was ranked 395 in the Top 500 list. In 2005 the HPC-cluster Belladonna was installed and superseded in 2007 by Damiana (rank 192 in the Top 500 list of 2007). In 2011 the Datura cluster with its 200 compute nodes with Intel-CPUs was installed. The HPC cluster Minerva was operated from 2016 to 2023. Minerva's primarily usage was numerical-relativity simulations of coalescing black holes and neutron stars in order to calculate the resulting gravitational-wave radiation. Upon installation, the cluster was ranked on place 463 of the top 500 list with 365.0 TFlop/s.
In 2019, Sakura, a HPC cluster located at the MPCDF started its operation. Numerical-relativistic simulations of astrophysical events that generate both gravitational waves and electromagnetic radiation are run on Sakura. Picture: © K. Zilker (MPCDF)
In 2023, HPC cluster Urania was put into operation at the Max Planck Computing and Data Facility in Garching. Urania is used for in-depth studies of binary black holes, and the gravitational waves emitted by them. Picture: © L. Hüdepohl (MPCDF)
In 2019, Sakura, a HPC cluster located at the Max Planck Computing and Data Facility (MPCDF) started its operation. Numerical-relativistic simulations of astrophysical events that generate both gravitational waves and electromagnetic radiation – e.g. mergers of neutron stars – are run on Sakura. The 16,000 CPU core computer cluster is integrated in a fast Omnipath-100 Network and 10Gb Ethernet connections. It consists of head nodes with Intel Xeon Silver 10 core Processors and 192GB to 384GB main memory as well as compute nodes with Intel Xeon Gold 6148 CPUs.
In 2023, HPC cluster Urania was put into operation at the Max Planck Computing and Data Facility in Garching. With 6,048 compute-cores and 22 Terabyte of memory it is just as powerful as its predecessor Minerva but requires only half the electricity to operate. Urania is used for in-depth studies of binary black holes, and the gravitational waves emitted by them. Urania consists of 84 compute nodes, each with 2x 36 core Intel Xeon Platinum 8360Y processors and 256 GB RAM.
High throughput data analysis compute clusters
The Atlas Computing Cluster at the AEI in Hannover is the world's largest and most powerful resource dedicated to gravitational wave searches and data analysis.
Atlas was officially launched in May 2008 with 1344 quad-core compute nodes running at 2.4 GHz CPU clock speed. One month later it was ranked number 58 on the June 2008 Top-500 list of the world’s fastest computers. At that time it was the sixth fastest computer in Germany.
Atlas was also the world’s fastest computer that used Ethernet as the networking interconnect. This is notable since Ethernet is a relatively inexpensive networking technology. The faster machines on the Top-500 list all used costlier interconnects such as Infiniband or proprietary technologies. Worldwide, Atlas came in at the front of the performance/price competition. In recognition of this, Atlas received an InfoWorld 100 award for being one of the 100 best IT solutions for 2008.
Currently, Atlas consists of more than 50,000 physical CPU cores (about 90,000 logical) in 3,000 servers. These servers range from 2,000 older 4-core systems with 16 GB of RAM each, 550 systems with 28 CPU cores and 192 GB of RAM, to the newest 444 with 64 CPU cores and 512 GB of RAM each. In addition, there are about 350 high-performance, specialized graphics cards (GPUs) in parallel with about 2,000 “regular” graphic cards for specialized applications. Atlas' theoretical peak computing performance to more than 2 PFLOP/s.
All these computers are connected via common Gigabit Ethernet with all other compute and storage servers. To connect all compute nodes, a total of 15 kilometers of Ethernet cables have been used. The total bandwidth is about 20 terabit/s.
In addition to the Atlas cluster in Hannover, the AEI operates a computer cluster in Potsdam. After Merlin (2002-2008), Morgane (2007-2014), and Vulcan (2013 – 2019) Hypatia was installed in 2019, providing 9,000 processor cores (in 16-core AMD EPYC CPUs). Hypatia is dedicated to the development of waveform models for current and future gravitational-wave detectors, and for the analysis of gravitational-wave data to infer the properties of black holes and neutron stars, and test General Relativity. In 2022, Hypatia was extended by about 4000 cores and 32 TB RAM. The extension consists of 64 compute nodes with dual-socket AMD "Milan" processors (Epyc 3rd generation, 32 cores per CPU), each node providing 512 GB of memory.
Within the LIGO-Virgo collaboration Hypatia is used for devising and testing new methods, large scale Monte-Carlo simulations, waveform development and study of systematic biases. As a part of the LISA consortium, AEI scientists currently participate in developing the science case for LISA. Within European Pulsar Timing Array collaboration AEI researchers search for gravitational-wave signals from the population of super-massive black hole binaries in the nano-Hertz band.
Like Atlas at AEI Hannover, Hypatia has been designed for High Throughput Computing (HTC), and is best suited for running many, mostly independent, tasks in parallel. It's built from commodity parts, and uses a Gigabit-Ethernet network. The processes running may allocate 34 TB of memory in total, and uses 400 TB of disk storage. As batch scheduler, HTCondor is used. Also, in the larger context of the LIGO Scientific Collaboration (LSC), parts of Hypatia serve as testbeds for software as well as for alternative processor architectures, storage concepts, and operating systems.
Graphic Processing Unit Clusters
In 2021, a GPU cluster Saraswati was installed. Saraswati's primary usage is for the development and application of machine learning methods for parameter inference in current detectors, and the development of GPU-accelerated inference codes for future detectors. It consists of 8 Nvidia A100 GPUs, with 40GB RAM per GPU, 2 Epyc 7713 64 core CPUs and 1 TB of system RAM. It has a dedicated storage system with 12TB disk space. A second GPU machine, Lakshmi, was installed in early 2023, with a similar configuration to Saraswati, but twice as much RAM per GPU and for the system.