Fastest computer cluster in Berlin-Brandenburg

Science Minister Sabine Kunst unveils new Datura high performance computing cluster at the Albert Einstein Institute

April 05, 2011

2400 processors, 200 servers, 4.8 TeraByte disc space and a maximum computing performance of 25.5 TeraFlops – which equals 25,500 billion calculating operations per second: these are some of the qualities of the Datura high performance computer, which will now help scientists at the Max Planck Institute for Gravitational Physics (Albert Einstein Institute/AEI) to calculate the collisions of black holes and neutron stars.

Further information

Technical data

200 Compute nodes each with
2 Intel XEON X5650 Westmere processors, each 2.66 GHz 24 GB RAM memory
300 GB storage capacity
3 network connections (2 x Gigabit, 1 x Infiniband)
IPMI 2.0 card

6 Storage nodes each with 
2 Intel XEON F5520 Nehalem processors, each 2.27 GHz
8 GB RAM memory
2 x 300 GB internal discs RAID controller with a connection of 30 TB net storage capacity (gross: 2 x 12 SAS HDDs each 2 TB)
3 network connections  (2 x Gigabit, 1 x Infiniband)
IPMI 2.0 card
Redundant power supply units

1 Head node (also called registration, access and management nodes) each with 2 Intel XEON X5650 Westmere processors, 2.66 GHz each 
24 GB RAM memory
300 GB storage capacity

3 network connections (2 x Gigabit, 1 x Infiniband)
IPMI 2.0 card
Redundant energy supply units

These components are housed in eight air-cooled 19” racks. The CentOS 5.5 operating system has been installed on all computers.

The following system-affiliated software is used:
Compiler: Gnu C++, Intel C++, Intel Fortran
Libraries: BLAS, LAPACK, Intel MKL, Intel MPI, OpenMPI, MVAPICH
Programming tool: Intel Cluster Tool Kit
Batch system: SunGrid Engine
Monitoring: Nagios, Ganglia (Open Source)
Management software: Perceus (Open Source)

Details

Each compute node has three network interfaces for three specific networks. The most important is the interprocess and storage network that the compute nodes interconnect with 80Gbit/s bidirectional via a high performance Infiniband switch.  Here, a SilverStorm switch from the company Voltaire is used. It has a backplane (circuit card) capacity of 51.8 Tbits/sec.

The second network serves to ensure that all components of the cluster can be fully operated. Here, five switches from the company Netgear (5 x GSM7352 Sv2) are used. To keep the cable lengths as short as possible, cascading was chosen.

There is an additional network that is not directly in operation but serves to support the system administrator with the early detection of hardware problems. Via IPMI cards, which are installed in all nodes, sensor values like, e.g., CPU temperature and fan speed, can be checked. Should the preset thresholds be crossed, the system automatically sends a message, depending upon the urgency, via e-mail or SMS to the system administrator. The system administrator can then take the necessary precautionary measures to avoid the disturbances.

Intelligent power distributed units (PDU) are yet another tool for the system administrator of the cluster.

Cooling of the cluster

The AEI has at its disposal, since the end of 2006, a large computing room for the high performance computers of the Numerical Relativity Theory Group. The room is ca.127 qm with a height of 3.40 m and an air volume of ca. 433 m3. The room is supplied with cold air through the double bottom. When setting up the cluster, the principle of warm/cold aisles was followed. Cold air is blown from the double bottom through perforated plates into the cold aisle to the racks and flows from front to back through the devices, thereby reaching the warm area in a heated state. Here, the warm air is sucked in from the cold chambers and again cooled, blown again under the double bottom and passed on in the cold aisle.

For the cluster of the Numerical Relativity Theory, a total of 380kW refrigerating capacity is currently available; of this amount, Datura requires ca. 90 kW. The rest is needed for the other clusters Damiana and Peyote, as well as for other systems.

Power supply

The power for the DATURA cluster is supplied via PDUs, which are connected on 32 A-cables. In the event of a power failure, a central uninterruptible power supply ensures, for max. 15 minutes, a continuous power supply to the storage and head nodes. Special software guarantees that these computers are automatically powered down and switched off as soon as a specified level to the availability of USV capacity is reached or if the room temperature exceeds a certain value.

Other Interesting Articles

Go to Editor View