The team designing Oak Ridge National Laboratory’s new Summit supercomputer correctly predicted the rise of data-centric computing – but its builders couldn’t forecast how bad weather would disrupt the delivery of key components.
Nevertheless, almost four years after IBM won the contract to build it, Summit is up and running on schedule. Jack Wells, Director of Science for Oak Ridge Leadership Computing Facility (OLCF), expects the 200-petaflop machine to be fully operational by early next year.
“It’s the world’s most powerful and largest supercomputer for science,” he said.
Summit was designed for workloads including nuclear physics, seismology and climate science, which typically start with a model and a set of initial conditions and generate huge volumes of data on their way to a solution.
But its creators also planned for new kinds of computing problem that begin with vast datasets and seek succinct explanations for them. Genomics studies are one example, machine-learning problems another.
“We thought there might be a lot of growth in our user programs in data-intensive applications, … and indeed that happened,” Wells said.
For example, there are now 10 or so deep-learning projects wanting time on Summit, from none a few years ago, he said.
Summit’s architecture – the way its memory is shared between processors and its ability to perform greater volumes of calculations at reduced precision – is particularly suited to such problems. (Summit runs Red Hat Linux as its OS.)
It’s an unusual supercomputer in other respects, too.
If performance benchmarks match predictions, it will lead the Top500 list of the world’s fastest supercomputers with a peak performance of 200 petaflops, or 200 million billion floating-point operations per second.
But by another measure, Summit can perform at over 1.88 exaflops, or 1.88 billion billion operations per second. Instead of the 64-bit, double-precision, floating-point arithmetic commonly used in scientific modelling, these calculations are performed using 16-bit, or half-precision, floating-point arithmetic, Wells said. That’s sufficient for many of the calculations used in deep learning or genomics.
Summit has far fewer computing nodes than the machine it is destined to replace, Titan, which was itself the world’s fastest in November 2012. But where each of Titan’s 18,688 nodes consisted of an AMD Opteron CPU backed with a single Nvidia Kepler GPU, Summit’s 4,600 nodes are each made up of two IBM Power9 CPUs and six Nvidia Tesla V100 GPUs. These are the chips that can handle computing to different levels of precision so efficiently.
Its nodes are packed with memory: 512 GB of of DDR4 RAM for the Power9s, 96 GB of High Bandwidth Memory (HBM2) for the V100s, and 1.6 TB for use as a burst buffer. Furthermore, from the programmer’s point of view, that memory is shared between the CPUs and GPUs and can be treated as a single block, further speeding operation.
The nodes are divided into three categories: login nodes for compiling code and submitting jobs, launch nodes for running batches, and compute nodes where the hard computing work is done. However, the nodes are all physically identical, so there’s no need to cross-compile jobs for different targets.
Linking the nodes is a dual-rail EDR InfiniBand network with a node injection bandwidth of 23 GB/s. The switches are laid out in a three-level non-blocking fat tree topology, which means that any two nodes should be able to communicate at the full bandwidth, no matter what the other nodes are doing.
Around the same time the U.S. Department of Energy commissioned IBM to build Summit at Oak Ridge, it also asked it to build another supercomputer, Sierra, at Lawrence Livermore National Laboratory.
They differ in one important respect: Where Lawrence Livermore uses a traditional raised floor design in its data center, at Oak Ridge facilities such as water and power arrive overhead.
“We had to reorient the system, the cabinet itself, in order to be able to accommodate Oak Ridge,” said Wayne Howell, Vice President for Design and Engineering at IBM Systems.
That also meant that all the infrastructure – the racks, the cooling, the network – had to be installed before the first nodes were delivered.
“If we had tried to build the infrastructure at the same time as we were trying to plug these things in, it would have been a coordination mess,” Howell said.
That, though, meant that the nodes had to be installed at a steady pace over a relatively short period.
“One of the challenges that we experienced is, once you get this train rolling with all of these deliveries coming in, you don’t want interruptions.”
Interruptions such as, say, a tractor-trailer breakdown, or bad weather. (The nodes were set up during the North American winter, between the fourth quarter of 2017 and the first quarter of 2018.)
When a vehicle breakdown left a load of servers stranded somewhere between IBM’s facility in California and the laboratory in Tennessee, the team sent another tractor back to fetch it rather than wait for the next delivery to overtake it. The net result was just a few hours’ delay on a journey thousands of kilometers long, Howell said.
When bad weather stopped the trucks altogether, IBM chartered planes instead.
“As we were delivering them across the U.S., we flew them across rather than driving them across to make up time. But some of these components are extremely large, so we had to charter large-capacity planes to be able to do that,” he said.
The servers brought other problems too: “Think of all the packaging that comes with them. We were quickly overwhelming Oak Ridge’s capacity to handle it,” he said. Instead taking that to the local waste handling facility, IBM shipped it out again in one of the empty trucks that had delivered the servers.
The last equipment was delivered in March 2018, said Wells.
“We’re continuing to shake down the system software,” said Wells. The plan is to complete the acceptance tests later this summer and then, he said, “We’ll be in full user operation by January, 2019.”