Computer makers are unveiling a total of 50 servers with Nvidia’s A100 graphics processing units (GPUs) to power AI, data science, and scientific computing applications.
Unveiled in May, the A100 GPU has 54 billion transistors (the on-off switches that are the building blocks of all things electronic) and can execute five petaflops of performance, or about 20 times more than the previous-generation chip Volta. That means that $20 million worth of central processing unit (CPU) servers taking up 22 racks can be replaced by new servers that cost $3 million and take up just four GPU-based server racks, said Paresh Kharya, director of product marketing for accelerated computing at Nvidia, in a press briefing.
The systems are coming from computer makers including Asus, Atos, Cisco, Dell, Fujitsu, Gigabyte, Hewlett Packard Enterprise, Inspur, Lenovo, One Stop Systems, Quanta/QCT, and Supermicro. Availability of the servers varies, with 30 systems expected this summer, and over 20 more by the end of the year, Kharya said.
The first GPU based on the Nvidia Ampere architecture, the A100 is the company’s largest leap in GPU performance to date with features such as the ability for one GPU to be partitioned into seven separate GPUs as needed, Nvidia said.
Integrating Mellanox
Nvidia made the announcement ahead of the ISC High Performance, an online event which is dedicated to high-performance computing. The new machines also include new InfiniBand interconnect technology from Mellanox, which Nvidia paid $7 billion to acquire in 2019.
Nvidia has integrated Mellanox technology with the A100 to create Selene, which Nvidia bills as a top 10 supercomputer and the world’s most energy-efficient computer. Selene was designed in less than a month and it provides over one exaflop of AI processing. Kharya said that supercomputers like Selene will help Nvidia penetrate further into the world’s top supercomputers.
Last year, Nvidia’s graphics processing units (GPUs) were part of 125 of the top 500 supercomputers in the world, according to ISC. If you count the supercomputers with Mellanox InfiniBand technology, the number is more than 300. The list is expected to grow even larger in 2020.
“If you look at the top 500 list, the reason why Nvidia is so successful in supercomputing is because scientific computing has changed,” said Kharya said. “We’ve entered a new era, one that has expanded beyond traditional modeling and simulation workloads to include AI, data analytics, edge screening, and big data visualization.”
Kharya said that Mellanox interconnect chips power the world’s leading weather forecast supercomputers. Weather and climate models are both compute and data intensive. Forecast quality depends on the model complexity and high resolution. And supercomputer performance depends on interconnect technology to move data quickly across different computers.
“It’s exciting to have the best compute on one side and the best network on the other, and now we can start to combine those technologies together and start building amazing things,” said Gilad Shainer, senior vice president at Nvidia, in a press briefing.
Customers using Mellanox include the Spanish Meteorological Agency, the China Meteorological Administration, the Finnish Meteorological
Institute, NASA, and the Royal Netherlands Meteorological Institute.
The Beijing Meteorological Service has selected 200 Gigabit HDR InfiniBand interconnect technology to accelerate its new supercomputing platform, which will be used for enhancing weather forecasting, improving climate and environmental research, and serving the weather forecasting information needs of the 2022 Winter Olympics in Beijing.
Nvidia said it has been able to run the RAPIDS suite of open-source data science software in just 14.5 minutes, breaking the previous record of performance by 19.5 times thanks to its new Nvidia DGX A100 systems, which use the new Nvidia A100 artificial intelligence GPU chip. A rival central processing unit (CPU) system does the same task in 4.7 hours. The 16 Nvidia DGX A100 systems used in the benchmark test had a total of 128 Nvidia A100 GPUs with Mellanox interconnects.
Nvidia also unveiled the Nvidia Mellanox UFM Cyber-AI platform, which minimizes downtime in InfiniBand data centers by harnessing AI-powered analytics to detect security threats and operational issues.
This extension of the UFM platform product portfolio — which has managed InfiniBand systems for nearly a decade — applies AI to learn a data center’s operational cadence and network workload patterns. It draws on both real-time and historic telemetry and workload data. Against this baseline, it tracks the system’s health and network modifications, and detects performance problems.
The new platform provides alerts of abnormal system and application behavior, and potential system failures and threats, as well as performs corrective actions. It delivers security alerts in cases of attempted system hacking, such as cryptocurrency mining. The result is reduced data center downtime — which typically costs more than $300,000 an hour, according to research by the ITIC 2020 report.
Fighting the coronavirus
Kharya said that Nvdia’s scientific computing platform has been enlisted in the fight against COVID-19. In genomics, Oxford Nanopore Technologies was able to sequence the virus genome in just 7 hours using Nvidia GPUs.
In infection analysis and prediction, the Nvidia RAPIDS team has helped create the GPU-accelerated Plotly’s Dash, a data visualization tool, enabling clearer insights into real-time infection rate analysis. Nvidia’s tools can be used to predict the availability of hospital resources across the U.S. In structural biology, the U.S. National Institutes of Health and the University of Texas, Austin, are using GPU-accelerated software CryoSPARC to reconstruct the first 3D structure of the virus protein using cryogenic electron microscopy.
In treatment, Nvidia worked with the National Institutes of Health and built an AI to accurately classify COVID-19 infection based on lung scans so efficient treatment plans can be devised. In drug discovery, Oak Ridge National Laboratory ran the Scripps Research Institute’s AutoDock on the GPU accelerated Summit Supercomputer to screen a billion potential drug combinations in just 12 hours.
In robotics, startup Kiwi is building robots to deliver medical supplies autonomously. And in edge detection, Whiteboard Coordinator built an AI system to automatically measure and screen elevated body temperatures, screening well over 2,000 healthcare workers per hour. Nvidia accelerates more than 700 high-performance computing applications.