Published May 14, 2015
Walt Disney World and supercomputing facilities have something in common: waiting lines.
Like tourists queuing up to ride Space Mountain and other attractions, researchers who rely on high performance computers to tackle complex scientific problems also must wait their turn because the demand to use these powerful machines outpaces their availability.
One way in which the National Science Foundation (NSF) is addressing the issue is to improve the efficiency of supercomputers.
In 2010, NSF awarded UB’s Center for Computational Research (CCR) a five-year, $7.7 million grant to create a comprehensive supercomputer management tool. UB researchers responded by developing XD Metrics on Demand (XDMoD), a tool that monitors the performance of NSF’s supercomputers and the software programs that run on them.
The tool, and an open source version, is now utilized worldwide by industry, academia and government supercomputing facilities to improve performance and limit backlogged requests, which can stretch to months. It also helps NSF and the user community to understand trends, such as the increase of computing power in the past 10 years and the geographic diversity of scientific communities engaged in high performance computing.
Based on that success, NSF has awarded UB a new five-year, $9 million grant — “The XD Metrics Service (XMS) for High Performance Computing (HPC)” — to improve the tool.
“The XMS team, led by the University at Buffalo Center for Computational Research, is to be commended for the excellent and important work they are doing to help ensure that the NSF portfolio of advanced cyberinfrastucture is operating as effectively and efficiently as possible to meet the advanced computing and data analytic needs of the U.S. research community,” said Rudolf Eigenmann, NSF program officer overseeing the XMS award.
“The XMS award is a recognition of the impact the XDMoD tool has had on the high performance computing community and I look forward to working with them over the next five years to further improve XDMoD’s capability and reach,” he added,
Specifically, XDMoD allows computing personnel to maximize supercomputer performance by automatically identifying failed or poorly performing hardware and software. Analysts can utilize XDMoD’s data and analytic capability to display historical usage trends, guide system upgrades and provide metrics to help quantify scientific impact and return on investment. Software developers and other users can use it to improve code performance to maximize their productivity.
“The XDMoD project has greatly raised the visibility of the university and CCR within the national and international supercomputing community” said Thomas Furlani, CCR director and principal investigator on both awards. “The impact extends far beyond the NSF portfolio of supercomputing centers and includes national and international academic and industrial high performance computing centers. XDMoD routinely supports operations in the global HPC community.
“This award,” Furlani noted, “is really a reflection on the highly skilled and capable CCR staff who are responsible for the success of the XMS project and the wide acceptance of XDMoD within the supercomputing community.”
Co-investigators on the new grant are Matthew Jones, associate director and lead computational scientist, and Steven Gallo, lead software engineer and database administrator, both at CCR; Abani Patra, professor in UB’s Department of Mechanical and Aerospace Engineering; and Gregor von Laszewski, assistant director of cloud computing at Indiana University’s Pervasive Technology Institute.
Other CCR personnel working on the grant include Robert DeLeon, Nikolay Simakov, Joseph White, Jeffrey Palmer, Thomas Yearke, Ryan Rathsam, Jeanette Sperhac, Martins Innus and Cynthia Cornelius.