CCR has been at the forefront of the development of open source tools for use by HPC centers to provide quantitative and qualitative metrics relevant to HPC, including resource utilization, resource performance, and impact on scholarship and research. These tools are useful to ensure the optimal operation of such centers and their resources as well as demonstrate the utility, service, competitive advantage, and return on investment that these centers provide.
Funded by the National Science Foundation, XDMoD (https://xdmod.ccr.buffalo.edu) is a comprehensive auditing framework for use by high performance computing centers, which provides metrics regarding resource utilization, resource performance, application performance, quality of service, and impact on scholarship and research. In addition to the XSEDE version of XDMoD, an open source version (Open XDMoD), targeted at academic and industrial HPC centers, has also been developed and is available for download at http://xdmod.sourceforge.net/.
XDMoD and Open XDMoD include a computationally lightweight application kernel system to measure overall system performance (quality of service). This allows continuous resource monitoring to measure all aspects of system performance including file-system performance, processor and memory performance, and network latency and bandwidth and can proactively identify underperforming hardware and software. XDMoD and Open XDMoD also provide system support personnel with job level performance data (without the need to recompile the application codes) and therefore provide system personnel with the ability to identify poorly performing codes and subsequently tune them for optimal performance.