CCR has been at the forefront of the development of open source
tools for use by HPC centers to provide quantitative and
qualitative metrics relevant to HPC, including resource
utilization, resource performance, and impact on scholarship and
research. These tools are useful to ensure the optimal
operation of such centers and their resources as well as
demonstrate the utility, service, competitive advantage, and return
on investment that these centers provide.
Funded by the National Science Foundation, XDMoD
is a comprehensive auditing framework for use by high performance
computing centers, which provides metrics regarding resource
utilization, resource performance, application performance, quality
of service, and impact on scholarship and research. In
addition to the XSEDE version of XDMoD, an open source
XDMoD), targeted at academic and industrial HPC centers,
has also been developed and is available for download at http://xdmod.sourceforge.net/.
XDMoD and Open XDMoD include a computationally lightweight application kernel system to measure overall system performance (quality of service). This allows continuous resource monitoring to measure all aspects of system performance including file-system performance, processor and memory performance, and network latency and bandwidth and can proactively identify underperforming hardware and software. XDMoD and Open XDMoD also provide system support personnel with job level performance data (without the need to recompile the application codes) and therefore provide system personnel with the ability to identify poorly performing codes and subsequently tune them for optimal performance.