CCR has been at the forefront of the development of open source
tools for use by HPC centers to provide quantitative and
qualitative metrics relevant to HPC, including resource
utilization, resource performance, and impact on scholarship and
research. These tools are useful to ensure the optimal
operation of such centers and their resources as well as
demonstrate the utility, service, competitive advantage, and return
on investment that these centers provide.
Funded by the National Science Foundation, XDMoD
is a comprehensive auditing framework for use by high performance
computing centers, which provides metrics regarding resource
utilization, resource performance, application performance, quality
of service, and impact on scholarship and research. In
addition to the XSEDE version of XDMoD, an open source
XDMoD), targeted at academic and industrial HPC centers,
has also been developed and is available for download at http://xdmod.sourceforge.net/.
The XDMoD and Open XDMoD frameworks include a computationally lightweight application kernel auditing system that utilizes performance kernels to measure overall system performance. This allows continuous resource auditing to measure all aspects of system performance including file-system performance, processor and memory performance, and network latency and bandwidth. The frameworks also provide job level performance data for every job running on the cluster (without the need to recompile the application codes) and therefore provide system personnel with the ability to identify poorly performing codes and subsequently tune them for optimal performance.