LRR-TUM-Logo Department of Informatics
Technische Universität München
Informatik X: Rechnertechnik und Rechnerorganisation / Parallelrechnerarchitektur
Prof. Dr. Arndt Bode , Prof. Dr. Hans Michael Gerndt
abakus50x50.gif
 Home  | Addresses  | Staff  | Research  | Lectures 
Search 

SMiLE: Tools


Data locality is one of the most important performance issues on NUMA systems since remote memory accesses are still an order of magnitude slower than local memory accesses, despite the good latency offered by current interconnection technologies. In order to reduce remote memory accesses, a comprehensive monitoring infrastructure has been developed, which comprises a low-level data acquisition system based on the SMiLE hardware monitor and responsible for gathering performance data, a tool environment for improving data locality, and a standardized interface for making the acquired information accessible to the tool environment.

Within the tool environment, the focus currently lies in a sophisticated visualization tool called DLV and an adaptive runtime system called ARS. Since the monitoring information, despite the preprocessing of the low-level software, is still based on individual memory accesses observed from the SCI network adapter, it is necessary to use a visualization tool to enable a high level abstraction and a projection of the monitoring data to data structures within the source code. DLV aids the user in analyzing an application's run-time data layout and optimizing programs with respect to data locality. On the other hands, ARS modified the data location during the execution using page migration and improves the data locality and thereby performance on-the-fly.

DLV provides a set of display windows showing the memory access histograms in various views and projecting the memory addresses back to data structures within the source code. This allows the programmer to analyze an application's access pattern and thereby forms the basis for any optimization of the physical data layout and distribution. The following figures demonstrate a few sample display windows available from DLV.

Click to enlarge.

The "Run-time transfer" window is designed to illustrate a global overview of the actual data transfer performed on the interconnection fabric and to visualize the number of network packets between all nodes in the system. By highlighting excessive transfers communication bottlenecks can be easily detected. The "Block character" window presents the memory accesses over the application's whole shared virtual space. Corresponding to each virtual page are the number of the most accesses and the color-specified node of the corresponding access source. This view directs the user to divide data into blocks and to place them on the appropriate node according to the observed block characteristics. Detailed accesses related to single pages can be seen in the "Access diagram" which presents the relative frequency of accesses using colored columns. Inappropriate data allocation can be easily detected via the different heights of the columns. This is necessary for correctly allocate individual pages without block characteristics.

Using DLV in combination with SIMT, a few source codes have been optimized regarding the data placement. Memory access distribution of a sample program before the optimization (top) and after the optimization (bottom) can be seen in the following figure. It can be observed that the requirement to access remote memories are heavily reduced after the optimization.

DLV aims at a correct initial data placement through the optimization of source codes. As the user can directly use the information about a program's run-time memory access behavior as well as the application specific semantics, significant performance improvement can be gained. However, there exist applications whose access pattern are dynamically changed during execution. It is complicate or even impossible for users to appropriately optimize these applications even though the visualization tool can show a program's memory access behavior per-phase. It is hence necessary to exploit an adaptive runtime system for dynamic data migration at run-time.

Such an adaptive runtime system called ARS has been designed and currently being implemented. It analyze the monitoring information, finds the communication hot spots, determines the correct location of shared data, and performs page migration. As the SMiLE hardware is currently still under development, ARS is currently built on top of SIMT. In order to show the run-time page movement, a graphical user interface has been combined to the data layout visualizer.


Back to SMiLE homepage

Questions, suggestions, ideas, criticism ?
Don't hesitate to send us an eMail!

(none) Webmaster