The Computing Innovation Fellows Project

Matchmaking Service for Mentors and CIFellows

* Post a Profile!
* Update a Profile

Click for Available Candidate Profiles

Dorian Arnold

University/Research Lab: University of New Mexico
Location: (Albuquerque, NM)
Personal Research Web Page: http://www.cs.unm.edu/~darnold

Keywords: High performance computing, Large scale distribued systems, Autonomous systems, Fault-tolerance, HPC Tools

Posted on: Thursday, June 4th, 2009
Broad Research Area: Information Systems / Information Science, Numerical/Scientific Computing / HPC / Data-Intensive Scalable Computing

Research Interests:

My research interests fall under the broad areas of high performance computing and large scale distributed systems. In particular, I am interested in abstractions, mechanisms and tools that allow system non-experts to harness the power of high-performance systems in scalable, efficient, reliable ways.

I collaborate with researchers at the The University of Wisconsin and the Lawrence Livermore National Laboratory, and collaborations with the Sandia National Laboratory and Los Alamos National Laboratory and are in the formative stages. These collaborations have placed us in an exclusive and privileged position to work with world class scientists on the largest systems in the world.

Autonomous Systems:
Currently, we are studying autonomous (aka self-adaptive, aka self-managing) overlay networks that support in-network data analyses and aggregation. Such networks use non-intrusive mechanisms for monitoring health, performance and offered loads, and use online performance modeling to reconfigure and optimize overlay topologies dynamically. We are also studying the broader applicability of the tree-based computational model (described below) for scientific applications, information analytics, data mining and enterprise computing.

Our other interests include alternatives to contemporary fault-tolerance mechanisms and new programming models and paradigms for future large scale systems.

Tree-based Overlay Networks:
At the University of Wisconsin with Bart Miller, I studied the use of hierarchical or tree-based overlay networks (TBONs) for efficient, reliable data communication and analyses for scalable tools and applications. A major outcome of this work is a scalable method for using the inherent data redundancies of certain (broad) classes of data aggregation computations to make them robust to node and process failures while avoiding the non-scalable overhead of explicit state replication (e.g. checkpoints). MRNet, the multicast/reduction network, is the TBON prototype we developed and continue to use to evaluate most of our TBON-related research.

Scalable Application Debugging:
STAT, the stack trace analysis tool is being developed as a collaboration of researchers from the Lawrence Livermore National Laboratory, The University of Wisconsin, and The University of New Mexico. STAT was developed to explore lightweight debugging techniques for extremely large (thousands and millions of processes) applications. STAT identifies processes equivalence classes, groups of processes exhibiting similar behavior, so that single class representatives can then be examined in depth with full-featured (less scalable) tools like TotalView or DDT. The initial version of STAT used only stack traces to determine process equivalence classes; our current work explores lightweight program analyses to classify application processes based on various notions of progress.

 

Contact Information:

Email: email obfuscated - click to reveal
Phone: (505) 277-1546

twitter-icon

Browse Mentor Posts in other Research Areas