News & Events

  • Data Logistics Toolkit software to help scientists manage, share, and manipulate Big Data

    Date: 07/01/2014

    School of Informatics and Computing Associate Professor Martin Swany is leading a multi-institutional team that has launched the Data Logistics Toolkit (DLT), a software package that incorporates and extends existing research software. Supported by a $991,351 grant from the National Science Foundation, the DLT will help scientists—even researchers and students at schools with more limited network connections— store and share massive amounts of data more effectively and efficiently.

    “We define ‘data logistics’ as the problem of managing the time sensitive positioning of data relative to its intended users and the resources they need to utilize.” said Swany who is also associate director of the Center for Research in Extreme Scale Technologies. “Distributed research teams too often find themselves frustrated when the data they want is not where it needs to be, when it needs to be there for work to proceed efficiently."

    The Data Logistics Toolkit 1.0 software release will help address the problem of “campus bridging” by integrating an innovative form of distributed storage with emerging high-performance networks. “The DLT is exciting because it will allow storage and networking technologies to work together in new and innovative ways to increase the control people have over the provisioning and delivery of high-volume data services,” said Swany.

    These improvements will also provide a powerful platform for educators and scientists who want to build open content distribution networks that can deliver high-definition multimedia content to locations that would otherwise be unable to receive it.

    Under the NSF’s Campus Cyberinfrastructure - Network Infrastructure and Engineering Program (CC-NIE), Swany is the principal investigator of the two-year award to add the DLT to the nation’s basic software cyberinfrastructure. Working with colleagues Micah Beck of the University of Tennessee Knoxville and Paul Sheldon of Vanderbilt University, the team has created a package of easy-to-install, production-quality software that makes it easy to set up new storage nodes—called “depots”—on campuses and at other well-connected sites, and then use these depots for high performance sharing of large scale data.

    To help improve the system’s performance, the DLT also includes perfSONAR, a collection of software packages used to monitor, measure, and characterize networks that is widely deployed in Research and Education Networks (R&E) worldwide. The perfSONAR implementation monitors network and end systems for healthy operation, but also provides the traffic-sensitive network “roadmap” necessary to optimize the logistics of data distribution.

    The DLT is already in use. Sheldon uses DLT technology at Vanderbilt to manage and share nearly a petabyte of data from the Large Hadron Collider experiments. More recently, academic organizations that work with satellite data, such as the 39-member AmericaView consortium, are beginning to use the DLT to distribute large collections of geospatial data files.

    The DLT group and AmericaView, in collaboration with the United States Geological Survey (USGS), have teamed to develop the Earth Observation Depot Network (EODN) to accelerate remote sensing workflows, with a particular emphasis on distributing Landsat satellite data, even to users at smaller sites whose connectivity would preclude ready access to it.

    “With DLT we now have the tools to put into place a user-owned and operated, production-level distributed content management system for remote sensing data. Using DLT could revolutionize the way remote sensing data is managed and accessed around the world,” said PR Blackwell, director of Columbia Regional Geospatial Services Center at Stephen F. Austin State University and founding member of AmericaView, Inc.

    Visit the Data Logistics Toolkit website to learn more and to download the software.