The StreamWare Project was launched in 2021 to enable users to develop streaming data-science applications. The StreamWare team acknowledges the generous support of the National Science Foundation, for the awards NSF-2119816, NSF-2118458, and NSF-2118985; StreamWare - A Scalable Framework for Accelerating Streaming Data Science as part of the Principles and Practice of Scalable Systems (PPoSS) program.
In grand-challenge scientific applications, the enormous amount of data produced by the sensing and instrumentation infrastructure often loses its value after a small window of time. Thus, to obtain actionable intelligence from the data, streaming analytics, i.e., the ability to analyze in-motion data, is increasingly becoming critical. Moreover, modern computing systems are highly heterogeneous, consisting of processors, accelerators, and large high-bandwidth external memories. To develop scalable streaming analytics applications, challenges across the full system stack – from application to target platform – need to be addressed. In this regard, this planning project is identifying a comprehensive set of research challenges, goals, key innovations and timelines in algorithms and applications, systems software, hardware-software co-design, and computer architecture. This project is bringing together a community of application developers and users, computer scientists, and data scientists, whose interests lie in building streaming data science applications targeting a wide variety of scalable systems. This project is demonstrating preliminary results on how it will achieve significant cross-stack performance improvements using Privacy Preserving Streaming Graph Learning for Secure Smart Grids as the driving application.
Modern data-science applications are characterized as being highly decentralized, distributed and requiring composition and orchestration between localized analytics on thousands or millions of edge platforms and massive centralized analytics in cloud/data centers, as well as requiring real-time analytics on streaming data. To enable scalable performance of grand-challenge streaming data-science applications, a framework that allows developers to seamlessly build these applications targeting a wide variety of scalable systems is needed. This planning project is conducting preliminary research towards a large proposal for developing an open-source framework, StreamWare, that will enable users to develop streaming data-science applications. This project is establishing a community of application developers and users, computer scientists, and data scientists who would serve as early adopters and developers of the StreamWare framework. In consultation with domain experts, a list of key data-science kernels for StreamWare is being generated, and their existing state-of-the-art algorithms and hardware IPs are being evaluated to identify performance limitations and opportunities for improvement. This project is also articulating the requirements of novel abstractions that can represent and operate on streaming data on heterogeneous platforms. This project uses Privacy Preserving Streaming Graph Learning for Secure Smart Grids as a motivating application to show preliminary evidence of end-to-end scalability using a novel notion of symbiotic scalability that captures the impact of StreamWare’s cross-layer optimizations. The expected outcomes of this planning project include a proposal for the research activities to be carried out in the large grant, publications on the results of the survey activities and future research directions for enabling streaming data science, and curricula for future graduate and undergraduate courses.