Design and implementation of a real-time interactive analytics system for large spatio-temporal data

Shiming Zhang, Yin Yang, Wei Fan, Marianne Winslett

Research output: Contribution to journalConference articlepeer-review

7 Citations (Scopus)

Abstract

In real-time interactive data analytics, the user expects to receive the results of each query within a short time period such as seconds. This is especially challenging when the data is big (e.g., on the scale of petabytes), and the analytics system runs on top of cloud infrastructure (e.g., thousands of interconnected commodity servers). We have been building such a system, called OceanRT, for managing large spatio-temporal data such as call logs and mobile web browsing records collected by a telecommunication company. Although there already exist systems for querying big data in real time, OceanRT's performance stands out due to several novel designs and components that address key efficiency and scalability issues that were largely overlooked in existing systems. First, OceanRT makes extensive use of software RDMA one-sided operations, which reduce networking costs without requiring specialized hardware. Second, OceanRT exploits the parallel computing capabilities of each node in the cloud through a novel architecture consisting of Access-Query Engines (AQEs) connected with minimal overhead. Third, OceanRT contains a novel storage scheme that optimizes for queries with joins and multi-dimensional selections, which are common for large spatiotemporal data. Experiments using the TPC-DS benchmark show that OceanRT is usually more than an order of magnitude faster than the current state-of-the-art systems.

Original languageEnglish
Pages (from-to)1754-1759
Number of pages6
JournalProceedings of the VLDB Endowment
Volume7
Issue number13
DOIs
Publication statusPublished - 2014
Externally publishedYes
EventProceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China
Duration: 1 Sept 20145 Sept 2014

Fingerprint

Dive into the research topics of 'Design and implementation of a real-time interactive analytics system for large spatio-temporal data'. Together they form a unique fingerprint.

Cite this