On Compressing Massive Streaming Graphs with Quadtrees

Abstract—Online social networks  constantly change as new ‘members’ join, existing ‘members’ leave, and ‘followers’ or ‘friends’ are formed and disappear. Dynamic properties of such graphs can be represented by the streaming graph model.  Given a massive graph data stream wherein the number of nodes is in the order of millions and the number of edges is in the tens of millions, we develop an efficient algorithm to compress this graph without having to read in the entire graph into the main memory. Our algorithm uses the quad-tree data structure that is implicitly constructed to produce the compressed graph output.  As a result of this implicit construction, our algorithm allows for node and edge additions/deletions directly into the output compressed graph. Furthermore, we propose new algorithms to solve various edge and node queries that directly operate on the compressed graph. We have performed extensive empirical evaluations of our algorithms using publicly available large social networks such as Facebook, LiveJournal, Twitter, and others. Our empirical evaluation is based on several parameters including time to perform compression, memory required by the compression algorithm, size of compressed graph, and time and memory required to execute queries. We have also presented extensions to the compression algorithm that we have developed. Our primary contributions are a) the novelty of using quad-trees for massive
streaming graph compression on social networks while achieving efficient space-time tradeoffs and b) algorithms for querying that graph that directly operates on the compressed graph.

This research work is published in  IEEE Big Data – Workshop on Mining Big Data in Social Networks,  Santa Clara, October, 2015.

Posted in Uncategorized.