Connecting the dots: Triangle completion and related problems on large data sets using GPUs

Studying the properties of Online Social Networks (OSNs) and other real world graphs have gained importance due to the large amount of information available from them. These large graphs contain data that can be analyzed and effectively used in advertising, security and improving the overall experience of the users of these networks. However, the analysis of these graphs for studying specific properties requires combinatorially explosive number of computations. Compute Unified Device Architecture (CUDA) is a programming model available from Nvidia for solving general-purpose problems using the massively parallel and highly multi-threaded Graphics Processing Units (GPUs). Therefore, using GPUs to solve these types of problems is appropriate. In addition, due to the properties of real-world data, the graphs being considered are sparse and have irregular data dependencies. Hence, using efficient techniques to store the graph data for initial preprocessing and final computation by taking advantage of heterogeneous CPU-GPU systems can address these issues. In this paper, we are interested in studying different properties of these real-world entities that transform into the following graph problems: a) identifying a missing edge, which when added would result in maximum increase in the number of triangles, b) identifying an existing edge whose removal would result in the maximum decrease in the number of triangles, c) identifying an existing edge whose removal would increase the number of connected components in the graph. In this paper, we develop and implement algorithms to solve the above problems using both CPU and GPU. Specifically, given a graph G = (V, E), we provide algorithms for the following: a) find (vi, Vj) ∉ E, such that Δf – Δc is maximized, where Δf and Δc are the number of triangles in Gm = (V, E ∪(vi, Vj)) and G, respectively, b) find a (vi, Vj) ϵ E, such that Δc – Δf is maximized, where Δf and Δc are the number of triangles in Gm = (V, E (vi, Vj)) and G respectively, c) find a (vi, Vj) ϵ E, such that Φc > Φc, where Φc and Φc are the number of connected components in Gm = (V, E (vi, Vj)) and G respectively. We implement the algorithms using a GPU and achieve a 10 × speedup as compared to a sequential implementation. Thereafter, we design a heuristic for finding an edge whose existence would result in the maximum increase in the number of triangles. The heuristic is implemented and the results are reported and compared to those of the regular algorithm on the GPU.

Published in  IEEE International Conference on Big Data 2014, Washington DC, 2014.

 

 

Posted in Uncategorized.