CCGraMi: An Effective Method for Mining Frequent Subgraphs in a Single Large Graph
In modern applications, large graphs are usually applied in the simulation and analysis of large complex systems such as social networks, computer networks, maps, traffic networks. Therefore, graph mining is also an interesting subject attracting many researchers. Among them, frequent subgraph mining in a single large graph is one of the most important branches of graph mining, it is defined as finding all subgraphs whose occurrences in a dataset are greater than or equal to a given frequency threshold. In which, the GraMi algorithm is considered the state of the art approach and many algorithms have been proposed to improve this algorithm. In 2020, the SoGraMi algorithm was proposed to optimize the GraMi algorithm and presented an outstanding performance in terms of runtime and storage space. In this paper, we propose a new algorithm to improve SoGraMi based on connected components, called CCGraMi (Connected Components GraMi). Our experiments on four real datasets (both directed and undirected) show that the proposed algorithm outperforms SoGraMi in terms of running time as well as memory requirements.
Abdelhamid, E., Abdelaziz, I., Kalnis, P., Khayyat, Z., and Jamour, F. Scalemine: Scalable parallel frequent subgraph mining in a single large graph. In SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2016), pp. 716–727.
Acosta-Mendoza, N., Gago-Alonso, A., and Medina-Pagola, J. E. Frequent approximate subgraphs as features for graph-based image classification. Knowledge-Based Systems 27 (2012), 381–392.
Amaranatha Reddy, P., and Hazarath Murali Krishna Prasad, M. High utility item-set mining from retail market data stream with various discount strategies using egui-tree. Journal of Ambient Intelligence and Humanized Computing (2021), 1–12.
Ansari, Z. A., Jahiruddin, and Abulaish, M. An efficient subgraph isomorphism solver for large graphs. IEEE Access 9 (2021), 61697–61709.
Basu, K., Zhou, C., Sen, A., and Goliber, V. H. A novel graph analytic approach to monitor terrorist networks. In 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom) (2018), pp. 1159–1166.
Daniel, C. B., Saravanan, S., and Mathew, S. Gis based road connectivity evaluation using graph theory. In Transportation Research (Singapore, 2020), T. V. Mathew, G. J. Joshi, N. R. Velaga, and S. Arkatkar, Eds., Springer Singapore, pp. 213–226.
Dhiman, A., and Jain, S. Optimizing frequent subgraph mining for single large graph. Procedia Computer Science 89 (2016), 378–385. Twelfth International Conference on Communication Networks, ICCN 2016, August 19– 21, 2016, Bangalore, India Twelfth International Conference on Data Mining and Warehousing, ICDMW 2016,August 19-21, 2016, Bangalore, India Twelfth International Conference on Image and Signal Processing, ICISP 2016, August 19-21, 2016, Bangalore, India.
Ding, R.-X., Wang, X., Shang, K., and Herrera, F. Social network analysis-based conflict relationship investigation and conflict degreebased consensus reaching process for large scale decision making using sparse representation. Information Fusion 50 (2019), 251–272.
El Islem Karabadji, N., Aridhi, S., and Seridi, H. A closed frequent subgraph mining algorithm in unique edge label graphs. In Machine Learning and Data Mining in Pattern Recognition (Cham, 2016), P. Perner, Ed., Springer International Publishing, pp. 43–57.
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., and Kalnis, P. Grami: Frequent subgraph and pattern mining in a single large graph. Proc. VLDB Endow. 7, 7 (mar 2014), 517–528.
Faroughi, A., Morichetta, A., Vassio, L., Figueiredo, F., Mellia, M., and Javidan, R. Towards website domain name classification using graph based semi-supervised learning. Computer Networks 188 (2021), 107865.
Fiedler, M., and Borgelt, C. Subgraph support in a single large graph. In Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007) (2007), pp. 399–404.
Iqbal, R., Doctor, F., More, B., Mahmud, S., and Yousuf, U. Big data analytics and computational intelligence for cyber–physical systems: Recent trends and state of the art applications. Future Generation Computer Systems 105 (2020), 766–778.
Jia, Y., Zhang, J., and Huan, J. An efficient graph-mining method for complicated and noisy data with real-world applications. Knowledge and Information Systems 28, 2 (2011), 423–447.
Jung, J. J. Constraint graph-based frequent pattern updating from temporal databases. Expert Systems with Applications 39, 3 (2012), 3169– 3173.
Kuramochi, M., and Karypis, G. Finding frequent patterns in a large sparse graph. Data mining and knowledge discovery 11, 3 (2005), 243–271.
Le, N.-T., Vo, B., Nguyen, L. B., Fujita, H., and Le, B. Mining weighted subgraphs in a single large graph. Information Sciences 514 (2020), 149–165.
Mai, S. T., et al. Scalable interactive dynamic graph clustering on multicore cpus. IEEE Transactions on Knowledge and Data Engineering 31, 7 (2019), 1239–1252.
Matousek, R., Dobrovsky, L., and Kudela, J. How to start a heuristic? utilizing lower bounds for solving the quadratic assignment problem. International Journal of Industrial Engineering Computations 13, 2 (2022), 151–164.
Mrzic, A., et al. Grasping frequent subgraph mining for bioinformatics applications. BioData mining 11, 1 (2018), 1–24.
Nabti, C. E. Subgraph Isomorphism Search In Massive Graph Data. PhD thesis, Universit´e de Lyon, 2017.
Nguyen, L. B., Nguyen, L. T., Zelinka, I., Snasel, V., Nguyen, H. S., and Vo, B. A method for closed frequent subgraph mining in a single large graph. IEEE Access (2021), 1–1.
Nguyen, L. B., Vo, B., Le, N.-T., Snasel, V., and Zelinka, I. Fast and scalable algorithms for mining subgraphs in a single large graph. Engineering Applications of Artificial Intelligence 90 (2020), 103539.
Qiao, F., Zhang, X., Li, P., Ding, Z., Jia, S., and Wang, H. A parallel approach for frequent subgraph mining in a single large graph using spark. Applied Sciences 8, 2 (2018).
Ray, A., Holder, L., and Choudhury, S. Frequent subgraph discovery in large attributed streaming graphs. In Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (New York, New York, USA, 24 Aug 2014),
W. Fan, A. Bifet, Q. Yang, and P. S. Yu, Eds., vol. 36 of Proceedings of Machine Learning Research, PMLR, pp. 166–181.
Schenker, A., Last, M., Bunke, H., and Kandel, A. Classification of web documents using a graph model. In Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. (2003), pp. 240–244 vol.1.
Shahrivari, S., and Jalili, S. Distributed discovery of frequent subgraphs of a network using mapreduce. Computing 97, 11 (2015), 1101–1120.
Talukder, N., and Zaki, M. J. A distributed approach for graph mining in massive networks. Data Mining and Knowledge Discovery 30, 5 (2016), 1024–1052.
Yan, X., and Han, J. gspan: graph-based substructure pattern mining. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. (2002), pp. 721–724.
Yu, Y., Zhang, Z., Sun, R., Liu, H., Yuan, S., Jiang, T., Wu, M., Guo, C., Guo, Y., Weng, J., Zheng, X., and Yuan, F. Ai-guided resource allocation and rescue decision system for medical applications. Future Generation Computer Systems 118 (2021), 485–491.
Zeng, J., Yang, L. T., Lin, M., Ning, H., and Ma, J. A survey: Cyber-physical-social systems and their system-level design methodology. Future Generation Computer Systems 105 (2020), 1028–1042.
Zhao, X., Chen, Y., Xiao, C., Ishikawa, Y., and Tang, J. Frequent subgraph mining based on pregel. The Computer Journal 59, 8 (2016), 1113–1128.
Copyright (c) 2021 MENDEL
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
MENDEL open access articles are normally published under a Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/ . Under the CC BY-NC-SA 4.0 license permitted 3rd party reuse is only applicable for non-commercial purposes. Articles posted under the CC BY-NC-SA 4.0 license allow users to share, copy, and redistribute the material in any medium of format, and adapt, remix, transform, and build upon the material for any purpose. Reusing under the CC BY-NC-SA 4.0 license requires that appropriate attribution to the source of the material must be included along with a link to the license, with any changes made to the original material indicated.