|
Co-authorship network with ground truth (for overlapping community detection)We construct co-authorship networks from DBLP, and Microsoft Academic Graph (MAG). Please see our paper for the citations to these two datasets. For DBLP, each community is a group of conferences; for MAG, each community is denoted by a ‘‘field of study’’ (FOS) tag. Each author’s ground truth community distribution (\(\mathbf{\theta}\) vector) is constructed by normalizing the number of papers he/she has published in conferences in a subfield (or papers that have the FOS tag). Please read our paper for details. We also construct bipartite version of the DBLP networks, where each node can either be an author or a paper, and the edges are between authors and papers. Please read our paper for details. Community Structure for Different NetworksDBLP1 has 6 communities as: - Machine Learning: NIPS, ICML, AISTATS, UAI - Theoritical Computer Science: STOC, FOCS, SODA, COLT, ITCS, RANDOM, ICALP, ISAAC - Data Mining: KDD, ICDM, CIKM, SDM, WSDM, RecSys - Computer Vision: CVPR, ICCV, ECCV, ICIP - Artificial Intelligence: AAAI, IJCAI - Natural Language Processing: ACL, NAACL, EMNLP, CONLL, COLING, EACL, SIGIR DBLP2 has 3 communities as: - Networking and Communications: INFOCOM, GLOBECOM, ICC - Systems: OSDI, SOSP, NSDI, SIGCOMM, MOBICOM, MOBISYS, CONEXT, ATC - Information Theory: ISIT, ITA, SIGMETRICS, MOBIHOC DBLP3 has 3 communities as: - Databases: VLDB, SIGMOD, PODS, CIKM, ICDE - Data Mining: KDD, ICDM, SDM, SIGIR - World Web Wide: WWW, WSDM, WINE, ICWSM DBLP4 has 3 communities as: - Programming Languages: PLDI, POPL, OOPSLA, ICLP, ESOP, ICFP - Software Engineering: FSE, ICSE, ASE/KBSE - Formal Methods: CAV, FM, SAS, FMSD, IFM, ICFEM, FORTE, CADE, TABLEAUX, LPAR DBLP5 has 4 communities as: - Computer Architecture: ASPLOS, ISCA, MICRO, HPCA - Computer Hardware: FPGA, CHES, ICCD, ISLPED, ASAP, ISPD - Real-time and Embedded Systems: RTSS, RTAS, ECRTS, MODELS, LCTRTS, CASES, EMSOFT, SCOPES - Computeraided Design: DAC, ICCAD, DATE, ASPDAC MAG1 has 3 communities as: - Computational Biology and Bioinformatics - Organic Chemistry - Genetics MAG2 has 3 communities as: - Machine Learning - Artificial Intelligence - Mathematical Optimization Data FormatFor each network, there are two txt files: Adjacency Matrix \(\mathbf{A}\in\mathbb{R}^{n\times n}\):
Community Groud Truth \(\mathbf{\Theta}\in\mathbb{R}^{n\times K}\):
DownloadThe data can be downloaded from here. The bipartite version of DBLP networks can be downloaded from here. Seperate files:
Code
CitationXueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabart, ‘‘On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations’’, in Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2324-2333, 2017. [BibTeX] Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabart, ‘‘Estimating Mixed Memberships with Sharp Eigenvector Deviations’’, Journal of the American Statistical Association, DOI: 10.1080/01621459.2020.1751645 [BibTeX] |