Formal Language Theory Meets Graph Theory

Salvador Aguinaga, Rodrigo Palacios, David Chiang, and Tim Weninger. Growing Graphs with Hyperedge Replacement Graph Grammars. International Conference on Information and Knowledge Management (CIKM), Indianapolis, IN, October 2016.

ACM Portal | Code

A CFG For Graphs

Discovering the underlying structures present in large real world graphs is a fundamental scientific problem. In this paper we show that a graph’s clique tree can be used to extract a hyperedge replacement grammar (HRG). If we store an ordering from the extraction process, the extracted graph grammar is guaranteed to generate an isomorphic copy of the original graph. Or, a stochastic application of the graph grammar rules can be used to quickly create random graphs. In experiments on large real world networks, we show that random graphs, generated from extracted graph grammars, exhibit a wide range of properties that are very similar to the original graphs. In addition to graph properties like degree or eigenvector centrality, what a graph "looks like" ultimately depends on small details in local graph substructures that are difficult to define at a global level. We show that our generative graph model is able to preserve these local substructures when generating new graphs and performs well on new and difficult tests of model robustness.

Consider the following example where we take a graph or its hypergraph equivalent, our approach transforms the original graph into its clique tree. Using the properties of clique trees we expand and focus on sepsets and derive a set of production rules.

Fact Checking Example Fact Checking Architecture

Using these rules we can grow exact and approximate graphs. Exact graph generation produces isomophic graphs. On the other hand, using stochastic graph generation we end up with approximate graphs that have many of the global properties observed in the original graph. Evaluating the performance of HRG against leading graph generators, like Kronerker Product, Exponential Random, and Chung-Lu graph models HRG outperforms these others. Especially when we compare the original graph to sets of generated graphs from the various generators using the connonical method of graph comparison: Graphlet Correlation Distance.

Growing Graphs Graph Comparison with GCD

Here we grow graphs by applying the rules in the order that they were derived. If the rules are instead applied stochastically, we generate approximate graphs. Computing the network properties like node degree, hop-plot, eigenvector centrality, graphlet correlation distance (GCD), and others allows us to compare the performance of the generator.

Infinity Mirror

Salvador Aguinaga and Tim Weninger. The Infinity Mirror Test for Analyzing the Robustness of Graph Generators. ACM International Conference on Knowledge Discovery in Data Mining (SIGKDD) Workshop on Mining and Learning with Graphs, San Francisco, CA, August 2016.

Paper | Poster | Code

Testing Robustness of Graph Generators

When we study complex systems such as communications networks, or networks of the brain we derive a graphical representation to work with. Often times during the validation of a new method—we are interested in similar networks (network that exhibit similar network properties). To address this, we can pick one of several well studied graph generators that learn a model given graph, for example: Kronecker Product or Chung-Lu. Using the learned model parameters we can generate synthetic graphs that on the surface have many of the properties observed in the original graph. In some cases we are interested in similar graphs that are an order or magnitude larger in size. But how well do these models capture the intricacies of the interactions to be able to scale. How do we shed light on the biases and assumptions that are at the core of these models? To do this, we use a simple method we call infinity mirror which takes the newly created synthetic graph and we feed it back to the graph generator to see how well it can relearn a set of model parameters. At each step, a new synthetic graph is generated, we compute a battery of network properties and compare these to the signature of the original graph. This way we can see what network properties are consistently preserved and which are not.

Infinity Mirror Test Summary of Infinity Mirror

Human Navigation in Information Networks

Salvador Aguinaga and Tim Weninger. The Infinity Mirror Test for Analyzing the Robustness of Graph Generators. ACM International Conference on Knowledge Discovery in Data Mining (SIGKDD) Workshop on Mining and Learning with Graphs, San Francisco, CA, August 2016.

IEEE Explore

Concept Hierarchies

In order to efficiently reason about knowledge and information, humans have evolved efficient strategies for organizing complex concepts in order to form connections between and recall information. This behavior can be observed and codified when people search for objects within digital information networks. Current models of search behavior exhibit unnecessary or extraneous complexity. Minimal or simple modifications to well established algorithms yield valid models of human navigation by exploring hierarchical information inherent in networks. We explore and validate a new model of how humans navigate an information networks. To that end, we present a new path finding algorithm that approximates human navigation by leveraging the categorical classification of the nodes within the network. We compare our new model, CatPath, to existing graph distance measures when possible and show that the category paths are largely correlated with traces of human navigation.

CatPath: Hops to Page of Interest Results of CatPath