The basic data structure of hornet.network is the graph (we currently use networkx.DiGraph to implement this graph from NetworkX).
There are two core components to a graph, the Node and the Edge.
A node represents a person or thing in the network. HORNET stores nodes as hornet.network.Node objects. This object contains an identifier of the node as well as important information about the node. Two nodes are considered equal if their ids are equal:
Node('Joe')
Class that represents a node in a graph.
>>> Node('abc')
<Node('abc', size=0)>
An edge is a way of connecting two nodes. Edges are represented by tuples with a length of 3. The first two elements in the tuple are the nodes that form the edge. The third element is hornet.network.EdgeDetail which contains information about the edge:
(Node('Joe'), Node('Jane'), EdgeDetail())
Edges can be either directed or undirected. If the graph is undirected,
Structure that holds details about an edge, such as the similarity and the size. EdgeDetail objects are not hashable, thus not suitable to be used as keys.
Attributes:
- similarity
- size The number of items in common between the two nodes.
- joint The joint probability of node A and node B, P(A,B) or P(A and B). Also known as the support of a rule.
- conditional The conditional probability of node B give node A, P(B|A). Also known as the confidence of a rule.
These attributes default to 0. They can optionally be set as keyed arguments to the constructor:
>>> d = EdgeDetail(size=450, joint=0.3, conditional=0.7)
HORNET provides a set of basic primitives for creating and manipulating a graph.
Creates a new edge in the form node1 -> node2. The function is the same for both directed and undirected edges.
Additional arguments are set as attributes to the edge.
Creates a new edge of the form node1 -> node2, adding a reference to it into graph and always nodeA. If the graph is undirected, nodeB also gets a reference to the edge. The values for this edge’s EdgeDetails can be specified as kw.
Returns the newly created edge.
>>> # create a graph
>>> g = create_directed_graph()
>>> # add an edge to the graph
>>> e = add_edge(g, Node('x'), Node('y'), size=15)
>>> g.edges(data=True)
[(<Node('x', size=0)>, <Node('y', size=0)>, <EdgeDetail(size=15)>)]
>>> # copy the graph
>>> h = copy_graph(g)
>>> id(h) != id(g) # they are different objects
True
>>> # remove the edge from the graph
>>> remove_edge(g, e)
>>> g.edges()
[]
>>> # if we just want an edge without adding it to a graph
>>> create_edge('x', 'y', size=4)
('x', 'y', <EdgeDetail(size=4)>)
Filtering is the process of removing edges from a graph. The prune performs the filtering. It requires a filtering function. Which is applied to each edge in the graph. If the filtering function determines that an edge should be kept, it will return True, and prune will not remove the edge.
Sorting a set of edges is useful for determining which edges have the highest conditional or joint probabilities, or the highest similarities. Sorting is also useful when creating a random version of the graph.
Generates association rules for the graph. Assumes a directed graph, and is ‘destructive’ in that it assigns a confidence and a support to each edge in the graph.
- denominator_fn is the function on which to normalize the edge sizes by. By default, hornet.network.attribute_count() is used, but hornet.network.transaction_count() is also valid.