Subgraph Matching¶
This module provides a class for representing subgraph matching problems.
-
class
InexactMatchingProblem(tmplt, world, fixed_costs=None, local_costs=None, global_costs=None, node_attr_fn=None, edge_attr_fn=None, missing_edge_cost_fn=None, local_cost_threshold=0, global_cost_threshold=0, strict_threshold=False, ground_truth_provided=False, candidate_print_limit=10, cache_path=None, edgewise_costs_cache=None, ignore_edgewise_costs_cache=False, use_monotone=True, match_fixed_costs=False)¶ A class representing any subgraph matching problem, noisy or otherwise.
TODO: describe the class in more detail. TODO: optionally accept ground truth map argument. TODO: Is it okay to describe the tmplt and world attributes using the same descriptions as were used for the corresponding parameters? TODO: Introduce local_cost threshold.
Examples
>>> tmplt = uclasm.load_edgelist(template_filepath) >>> world = uclasm.load_edgelist(world_filepath) >>> smp = uclasm.MatchingProblem(tmplt, world)
Parameters: - tmplt (Graph) – Template graph to be matched.
- world (Graph) – World graph to be searched.
- fixed_costs (2darray, optional) – Cost of assigning a template node to a world node, ignoring structure. One row for each template node, one column for each world node.
- local_costs (2darray, optional) – Initial local costs.
- global_costs (2darray, optional) – Initial global costs.
- node_attr_fn (function) – Function for comparing node attributes. Should take two pd.Series of node attributes and return the cost associated with the difference between them.
- edge_attr_fn (function) – Function for comparing edge attributes. Should take two pd.Series of edge attributes and return the cost associated with the difference between them.
- missing_edge_cost_fn (function) – Function for computing the cost of a missing template edge. Should take a pd.Series of node attributes and return the cost associated with removing that edge.
- local_cost_threshold (int, optional) – A template node cannot be assigned to a world node if it will result in more than this number of its edges missing in an eventual match.
- global_cost_threshold (int, optional) – A subgraph whose cost againt the template exceeds this threshold will not be considered a match. It can also be used to eliminate candidates from the world graph. A cost of 0 corresponds to an exact match for the template, whereas a cost of 1 means that the match may be missing a single edge which is present in the template but not in the world.
- ground_truth_provided (bool, optional) – A flag indicating whether a signal has been injected into the world graph with node identifiers that match those in the template.
- candidate_print_limit (int, optional) – When summarizing the candidates of each template node, limit the list of candidates to this many.
- use_monotone (bool, optional) – Whether to use monotone arrays for the cost. Defaults to true.
-
shape¶ Size of the matching problem: Number of template nodes and world nodes.
Type: (int, int)
-
local_cost_threshold¶ A template node cannot be assigned to a world node if it will result in more than this number of its edges missing.
Type: int, optional
-
global_cost_threshold¶ A subgraph whose cost againt the template exceeds this threshold will not be considered a match. It can also be used to eliminate candidates from the world graph. A cost of 0 corresponds to an exact match for the template, whereas a cost of 1 means that the match may be missing a single edge which is present in the template but not in the world.
Type: int, optional
-
copy(copy_graphs=True)¶ Returns a copy of the MatchingProblem.
-
set_costs(fixed_costs=None, local_costs=None, global_costs=None)¶ Set the cost arrays by force. Override monotonicity.
Parameters: - fixed_costs (2darray, optional) –
- local_costs (2darray, optional) –
- global_costs (2darray, optional) –
-
fixed_costs¶ Fixed costs such as node attribute mismatches.
Cost of assigning a template node to a world node, ignoring structure. One row for each template node, one column for each world node. TODO: Better docstrings.
Type: 2darray
-
local_costs¶ Local costs such as missing edges around each node.
Each entry of this matrix denotes a bound on the local cost of matching the template node corresponding to the row to the world node corresponding to the column. TODO: Better docstrings.
Type: 2darray
-
global_costs¶ Costs of full graph match.
Each entry of this matrix bounds the global cost of matching the template node corresponding to the row to the world node corresponding to the column. TODO: Better docstrings.
Type: 2darray
-
candidates(tmplt_idx=None)¶ Get the matrix of compatibility between template and world nodes.
World node j is considered to be a candidate for a template node i if there exists an assignment from template nodes to world nodes in which i is assigned to j whose cost does not exceed the desired threshold.
This could be a property, but it is not particularly cheap to compute.
Returns: A boolean matrix where each entry indicates whether the world node corresponding to the column is a candidate for the template node corresponding to the row. Return type: 2darray
-
reduce_world()¶ Reduce the size of the world graph.
Check whether there are any world nodes that are not candidates to any tmplt nodes. If so, remove them from the world graph and update the matching problem.
Returns: A boolean array of values corresponding to which nodes were kept. True where nodes were kept and false where they were removed. Return type: np.ndarray(bool)
-
have_candidates_changed()¶ Check whether candidates have changed.
Returns: True if any of the candidates have been eliminated. False otherwise. Return type: bool
-
enforce_matching(matching)¶ Enforce the given matching tuple by setting fixed costs in off-match rows and columns to float(“inf”) :param matching: Iterable of 2-tuples indicating pairs of template-world indexes :type matching: iterable
-
prevent_match(tmplt_idx, world_idx)¶ Prevent matching the template node with the given index to the world node with the given index. :param tmplt_idx: The index of the template node not to be matched. :type tmplt_idx: int :param world_idx: The index of the world node not to be matched. :type world_idx: int
-
MatchingProblem¶ alias of
uclasm.matching.matching_problem.InexactMatchingProblem