-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding the largest common monomorphic subgraph #34
Comments
@bheijden this is a super neat use of grandiso, and I imagine it might be very helpful indeed, though it may require some modifications. The first thing that comes to mind is that For example, if you have a graph
One way around this is to "seed" your search with the Would be happy to discuss more if you think this looks doable! |
Thanks for your detailed answer! First off, I wanted to double check that the steps described in algorithm 1 of the DotMotif paper match the steps taken in grandiso. Also, what would you say are the main differences with Below, I describe my current approach. I am interested to hear your thoughts (and perhaps optimizations I could take). I created a new function called
To address your starting point problem for a graph A⟷B⟷C⟷A and motif A⟷B⟷C, I now add all possible starting point to the candidates queue in the very first evaluation (in With this approach, I noticed that the queuing order does not results in a depth-first search anymore, because all starting nodes are added as candidates in the queue, and new candidates are enqueued after them. Ideally, I want to do a depth first search, so that I find the largest candidate as fast as possible (as I may bump into Another inefficiency of the implementation above is that two different matches may start from different nodes, but may quickly lead to proposing the same candidates. I could built in a routine that checks whether a candidate has already been checked before, but this may, in itself be costly. Is this something that can be implemented efficiently with the I also found that |
Tried to address all your comments inline —
That's right, although the grandiso codebase has diverged slightly in favor of performance optimizations that are not reflected in the DotMotif paper. The main examples of this are different queue strategies (which it looks like you're familiar with now!) and some memoization, which makes a huge difference when it can be used.
That sounds exactly right, as long as "largest motif match" means "largest exact subgraph match". My strategy would be—
—ah yes; exactly that!
Frankly, I'm not 100% sure, but I am relatively convinced that this should work as you expect, with one exception: I "short-circuit" the search and quit early when I see a structural mismatch (similar to the structural matching done by VF and others). Structural mismatch here just means that the degree of the host vertex is smaller than the degree of the motif vertex (though my recollection is that there were clever additions to this in a VF-derived implementation. Oh well — to wit, I only implement the degree check for now!). This is done by this function: grandiso-networkx/grandiso/__init__.py Lines 61 to 63 in f9be813
...which you can override in the top-level call to the
I would love this!! The I've written the library so that you don't need to edit the grandiso core to bring your own queue; you should be able to do something like from grandiso import find_motifs_iter
class MyCleverQueue(...):
def put(...)
def get(...)
def empty(...)
find_motifs_iter(
...,
queue=MyCleverQueue()
) To be clear: I'd love to merge in some smarter queue logic into the library; and ALSO, you might be able to get away with not having to dig into grandiso guts to make it work! I am also totally happy to redo some of the core code to accommodate a more advanced replacement for the current queues if you think it's necessary, though I'm sensitive to keeping the current exact-match performance as close to (or better than) its current perf as possible when making the change. And the
I would do exactly this —
Ah yes — this is another one of my long-standing gotchas! The main reason this doesn't work out of the box (which is kinda deliberate, but lazy) is that there are some pretty clear performance benefits to searching many times on smaller motifs than once on a big (disconnected) motif; AND there's a HUGE result-count explosion if you allow disconnected motifs (because you count ALL mappings of the second submotif for every valid mapping of the first submotif). Three disconnected motif pieces yields O(count³) results, and so on. Huge numbers! And tons of wasted compute. So my proposal would be to either count for each component individually, or figure out a way to "conditionally" connect them so that grandiso can walk across the complete motif for the purposes of search... OR maybe there's a third option that is smarter that I'm not thinking of? |
Thank you for the long and detailed response! Greatly appreciated :). I'll follow that with an apology for my late reply (I was on Holiday the last two weeks)! Below my comments inlined:
Smart! I hadn't thought of this yet. Even though a node may itself not lead to a perfect match (hence the short circuit approach if you are looking for perfect matches), it may actually be part of the largest common subgraph! Thank you for this pointer! I'm going to test this. I only hope that it won't blow up the number of candidates that are to be evaluated.
Smart queuing is an optimization rabbit hole with so many possibilities, and it may be hard to find a smart queque strategy that generally works for any graph. One could sort the queue based on interestingness, but that operation in itself may be a computationally expensive operation that can ultimately negate any performance gained by the sort. Simple queue strategies where you can, for example, add in the front or back are computationally cheap and may be a simple way to favor promising over unlikely candidates.
From what I understand, is that the
Perhaps not unpacking the candidates, but providing it as a single argument
Hmm, but to find the largest subgraph match, you have to consider all these different disconnected motifs and there combinations thereof simultaneously right? If you consider them separately, you may use the same node in the host graph in the mappings of two disconnected graphs in the motif. I do see your problem of the count explosion though... In my specific case, the host graph is always connected (single graph), a complete multi-partite graph, and directed acyclical. Perhaps there is some optimizations to be done by considering these characteristics of the host? Also, I was wondering what your take is on using maximum clique finding algorithms for finding the largest subgraph isomorphisms (e.g. here)? Are (modifications of) these algorithms useful in finding monomorphic subgraphs as well ? I can find quite some literature on largest subgraph isomorphism, but much less on (the less strict?) monomorphic variant. And finally, grandiso seems to take a different approach, why? Btw, I'm probably ab/misusing all the network theoretic definitions. I'll apologize for that upfront (not my field :P). |
Hi,
I am looking for an algorithm that can find the largest common monomorphic subgraph between two graphs. Specifically, I have two graphs G1 and G2, and I would like to identify the largest subgraph (may be non-unique) of G2 that is monomorphic to a subgraph of G1.
I am wondering whether
find_motifs_iter
can be useful in an iterative method to find an approximation of the largest subgraph of G2 that is a monomorphic to a subgraph of G1. I suppose that somewhere in this method, I may be able to extract the largest considered candidate subgraph that failed to be extended to a full graph monomorphism.Any ideas on where to start with this?
Thanks!
The text was updated successfully, but these errors were encountered: