A path query tool for program analysis.
Given a labelled directed graph and a query expression, LDGGrep computes the minimal graph representing all matching paths.
With experience, LDGGrep can be used to quickly reduce large, unwieldy graphs to minimal, problem-focused graphs for further analysis and visualization.
The Ghidra extension includes graph models built from program data, a query dialog with basic documentation, and a graph viewer.
-
RefGrep
uses the graph of Ghidra references. Nodes are addresses and edges are references like function calls and data indirections. (example queries) -
RefGrepExt
demonstrates extendingRefGrep
with extra predicates. (example queries) -
RefGrepWithDataStarts
isRefGrep
except all reference sources, and not just functions, are added to the set of starting nodes. That can be a large set, so queries should start with a selective first predicate. (example queries) -
RefGrepWithTypes
isRefGrep
with extra references to Structure fields. Note, the first time a function is visited, it's decompiled - subsequent visits will be faster. (example queries) -
BlockGrep
extendsRefGrep
with basic blocks for control flow graph queries. (example queries) -
BaseGhidraGrep
is the abstractGhidraScript
that all of the above inherit from. It provides a query dialog with history and all the wiring to the engine, just add a model.
Install the extension and restart Ghidra. (alternatively, add the bundle and ghidra_scripts directory via the Bundle Manager)
From the script manager, select the LDGGrep category.
Running each script displays a dialog with a query textbox and a help window. Clicking on a query expression in the help window populates the query box.
Clicking the "graph" button or pressing enter will submit the graph query. If parsing fails, an error message will show in Ghidra's console. On success, either no match is found and "no match" is written to Ghidra's console, or a graph window will open containing the minimized graph of matching paths.
In the graph window, clicking on a node or edge jumps to the corresponding location in the open program. Selecting a set of nodes will highlight the corresponding locations in the code browser.
Clicking the "mem" button in the query dialog will perform the same query, but
the stored nodes are presented in a table. If the sto
predicate doesn't
appear in the query expression, the table will be empty.
mvn package -Dghidra.version=9.2.3
ls -l ./ghidra/extension/target/ghidra_9.2.3_*_ldggrep.zip
(Maven calls the Bash script ghidra/extension/build.sh
, so for now building
the extension depends on Bash)
Tools that use LDGGrep and support classes for the development of new ones.
mvn package
ls -l ./tools/target/appassembler/bin
Disassemble jars and classes to generate a graph of references, then query it from a repl.
e.g. to find all call paths from LDGGrep's primary match method to the
dk.brics
API:
./tools/target/appassembler/bin/javagrep ./ldggrep/target/ldggrep-*.jar
> </LDGMatcher::match/> (callx </jpleasu/>)* callx </dk\.brics/>
A graph file grep built from the jgrapht-io parsers.
./tools/target/appassembler/bin/gfgrep ./tools/src/test/resources/test.dot
> /x/ </b/>
restgrep
is a web service for querying graphs and
restclient.py
is a sample client.
# start the server w/ "showmatch" so that every matched graph shows in a window
./tools/target/appassembler/bin/restgrep -showmatch
# install client dependencies
pip3 install pydot networkx requests
# send a graph and do some queries
./tools/src/main/python/restclient.py
To change the listening port, use the -port #
option to restgrep
, and
change the URL in restclient.py
.
If the subject of "regular expressions" provokes you, LDGGrep is not for you.
Query expressions in LDGGrep are like regular expressions, except instead of matching a string composed of a sequence of a characters, we're matching a walk in a directed graph composed of a sequence of nodes and edges.
LDGGrep uses .
, |
, +
, ?
, *
, ()
, and {#,#}
in the exact same way
as regular expressions, but to go from characters to objects we need some more
syntax.
First, recall that in regular expressions (most) characters are matched with an
identical literal - so a
matches a
. For more power matching a character, we
have character
classes,
e.g. [:alpha:]
matches a
but not 1
.
With just a tiny bit of imagination, we might allow any
predicate to
take the place of [:alpha:]
. With a stateless predicate, the semantics of
regular expressions are unchanged. We could even replace each literal, like
b
, with a predicate, like [:is_b:]
. It's predicates all the way down!
In LDGGrep, this idea of using predicates lets us generalize from characters to objects. (note: LDGGrep uses square brackets entirely differently, only the idea is shared!)
As noted above, .
does the same thing in LDGGrep as in regular expressions -
it matches anything. E.g. the LDGGrep query .{2,4}
matches all walks with
length from 2 to 4.
We often want to match objects by their name, so there are two kinds of predicates to match strings. The model of an graph provides the conversion function to make strings of objects.
String literal predicates are enclosed in double quotes and regex predicates
are enclosed in forward slashes. E.g. the LDGGrep query "a" /b/
matches
walks of length 2 whose first edge is named a
and second edge has a name that
contains b
.
For direct access to objects in a predicate, there are Iverson
bracketed JavaScript expressions. The object to be tested is named x
in the
expression, and the expession is executed in the with(x)
scope for more
succinct access to its members.
Iverson bracket predicates are enclosed in square brackets. E.g the LDGGrep
query [x.fieldA]
, or equivalently [fieldA]
, matches edges whose "fieldA"
member is (convertable to) true.
Finally, the model of a graph can provide "bareword" predicate methods, annotated methods of the model that take an object and return true or false. They're referred to as "bareword" because they're distinguished from the other predicates by not being enclosed in anything in particular.
In LDGGrep, node predicates are different from edge predicates because they do not advance in the graph during matching, they only filter.
Node predicates are enclosed in angle brackets. What's inside the angle
brackets is a predicate as in the previous section. For example, the LDGGrep
query <"x"> "b"
matches all walks of length one that start on a node with
name "x" and continue to an edge with name "b".
expr ::= alt (";" alt)*
alt ::= cat ("|" cat)*
cat ::= atom atom*
atom ::= "(" expr ")" | rep
rep ::= pred ("{" NUMBER?, NUMBER? "}" | "*" | "+" )?
pred ::= node_pred | edge_pred
edge_pred ::= pred_expr
node_pred ::= "<" pred_expr ">"
pred_expr ::= "!" pred_expr | DOUBLE_QUOTE STRING DOUBLE_QUOTE | "/" REGEX "/" | IDENTIFIER | "[" JAVASCRIPT "]"
see ldgpat.jj for more detail.
- ldggrep-1.1
- added start generators
- added Ghidra script RefGrepWithTypes
- ldggrep-1.0
- initial releasa