The PyGhidra Python library, originally developed by the Department of Defense Cyber Crime Center (DC3) under the name "Pyhidra", is a Python library that provides direct access to the Ghidra API within a native CPython 3 interpreter using JPype. PyGhidra contains some conveniences for setting up analysis on a given sample and running a Ghidra script locally. It also contains a Ghidra plugin to allow the use of CPython 3 from the Ghidra GUI.
Ghidra provides an out-of-the box integration with the PyGhidra Python library which makes installation and usage fairly straightforward. This enables the Ghidra GUI and headless Ghidra to run GhidraScript's written in native CPython 3, as well as interact with the Ghidra GUI through a built-in REPL. To launch Ghidra in PyGhidra-mode, see Ghidra's latest Installation Guide.
It is also possible (and encouraged!) to use PyGhidra as a standalone Python library for usage in reverse engineering workflows where Ghidra may be one of many components involved. The following instructions in this document focus on this type of usage.
To install the PyGhidra Python library:
- Download and install Ghidra 11.3 or later to a desired location.
- Set the
GHIDRA_INSTALL_DIR
environment variable to point to the directory where Ghidra is installed. - Install PyGhidra:
- Online:
pip install pyghidra
- Offline:
python3 -m pip install --no-index -f <GhidraInstallDir>/Ghidra/Features/PyGhidra/pypkg/dist pyghidra
- Online:
Optionally, you can also install the Ghidra type stubs to improve your development experience (assuming your Python editor supports it). The type stubs module is specific to each version of Ghidra:
- Online:
pip install ghidra-stubs==<version>
- Offline:
python3 -m pip install --no-index -f <GhidraInstallDir>/docs/ghidra_stubs ghidra-stubs
The current version of PyGhidra inherits an API from the original "Pyhidra" project that provides an excellent starting point for interacting with a Ghidra installation. NOTE: These functions are subject to change in the future as more thought and feedback is collected on PyGhidra's role in the greater Ghidra ecosystem:
To get a raw connection to Ghidra use the start()
function. This will setup a JPype connection and
initialize Ghidra in headless mode, which will allow you to directly import ghidra
and java
.
NOTE: No projects or programs get setup in this mode.
def start(verbose=False, *, install_dir: Path = None) -> "PyGhidraLauncher":
"""
Starts the JVM and fully initializes Ghidra in Headless mode.
:param verbose: Enable verbose output during JVM startup (Defaults to False)
:param install_dir: The path to the Ghidra installation directory.
(Defaults to the GHIDRA_INSTALL_DIR environment variable)
:return: The PyGhidraLauncher used to start the JVM
"""
import pyghidra
pyghidra.start()
import ghidra
from ghidra.app.util.headless import HeadlessAnalyzer
from ghidra.program.flatapi import FlatProgramAPI
from ghidra.base.project import GhidraProject
from java.lang import String
# do things
To check to see if PyGhidra has been started, use the started()
function.
def started() -> bool:
"""
Whether the PyGhidraLauncher has already started.
"""
import pyghidra
if pyghidra.started():
...
To have PyGhidra setup a binary file for you, use the open_program()
function. This will setup a
Ghidra project and import the given binary file as a program for you.
Again, this will also allow you to import ghidra
and java
to perform more advanced processing.
def open_program(
binary_path: Union[str, Path],
project_location: Union[str, Path] = None,
project_name: str = None,
analyze=True,
language: str = None,
compiler: str = None,
loader: Union[str, JClass] = None
) -> ContextManager["FlatProgramAPI"]: # type: ignore
"""
Opens given binary path in Ghidra and returns FlatProgramAPI object.
:param binary_path: Path to binary file, may be None.
:param project_location: Location of Ghidra project to open/create.
(Defaults to same directory as binary file)
:param project_name: Name of Ghidra project to open/create.
(Defaults to name of binary file suffixed with "_ghidra")
:param analyze: Whether to run analysis before returning.
:param language: The LanguageID to use for the program.
(Defaults to Ghidra's detected LanguageID)
:param compiler: The CompilerSpecID to use for the program. Requires a provided language.
(Defaults to the Language's default compiler)
:param loader: The `ghidra.app.util.opinion.Loader` class to use when importing the program.
This may be either a Java class or its path. (Defaults to None)
:return: A Ghidra FlatProgramAPI object.
:raises ValueError: If the provided language, compiler or loader is invalid.
:raises TypeError: If the provided loader does not implement `ghidra.app.util.opinion.Loader`.
"""
import pyghidra
with pyghidra.open_program("binary_file.exe") as flat_api:
program = flat_api.getCurrentProgram()
listing = program.getListing()
print(listing.getCodeUnitAt(flat_api.toAddr(0x1234)))
# We are also free to import ghidra while in this context to do more advanced things.
from ghidra.app.decompiler.flatapi import FlatDecompilerAPI
decomp_api = FlatDecompilerAPI(flat_api)
...
decomp_api.dispose()
By default, PyGhidra will run analysis for you. If you would like to do this yourself, set analyze
to False
.
import pyghidra
with pyghidra.open_program("binary_file.exe", analyze=False) as flat_api:
from ghidra.program.util import GhidraProgramUtilities
program = flat_api.getCurrentProgram()
if GhidraProgramUtilities.shouldAskToAnalyze(program):
flat_api.analyzeAll(program)
The open_program()
function can also accept optional arguments to control the project name and
location that gets created (helpful for opening up a sample in an already existing project).
import pyghidra
with pyghidra.open_program("binary_file.exe", project_name="MyProject", project_location=r"C:\projects") as flat_api:
...
PyGhidra can also be used to run an existing Ghidra Python script directly in your native CPython
interpreter using the run_script()
function. However, while you can technically run an existing
Ghidra script unmodified, you may run into issues due to differences between Jython 2 and
CPython 3/JPype. Therefore, some modification to the script may be needed.
def run_script(
binary_path: Optional[Union[str, Path]],
script_path: Union[str, Path],
project_location: Union[str, Path] = None,
project_name: str = None,
script_args: List[str] = None,
verbose=False,
analyze=True,
lang: str = None,
compiler: str = None,
loader: Union[str, JClass] = None,
*,
install_dir: Path = None
):
"""
Runs a given script on a given binary path.
:param binary_path: Path to binary file, may be None.
:param script_path: Path to script to run.
:param project_location: Location of Ghidra project to open/create.
(Defaults to same directory as binary file if None)
:param project_name: Name of Ghidra project to open/create.
(Defaults to name of binary file suffixed with "_ghidra" if None)
:param script_args: Command line arguments to pass to script.
:param verbose: Enable verbose output during Ghidra initialization.
:param analyze: Whether to run analysis, if a binary_path is provided, before running the script.
:param lang: The LanguageID to use for the program.
(Defaults to Ghidra's detected LanguageID)
:param compiler: The CompilerSpecID to use for the program. Requires a provided language.
(Defaults to the Language's default compiler)
:param loader: The `ghidra.app.util.opinion.Loader` class to use when importing the program.
This may be either a Java class or its path. (Defaults to None)
:param install_dir: The path to the Ghidra installation directory. This parameter is only
used if Ghidra has not been started yet.
(Defaults to the GHIDRA_INSTALL_DIR environment variable)
:raises ValueError: If the provided language, compiler or loader is invalid.
:raises TypeError: If the provided loader does not implement `ghidra.app.util.opinion.Loader`.
"""
import pyghidra
pyghidra.run_script(r"C:\input.exe", r"C:\some_ghidra_script.py")
This can also be done on the command line using pyghidra
.
> pyghidra C:\input.exe C:\some_ghidra_script.py <CLI ARGS PASSED TO SCRIPT>
JVM configuration for the classpath and vmargs may be done through a PyGhidraLauncher
.
class PyGhidraLauncher:
"""
Base pyghidra launcher
"""
def add_classpaths(self, *args):
"""
Add additional entries to the classpath when starting the JVM
"""
self.class_path += args
def add_vmargs(self, *args):
"""
Add additional vmargs for launching the JVM
"""
self.vm_args += args
def add_class_files(self, *args):
"""
Add additional entries to be added the classpath after Ghidra has been fully loaded.
This ensures that all of Ghidra is available so classes depending on it can be properly loaded.
"""
self.class_files += args
def start(self, **jpype_kwargs):
"""
Starts Jpype connection to Ghidra (if not already started).
"""
The following PyGhidraLauncher
s are available:
class HeadlessPyGhidraLauncher(PyGhidraLauncher):
"""
Headless pyghidra launcher
"""
class DeferredPyGhidraLauncher(PyGhidraLauncher):
"""
PyGhidraLauncher which allows full Ghidra initialization to be deferred.
initialize_ghidra must be called before all Ghidra classes are fully available.
"""
class GuiPyGhidraLauncher(PyGhidraLauncher):
"""
GUI pyghidra launcher
"""
from pyghidra.launcher import HeadlessPyGhidraLauncher
launcher = HeadlessPyGhidraLauncher()
launcher.add_classpaths("log4j-core-2.17.1.jar", "log4j-api-2.17.1.jar")
launcher.add_vmargs("-Dlog4j2.formatMsgNoLookups=true")
launcher.start()
There may be some Python modules and Java packages with the same import path. When this occurs the Python module takes precedence. While JPype has its own mechanism for handling this situation, PyGhidra automatically makes the Java package accessible by allowing it to be imported with an underscore appended to the package name:
import pdb # imports Python's pdb
import pdb_ # imports Ghidra's pdb