Difficulties sharing DataType Archive between multiple programs in a project #3012
Replies: 9 comments 3 replies
-
You may want to convert this issue to a discussion. You will probably get more eyeballs there than here. If github isn't giving you the "Transfer Issue" option, I or another team member could do it. As far as your original question goes, I can't think of any shortcuts that would help your situation. |
Beta Was this translation helpful? Give feedback.
-
Hmm. If it was available to me, I gather that I would see a "Convert to Discussion" button on the |
Beta Was this translation helpful? Give feedback.
-
With trying to live with this problem for some time now, I do have a suggestion that I think would help. What I want is perhaps an AliasDataType that contains the name of a DataType to alias to. When the type is referenced, it looks up the alias. Or perhaps it looks up itself or something. The idea is that One file defines an alias for msgType to be localMsgType, which is an enum with the particular message codes for this binary. Many functions take a msgType argument, and message structures take a msgType member in the structure, and the FunctionDefinitionDataType signatures and referencing structures all can be shared between different binaries, but when it comes time to use the DataType, it refers to the localMsgType enum in that particular binary which is not shared with other binaries. Does that make any sense at all? I want to share structs and signatures that refer to msgType, but have a unique msgType for each binary (since each one has a different numbering). Similarly, I might have an alias for "struct foo" that some binaries set up to refer to "struct foo1" and others refer to "struct foo2" because at some point in the code's evolution a few fields were added to the real struct foo, but I want to share everything that only references struct foo between all of the different binaries. |
Beta Was this translation helpful? Give feedback.
-
I think this makes sense. You'd like a way to have certain types get replaced with program-specific versions when applied to a particular program. It seems like we'd need to have a new type, something like a Placeholder Type, that Ghidra knows must be resolved with a program-specific type. If that type does not exist, then the UI would flag that somehow so that you'd know to create that program's placeholder type before you can start using the containing data type.
Are you familiar with the notion of Data Type Synchronizing in Ghidra? This allows you to add types to a program archive from various other data type libraries. Then, as the types change, Ghidra shows that the source archive and program archive have diverged. There are actions to resynchronize the types in the archives. This is clearly not what you are asking for in this feature, but it does help somewhat with managing archives as types change. |
Beta Was this translation helpful? Give feedback.
-
Yes, at the moment we have a two-layer hierarchy of data-type management. There is a top-layer archive with some basic types that seem pretty invariant. Then, as we see different families of binaries we break off datatypes into these family-specific archives. This is "manageable" but is a rather troublesome manual process. There are a few annoyances, like drag and drop is the only way to share, there is no context menu entry for "share with archive", so I have to drag an entry across 5 screens of datatypes to get to the library, which if it's expanded is easy to skip over the library itself... And we do some things like using ushort for the message number in the structure of a generic message with a comment that this is the msgType because the enum is specific to that particular program (the issue above). This means that the message structure is sharable without causing conflicts if it had the actual enum datatype in the structure, but it also means that ghidra doesn't understand the data as well as it could. BTW, we currently have ~30 binaries that are all related, and it is a struggle to manage this. Ghidra does a lot more than anything else I have ever seen. Things like bsim help (we also have a lot of home grown tools) and version tracker markup is the only thing that I've seen that helps with porting volatile data from one program to another. But it's a difficult problem which is still just starting to be solved. |
Beta Was this translation helpful? Give feedback.
-
Just between the two of us, I feel like our Data Type management never really matured.
This really is the expectation. Obviously for user-specific needs, but also to address the tool's deficiencies, we assume our clients write into Ghidra the functionality that they need. The plugin and extension points features are designed to this end. Of course, being open source ultimately allows clients to make Ghidra whatever they need when the extendibility is not enough. We have to walk a fine line when deciding which user-made features should be pulled back into the tool. Whatever we decide to put into the tool has to be useful and generic enough for us to easily maintain. As you are pointing out in this ticket, I think the overall Data Type workflow needs quite a bit of work. This is something that we should fix. Admittedly though, this is a hard problem. You have also pointed out that long-term management of RE'd libraries is something that has not gotten enough attention. This is something that likely would require more resources and collaboration for us to improve. Perhaps we need more great developers working on the Ghidra team... 🐉 |
Beta Was this translation helpful? Give feedback.
-
I think it probably is worth it in the context of a bigger data type overhaul. |
Beta Was this translation helpful? Give feedback.
-
We have a project consisting of multiple versions of the same basic program. These programs share the same DataTypes (in particular, structure definitions), so we have a common DataType archive to pull from. However, a persistent problem we see is illustrated by this example:
Consider messages passed between tasks within the program. Different versions of the program may have different definitions of particular message numbers, so the enum that defines the message number is unique for each program within the project.
But, the message header itself is common between all versions, except that they use different enums for the message number. But, the DataType archive has to contain a single message number enum in the archive. Furthermore, message queues point to the message header, so they want to be unique based on which enum is in the header, etc.
So, what we want to do is to have a "generic" enum in the DataType archive and the specific values in the enum get used from the program datatype manager. At the moment, we use ushort in the message header, but this disables the very useful enum expansion that ghidra is capable of.
And sometimes this variation is actually slightly different contents of structures, which is more problematic to work around.
So, the question is, are we overlooking a better way to deal with this problem?
Beta Was this translation helpful? Give feedback.
All reactions