Difficulties sharing DataType Archive between multiple programs in a project #3012

marcushall42 · 2019-09-20T17:34:09Z

marcushall42
Sep 20, 2019

We have a project consisting of multiple versions of the same basic program. These programs share the same DataTypes (in particular, structure definitions), so we have a common DataType archive to pull from. However, a persistent problem we see is illustrated by this example:

Consider messages passed between tasks within the program. Different versions of the program may have different definitions of particular message numbers, so the enum that defines the message number is unique for each program within the project.
But, the message header itself is common between all versions, except that they use different enums for the message number. But, the DataType archive has to contain a single message number enum in the archive. Furthermore, message queues point to the message header, so they want to be unique based on which enum is in the header, etc.

So, what we want to do is to have a "generic" enum in the DataType archive and the specific values in the enum get used from the program datatype manager. At the moment, we use ushort in the message header, but this disables the very useful enum expansion that ghidra is capable of.
And sometimes this variation is actually slightly different contents of structures, which is more problematic to work around.

So, the question is, are we overlooking a better way to deal with this problem?

mpaxson · 2019-12-17T12:59:54Z

mpaxson
Dec 17, 2019

Bump

0 replies

anarelion · 2021-05-05T20:43:32Z

anarelion
May 5, 2021

Bump

0 replies

dev747368 · 2021-05-05T23:56:52Z

dev747368
May 5, 2021
Collaborator

You may want to convert this issue to a discussion. You will probably get more eyeballs there than here. If github isn't giving you the "Transfer Issue" option, I or another team member could do it.

As far as your original question goes, I can't think of any shortcuts that would help your situation.

0 replies

marcushall42 · 2021-05-06T22:52:37Z

marcushall42
May 6, 2021
Author

Hmm. If it was available to me, I gather that I would see a "Convert to Discussion" button on the
right, but I don't, so I guess this is asking to be converted when you get a chance..

0 replies

marcushall42 · 2021-05-06T23:04:03Z

marcushall42
May 6, 2021
Author

With trying to live with this problem for some time now, I do have a suggestion that I think would help.
I had thought that perhaps the "new -> typedef" entry in the DataTypeManager might give me what I wanted, but it seems that using a typedef gets replaced with the contents of the type when the typedef is used. That is, if I have an enum localMsgType and a typedef msgType => localMsgType, then if I define an element in a structure or function signature as a msgType, it gets replaced with localMsgType when the data is parsed.

What I want is perhaps an AliasDataType that contains the name of a DataType to alias to. When the type is referenced, it looks up the alias. Or perhaps it looks up itself or something. The idea is that One file defines an alias for msgType to be localMsgType, which is an enum with the particular message codes for this binary. Many functions take a msgType argument, and message structures take a msgType member in the structure, and the FunctionDefinitionDataType signatures and referencing structures all can be shared between different binaries, but when it comes time to use the DataType, it refers to the localMsgType enum in that particular binary which is not shared with other binaries.

Does that make any sense at all? I want to share structs and signatures that refer to msgType, but have a unique msgType for each binary (since each one has a different numbering).

Similarly, I might have an alias for "struct foo" that some binaries set up to refer to "struct foo1" and others refer to "struct foo2" because at some point in the code's evolution a few fields were added to the real struct foo, but I want to share everything that only references struct foo between all of the different binaries.

0 replies

dragonmacher · 2021-05-07T13:41:05Z

dragonmacher
May 7, 2021
Collaborator

Does that make any sense at all? I want to share structs and signatures that refer to msgType, but have a unique msgType for each binary (since each one has a different numbering).

I think this makes sense. You'd like a way to have certain types get replaced with program-specific versions when applied to a particular program. It seems like we'd need to have a new type, something like a Placeholder Type, that Ghidra knows must be resolved with a program-specific type. If that type does not exist, then the UI would flag that somehow so that you'd know to create that program's placeholder type before you can start using the containing data type.

but I want to share everything that only references struct foo between all of the different binaries.

Are you familiar with the notion of Data Type Synchronizing in Ghidra? This allows you to add types to a program archive from various other data type libraries. Then, as the types change, Ghidra shows that the source archive and program archive have diverged. There are actions to resynchronize the types in the archives. This is clearly not what you are asking for in this feature, but it does help somewhat with managing archives as types change.

0 replies

marcushall42 · 2021-05-07T14:00:26Z

marcushall42
May 7, 2021
Author

Yes, at the moment we have a two-layer hierarchy of data-type management. There is a top-layer archive with some basic types that seem pretty invariant. Then, as we see different families of binaries we break off datatypes into these family-specific archives. This is "manageable" but is a rather troublesome manual process. There are a few annoyances, like drag and drop is the only way to share, there is no context menu entry for "share with archive", so I have to drag an entry across 5 screens of datatypes to get to the library, which if it's expanded is easy to skip over the library itself...

And we do some things like using ushort for the message number in the structure of a generic message with a comment that this is the msgType because the enum is specific to that particular program (the issue above). This means that the message structure is sharable without causing conflicts if it had the actual enum datatype in the structure, but it also means that ghidra doesn't understand the data as well as it could.

BTW, we currently have ~30 binaries that are all related, and it is a struggle to manage this. Ghidra does a lot more than anything else I have ever seen. Things like bsim help (we also have a lot of home grown tools) and version tracker markup is the only thing that I've seen that helps with porting volatile data from one program to another. But it's a difficult problem which is still just starting to be solved.

0 replies

dragonmacher · 2021-05-07T14:22:12Z

dragonmacher
May 7, 2021
Collaborator

There are a few annoyances, like drag and drop is the only way to share, there is no context menu entry for "share with archive", so I have to drag an entry across 5 screens of datatypes to get to the library, which if it's expanded is easy to skip over the library itself...

Just between the two of us, I feel like our Data Type management never really matured.

(we also have a lot of home grown tools)

This really is the expectation. Obviously for user-specific needs, but also to address the tool's deficiencies, we assume our clients write into Ghidra the functionality that they need. The plugin and extension points features are designed to this end. Of course, being open source ultimately allows clients to make Ghidra whatever they need when the extendibility is not enough.

We have to walk a fine line when deciding which user-made features should be pulled back into the tool. Whatever we decide to put into the tool has to be useful and generic enough for us to easily maintain. As you are pointing out in this ticket, I think the overall Data Type workflow needs quite a bit of work. This is something that we should fix. Admittedly though, this is a hard problem. You have also pointed out that long-term management of RE'd libraries is something that has not gotten enough attention. This is something that likely would require more resources and collaboration for us to improve. Perhaps we need more great developers working on the Ghidra team... 🐉

2 replies

marcushall42 May 7, 2021
Author

Sure, some things may be a little rough, but overall it is quite capable. And it's 1000x better than anything else.

Creating an AliasDataType that holds a datatype name and some way to get to a datatypemanager isn't hard, but all of the places in the code like "dt instanceof Composite" would have to account for it.. Something like a DataType method to return the "real datatype" which is tested "dt.getRealType() instanceof Composite", or perhaps "dt.isInstanceOf(Composite)" or something. And such a sweeping change is daunting, and using the direct test would still creep in, since forcing the new method is not enforceable. So perhaps there is a more cleaver solution possible...

Although it seems general enough to be useful to anyone else facing a similar problem, is it useful enough for everyone to be truly worth the complexity?

Unfortunately, in our case the environment (and ghidra itself) is provided by our customer, and hacking something into ghidra and getting them to use it throughout the network is not really possible (perhaps if it was something truly critical..) So we are limited to ghidra_scripts. Which is a lot of capability, really.

astrelsky May 7, 2021

Sure, some things may be a little rough, but overall it is quite capable. And it's 1000x better than anything else.

Creating an AliasDataType that holds a datatype name and some way to get to a datatypemanager isn't hard, but all of the places in the code like "dt instanceof Composite" would have to account for it..

This is a fundamental problem with how that portion of code is laid out. Sure DataType is an ExtensionPoint and you can implement your own behavior. Unfortunately you only get the behavior you want until you try to apply it somewhere. Once Ghidra goes to create the DatabaseObject it will no longer behave the way you want and there is no way to change that behavior.

With the exception of DynamicDataType. You might be able to hack something up with that.

dragonmacher · 2021-05-07T18:12:50Z

dragonmacher
May 7, 2021
Collaborator

is it useful enough for everyone to be truly worth the complexity?

I think it probably is worth it in the context of a bigger data type overhaul.

1 reply

marcushall42 May 28, 2021
Author

Well, I did a little playing around to re-visit the TypedefDataType. It turns out that some simple experiments seem to show that it does provide the capabilities that I need. There are just a few caveats to get things to work with me...

I create an enum "MsgType_local" with the type values for this particular program.
I create a typedef "MsgType" that points to "MsgType_local"
Then, I create "Msg" structure that contains "MsgType" as one of the components.

Now, I copy "Msg" to my datatype archive and answer "Yes" to associate the types with the archive. This also copies and associates "MsgType" and "MsgType_local" into the archive. Then I remove the association of "MsgType_local"

Now, I can change entries in the MsgType_local in the program without triggering any updates to the archive. I can change elements in the "Msg" structure and update that to the archive, without affecting the "MsgType_local" in the archive (which is just a placeholder anyhow). So this use seems to be good...

Furthermore, (I haven't tested this yet, but it seems like it must work) I should be able to define various versions of a structure in the archive as "SomeStruct_V0" and "SomeStruct_V1", etc. Then two typedefs, "SomeStruct" typedef to "SomeStruct_local" and "SomeStruct_local" typedef to "SomeStruct_V1". Again, I disassociate "SomeStruct_local" from the archive. Now I can embed things like "SomeStruct *" in arguments or other structures and I should get to the appropriate version for the local program.

I just have to always remember to disassociate all of the _local typedefs from the archive whenever I pull in one of these datatypes. Probably create a category path just for them so that they are all segregated.

This might just work....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulties sharing DataType Archive between multiple programs in a project #3012

{{title}}

Replies: 9 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Difficulties sharing DataType Archive between multiple programs in a project #3012

marcushall42 Sep 20, 2019

Replies: 9 comments · 3 replies

mpaxson Dec 17, 2019

anarelion May 5, 2021

dev747368 May 5, 2021 Collaborator

marcushall42 May 6, 2021 Author

marcushall42 May 6, 2021 Author

dragonmacher May 7, 2021 Collaborator

marcushall42 May 7, 2021 Author

dragonmacher May 7, 2021 Collaborator

marcushall42 May 7, 2021 Author

astrelsky May 7, 2021

dragonmacher May 7, 2021 Collaborator

marcushall42 May 28, 2021 Author

marcushall42
Sep 20, 2019

Replies: 9 comments 3 replies

mpaxson
Dec 17, 2019

anarelion
May 5, 2021

dev747368
May 5, 2021
Collaborator

marcushall42
May 6, 2021
Author

marcushall42
May 6, 2021
Author

dragonmacher
May 7, 2021
Collaborator

marcushall42
May 7, 2021
Author

dragonmacher
May 7, 2021
Collaborator

marcushall42 May 7, 2021
Author

dragonmacher
May 7, 2021
Collaborator

marcushall42 May 28, 2021
Author