Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There seems to be no way to determine the base of a number when there are no analysis hints defined #17727

Closed
plaets opened this issue Sep 28, 2020 · 12 comments
Labels
API New API requests, changes, removal RAnal

Comments

@plaets
Copy link
Contributor

plaets commented Sep 28, 2020

I'm trying to implement a hotkey to toggle the immediate base in Cutter. For the hotkey to always work correctly, I need to obtain the current immediate base for the selected instruction. However, there seems to be no way to get that information, unless an analysis hint was defined for that particular address earlier.
The only alternative I have considered is parsing the opcode, but I'm not sure if that's a good idea (radare supports many different instruction sets, syntaxes, and disassemblers). aoj/ao does not seem to be useful in this case either.

See https://github.com/radareorg/cutter/pull/2429#issuecomment-699868626 :

@ret2libc

I think there is no way currently to determine what is the base used for a number when no hint is defined yet. I see https://github.com/radareorg/radare2/blob/master/libr/parse/filter.c#L402 , which seems to mean that whatever is returned by the disassembler is shown to the user. Capstone seems to show the number as "decimal" if < 10 (by "decimal" i mean that it doesn't show 0x, but of course numbers < 10 are the same in hex and dec), but you probably can't assume that, as radare2 supports multiple disassemblers.

Expected behavior

[0x00003362]> pd1
│           0x00003362      mov     edx, 5 
[0x00003362]> ahj 0x00003362
[{"addr":13154,"immbase":10}] 
[0x00003362]> pd1 @ 0x00003a94
│           0x00003a94      lea     rax, [0x00007766] 
[0x00003a94]> ahj 0x00003a94
[{"addr":14996,"immbase":16}] 

(Or maybe a different command to get the default immediate base or other information that could be useful in this case)

Actual behavior

[0x00003362]> pd1
│           0x00003362      mov     edx, 5 
[0x00003362]> ahj 0x00003362
[] 
[0x00003362]> ahi 10 @ 0x00003362

[0x00003362]> ahj 0x00003362
[{"addr":13154,"immbase":10}] 
(...)

Related to

https://github.com/radareorg/cutter/pull/2429

@ret2libc
Copy link
Contributor

Do you need a command? Or is an API enough for Cutter? Because adding a command just for this seems a bit overkill to me, but an API can be easily added. Also, having a RAnalHint for each instruction that has a number could be too much to store IMO.

@ret2libc ret2libc added API New API requests, changes, removal RAnal labels Sep 28, 2020
@plaets
Copy link
Contributor Author

plaets commented Sep 28, 2020

I'm still a beginner so I'm not sure (I believe that Cutter communicates with radare using commands?).
@ITAYC0HEN
Having a hint for every instruction does not sound like a good idea, I agree.

@plaets
Copy link
Contributor Author

plaets commented Sep 28, 2020

Ok, I guess Cutter uses radare2 API after all (not sure why this wasn't obvious for me), only some actions are performed by commands. I guess an API would be enough then.

@ret2libc
Copy link
Contributor

@trufae considering guessing which base is used by each disassembler seem like a very bad solution, maybe we could enforce a "r2 rule" for immediates. For example, numbers <10 are showed in decimal, without 0x, while numbers >= 10 are shown in hexadecimal, with 0x. This would have the disadvantage of always having some filter operations on instructions, but it would ensure consistency and it would avoid having RAnalHints saved everywhere. By having this hard rule, it would be very easy for r2 to provide an API that gets the base used by an instruction. WDYT?

@trufae
Copy link
Collaborator

trufae commented Sep 29, 2020

This rule is defined in sdb already, and capstone have a similar behaviour i think

@ret2libc
Copy link
Contributor

This rule is defined in sdb already, and capstone have a similar behaviour i think

Where exactly? Could you point to where this is done? I know capstone uses the same behaviour, but as we don't support only capstone we should provide a consistent behaviour across disassemblers probably.

@trufae
Copy link
Collaborator

trufae commented Sep 29, 2020

A quick grep spotted this:

212 SDB_API int sdb_array_add_num(Sdb *s, const char *key, ut64 val, ut32 cas) {
213         char buf[SDB_NUM_BUFSZ];
214         char *v = sdb_itoa (val, buf, SDB_NUM_BASE);
215         if (!sdb_array_contains (s, key, v, NULL)) {
216                 if (val < 256) {
217                         char *v = sdb_itoa (val, buf, 10);
218                         return sdb_array_add (s, key, v, cas);
219                 }
220         }
221         return sdb_array_add (s, key, v, cas);
222 }

also this thing was discussed in the capstone repo as well

@ret2libc
Copy link
Contributor

Sorry, I'm not sure I understand. What has sdb_array_add_num has to do with the way numbers are shown in the disassembly?

@plaets
Copy link
Contributor Author

plaets commented Oct 4, 2020

I checked this with a couple of disassemblers and for me, it seems like the way the immediates are formatted by default is dependent on the disassembler. For example, x86 usually outputs immediates below 10 as decimals, brainfuck outputs all immediates as decimals, in GNU PowerPC it's dependent on the instruction (lhz, lis... display immediates as decimal, call, bsola... as hexadecimal).

What would also help me solve this issue (and another one related to that cutter pull request) is having an api/command to get the immediates with the formatting applied. I'm not sure if such a feature could be implemented easily, are the immediate base analysis hints converted to some kind of configuration that's passed to external disassemblers? Or are those hints used only in radare when displaying the assembly?

(Also, I noticed that wa is kind of counter-intuitive, at least on x86. wa mov rax, 9 writes mov rax, 9 however wa mov rax, 10 writes mov rax, 0x10. wa mov rax, a does not assemble until I put 0x before a, which makes sense to me, however, the examples before don't)

@ret2libc
Copy link
Contributor

ret2libc commented Oct 5, 2020

Or are those hints used only in radare when displaying the assembly?

This.

However, in some cases we also pass some options to the disassembler. For example to x86 capstone we pass an option to specify to always use hexadecimal IIRC.

@ITAYC0HEN
Copy link
Contributor

How do you suggest to proceed with this issue? I am sure that @plaets is not the only one who is confused regarding this. Me as well. Do we have a way to determine the base of the number without analysis hints everywhere?

@trufae
Copy link
Collaborator

trufae commented Aug 27, 2024

if the number starts with 0x its hexadecimal, otherwise it's decimal. that's the way to determine the base number if no hints are defined, you can't rely on all the disassembler engines or parse plugins to behave the same way because users can modify them, and there's no need to overengineer this simple topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API New API requests, changes, removal RAnal
Projects
None yet
Development

No branches or pull requests

4 participants