-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llext: Fix off-by-one in RISC-V truncation check #83198
llext: Fix off-by-one in RISC-V truncation check #83198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks great, it would be a good value add here to have a test validating this edge that clearly got tripped on
I will add a second commit with a test for the error condition, using a cb-type instruction as an example. |
@teburd a test that uses both the maximum positive and negative values of the immediate has been added in a separate commit. |
Small formatting issue, but the tests are much appreciated by me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, though I expect a small force push will cause some churn again. Happy to reapprove again if needed
66d11e6
to
9381ec2
Compare
The RISC-V architecture-specific relocations need to check whether each required relocation can fit into the modified instruction's immediate. All immediates in RISC-V are encoded as two's complement. The current truncation check has an off-by-one error for checking the maximum negative distance, as two's complement encoding can represent a negative value that is the maximum positive value plus one, causing LLEXT to refuse loading valid code. This commit adds an additional condition to the check that fixes the aforementioned issue. Signed-off-by: Eric Ackermann <[email protected]>
9381ec2
to
e457c2a
Compare
This branch has been rebased to main; I have tested again that undoing the fix in the first commit causes tests to fail in the expected manner. |
|
||
/* | ||
* 2+4 bytes for _backward_jump_target itself | ||
* need to insert 122 padding bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct? The comment says 122 bytes, but the code in line 34 seems to be adding 256 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, the comment is out-of-date, it should state that 252 bytes of padding are needed. The c.addi
and ret
instructions are 2 bytes each, and we need to have exactly 256 bytes between _do_jump
and _backward_jump_target
to trigger the edge case.
I will update this and also change ret
to a c.jr
compressed jump to make this more obvious.
e457c2a
to
c6a42d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor clarification - otherwise LGTM!
* we pad with 0xff here, which at the time of writing is an invalid opcode and causes | ||
* an exception should it ever be executed | ||
*/ | ||
.fill 126, 2, 0xff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will actually fill the file with sequences of 0x00 0xff
. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, this is not what I intended.
On second thought, my padding with invalid instructions to detect silent failures with incorrect relocation is not really future-proof - there is no guarantee that the padding will remain an invalid instruction forever.
Thus, I have changed the padding to use 2-byte return instructions instead and added padding for both possible jump directions, clearly and obviously causing a test failure if the relocation is not exactly correct (too short or in the wrong direction).
All immediates in RISC-V are encoded as two's complement. This commit adds a test for relocating jumps that utilize the full range of the immediate, in both positive and negative direction. To this end, the test uses the compressed b-type (CB) instruction to branch to its maximum negative (-256) and maximum positive (+254) targets. In case of test failure, expect relocating the corresponding llext to fail. Signed-off-by: Eric Ackermann <[email protected]>
c6a42d9
to
8ae53ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the detail work, thanks for this!
The RISC-V architecture-specific relocations need to check whether each required relocation can fit into the modified instruction's immediate. All immediates in RISC-V are encoded as two's complement. The current truncation check has an off-by-one error for checking the maximum negative distance, as two's complement encoding can represent a negative value that is the maximum positive value plus one, causing LLEXT to refuse loading valid code.
This commit adds an additional condition to the check that fixes the aforementioned issue.
For a concrete example of the problem, consider the CB-type instruction: It has an 8-bit immediate with an explicit LSB of 0. The current code accepts relocations for CB-type instruction within a range of [-255,+255] from the operation's location, relying on standard RISC-V alignment requirements to ensure that the LSB is 0.
However, while the maximum positive jump distance of 255 is correct, the maximum negative jump distance for a CB-type instruction is actually -256 bit. Here is an example in a RISC-V assembler for a jump with -256 byte offset: https://luplab.gitlab.io/rvcodecjs/#q=c.beqz+a0,+-256&abi=false&isa=AUTO
Here is an example for trying to jump +256 byte (interpreted as -256 byte jump): https://luplab.gitlab.io/rvcodecjs/#q=c.beqz+a0,+256&abi=false&isa=AUTO
The RISC-V ISA manual states (somewhat misleadingly in my opinion) that the CB-type instruction can perform jumps in a [-256,+256] byte range: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-a8395e7-2024-12-16/riscv-privileged.pdf, page 167; I have raised a separate issue for the ISA manual: riscv/riscv-isa-manual#1781