Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Controller: Fix HCI command buffer allocation failure #83774

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cvinayak
Copy link
Contributor

@cvinayak cvinayak commented Jan 10, 2025

Fix HCI command buffer allocation failure, that can cause loss of Host Number of Completed Packets command.

Fail by rejecting the HCI Host Buffer Size command if the required number of HCI command buffers are not allocated in the Controller implementation.

Relates to commit 8161430 ("Bluetooth: Add workaround
for no command buffer available")'.

Relates to commit 297f4f4 ("Bluetooth: Split HCI
command & event buffers to two pools").

Relates to #81866

One of many CI failure exposing incorrect allocation:

tests/bsim/bluetooth/host/l2cap/einprogress/test_scripts/run.sh FAILED (0.838 s)
d_01: @00:00:00.000000  *** Booting Zephyr OS build v4.0.0-3133-ge9f6c00b05bc ***
d_01: @00:00:00.000000 INFO: Test start: tester
d_00: @00:00:00.000000  *** Booting Zephyr OS build v4.0.0-3133-ge9f6c00b05bc ***
d_00: @00:00:00.000000 INFO: Test start: dut
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [sysworkq] bt_ctlr_hci: FC: Require Host ACL packets (2) < CONFIG_BT_BUF_CMD_TX_COUNT (2)
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [main] bt_hci_core: opcode 0x0c33 status 0x12 
d_01: @00:00:00.005768  ASSERTION FAIL [!err] @ WEST_TOPDIR/zephyr/tests/bsim/bluetooth/host/l2cap/einprogress/src/tester.c:51
d_01: @00:00:00.005768  @ WEST_TOPDIR/zephyr/lib/os/assert.c:43
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Current thread: 0x826bc40 (main)
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Halting system
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [sysworkq] bt_ctlr_hci: FC: Require Host ACL packets (2) < CONFIG_BT_BUF_CMD_TX_COUNT (2)
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [main] bt_hci_core: opcode 0x0c33 status 0x12 
d_00: @00:00:00.005768  ASSERTION FAIL [!err] @ WEST_TOPDIR/zephyr/tests/bsim/bluetooth/host/l2cap/einprogress/src/dut.c:108
d_00: @00:00:00.005768  @ WEST_TOPDIR/zephyr/lib/os/assert.c:43
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Current thread: 0x826bc40 (main)
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Halting system
p_2G4: WARNING: (src/bs_pc_base.c:437): Device 0 left the party unsuspectingly.. I treat it as if it disconnected
timeout: the monitored command dumped core
/__w/zephyr/zephyr/tests/bsim/sh_common.source: line 22: 413863 Trace/breakpoint trap   $@
p_2G4: WARNING: (src/bs_pc_base.c:437): Device 1 left the party unsuspectingly.. I treat it as if it disconnected
timeout: the monitored command dumped core
/__w/zephyr/zephyr/tests/bsim/sh_common.source: line 22: 413864 Trace/breakpoint trap   $@

Fix HCI command buffer allocation failure, that can cause
loss of Host Number of Completed Packets command.

Fail by rejecting the HCI Host Buffer Size command if the
required number of HCI command buffers are not allocated in
the Controller implementation.

Relates to commit 8161430 ("Bluetooth: Add workaround
for no command buffer available")'.

Relates to commit 297f4f4 ("Bluetooth: Split HCI
command & event buffers to two pools").

Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
@cvinayak cvinayak force-pushed the github_hci_cmd_buf_alloc_check branch from 3881e41 to 7c899ff Compare January 10, 2025 09:05
Comment on lines +205 to +208
When Controller to Host data flow control is enabled in the Host only
build, ensure that BT_BUF_CMD_TX_COUNT is at least 2, to be able to
send one normal HCI command and one Host Number of Completed Packets
command.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be enforced via Kconfig? Something like range 2 64 if !BT_CTLR && BT_HCI_ACL_FLOW_CONTROL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, depends on !BT_HCI_RAW || !HAS_BT_CTLR ?

I want to keep BT_BUF_CMD_TX_COUNT only for Host only and Host+Controller build, but not for Controller-only build, as it does not make sense when the Controller only supports 1 Num HCI Command Packets + BT_BUF_ACL_RX_COUNT (for Host Num Completed Packets commands).... where do you think is the right place to define BT_BUF_CMD_TX_COUNT internally, so that it is reference today in hci_raw.c and in the Controller code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it in the place here, but make it promptless and default to 1 for controller-only builds?

Comment on lines +210 to +213
When Controller to Host data flow control is supported in the
Controller only build, ensure that BT_BUF_CMD_TX_COUNT is greater than
or equal to (BT_BUF_ACL_RX_COUNT + Ncmd), where Ncmd is supported
maximum Num_HCI_Command_Packets in the Controller implementation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this can be enforced via Kconfig, but we can have BUILD_ASSERTs for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added runtime check on Host Buffer Size command and will derive for Controller-only (not allow it to be configured incorrectly).

Comment on lines +561 to +563
* Note: Zephyr Controller does not support Num_HCI_Command_Packets > 1.
*/
if (acl_pkts >= CONFIG_BT_BUF_CMD_TX_COUNT) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should CONFIG_BT_BUF_CMD_TX_COUNT be limited to range 1 1 for the ZLL then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 when not enabling Controller to Host data flow control, and (1 + BT_BUF_ACL_RX_COUNT) when enabled.

Comment on lines +193 to +195
HCI Controllers may not support Num_HCI_Command_Packets > 1, hence
default to 1 when not enabling Controller to Host data flow control,
BT_HCI_ACL_FLOW_CONTROL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph isn't completely clear to me. Even if a controller supports ncmd > 1, the Zephyr host has a command queue and will always round down the value to 1, or rather, treat it as a boolean:

/* Allow next command to be sent */
if (ncmd) {
k_sem_give(&bt_dev.ncmd_sem);
bt_tx_irq_raise();
}

The other help text paragraphs sort of make sense, but I just wanted to make sure the above host implementation detail is clear, so that this PR isn't making any false assumptions (since I haven't quite wrapped my head around all the details of it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Zephyr Host only implements ncmd = 1. See,

/* Give cmd_sem allowing to send first HCI_Reset cmd, the only
* exception is if the controller requests to wait for an
* initial Command Complete for NOP.
*/
if (!IS_ENABLED(CONFIG_BT_WAIT_NOP)) {
k_sem_init(&bt_dev.ncmd_sem, 1, 1);
} else {
k_sem_init(&bt_dev.ncmd_sem, 0, 1);
}

I guess it is not difficult to use the same semaphore and we able to support ncmd > 1, but we do not have support in native Controller today for that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Zephyr host has a command queue and will always round down the value to 1, or rather, treat it as a boolean:

@jhedberg I believe the implementation "can" support ncmd > 1, but today ncmd in the evt is only used as a bool to pause the queuing, but the ncmd_sem if initialized correctly would support ncmd > 1. But we are not going towards that in this PR.

@cvinayak cvinayak requested a review from KyraLengfeld January 10, 2025 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants