Does DeepSpeed MoE support #device < #expert during inference #4574
Unanswered
iteratorlee
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been trying out DeepSpeed for MoE inference. Yet I found that codes from lines 269-272 might be buggy in DeepSpeed/deepspeed/inference/engine.py.
As depicted in the following figure, in line 269, if dist.get_world_size() is smaller than moe_ep_size (say there are 2 GPUs and 8 experts, and the moe_ep_size=8), num_ep_groups would be 0. In this case, the if branch in line 270 would never be reached.
I wonder if is this a redundant design. Or otherwise deepspeed does not support cases when #device<#expert (i.e., one device holds multiple experts) in MoE inference (because I'm pretty sure it is supported in training)?
Besides, I want to know if are there any off-the-shelf open-sourced pre-trained MoE models that can be used in DeepSpeed inference to test expert parallelism.
Thanks. Any replies would be appreciated! @tjruwase
Beta Was this translation helpful? Give feedback.
All reactions