Add one more AT_DISPATCH for layer norm gamma scalar type #883

Liangliang-Ma · 2024-09-09T06:53:06Z

Original kernel only do AT_DISPATCH on input dtype, while weight/bias (gamma/beta) may have different data types. It will break when trying to get gamma.data_ptr<weight_t>(), for its wrong scalar type.

Same issue break comfirmed on cuda.

Fixing this by adding one more AT_DISPATCH.

xytintel · 2024-09-10T08:36:59Z

@Liangliang-Ma What scenarios need this fix? Since CUDA doesn’t specifically address this issue, is it necessary? @fengyuan14 @EikanWang

Liangliang-Ma · 2024-09-11T01:11:43Z

@Liangliang-Ma What scenarios need this fix? Since CUDA doesn’t specifically address this issue, is it necessary? @fengyuan14 @EikanWang

Met this in a ipex rebase 2.5 workload jira, which is a bart model doing question answering task on float16 precision(no AMP). It turned out to be easy to reproduce in single UT, also failing with lastest stock pytorch whl on CUDA.

import torch
from torch.nn import LayerNorm

dtype1 = torch.float16
inputs = torch.rand([1, 1024, 4096], dtype=dtype1, requires_grad=True).cuda()
ln = LayerNorm(4096, eps=1e-05, elementwise_affine=True).cuda()
outputs = ln(inputs)
#RuntimeError: expected scalar type Half but found Float

More detailed description:
LayerNorm's weight/bias is initialized with float32 by default if dtype is not specified in python interface. The layernorm kernel will get this kind of parameter: input--float16, weight--float32, bias--float32. But currently AT_DISPATCH will dispatch these three to same type <scalar_t, scalar_t, scalar_t>, which will end up with failing for getting data_ptr.

add dispatch for weight_t

959c4d0

Liangliang-Ma requested review from majing921201, guangyey, fengyuan14 and xytintel September 9, 2024 07:46

xytintel marked this pull request as draft November 5, 2024 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add one more AT_DISPATCH for layer norm gamma scalar type #883

Add one more AT_DISPATCH for layer norm gamma scalar type #883

Liangliang-Ma commented Sep 9, 2024 •

edited

Loading

xytintel commented Sep 10, 2024

Liangliang-Ma commented Sep 11, 2024 •

edited

Loading

Add one more AT_DISPATCH for layer norm gamma scalar type #883

Are you sure you want to change the base?

Add one more AT_DISPATCH for layer norm gamma scalar type #883

Conversation

Liangliang-Ma commented Sep 9, 2024 • edited Loading

xytintel commented Sep 10, 2024

Liangliang-Ma commented Sep 11, 2024 • edited Loading

Liangliang-Ma commented Sep 9, 2024 •

edited

Loading

Liangliang-Ma commented Sep 11, 2024 •

edited

Loading