Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HaloAttention掩码问题 #121

Open
L-wwww opened this issue Dec 19, 2024 · 0 comments
Open

HaloAttention掩码问题 #121

L-wwww opened this issue Dec 19, 2024 · 0 comments

Comments

@L-wwww
Copy link

L-wwww commented Dec 19, 2024

mask out padding (in the paper, they claim to not need masks, but what about padding?)

    mask = torch.ones(1, 1, h, w, device = device)
    mask = F.unfold(mask, kernel_size = block + (halo * 2), stride = block, padding = halo)
    mask = repeat(mask, '() j i -> (b i h) () j', b = b, h = heads)
    mask = mask.bool()

    max_neg_value = -torch.finfo(sim.dtype).max
    sim.masked_fill_(mask, max_neg_value)

代码似乎将所有非0值置为无穷小,根据注释似乎是想将padding位置 置为无穷小

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant