Attention docstring missing head dimension for arguments mask and nonbatched_bias #894

louis-rf · 2024-02-05T11:19:47Z

Linked to the following PR and fork.

The Attention module is only called in four places, with the following shapes:

- TriangleAttention:           [q_data: (B, Q, ..); m_data: (B, K, ..); mask: (B, H=1, Q=1, K); nonbatched_bias: (H, Q, K)]
- TemplateEmbedding:           [q_data: (B, Q, ..); m_data: (B, K, ..); mask: (B, H=1, Q=1, K)]
- MSARowAttentionWithPairBias: [q_data: (B, Q, ..); m_data: (B, K, ..); mask: (B, H=1, Q=1, K); nonbatched_bias: (H, Q, K)]
- MSAColumnAttention:          [q_data: (B, Q, ..); m_data: (B, K, ..); mask: (B, H=1, Q=1, K)]

The mask always has new axes for dimensions 2 and 3. In all four cases it is wrapped in alphafold.model.mapping.inference_subbatch, but this doesn't affect the dimensions, only the sizes.

The docstring gives incorrect shapes for the arguments: mask, nonbatched_bias, based on the usage it should be:

      mask: A mask for the attention, shape [batch_size, N_heads, N_queries, N_keys].
      nonbatched_bias: Shared bias, shape [N_heads, N_queries, N_keys].

instead of:

      mask: A mask for the attention, shape [batch_size, N_queries, N_keys].
      nonbatched_bias: Shared bias, shape [N_queries, N_keys].

This is clear when looking at where mask is used:

...

    logits = jnp.einsum('bqhc,bkhc->bhqk', q, k)
    if nonbatched_bias is not None:
      logits += jnp.expand_dims(nonbatched_bias, axis=0)
    logits = jnp.where(mask, logits, _SOFTMAX_MASK)
...

as the output of the einsum has shape: bhqk.

I believe some implementations of attention wont have a head dimension in the mask, since it is not used in AlphaFold maybe it would be worth removing it in the mask when attention is called (and including an expand_dims for this head dimension within the attention module). But only changing the docstring is easier, and it is still a valid implementation of Attention so I think that is the way to go.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention docstring missing head dimension for arguments mask and nonbatched_bias #894

Attention docstring missing head dimension for arguments mask and nonbatched_bias #894

louis-rf commented Feb 5, 2024

Attention docstring missing head dimension for arguments mask and nonbatched_bias #894

Attention docstring missing head dimension for arguments mask and nonbatched_bias #894

Comments

louis-rf commented Feb 5, 2024