About ROPE in sample process #54

Leedonus · 2024-08-06T01:51:50Z

Hi, thanks for the interesting work. I want to know why the PE of the text token in generating process is set to zero?

zxduan90 · 2024-08-06T09:40:50Z

@daiyixiang666 But these zeros will make the q and k of the text_token zero, according to

Line 220 in ce98ec4

xq = apply_rotary_emb(xq, freqs_cis)

, then the attention of the output token and the text token will be the same

daiyixiang666 · 2024-08-06T09:57:35Z

Oh， I see， so the text only make effect via xv?

zxduan90 · 2024-08-06T10:03:26Z

@daiyixiang666 yes, i think there is a problem in it

zxduan90 · 2024-08-06T10:07:19Z

The text token has less relevance with the image compared to stable diffusion method

daiyixiang666 · 2024-08-06T10:14:54Z

And it will also mean the text attention mask is useless?

This comment was marked as abuse.

Sign in to view

Provide feedback