Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question! #111

Open
bxhsort opened this issue Dec 17, 2024 · 2 comments
Open

question! #111

bxhsort opened this issue Dec 17, 2024 · 2 comments

Comments

@bxhsort
Copy link

bxhsort commented Dec 17, 2024

Hi, I want to ask you a question, is the process the same as the diffusion idea for adding convolutional layers to retain more information? Algorithm 1 is for denoising, and algorithm 2 is for denoising the reconstructed image! I have a question about this place and I hope you can answer it. Thank you very much

@iFighting
Copy link
Contributor

iFighting commented Dec 17, 2024

Hi, I want to ask you a question, is the process the same as the diffusion idea for adding convolutional layers to retain more information? Algorithm 1 is for denoising, and algorithm 2 is for denoising the reconstructed image! I have a question about this place and I hope you can answer it. Thank you very much

@bxhsort Actually, it is not the case, The algorithm 1 and 2 is for the var image tokenizer encoding & decoding setting, not the image generation process. The image generation is done by the autoregressive process.

@bxhsort
Copy link
Author

bxhsort commented Dec 17, 2024

Hi, I want to ask you a question, is the process the same as the diffusion idea for adding convolutional layers to retain more information? Algorithm 1 is for denoising, and algorithm 2 is for denoising the reconstructed image! I have a question about this place and I hope you can answer it. Thank you very much

@bxhsort Actually, it is not the case, The algorithm 1 and 2 is for the var image tokenizer encoding & decoding setting, not the image generation process. The image generation is done by the autoregressive process.

Thank you for your answer. My understanding is that algorithms 1 and 2 provide GT for the subsequent autoregressive training. Is this process to better train the token and its Embedding, which is similar to the training of tokens in nlp? The process of stage2 is an autoregressive process, which is also a process of inference. In algorithm 1, after you extract features with convolution layer, you subtract them with image feature f. I have some doubts about this place, and I would like to ask why you can save information well by doing this. Looking forward to your reply, thank you very much!
Uploading 屏幕截图 2024-12-17 164004.png…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants