Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code for compared methods #12

Open
AshkanTaghipour opened this issue Dec 17, 2023 · 12 comments
Open

Code for compared methods #12

AshkanTaghipour opened this issue Dec 17, 2023 · 12 comments

Comments

@AshkanTaghipour
Copy link

Hi,
Thank you for your interesting work and in depth analysis.
i would like to ask about possibilty of releaseing the code of compared method in Table. 2

@Karine-Huang
Copy link
Owner

Karine-Huang commented Dec 19, 2023

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench
CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2.
B-CLIP is to use BLIP to generate captions and use CLIP to calculate score.
B-VQA-n is to use the direct prompt as the question without being disentangled.
And the models are also from the official repos, with the same generator as in inference_eval

@AshkanTaghipour
Copy link
Author

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2. B-CLIP is to use BLIP to generate captions and use CLIP to calculate score. B-VQA-n is to use the direct prompt as the question without being disentangled. And the models are also from the official repos, with the same generator as in inference_eval

Thank you,
for the BLIP-VQA evaluation, the final accuracy would be the average of the answers in the json file named "vqa_result.json" in "examples/annotation_blip/" directory?
image

@Karine-Huang
Copy link
Owner

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

@Retr0573
Copy link

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2. B-CLIP is to use BLIP to generate captions and use CLIP to calculate score. B-VQA-n is to use the direct prompt as the question without being disentangled. And the models are also from the official repos, with the same generator as in inference_eval

Hello! And early Happy New Year! As you said, I can see BVAQ, CLIP, UniDet and MiniGPT-cot eval in your project, but B-CLIP and B-VQA-n are not in the project, right? I just want to check this question and Thank you for this work, it helps a lot!

@AshkanTaghipour
Copy link
Author

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well?
attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds)
image

@Karine-Huang
Copy link
Owner

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds) image

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

@Karine-Huang
Copy link
Owner

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2. B-CLIP is to use BLIP to generate captions and use CLIP to calculate score. B-VQA-n is to use the direct prompt as the question without being disentangled. And the models are also from the official repos, with the same generator as in inference_eval

Hello! And early Happy New Year! As you said, I can see BVAQ, CLIP, UniDet and MiniGPT-cot eval in your project, but B-CLIP and B-VQA-n are not in the project, right? I just want to check this question and Thank you for this work, it helps a lot!

Hello!
For B-CLIP, we use the same way as in Attend-and-Excite. You can use the official repo of BLIP to generate captions and use CLIP text-text similarities to calculate the similarity score of the generated captions and the ground truth.

For B-VQA-n, as explained in paper "BLIP-VQA-naive (denoted as
B-VQA-n) applies BLIP VQA to ask a single question (e.g., a green bench and a red car?) with the
whole prompt". All you need to do in BLIPvqa_eval/BLIP_vqa.py is (1) set np_num=1 (2) replace L#32-37 with image_dict['question']=f'{f}?' .

Hope this helps!

@AshkanTaghipour
Copy link
Author

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds) image

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Thank you very much,
is it possible to add the spatial checkpoints for GORS to the repo?
thanks

@Karine-Huang
Copy link
Owner

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds) image

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Thank you very much, is it possible to add the spatial checkpoints for GORS to the repo? thanks

The checkpoints are added in GORS_finetune/checkpoint. Thank you!

@Chao0511
Copy link

Chao0511 commented Jan 29, 2024

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds) image

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Thank you very much, is it possible to add the spatial checkpoints for GORS to the repo? thanks

The checkpoints are added in GORS_finetune/checkpoint. Thank you!

Hello, thank you for releasing the code.
About your finetuning process,
(1) why you chose text encoder's learning rate as 5e-6? e.g. 5e-5 doesn't work?
(2) did you find denoising loss gradually growing or it remained quite stable?
(3) why you didn't consider unconditional denoising loss? e.g. randomly drop 10% text prompts to improve classifier-free guidance
Many thanks!

@Karine-Huang
Copy link
Owner

Karine-Huang commented Jan 31, 2024

(1) why you chose text encoder's learning rate as 5e-6? e.g. 5e-5 doesn't work?
(2) did you find denoising loss gradually growing or it remained quite stable?
(3) why you didn't consider unconditional denoising loss? e.g. randomly drop 10% text prompts to improve classifier-free guidance

Thanks for your questions!
For (1):
We chose the text encoder's learning rate as 5e-6 because it was set as the default value in the training script available at this link: link.
For (2):
The denoising loss exhibited some oscillations, but overall, it trended downwards.
For (3):
We did not consider unconditional denoising loss, such as randomly dropping 10% of text prompts to enhance classifier-free guidance. While this approach may improve performance, the focus was primarily on demonstrating the positive effects of the reweighting method on results of alignment, thus it was not experimented with in our case.

@Chao0511
Copy link

Chao0511 commented Feb 1, 2024

(1) why you chose text encoder's learning rate as 5e-6? e.g. 5e-5 doesn't work?
(2) did you find denoising loss gradually growing or it remained quite stable?
(3) why you didn't consider unconditional denoising loss? e.g. randomly drop 10% text prompts to improve classifier-free guidance

Thanks for your questions! For (1): We chose the text encoder's learning rate as 5e-6 because it was set as the default value in the training script available at this link: link. For (2): The denoising loss exhibited some oscillations, but overall, it trended downwards. For (3): We did not consider unconditional denoising loss, such as randomly dropping 10% of text prompts to enhance classifier-free guidance. While this approach may improve performance, the focus was primarily on demonstrating the positive effects of the reweighting method on results of alignment, thus it was not experimented with in our case.

Thank you very much for your reply, it is very clear. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants