Code for compared methods #12

AshkanTaghipour · 2023-12-17T04:48:12Z

Hi,
Thank you for your interesting work and in depth analysis.
i would like to ask about possibilty of releaseing the code of compared method in Table. 2

Karine-Huang · 2023-12-19T15:36:26Z

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench
CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2.
B-CLIP is to use BLIP to generate captions and use CLIP to calculate score.
B-VQA-n is to use the direct prompt as the question without being disentangled.
And the models are also from the official repos, with the same generator as in inference_eval

AshkanTaghipour · 2023-12-28T04:18:17Z

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2. B-CLIP is to use BLIP to generate captions and use CLIP to calculate score. B-VQA-n is to use the direct prompt as the question without being disentangled. And the models are also from the official repos, with the same generator as in inference_eval

Thank you,
for the BLIP-VQA evaluation, the final accuracy would be the average of the answers in the json file named "vqa_result.json" in "examples/annotation_blip/" directory?

Karine-Huang · 2023-12-28T04:28:59Z

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Retr0573 · 2023-12-29T06:46:28Z

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2. B-CLIP is to use BLIP to generate captions and use CLIP to calculate score. B-VQA-n is to use the direct prompt as the question without being disentangled. And the models are also from the official repos, with the same generator as in inference_eval

Hello! And early Happy New Year! As you said, I can see BVAQ, CLIP, UniDet and MiniGPT-cot eval in your project, but B-CLIP and B-VQA-n are not in the project, right? I just want to check this question and Thank you for this work, it helps a lot!

AshkanTaghipour · 2023-12-30T06:19:55Z

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well?
attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds)

Karine-Huang · 2024-01-04T12:04:06Z

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds)

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Karine-Huang · 2024-01-04T12:31:21Z

Hi! The compared methods are all from the official repos, and the usages are explained in T2I-CompBench CLIP code can be found in CLIP_similarity, with the argument "--complex" False for Table 2. B-CLIP is to use BLIP to generate captions and use CLIP to calculate score. B-VQA-n is to use the direct prompt as the question without being disentangled. And the models are also from the official repos, with the same generator as in inference_eval

Hello! And early Happy New Year! As you said, I can see BVAQ, CLIP, UniDet and MiniGPT-cot eval in your project, but B-CLIP and B-VQA-n are not in the project, right? I just want to check this question and Thank you for this work, it helps a lot!

Hello!
For B-CLIP, we use the same way as in Attend-and-Excite. You can use the official repo of BLIP to generate captions and use CLIP text-text similarities to calculate the similarity score of the generated captions and the ground truth.

For B-VQA-n, as explained in paper "BLIP-VQA-naive (denoted as
B-VQA-n) applies BLIP VQA to ask a single question (e.g., a green bench and a red car?) with the
whole prompt". All you need to do in BLIPvqa_eval/BLIP_vqa.py is (1) set np_num=1 (2) replace L#32-37 with image_dict['question']=f'{f}?' .

Hope this helps!

AshkanTaghipour · 2024-01-15T04:06:43Z

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds)

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Thank you very much,
is it possible to add the spatial checkpoints for GORS to the repo?
thanks

Karine-Huang · 2024-01-16T11:18:56Z

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds)

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Thank you very much, is it possible to add the spatial checkpoints for GORS to the repo? thanks

The checkpoints are added in GORS_finetune/checkpoint. Thank you!

Chao0511 · 2024-01-29T14:55:11Z

Yes, BLIP-VQA score is the average of the results from "vqa_result.json". We have updated BLIPvqa_eval/BLIP_vqa.py in L#117-120 to complete the calculation of the average. Thank you!

Thank you for your response, for the UniDet evaluation, there is one file ofter evaluation called vqa_result.json, however for the blip there are other files called color_test.json that help to find one to one mapping for the given image and the result, is it possible to add something similar for the UniDet as well? attached is the vqa_results.json for the UniDet evaluation, and as you know, analysing such a result would be hard while one is working with several generated images per prompt (several seeds)

Hello! The mapping file is added in UniDet_eval/determine_position_for_eval.py, and it will be saved as mapping.json in the same directory of vqa_result.json. Thank you!

Thank you very much, is it possible to add the spatial checkpoints for GORS to the repo? thanks

The checkpoints are added in GORS_finetune/checkpoint. Thank you!

Hello, thank you for releasing the code.
About your finetuning process,
(1) why you chose text encoder's learning rate as 5e-6? e.g. 5e-5 doesn't work?
(2) did you find denoising loss gradually growing or it remained quite stable?
(3) why you didn't consider unconditional denoising loss? e.g. randomly drop 10% text prompts to improve classifier-free guidance
Many thanks!

Karine-Huang · 2024-01-31T16:50:03Z

(1) why you chose text encoder's learning rate as 5e-6? e.g. 5e-5 doesn't work?
(2) did you find denoising loss gradually growing or it remained quite stable?
(3) why you didn't consider unconditional denoising loss? e.g. randomly drop 10% text prompts to improve classifier-free guidance

Thanks for your questions!
For (1):
We chose the text encoder's learning rate as 5e-6 because it was set as the default value in the training script available at this link: link.
For (2):
The denoising loss exhibited some oscillations, but overall, it trended downwards.
For (3):
We did not consider unconditional denoising loss, such as randomly dropping 10% of text prompts to enhance classifier-free guidance. While this approach may improve performance, the focus was primarily on demonstrating the positive effects of the reweighting method on results of alignment, thus it was not experimented with in our case.

Chao0511 · 2024-02-01T08:46:03Z

(1) why you chose text encoder's learning rate as 5e-6? e.g. 5e-5 doesn't work?
(2) did you find denoising loss gradually growing or it remained quite stable?
(3) why you didn't consider unconditional denoising loss? e.g. randomly drop 10% text prompts to improve classifier-free guidance

Thanks for your questions! For (1): We chose the text encoder's learning rate as 5e-6 because it was set as the default value in the training script available at this link: link. For (2): The denoising loss exhibited some oscillations, but overall, it trended downwards. For (3): We did not consider unconditional denoising loss, such as randomly dropping 10% of text prompts to enhance classifier-free guidance. While this approach may improve performance, the focus was primarily on demonstrating the positive effects of the reweighting method on results of alignment, thus it was not experimented with in our case.

Thank you very much for your reply, it is very clear. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code for compared methods #12

Code for compared methods #12

AshkanTaghipour commented Dec 17, 2023

Karine-Huang commented Dec 19, 2023 •

edited

Loading

AshkanTaghipour commented Dec 28, 2023

Karine-Huang commented Dec 28, 2023

Retr0573 commented Dec 29, 2023

AshkanTaghipour commented Dec 30, 2023

Karine-Huang commented Jan 4, 2024

Karine-Huang commented Jan 4, 2024

AshkanTaghipour commented Jan 15, 2024

Karine-Huang commented Jan 16, 2024

Chao0511 commented Jan 29, 2024 •

edited

Loading

Karine-Huang commented Jan 31, 2024 •

edited

Loading

Chao0511 commented Feb 1, 2024

Code for compared methods #12

Code for compared methods #12

Comments

AshkanTaghipour commented Dec 17, 2023

Karine-Huang commented Dec 19, 2023 • edited Loading

AshkanTaghipour commented Dec 28, 2023

Karine-Huang commented Dec 28, 2023

Retr0573 commented Dec 29, 2023

AshkanTaghipour commented Dec 30, 2023

Karine-Huang commented Jan 4, 2024

Karine-Huang commented Jan 4, 2024

AshkanTaghipour commented Jan 15, 2024

Karine-Huang commented Jan 16, 2024

Chao0511 commented Jan 29, 2024 • edited Loading

Karine-Huang commented Jan 31, 2024 • edited Loading

Chao0511 commented Feb 1, 2024

Karine-Huang commented Dec 19, 2023 •

edited

Loading

Chao0511 commented Jan 29, 2024 •

edited

Loading

Karine-Huang commented Jan 31, 2024 •

edited

Loading