Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's the difference between the model file base and dave, 1 shot and 3 shot. Can I give 5 shot or more? #5

Open
lisenjie757 opened this issue Jun 15, 2024 · 6 comments

Comments

@lisenjie757
Copy link

lisenjie757 commented Jun 15, 2024

And the demo only provide 3-shot inference, how to do zero-shot inference on my own image, can you provide a demo_zero? Thank you!

@jerpelhan
Copy link
Owner

The provided models are optimized for a specific number of inputs. I believe the method should work when adding more exemplars, but I am unsure if it yields better results. If you test this, let me know your findings.
I will see if in the future I have the time to add a zero-shot demo.

@GioFic95
Copy link
Contributor

I would also appreciate a clarification of the difference between the models base_0_shot.pth and DAVE_0_shot.pth, base_3_shot.pth and DAVE_3_shot.pth.

Moreover, I'm trying to implement a zero-shot demo, but I have some doubts:

  • From the zero-shot test script it looks like the num_objects parameter should be 3 (ref), in fact, zero-shot models (i.e. base_0_shot.pth and DAVE_0_shot.pth) have objectness shape equal to (3, 9, 256), where the first dimension is self.num_objects (ref). Could you explain why?
  • Running your demo with --num_objects 3 --model_name DAVE_0_shot --zero_shot and changing the model path to os.path.join(args.model_path, 'DAVE_0_shot.pth') it works (i.e. no errors), but the model still seems to be taking exemplars into account, since it produces different results when provided with different exemplars, both with and without --use_query_pos_emb. Is this expected?

So, could you please provide some hints or code references on how to implement a zero-shot demo?

Thank you!

@jerpelhan
Copy link
Owner

jerpelhan commented Jun 28, 2024

I would also appreciate a clarification of the difference between the models base_0_shot.pth and DAVE_0_shot.pth, base_3_shot.pth and DAVE_3_shot.pth.

base_3_shot.pth and base_0_shot.pth are LOCA weights, so only for the density map prediction part, without the weights for bounding box prediction.

From the zero-shot test script it looks like the num_objects parameter should be 3 (ref), in fact, zero-shot models (i.e. base_0_shot.pth and DAVE_0_shot.pth) have objectness shape equal to (3, 9, 256), where the first dimension is self.num_objects (ref). Could you explain why?

This is also a part of LOCA method. In few-shot it uses an exemplar pooling into 3x3 prototype. When flattened you get 9 (the second dimension). This is kept unchanged in zero-shot, just using trainable parameters instead of roi pooling from exemplars in the image.

Running your demo with --num_objects 3 --model_name DAVE_0_shot --zero_shot and changing the model path to os.path.join(args.model_path, 'DAVE_0_shot.pth') it works (i.e. no errors), but the model still seems to be taking exemplars into account, since it produces different results when provided with different exemplars, both with and without --use_query_pos_emb. Is this expected?

Did you run demo.py with all parameters unchanged except for the model name and the addition of --zero_shot? If not, please try running it this way and let us know if you encounter any issues. I will check this soon and post a demo for the zero-shot setup as soon as possible.

@jerpelhan
Copy link
Owner

I just saw what the issue probably is: In a few-shot setup, the image is resized based on the exemplar size. Since demo.py was created for few-shot counting, it includes resizing on line 74. Try adapting it. (modify the function resize in demo.py to return in line 24, before it resizes the image based on the bounding boxes.)

Additionally, note that DAVE in zero-shot performs two passes. In the first pass, it estimates the size of objects, based on which it resizes the image and performs a second pass, which improves the results (see main.py).

@GioFic95
Copy link
Contributor

GioFic95 commented Jul 2, 2024

Hi @jerpelhan, thank you for your reply.

Did you run demo.py with all parameters unchanged except for the model name and the addition of --zero_shot?

I tried running demo.py as you suggested. It doesn't raise any error but it still requires the exemplars. Providing them it looks like they are taken into account even if the --zero_shot parameter is used, since the results seem to change with different exemplars. It may be due to the different resizing that is applied.

Modify the function resize in demo.py to return in line 24, before it resizes the image based on the bounding boxes.

I finally found out what was making the code not working without the exemplars also when removing the dependency of the resize function on the bounding boxes: in COTR.forward(), the line self.num_objects = bboxes.shape[1] caused a RuntimeError: shape '[1, 0, 3, 3, -1]' is invalid for input of size 6912 when no bbox was provided.
By commenting this line, the code seems to work.

Figure_zero_1
(image from the Video Object Counting Dataset)


Additionally, note that DAVE in zero-shot performs two passes. In the first pass, it estimates the size of objects, based on which it resizes the image and performs a second pass, which improves the results (see main.py).

I'll try to add this two-steps approach in the zero-shot demo as well, thank you very much.

@GioFic95 GioFic95 mentioned this issue Jul 3, 2024
@jerpelhan
Copy link
Owner

We wanted to share an update regarding this repository. We've developed a novel method, GeCo, which significantly outperforms the older approach in this repo by a large margin. You can check out the code and an easy-to-run demo here: https://github.com/jerpelhan/GeCo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants