Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other potential detection models? #4

Open
yinanyz opened this issue Jul 17, 2023 · 2 comments
Open

Other potential detection models? #4

yinanyz opened this issue Jul 17, 2023 · 2 comments

Comments

@yinanyz
Copy link

yinanyz commented Jul 17, 2023

I'm curious about the choice of detection model (i.e. UniDet); how did you choose it and by chance, have you tried other detection models and compared with it? Thanks!

@yinanyz
Copy link
Author

yinanyz commented Jul 18, 2023

another question related to object detection - what if there're multiple objects detected in the image? e.g. "a horse to the right of a person", how do you handle the case where there're multiple horses in the image? as I saw that in the code you're doing
obj1_pos = obj.index(obj1) # 物体1的位置
so I'm wondering what if there're multiple obj1 ?

@Karine-Huang
Copy link
Owner

Thanks for the question!
For Q1:
We try multi-modal models, such as miniGPT4, mPlug-Owl, MultiModal-GPT, InternChat, BLIP, may not perform well in spatial understanding. Therefore, a more accurate and intuitive approach like object detection is selected. UniDet is suitable for the current task because of its strong performance on standard object detection benchmarks, like COCO, makes it a suitable choice for tasks that require accurately detecting a wide range of objects. Other object detection methods might also be able to accomplish this task.
For Q2:
If there are multiple objects detected in an image, we determine the main object and prioritize one object over others: first detect objects in the image, and then make probability ranking, select the object with the highest probability as the main object. For the main object we've chosen, compare its spatial position with other objects in the image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants