Input Text Box PNG - Search News

VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

TLDR: In this work, we explore directly applying a pre-trained generative diffusion model to the challenging discriminative task of visual grounding without any fine-tuning and additional training ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

Trending now