Email: ali.abdollahi024@student.sharif.edu
Google Scholar: -
GitHub: -
LinkedIn: ali-abdollahi024a
Email: arash.marioriyad98@sharif.edu
Google Scholar: Arash-Marioriyad
GitHub: Arash-Mari-Oriyad
LinkedIn: arash-mari-oriyad
Email: f.meh16@student.sharif.edu
Google Scholar: -
GitHub: NightMachinery
LinkedIn: feraidoon-mehri
Website: ...
Website: ...
Website: ...
Website: ...
In recent years, text-to-image (T2I) diffusion models such as Stable Diffusion and DALL-E have shown promising performance in generating realistic, creative, diverse, and high-quality images from textual descriptions. These models leverage the iterative denoising process and text embeddings through the cross-attention mechanism to generate images used in many applications across various domains. However, despite their impressive capabilities, these models often struggle to faithfully capture all the entities, attributes, and relationships described in the input prompt, leading to various compositional misalignments such as entity missing, improper attribute binding, wrong spatial relationships, and counting problems. We aim to add compositional generation capability into T2I models to overcome the mentioned compositional generation failure modes.
Description
Requirements: ...
Autors
Description
Link