CVPR 2026 Workshop

AI for Creative Visual Content Generation Editing and Understanding

June 3, 2026 · 8:30 AM – 12:30 PM · Room 501

Denver, Colorado, USA



About

The AI for Creative Visual Content Generation, Editing and Understanding (CVEU) workshop at CVPR 2026 continues the series of events at computer vision community: CVPR'25, CVPR'24, ICCV'23, ECCV'22, ICCV'21, and computer graphics community: SIGGRAPH'24, SIGGRAPH'25, SIGGRAPH Asia'25, and ai art community: HKUST AI Film Festival 2025. HKUST AI Film Festival 2026.

It brings together researchers, artists and entrepreneurs working on computer graphics, human–computer interaction, computer vision, machine learning, and cognitive research to explore how AI can assist creative visual content generation, editing and understanding.

Workshop Theme: World Modeling

World modeling is the development of AI systems that build rich internal representations of the physical and digital world—its objects, agents, dynamics, and uncertainties—and use these models to reason, plan, and act. Instead of only mapping inputs directly to outputs, world models learn predictive, structured abstractions (e.g., 3D scenes, causal relationships, multimodal narratives) that support simulation and counterfactual reasoning.

For creative visual content, world modeling enables models to maintain global coherence across space, time, and modalities: characters stay consistent across shots, lighting and physics remain plausible, and edits respect scene geometry, narrative intent, and user constraints. This workshop highlights methods that leverage such structured world understanding to advance controllable generation, robust editing, interactive storytelling, and human–AI collaboration in visual media.

Tentative Program (CVPR 2026)

Schedule Mountain Time (MT)
June 3, 2026 · Room 501 · 8:30 AM – 12:30 PM
Opening Remarks 8:25 – 8:30 AM
Keynote Talk #1 — Soo Ye Kim (25 min + 5 min Q&A) 8:30 – 9:00 AM
Keynote Talk #2 — Jack Parker-Holder (25 min + 5 min Q&A) 9:00 – 9:30 AM
Keynote Talk #3 — Amir Bar (25 min + 5 min Q&A) 9:30 – 10:00 AM
Oral Presentation #1 10:00 – 10:10 AM
Oral Presentation #2 10:10 – 10:20 AM
Oral Presentation #3 10:20 – 10:30 AM
Coffee Break & Poster Session 10:30 – 11:30 AM
Keynote Talk #4 — Mike Zheng Shou (25 min + 5 min Q&A) 11:30 AM – 12:00 PM
Keynote Talk #5 — Jiajun Wu (25 min + 5 min Q&A) 12:00 – 12:30 PM
Closing Remarks 12:30 – 12:35 PM

Tentative Speakers

More speakers to be announced.

Amir Bar
Amir Bar
FAIR
Jiajun Wu
Jiajun Wu
Stanford University
Jack Parker-Holder
Jack Parker-Holder
Google DeepMind
Mike Zheng Shou
Mike Zheng Shou
National University of Singapore
Soo Ye Kim
Soo Ye Kim
Adobe Research

Call for Papers

(click to expand) +

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026
This workshop is for the next installment of the AI for Creative Visual Content Generation Editing and Understanding (CVEU). It brings together researchers working on computer vision, machine listening, computer graphics, human-computer interaction, and cognitive research. It aims to bring awareness of recent advances in machine learning technologies to enable assisted creative-video creation and understanding. The workshop will include invited talks by experts in the area and give the community opportunities to share their work via oral and poster presentations. We encourage practitioners, designers, students, post-docs, and researchers to submit work describing new ideas, work-in-progress, and previously or concurrently published research.

  • Video GenAI
  • Video storyboarding
  • GenAI for visual productions
  • GenAI for audio-visual storytelling
  • Watermarking
  • Content attribution
  • Safety in content generation
  • Computational Video Editing
  • Computational Videography/Virtual Cinematography
  • Video Shortening and Manipulation
  • Video Description and Storyboarding
  • Video Search and Retrieval
  • Human-centric Video Understanding
  • Event/Story Understanding
  • Emotion/Sound Recognition
  • Generative Adversarial Network for Video
  • Diffusion Model and GAN for Video
  • Neural Rendering for Video
  • VR/AR/Panorama Video
  • Dataset and Evaluation

Submission Tracks

In-proceeding Track 8 pages (excl. references) · CVPR proceedings · CVPR template
Updated per CVPR workshop chairs — Abstract, Submission, and Notification dates revised below.
Abstract Mar 16 Mar 16, 2026 · 11:59 PM UTC
Submission Mar 23 Mar 17, 2026 · 11:59 PM UTC
Notification Apr 6 Mar 24, 2026 · 23:59 UTC
Camera-ready Apr 9, 2026
Extended Abstract Track 4 pages (excl. references) · Not in proceedings · CVPR template
Submission Apr 22, 2026
Notification May 1, 2026
Camera-ready May 7, 2026
Invited Poster Presentation Track Already published papers (e.g. CVPR 2026) · Must relate to CVEU topics
Submission May 7, 2026
Notification May 10, 2026

Accepted Papers

In-proceeding Track
  • 1.Generating Fit Check Videos with a Handheld CameraBowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steve Seitz
  • 2.Diamonds in the Sky: Pareidolic Animals in CloudsMiriam Horovicz, Yacov Hel-Or, Yael Moses
  • 3.LaDe: Unified Multi-Layered Graphic Media Generation and DecompositionVlad-Constantin Lungu-Stan, Ionut Mironica, Iuliana Georgescu
  • 4.LumiCtrl: Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image ModelsMuhammad Atif Butt, Kai Wang, Javier Vazquez-Corral, Joost van de Weijer
  • 5.FOCUS: Optimal Control for Multi-Entity World Modeling in Text-to-Image GenerationEric Bill, Enis Simsar, Thomas Hofmann
  • 6.Learning Temporal Relations for Evaluating Instruction-Guided Image Editing in Vision-Language ModelsPia Donabauer, Alexander Tack, Udo Kruschwitz
  • 7.PLATO++: Pose-Conditioned Part-Aware Object Generation via Residual Structure LearningVarghese Kuruvilla, Harishwar J., Ravi Kiran Sarvadevabhatla
  • 8.Temporal Environment-Aware Image Generation via Latent DiffusionNasrin Kalanat, Yiqun Xie, Yanhua Li, Xiaowei Jia
  • 9.Towards Design CompositionAbhinav Mahajan, Abhikhya Tripathy, Sudeeksha Pala, Vaibhav Methi, K Joseph, Balaji Vasan Srinivasan
  • 10.ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference ImagesXianghao Kong, Qiaosong Qi, Yuanbin Wang, Biaolong Chen, Aixi Zhang, Anyi Rao
  • 11.X-Aligner: Composed Visual Retrieval without the Bells and WhistlesYuqian Zheng, Iuliana Georgescu
  • 12.Visual Composition Generation of Multi-Source Heterogeneous Concepts — A Practical Study Based on the AIGC Short Film: The MeetingShiqin Hou, Jiayan Chen, Tianyi Zhang, Baoyang Chen, Anyi Rao
  • 13.Hi-Light: A Path to High-fidelity, High-resolution Video Relighting With A Refined Evaluation ParadigmXiangrui Liu, Haoxiang Li, Yezhou Yang
  • 14.From Pixels to Layers: Multimodal Generation of Editable Vector GraphicsRongxiang Zhang, Xinyi Shang, Peng Sun, Tao Lin
  • 15.Faces in the Wild: GAN-Driven Normalization for Robust Facial RecognitionMohit, Atul Kumar, Akshay Agarwal
  • 16.Mem-UniVST: Recurrent Gram Memory for Training-Free Temporally Consistent Video StylizationNimay Ballal, Shylaja S S
  • 17.Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal ModelsSonglin Yang, Xianghao Kong, Anyi Rao
  • 18.FashionDraft: Resolving Structural Ambiguity in Text-To-Image Fashion GenerationSanthosh V, Arushi Jain, Shubham Paliwal, Monika Sharma
  • 19.DRA-MTransfer: Physically Realistic Video Motion Transfer with Dual-Grained Re-AdaptationGuoli Jia, Zhiyuan Ma, Junyao Hu, Xinwei Long, Kai Tian, Kaikai Zhao, Zhaoxiang Liu, Kai Wang, Shiguo Lian, Bowen Zhou
Extended Abstract Track
  • 1.Temporal Consistency-Guided Video Editing using Diffusion ModelsMahule Roy, Subhas Roy
  • 2.Implicit Neural Representation of TexturesAlbert Kwok, Zheyuan Hu, Dounia Hammou
  • 3.Selective Timestep Weighting and Advantage-Based Replay for Sample-Efficient Diffusion RLHFEric Zhu, Abhinav Shrivastava, Soumik Mukhopadhyay
  • 4.Benchmarking Single-Step Inpainting Methods for Multi-Object 3D Gaussian Splatting ScenesFinn Dröge, Cecilia Curreli, Abhishek Saroha, Daniel Cremers
  • 5.MiMix: Character Mixing for Video GenerationTingting Liao, Chongjian Ge, Guangyi Liu, Hao Li, Yi Zhou
  • 6.DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive ModelsMingue Park, Prin Phunyaphibarn, Phillip Y. Lee, Minhyuk Sung
  • 7.Coarse-to-Real: Generative Rendering for Populated Dynamic ScenesGonzalo Gomez-Nogales, Yicong Hong, Chongjian Ge, Marc Comino-Trinidad, Dan Casas, Peiye Zhuang, Yi Zhou
  • 8.Text-Guided Object Extraction in SVG via Raster Grounding and Filtering PrimitivesIvan Jarsky, Boris Timofeenko, Boris Malashenko, Valeria Efimova
Invited Poster Presentation Track
  • 1.Charts Are Not Images: On the Challenges of Scientific Chart Editing(ICLR 2026)Shawn Li, Ryan Rossi, Sungchul Kim, Sunav Choudhary, Franck Dernoncourt, Puneet Mathur, Zhengzhong Tu, Yue Zhao
  • 2.MatLat: Material Latent Space for PBR Texture Generation(CVPR 2026)Kyeongmin Yeo, Yunhong Min, Jaihoon Kim, Minhyuk Sung
  • 3.InstructMix2Mix: Consistent Sparse-View Editing Through Multi-View Model Personalization(CVPR 2026)Daniel Gilo, Or Litany
  • 4.FuLLaMa: Training-free Diffusion-based Object Removal with Context Preservation(WACV 2026)Ilke Demir, Umur Aybars Ciftci
  • 5.Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models(WACV 2026)Héctor Laria, Alexandra Gomez-Villa, Jiang Qin, Muhammad Atif Butt, Bogdan Raducanu, Javier Vazquez-Corral, Joost van de Weijer, Kai Wang
  • 6.Making Video Models Adhere to User Intent with Minor Adjustments(TMLR 2026)Daniel Ajisafe, Eric Hedlin, Helge Rhodin, Kwang Moo Yi
  • 7.First Frame Is the Place to Go for Video Content Customization(CVPR 2026)Jingxi Chen, Zongxia Li, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermuller, Brandon Y. Feng, Yiannis Aloimonos
  • 8.Through the PRISM: Principle-Aware, Interpretable, and Multi-Scale Evaluation of Visual Designs(CVPR 2026 Findings)Mona Gandhi, K J Joseph, Srinivasan Parthasarathy, Sayan Nag
  • 9.A Training-Free Style-Personalization via SVD-Based Feature Decomposition(CVPR 2026)Kyoungmin Lee, Jihun Park, Jongmin Gim, Wonhyeok Choi, Kyumin Hwang, Jaeyeul Kim, Sunghoon Im
  • 10.SIGMA-GEN: Structure and Identity Guided Multi-subject Assembly for Image Generation(ICLR 2026)Oindrila Saha, Vojtech Krs, Radomir Mech, Subhransu Maji, Kevin Blackburn-Matzen, Matheus Gadelha

Organizers

Ozgur Kara
UIUC
(Program Chair)
Junho Kim
UIUC
(Program Chair)
Victor Escorcia
Independent
Dong Liu
Adobe
Fabian Caba
Adobe
Jiaju Ma
Stanford University
(Program Chair)
Songlin Yang
HKUST

Advisory Committee

Maneesh Agrawala
Stanford University
Anyi Rao
HKUST

Participants from


Podcast

If you want to explore with us more about how AI is shaping the future of filmmaking, we invite you to tune in to podcast Empowering Storytellers .

OUR GUESTS INCLUDE

Tony Ngai

Governor, Asia-Pacific Region, Society of Motion Picture and Television Engineers (SMPTE)

Chris Williamson

Senior 3D Generalist & Technical Advisor, Wētā Workshop

Jacky Zheng

Art Director, Wētā Workshop

Terry Lam

Dean, School of Film and Television, The Hong Kong Academy for Performing Arts

Lucas Mariano

2D Animator & Motion Designer

More guests to be announced...