Computational-Visual-Generation-Resources

Review

Artifcial intelligence in the creative industries: a review.
N Anantrasirichai, D Bull.
Artifcial Intelligence Review, 2021.

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?.
C Zhang, C Zhang, S Zheng, Y Qiao, C Li, et al.
arXiv, 2023.

State of the art on diffusion models for visual computing.
R Po, W Yifan, V Golyanik, K Aberman, JT Barron, AH Bermano, ER Chan, T Dekel, et al.
arXiv:2310.07204, 2023. [Paper]

Image Generation

Layout

Image Generation from Layout.
B Zhao, L Meng, W Yin, L Sigal.
CVPR, 2019. [Paper] [Github]

Layout2image Image Generation from Layout.
B Zhao, W Yin, L Meng, L Sigal.
IJCV, 2020.

Posterlayout: A new benchmark and approach for content-aware visual-textual presentation layout.
HY Hsu, X He, Y Peng, H Kong, Q Zhang.
CVPR, 2023. [Paper] [Github]

Composition

Making images real again: A comprehensive survey on deep image composition.
L Niu, W Cong, L Liu, Y Hong, B Zhang, J Liang, et al.
arXiv, 2021. [Paper]

Shadow generation for composite image in real-world scenes.
Y Hong, L Niu, J Zhang.
AAAI, 2022. [Paper]

Current advances and future perspectives of image fusion: A comprehensive review.
S Karim, G Tong, J Li, A Qadir, U Farooq, Y Yu.
Information Fusion, 2023. [Paper]

Editing

In-domain gan inversion for real image editing.
J Zhu, Y Shen, D Zhao, B Zhou.
ECCV, 2020.

Anycost gans for interactive image synthesis and editing.
J Lin, R Zhang, F Ganz, S Han, et al.
CVPR, 2021.

EditGAN: High-Precision Semantic Image Editing.
H Ling, K Kreis, D Li, SW Kim, et al.
NIPS, 2021.

Controllable

Condition-Aware Neural Network for Controlled Image Generation.
H Cai, M Li, Q Zhang, MY Liu, S Han.
CVPR, 2024.

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation.
M Huang, Y Long, X Deng, R Chu, J Xiong, X Liang, H Cheng, Q Lu, W Liu.
arXiv:2403.08857, 2024. [Paper] [Github]

Prompt Highlighter: Interactive Control for Multi-Modal LLMs.
Y Zhang, S Qian, B Peng, S Liu, J Jia.
CVPR, 2024. [Paper] [Github]

Diffusion

High-resolution image synthesis with latent diffusion models.
R Rombach, A Blattmann, D Lorenz, et al.
CVPR, 2022. [Paper]

Layoutdiffusion: Controllable diffusion model for layout-to-image generation.
G Zheng, X Zhou, X Li, Z Qi, et al.
CVPR, 2023. [Paper]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
JT Hoe, X Jiang, CS Chan, et al.
CVPR, 2024. [Paper] [Github]

Applications

Intelligent design of multimedia content in Alibaba.
K. Liu, and et al.
Front Inform Technol Electron Eng, 2019, 20(12):1657-1664. [Paper] [Github]

Content-aware generative modeling of graphic design layouts.
X Zheng, X Qiao, Y Cao, RWH Lau.
TOG, 2019.

Automatic synthesis of advertising images according to a specified style.
W. You, and et al.
Front Inform Technol Electron Eng, 2020. [Paper] [Github]

Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce.
S Vempati, KT Malayil, V Sruthi, R Sandeep.
FRS, 2020.

N" uwa: Visual synthesis pre-training for neural visual world creation.
C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang, et al.
ArXiv, 2021.

Vinci: An Intelligent Graphic Design System for Generating Advertising Posters.
S Guo, Z Jin, F Sun, J Li, Z Li, Y Shi, N Cao.
CHI, 2021.

Preparing for an era of deepfakes and AI-generated ads: A framework for understanding responses to manipulated advertising.
C Campbell, K Plangger, S Sands, et al.
Journal of Advertisment, 2021.

Image Manipulation Detection

Learning Rich Features for Image Manipulation Detection.
P Zhou, X Han, VI Morariu, et al.
CVPR, 2018. [Paper]

Faceforensics++: Learning to detect manipulated facial images.
A Rossler, D Cozzolino, L Verdoliva, et al.
CVPR, 2019.

Constrained R-CNN A general image manipulation detection model.
C Yang, H Li, F Lin, B Jiang, et al.
ICME, 2020. [Paper]

Media Forensics and DeepFakes.
L Verdoliva.
IEEE Journal of Selected Topics in Signal Processing, 2020. [Paper]

The creation and detection of deepfakes: A survey.
Y Mirsky, W Lee.
ACM Computing Surveys (CSUR), 2021.

Multi-Modality Image Manipulation Detection.
C Yang, Z Wang, H Shen, H Li, et al.
ICME, 2021. [Paper]

Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples.
S Hussain, P Neekhara, M Jere, et al.
WACV, 2021. [Paper]

Exploiting deep generative prior for versatile image restoration and manipulation.
X Pan, X Zhan, B Dai, D Lin, CC Loy, et al.
TPAMI, 2021.

Online handwritten signature verification using feature weighting algorithm relief.
L Yang, Y Cheng, X Wang, Q Liu.
Soft Computing, 2018. [Paper]

Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification.
LG Hafemann, R Sabourin, et al.
IEEE Transactions on Information Forensics and Security, 2020. [Paper]

TextStyleBrush: Transfer of Text Aesthetics from a Single Example.
P Krishnan, R Kovvuri, G Pang, B Vassilev, et al.
ArXiv, 2021. [Paper]

Video Generation

Video to Video Synthesis.
TC Wang, MY Liu, JY Zhu, G Liu, A Tao, J Kautz, et al.
NIPS, 2018.

Mocogan: Decomposing motion and content for video generation.
S Tulyakov, MY Liu, X Yang, et al.
CVPR, 2018.

Playable Video Generation.
W Menapace, S Lathuilière, et al.
CVPR, 2021. [Paper]

A good image generator is what you need for high-resolution video synthesis.
Y Tian, J Ren, M Chai, K Olszewski, X Peng, et al.
ICLR, 2021. [Paper]

From Sora What We Can See: A Survey of Text-to-Video Generation.
R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li, H Duan, B Wei, R Ranjan.
arXiv:2405.10674, 2024. [Paper]

Sora as an agi world model? a complete survey on text-to-video generation.
J Cho, FD Puspitasari, S Zheng, J Zheng, LH Lee, TH Kim, CS Hong, C Zhang.
arXiv:2403.05131, 2024. [Paper]

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions.
Y Zhang, Y Kang, Z Zhang, X Ding, S Zhao, X Yue.
arXiv:2402.03040, 2024. [Paper] [Github]

Direct-a-video: Customized video generation with user-directed camera movement and object motion.
S Yang, L Hou, H Huang, C Ma, P Wan, D Zhang, X Chen, J Liao.
SIGGRAPH, 2024.

Cameractrl: Enabling camera control for text-to-video generation.
H He, Y Xu, Y Guo, G Wetzstein, B Dai, H Li, C Yang.
arXiv:2404.02101, 2024. [Paper]

Training-free Camera Control for Video Generation.
C Hou, G Wei, Y Zeng, Z Chen.
arXiv:2406.10126, 2024. [Paper]

Video Manipulation Detection

Deepfake Video Detection Using Recurrent Neural Networks.
D Güera, EJ Delp.
AVSS, 2018. [Paper]

Faceforensics: A large-scale video dataset for forgery detection in human faces.
A Rössler, D Cozzolino, L Verdoliva, C Riess, et al.
ArXiv, 2018. [Paper]

Mesonet: a compact facial video forgery detection network.
D Afchar, V Nozick, J Yamagishi, et al.
WIFS, 2018. [Paper]

Face Forensics in the Wild.
T Zhou, W Wang, Z Liang, et al.
CVPR, 2021.

Audio Generation

Wavenet: A generative model for raw audio.
A Oord, S Dieleman, H Zen, K Simonyan, et al.
ArXiv, 2016.

Applications of Deep Learning to Audio Generation.
Y Zhao, X Xia, R Togneri.
ICSM, 2018.

Gansynth: Adversarial neural audio synthesis.
J Engel, KK Agrawal, S Chen, I Gulrajani, et al.
ICLR, 2019.

magenta
Magenta is a research project exploring the role of machine learning in the process of creating art and music.
[Github]

Audio Manipulation

All your voices are belong to us: Stealing voices to fool humans and machines.
D Mukhopadhyay, M Shirvanian, N Saxena.
ESORICS, 2015. [Paper]

Deepsonar: Towards effective and robust detection of ai-synthesized fake voices.
R Wang, F Juefei-Xu, Y Huang, Q Guo, X Xie, et al.
MM, 2018. [Paper]

ASVspoof 2019: Future horizons in spoofed and fake audio detection.
M Todisco, X Wang, V Vestman, M Sahidullah, et al.
ArXiv, 2019. [Paper]

Deep4SNet: deep learning for fake speech classification.
DM Ballesteros, Y Rodriguez-Ortega, D Renza, et al.
ESWA, 2021. [Paper]

Illumination

Deep neural models for illumination estimation and relighting: A survey.
F Einabadi, JY Guillemaut, A Hilton.
Computer Graphics Forum, 2021.

Lightit: Illumination modeling and control for diffusion models.
P Kocsis, J Philip, K Sunkavalli, M Nießner, Y Hold-Geoffroy.
CVPR, 2024. [CVPR]

Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory.
X Xing, VT Hu, JH Metzen, K Groh, S Karaoglu, T Gevers.
arXiv:2407.20785, 2024. [ArXiv]

Reconstruction

KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera.
S Izadi, D Kim, O Hilliges, D Molyneaux, et al.
UIST, 2011.

Soft 3D reconstruction for view synthesis.
E Penner, L Zhang.
ACM Transactions on Graphics (TOG), 2017.

State of the Art on 3D Reconstruction with RGB‐D Cameras.
M Zollhöfer, P Stotko, A Görlitz, et al.
Computer Graphics Forum, 2018.

Disn: Deep implicit surface network for high-quality single-view 3d reconstruction.
Q Xu, W Wang, D Ceylan, R Mech, et al.
NIPS, 2019.

Occupancy networks: Learning 3d reconstruction in function space.
L Mescheder, M Oechsle, M Niemeyer, et al.
CVPR, 2019.

Fast Online 3D Reconstruction of Dynamic Scenes From Individual Single-Photon Detection Events.
Y Altmann, S McLaughlin, et al.
IEEE Transactions on Signal Processing, 2019.

DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors.
J Huang, SS Huang, H Song, et al.
CVPR, 2021.

SP-GAN: Sphere-guided 3D shape generation and manipulation.
R Li, X Li, KH Hui, CW Fu.
ACM Transactions on Graphics (TOG), 2021.

Neural Rendering

Neural scene representation and rendering.
SMA Eslami, DJ Rezende, F Besse, F Viola, et al.
Science, 2018. [Github]

Deferred neural rendering: Image synthesis using neural textures.
J Thies, M Zollhöfer, M Nießner.
ACM Transactions on Graphics (TOG), 2019.

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.
B Mildenhall, PP Srinivasan, M Tancik, JT Barron, R Ramamoorthi, R Ng.
ECCV 2020. [Paper] [Project]

SIREN: Implicit Neural Representations with Periodic Activation Functions.
V Sitzmann, JNP Martel, AW Bergman, DB Lindell, et al.
NeurIPS 2020 (Oral). [Paper][Github]

Neural Ray-Tracing: Learning Surfaces and Reflectance for Relighting and View Synthesis.
J Knodt, SH Baek, F Heide.
ArXiv, 2021. [Github]

Autoint: Automatic integration for fast neural volume rendering.
DB Lindell, JNP Martel, et al.
CVPR, 2021. [Github]

NeRF in the Wild Neural Radiance Fields for Unconstrained Photo Collections.
R Martin-Brualla, N Radwan, et al.
CVPR, 2021.

Neural scene graphs for dynamic scenes.
J Ost, F Mannan, N Thuerey, et al.
CVPR, 2021. [Github]

ACORN: Adaptive Coordinate Networks for Neural Scene Representation.
JJNP Martel, DB Lindell, CZ Lin, ER Chan, et al.
SIGGRAPH, 2021. [Github]

awesome neural rendering.
Deep image or video generation approaches that enable explicit or implicit control of scene properties such as illumination, camera parameters, pose, geometry, appearance, and semantic structure..
[Github]

Scene Generation

Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion.
B Deng, R Tucker, Z Li, L Guibas, N Snavely, G Wetzstein.
SIGGRAPH, 2024.

MegaScenes: Scene-Level View Synthesis at Scale.
J Tung, G Chou, R Cai, G Yang, K Zhang, G Wetzstein, B Hariharan, et al.
ECCV, 2024.

Street-view image generation from a bird's-eye view layout.
A Swerdlow, R Xu, B Zhou.
IEEE Robotics and Automation Letters, 2024.

UrbanWorld: An Urban World Model for 3D City Generation.
Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li.
ArXiv, 2024. [ArXiv]

Immersive-Experiences

Vision Scene

Gaudi: A neural architect for immersive 3d scene generation.
MA Bautista, P Guo, S Abnar, et al.
NeurIPS, 2022.

Text2immersion: Generative immersive scene with 3d gaussians.
H Ouyang, K Heal, S Lombardi, T Sun.
arxiv:2312.09242, 2023.

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling.
H Li, H Shi, W Zhang, W Wu, Y Liao, L Wang, et al.
ArXiv, 2024.

Prompt Engineering, Tools and Methods for Immersive Experience Development.
A Rozo-Torres, WJ Sarmiento.
IEEE VR, 2024.

Audio Scene

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation.
M Kim, SW Chung, Y Ji, HG Kang, MS Choi.
arxiv:2406.12688, 2024.

ASMR

Neural Moderation of ASMR Erotica Content in Social Networks.
Y Chen, D Jiang, C Tan, Y Song, C Zhang, L Chen.
IEEE Transactions on Knowledge and Data Engineering, 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computational-Visual-Generation-Resources

Review

Image Generation

Layout

Composition

Editing

Controllable

Diffusion

Applications

Image Manipulation Detection

Video Generation

Video Manipulation Detection

Audio Generation

Audio Manipulation

Illumination

Reconstruction

Neural Rendering

Scene Generation

Immersive-Experiences

Vision Scene

Audio Scene

ASMR

About

Releases

Packages

TZ-Physics/Computational-Visual-Generation-Resources

Folders and files

Latest commit

History

Repository files navigation

Computational-Visual-Generation-Resources

Review

Image Generation

Layout

Composition

Editing

Controllable

Diffusion

Applications

Image Manipulation Detection

Video Generation

Video Manipulation Detection

Audio Generation

Audio Manipulation

Illumination

Reconstruction

Neural Rendering

Scene Generation

Immersive-Experiences

Vision Scene

Audio Scene

ASMR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages