Skip to content

TZ-Physics/Computational-Visual-Generation-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

Computational-Visual-Generation-Resources

Review

Artifcial intelligence in the creative industries: a review.
N Anantrasirichai, D Bull.
Artifcial Intelligence Review, 2021.

A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?.
C Zhang, C Zhang, S Zheng, Y Qiao, C Li, et al.
arXiv, 2023.

State of the art on diffusion models for visual computing.
R Po, W Yifan, V Golyanik, K Aberman, JT Barron, AH Bermano, ER Chan, T Dekel, et al.
arXiv:2310.07204, 2023. [Paper]

Image Generation

Layout

Image Generation from Layout.
B Zhao, L Meng, W Yin, L Sigal.
CVPR, 2019. [Paper] [Github]

Layout2image Image Generation from Layout.
B Zhao, W Yin, L Meng, L Sigal.
IJCV, 2020.

Posterlayout: A new benchmark and approach for content-aware visual-textual presentation layout.
HY Hsu, X He, Y Peng, H Kong, Q Zhang.
CVPR, 2023. [Paper] [Github]

Composition

Making images real again: A comprehensive survey on deep image composition.
L Niu, W Cong, L Liu, Y Hong, B Zhang, J Liang, et al.
arXiv, 2021. [Paper]

Shadow generation for composite image in real-world scenes.
Y Hong, L Niu, J Zhang.
AAAI, 2022. [Paper]

Current advances and future perspectives of image fusion: A comprehensive review.
S Karim, G Tong, J Li, A Qadir, U Farooq, Y Yu.
Information Fusion, 2023. [Paper]

Editing

In-domain gan inversion for real image editing.
J Zhu, Y Shen, D Zhao, B Zhou.
ECCV, 2020.

Anycost gans for interactive image synthesis and editing.
J Lin, R Zhang, F Ganz, S Han, et al.
CVPR, 2021.

EditGAN: High-Precision Semantic Image Editing.
H Ling, K Kreis, D Li, SW Kim, et al.
NIPS, 2021.

Controllable

Condition-Aware Neural Network for Controlled Image Generation.
H Cai, M Li, Q Zhang, MY Liu, S Han.
CVPR, 2024.

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation.
M Huang, Y Long, X Deng, R Chu, J Xiong, X Liang, H Cheng, Q Lu, W Liu.
arXiv:2403.08857, 2024. [Paper] [Github]

Prompt Highlighter: Interactive Control for Multi-Modal LLMs.
Y Zhang, S Qian, B Peng, S Liu, J Jia.
CVPR, 2024. [Paper] [Github]

Diffusion

High-resolution image synthesis with latent diffusion models.
R Rombach, A Blattmann, D Lorenz, et al.
CVPR, 2022. [Paper]

Layoutdiffusion: Controllable diffusion model for layout-to-image generation.
G Zheng, X Zhou, X Li, Z Qi, et al.
CVPR, 2023. [Paper]

InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
JT Hoe, X Jiang, CS Chan, et al.
CVPR, 2024. [Paper] [Github]

Applications

Intelligent design of multimedia content in Alibaba.
K. Liu, and et al.
Front Inform Technol Electron Eng, 2019, 20(12):1657-1664. [Paper] [Github]

Content-aware generative modeling of graphic design layouts.
X Zheng, X Qiao, Y Cao, RWH Lau.
TOG, 2019.

Automatic synthesis of advertising images according to a specified style.
W. You, and et al.
Front Inform Technol Electron Eng, 2020. [Paper] [Github]

Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce.
S Vempati, KT Malayil, V Sruthi, R Sandeep.
FRS, 2020.

N" uwa: Visual synthesis pre-training for neural visual world creation.
C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang, et al.
ArXiv, 2021.

Vinci: An Intelligent Graphic Design System for Generating Advertising Posters.
S Guo, Z Jin, F Sun, J Li, Z Li, Y Shi, N Cao.
CHI, 2021.

Preparing for an era of deepfakes and AI-generated ads: A framework for understanding responses to manipulated advertising.
C Campbell, K Plangger, S Sands, et al.
Journal of Advertisment, 2021.

Image Manipulation Detection

Learning Rich Features for Image Manipulation Detection.
P Zhou, X Han, VI Morariu, et al.
CVPR, 2018. [Paper]

Faceforensics++: Learning to detect manipulated facial images.
A Rossler, D Cozzolino, L Verdoliva, et al.
CVPR, 2019.

Constrained R-CNN A general image manipulation detection model.
C Yang, H Li, F Lin, B Jiang, et al.
ICME, 2020. [Paper]

Media Forensics and DeepFakes.
L Verdoliva.
IEEE Journal of Selected Topics in Signal Processing, 2020. [Paper]

The creation and detection of deepfakes: A survey.
Y Mirsky, W Lee.
ACM Computing Surveys (CSUR), 2021.

Multi-Modality Image Manipulation Detection.
C Yang, Z Wang, H Shen, H Li, et al.
ICME, 2021. [Paper]

Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples.
S Hussain, P Neekhara, M Jere, et al.
WACV, 2021. [Paper]

Exploiting deep generative prior for versatile image restoration and manipulation.
X Pan, X Zhan, B Dai, D Lin, CC Loy, et al.
TPAMI, 2021.

Online handwritten signature verification using feature weighting algorithm relief.
L Yang, Y Cheng, X Wang, Q Liu.
Soft Computing, 2018. [Paper]

Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification.
LG Hafemann, R Sabourin, et al.
IEEE Transactions on Information Forensics and Security, 2020. [Paper]

TextStyleBrush: Transfer of Text Aesthetics from a Single Example.
P Krishnan, R Kovvuri, G Pang, B Vassilev, et al.
ArXiv, 2021. [Paper]

Video Generation

Video to Video Synthesis.
TC Wang, MY Liu, JY Zhu, G Liu, A Tao, J Kautz, et al.
NIPS, 2018.

Mocogan: Decomposing motion and content for video generation.
S Tulyakov, MY Liu, X Yang, et al.
CVPR, 2018.

Playable Video Generation.
W Menapace, S Lathuilière, et al.
CVPR, 2021. [Paper]

A good image generator is what you need for high-resolution video synthesis.
Y Tian, J Ren, M Chai, K Olszewski, X Peng, et al.
ICLR, 2021. [Paper]

From Sora What We Can See: A Survey of Text-to-Video Generation.
R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li, H Duan, B Wei, R Ranjan.
arXiv:2405.10674, 2024. [Paper]

Sora as an agi world model? a complete survey on text-to-video generation.
J Cho, FD Puspitasari, S Zheng, J Zheng, LH Lee, TH Kim, CS Hong, C Zhang.
arXiv:2403.05131, 2024. [Paper]

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions.
Y Zhang, Y Kang, Z Zhang, X Ding, S Zhao, X Yue.
arXiv:2402.03040, 2024. [Paper] [Github]

Direct-a-video: Customized video generation with user-directed camera movement and object motion.
S Yang, L Hou, H Huang, C Ma, P Wan, D Zhang, X Chen, J Liao.
SIGGRAPH, 2024.

Cameractrl: Enabling camera control for text-to-video generation.
H He, Y Xu, Y Guo, G Wetzstein, B Dai, H Li, C Yang.
arXiv:2404.02101, 2024. [Paper]

Training-free Camera Control for Video Generation.
C Hou, G Wei, Y Zeng, Z Chen.
arXiv:2406.10126, 2024. [Paper]

Video Manipulation Detection

Deepfake Video Detection Using Recurrent Neural Networks.
D Güera, EJ Delp.
AVSS, 2018. [Paper]

Faceforensics: A large-scale video dataset for forgery detection in human faces.
A Rössler, D Cozzolino, L Verdoliva, C Riess, et al.
ArXiv, 2018. [Paper]

Mesonet: a compact facial video forgery detection network.
D Afchar, V Nozick, J Yamagishi, et al.
WIFS, 2018. [Paper]

Face Forensics in the Wild.
T Zhou, W Wang, Z Liang, et al.
CVPR, 2021.

Audio Generation

Wavenet: A generative model for raw audio.
A Oord, S Dieleman, H Zen, K Simonyan, et al.
ArXiv, 2016.

Applications of Deep Learning to Audio Generation.
Y Zhao, X Xia, R Togneri.
ICSM, 2018.

Gansynth: Adversarial neural audio synthesis.
J Engel, KK Agrawal, S Chen, I Gulrajani, et al.
ICLR, 2019.

magenta
Magenta is a research project exploring the role of machine learning in the process of creating art and music.
[Github]

Audio Manipulation

All your voices are belong to us: Stealing voices to fool humans and machines.
D Mukhopadhyay, M Shirvanian, N Saxena.
ESORICS, 2015. [Paper]

Deepsonar: Towards effective and robust detection of ai-synthesized fake voices.
R Wang, F Juefei-Xu, Y Huang, Q Guo, X Xie, et al.
MM, 2018. [Paper]

ASVspoof 2019: Future horizons in spoofed and fake audio detection.
M Todisco, X Wang, V Vestman, M Sahidullah, et al.
ArXiv, 2019. [Paper]

Deep4SNet: deep learning for fake speech classification.
DM Ballesteros, Y Rodriguez-Ortega, D Renza, et al.
ESWA, 2021. [Paper]

Illumination

Deep neural models for illumination estimation and relighting: A survey.
F Einabadi, JY Guillemaut, A Hilton.
Computer Graphics Forum, 2021.

Lightit: Illumination modeling and control for diffusion models.
P Kocsis, J Philip, K Sunkavalli, M Nießner, Y Hold-Geoffroy.
CVPR, 2024. [CVPR]

Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory.
X Xing, VT Hu, JH Metzen, K Groh, S Karaoglu, T Gevers.
arXiv:2407.20785, 2024. [ArXiv]

Reconstruction

KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera.
S Izadi, D Kim, O Hilliges, D Molyneaux, et al.
UIST, 2011.

Soft 3D reconstruction for view synthesis.
E Penner, L Zhang.
ACM Transactions on Graphics (TOG), 2017.

State of the Art on 3D Reconstruction with RGB‐D Cameras.
M Zollhöfer, P Stotko, A Görlitz, et al.
Computer Graphics Forum, 2018.

Disn: Deep implicit surface network for high-quality single-view 3d reconstruction.
Q Xu, W Wang, D Ceylan, R Mech, et al.
NIPS, 2019.

Occupancy networks: Learning 3d reconstruction in function space.
L Mescheder, M Oechsle, M Niemeyer, et al.
CVPR, 2019.

Fast Online 3D Reconstruction of Dynamic Scenes From Individual Single-Photon Detection Events.
Y Altmann, S McLaughlin, et al.
IEEE Transactions on Signal Processing, 2019.

DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors.
J Huang, SS Huang, H Song, et al.
CVPR, 2021.

SP-GAN: Sphere-guided 3D shape generation and manipulation.
R Li, X Li, KH Hui, CW Fu.
ACM Transactions on Graphics (TOG), 2021.

Neural Rendering

Neural scene representation and rendering.
SMA Eslami, DJ Rezende, F Besse, F Viola, et al.
Science, 2018. [Github]

Deferred neural rendering: Image synthesis using neural textures.
J Thies, M Zollhöfer, M Nießner.
ACM Transactions on Graphics (TOG), 2019.

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.
B Mildenhall, PP Srinivasan, M Tancik, JT Barron, R Ramamoorthi, R Ng.
ECCV 2020. [Paper] [Project]

SIREN: Implicit Neural Representations with Periodic Activation Functions.
V Sitzmann, JNP Martel, AW Bergman, DB Lindell, et al.
NeurIPS 2020 (Oral). [Paper][Github]

Neural Ray-Tracing: Learning Surfaces and Reflectance for Relighting and View Synthesis.
J Knodt, SH Baek, F Heide.
ArXiv, 2021. [Github]

Autoint: Automatic integration for fast neural volume rendering.
DB Lindell, JNP Martel, et al.
CVPR, 2021. [Github]

NeRF in the Wild Neural Radiance Fields for Unconstrained Photo Collections.
R Martin-Brualla, N Radwan, et al.
CVPR, 2021.

Neural scene graphs for dynamic scenes.
J Ost, F Mannan, N Thuerey, et al.
CVPR, 2021. [Github]

ACORN: Adaptive Coordinate Networks for Neural Scene Representation.
JJNP Martel, DB Lindell, CZ Lin, ER Chan, et al.
SIGGRAPH, 2021. [Github]

awesome neural rendering.
Deep image or video generation approaches that enable explicit or implicit control of scene properties such as illumination, camera parameters, pose, geometry, appearance, and semantic structure..
[Github]

Scene Generation

Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion.
B Deng, R Tucker, Z Li, L Guibas, N Snavely, G Wetzstein.
SIGGRAPH, 2024.

MegaScenes: Scene-Level View Synthesis at Scale.
J Tung, G Chou, R Cai, G Yang, K Zhang, G Wetzstein, B Hariharan, et al.
ECCV, 2024.

Street-view image generation from a bird's-eye view layout.
A Swerdlow, R Xu, B Zhou.
IEEE Robotics and Automation Letters, 2024.

UrbanWorld: An Urban World Model for 3D City Generation.
Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li.
ArXiv, 2024. [ArXiv]

Immersive-Experiences

Vision Scene

Gaudi: A neural architect for immersive 3d scene generation.
MA Bautista, P Guo, S Abnar, et al.
NeurIPS, 2022.

Text2immersion: Generative immersive scene with 3d gaussians.
H Ouyang, K Heal, S Lombardi, T Sun.
arxiv:2312.09242, 2023.

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling.
H Li, H Shi, W Zhang, W Wu, Y Liao, L Wang, et al.
ArXiv, 2024.

Prompt Engineering, Tools and Methods for Immersive Experience Development.
A Rozo-Torres, WJ Sarmiento.
IEEE VR, 2024.

Audio Scene

Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation.
M Kim, SW Chung, Y Ji, HG Kang, MS Choi.
arxiv:2406.12688, 2024.

ASMR

Neural Moderation of ASMR Erotica Content in Social Networks.
Y Chen, D Jiang, C Tan, Y Song, C Zhang, L Chen.
IEEE Transactions on Knowledge and Data Engineering, 2023.

About

Resources for Computational Visual Generation & Graphics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published