Artifcial intelligence in the creative industries: a review.
N Anantrasirichai, D Bull.
Artifcial Intelligence Review, 2021.
A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?.
C Zhang, C Zhang, S Zheng, Y Qiao, C Li, et al.
arXiv, 2023.
State of the art on diffusion models for visual computing.
R Po, W Yifan, V Golyanik, K Aberman, JT Barron, AH Bermano, ER Chan, T Dekel, et al.
arXiv:2310.07204, 2023.
[Paper]
Image Generation from Layout.
B Zhao, L Meng, W Yin, L Sigal.
CVPR, 2019.
[Paper]
[Github]
Layout2image Image Generation from Layout.
B Zhao, W Yin, L Meng, L Sigal.
IJCV, 2020.
Posterlayout: A new benchmark and approach for content-aware visual-textual presentation layout.
HY Hsu, X He, Y Peng, H Kong, Q Zhang.
CVPR, 2023.
[Paper]
[Github]
Making images real again: A comprehensive survey on deep image composition.
L Niu, W Cong, L Liu, Y Hong, B Zhang, J Liang, et al.
arXiv, 2021.
[Paper]
Shadow generation for composite image in real-world scenes.
Y Hong, L Niu, J Zhang.
AAAI, 2022.
[Paper]
Current advances and future perspectives of image fusion: A comprehensive review.
S Karim, G Tong, J Li, A Qadir, U Farooq, Y Yu.
Information Fusion, 2023.
[Paper]
In-domain gan inversion for real image editing.
J Zhu, Y Shen, D Zhao, B Zhou.
ECCV, 2020.
Anycost gans for interactive image synthesis and editing.
J Lin, R Zhang, F Ganz, S Han, et al.
CVPR, 2021.
EditGAN: High-Precision Semantic Image Editing.
H Ling, K Kreis, D Li, SW Kim, et al.
NIPS, 2021.
Condition-Aware Neural Network for Controlled Image Generation.
H Cai, M Li, Q Zhang, MY Liu, S Han.
CVPR, 2024.
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation.
M Huang, Y Long, X Deng, R Chu, J Xiong, X Liang, H Cheng, Q Lu, W Liu.
arXiv:2403.08857, 2024.
[Paper]
[Github]
Prompt Highlighter: Interactive Control for Multi-Modal LLMs.
Y Zhang, S Qian, B Peng, S Liu, J Jia.
CVPR, 2024.
[Paper]
[Github]
High-resolution image synthesis with latent diffusion models.
R Rombach, A Blattmann, D Lorenz, et al.
CVPR, 2022.
[Paper]
Layoutdiffusion: Controllable diffusion model for layout-to-image generation.
G Zheng, X Zhou, X Li, Z Qi, et al.
CVPR, 2023.
[Paper]
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models.
JT Hoe, X Jiang, CS Chan, et al.
CVPR, 2024.
[Paper]
[Github]
Intelligent design of multimedia content in Alibaba.
K. Liu, and et al.
Front Inform Technol Electron Eng, 2019, 20(12):1657-1664.
[Paper]
[Github]
Content-aware generative modeling of graphic design layouts.
X Zheng, X Qiao, Y Cao, RWH Lau.
TOG, 2019.
Automatic synthesis of advertising images according to a specified style.
W. You, and et al.
Front Inform Technol Electron Eng, 2020.
[Paper]
[Github]
Enabling hyper-personalisation: Automated ad creative generation and ranking for fashion e-commerce.
S Vempati, KT Malayil, V Sruthi, R Sandeep.
FRS, 2020.
N" uwa: Visual synthesis pre-training for neural visual world creation.
C Wu, J Liang, L Ji, F Yang, Y Fang, D Jiang, et al.
ArXiv, 2021.
Vinci: An Intelligent Graphic Design System for Generating Advertising Posters.
S Guo, Z Jin, F Sun, J Li, Z Li, Y Shi, N Cao.
CHI, 2021.
Preparing for an era of deepfakes and AI-generated ads: A framework for understanding responses to manipulated advertising.
C Campbell, K Plangger, S Sands, et al.
Journal of Advertisment, 2021.
Learning Rich Features for Image Manipulation Detection.
P Zhou, X Han, VI Morariu, et al.
CVPR, 2018.
[Paper]
Faceforensics++: Learning to detect manipulated facial images.
A Rossler, D Cozzolino, L Verdoliva, et al.
CVPR, 2019.
Constrained R-CNN A general image manipulation detection model.
C Yang, H Li, F Lin, B Jiang, et al.
ICME, 2020.
[Paper]
Media Forensics and DeepFakes.
L Verdoliva.
IEEE Journal of Selected Topics in Signal Processing, 2020.
[Paper]
The creation and detection of deepfakes: A survey.
Y Mirsky, W Lee.
ACM Computing Surveys (CSUR), 2021.
Multi-Modality Image Manipulation Detection.
C Yang, Z Wang, H Shen, H Li, et al.
ICME, 2021.
[Paper]
Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples.
S Hussain, P Neekhara, M Jere, et al.
WACV, 2021.
[Paper]
Exploiting deep generative prior for versatile image restoration and manipulation.
X Pan, X Zhan, B Dai, D Lin, CC Loy, et al.
TPAMI, 2021.
Online handwritten signature verification using feature weighting algorithm relief.
L Yang, Y Cheng, X Wang, Q Liu.
Soft Computing, 2018.
[Paper]
Characterizing and evaluating adversarial examples for Offline Handwritten Signature Verification.
LG Hafemann, R Sabourin, et al.
IEEE Transactions on Information Forensics and Security, 2020.
[Paper]
TextStyleBrush: Transfer of Text Aesthetics from a Single Example.
P Krishnan, R Kovvuri, G Pang, B Vassilev, et al.
ArXiv, 2021.
[Paper]
Video to Video Synthesis.
TC Wang, MY Liu, JY Zhu, G Liu, A Tao, J Kautz, et al.
NIPS, 2018.
Mocogan: Decomposing motion and content for video generation.
S Tulyakov, MY Liu, X Yang, et al.
CVPR, 2018.
Playable Video Generation.
W Menapace, S Lathuilière, et al.
CVPR, 2021.
[Paper]
A good image generator is what you need for high-resolution video synthesis.
Y Tian, J Ren, M Chai, K Olszewski, X Peng, et al.
ICLR, 2021.
[Paper]
From Sora What We Can See: A Survey of Text-to-Video Generation.
R Sun, Y Zhang, T Shah, J Sun, S Zhang, W Li, H Duan, B Wei, R Ranjan.
arXiv:2405.10674, 2024.
[Paper]
Sora as an agi world model? a complete survey on text-to-video generation.
J Cho, FD Puspitasari, S Zheng, J Zheng, LH Lee, TH Kim, CS Hong, C Zhang.
arXiv:2403.05131, 2024.
[Paper]
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions.
Y Zhang, Y Kang, Z Zhang, X Ding, S Zhao, X Yue.
arXiv:2402.03040, 2024.
[Paper]
[Github]
Direct-a-video: Customized video generation with user-directed camera movement and object motion.
S Yang, L Hou, H Huang, C Ma, P Wan, D Zhang, X Chen, J Liao.
SIGGRAPH, 2024.
Cameractrl: Enabling camera control for text-to-video generation.
H He, Y Xu, Y Guo, G Wetzstein, B Dai, H Li, C Yang.
arXiv:2404.02101, 2024.
[Paper]
Training-free Camera Control for Video Generation.
C Hou, G Wei, Y Zeng, Z Chen.
arXiv:2406.10126, 2024.
[Paper]
Deepfake Video Detection Using Recurrent Neural Networks.
D Güera, EJ Delp.
AVSS, 2018.
[Paper]
Faceforensics: A large-scale video dataset for forgery detection in human faces.
A Rössler, D Cozzolino, L Verdoliva, C Riess, et al.
ArXiv, 2018.
[Paper]
Mesonet: a compact facial video forgery detection network.
D Afchar, V Nozick, J Yamagishi, et al.
WIFS, 2018.
[Paper]
Face Forensics in the Wild.
T Zhou, W Wang, Z Liang, et al.
CVPR, 2021.
Wavenet: A generative model for raw audio.
A Oord, S Dieleman, H Zen, K Simonyan, et al.
ArXiv, 2016.
Applications of Deep Learning to Audio Generation.
Y Zhao, X Xia, R Togneri.
ICSM, 2018.
Gansynth: Adversarial neural audio synthesis.
J Engel, KK Agrawal, S Chen, I Gulrajani, et al.
ICLR, 2019.
magenta
Magenta is a research project exploring the role of machine learning in the process of creating art and music.
[Github]
All your voices are belong to us: Stealing voices to fool humans and machines.
D Mukhopadhyay, M Shirvanian, N Saxena.
ESORICS, 2015.
[Paper]
Deepsonar: Towards effective and robust detection of ai-synthesized fake voices.
R Wang, F Juefei-Xu, Y Huang, Q Guo, X Xie, et al.
MM, 2018.
[Paper]
ASVspoof 2019: Future horizons in spoofed and fake audio detection.
M Todisco, X Wang, V Vestman, M Sahidullah, et al.
ArXiv, 2019.
[Paper]
Deep4SNet: deep learning for fake speech classification.
DM Ballesteros, Y Rodriguez-Ortega, D Renza, et al.
ESWA, 2021.
[Paper]
Deep neural models for illumination estimation and relighting: A survey.
F Einabadi, JY Guillemaut, A Hilton.
Computer Graphics Forum, 2021.
Lightit: Illumination modeling and control for diffusion models.
P Kocsis, J Philip, K Sunkavalli, M Nießner, Y Hold-Geoffroy.
CVPR, 2024.
[CVPR]
Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory.
X Xing, VT Hu, JH Metzen, K Groh, S Karaoglu, T Gevers.
arXiv:2407.20785, 2024.
[ArXiv]
KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera.
S Izadi, D Kim, O Hilliges, D Molyneaux, et al.
UIST, 2011.
Soft 3D reconstruction for view synthesis.
E Penner, L Zhang.
ACM Transactions on Graphics (TOG), 2017.
State of the Art on 3D Reconstruction with RGB‐D Cameras.
M Zollhöfer, P Stotko, A Görlitz, et al.
Computer Graphics Forum, 2018.
Disn: Deep implicit surface network for high-quality single-view 3d reconstruction.
Q Xu, W Wang, D Ceylan, R Mech, et al.
NIPS, 2019.
Occupancy networks: Learning 3d reconstruction in function space.
L Mescheder, M Oechsle, M Niemeyer, et al.
CVPR, 2019.
Fast Online 3D Reconstruction of Dynamic Scenes From Individual Single-Photon Detection Events.
Y Altmann, S McLaughlin, et al.
IEEE Transactions on Signal Processing, 2019.
DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors.
J Huang, SS Huang, H Song, et al.
CVPR, 2021.
SP-GAN: Sphere-guided 3D shape generation and manipulation.
R Li, X Li, KH Hui, CW Fu.
ACM Transactions on Graphics (TOG), 2021.
Neural scene representation and rendering.
SMA Eslami, DJ Rezende, F Besse, F Viola, et al.
Science, 2018.
[Github]
Deferred neural rendering: Image synthesis using neural textures.
J Thies, M Zollhöfer, M Nießner.
ACM Transactions on Graphics (TOG), 2019.
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.
B Mildenhall, PP Srinivasan, M Tancik, JT Barron, R Ramamoorthi, R Ng.
ECCV 2020.
[Paper] [Project]
SIREN: Implicit Neural Representations with Periodic Activation Functions.
V Sitzmann, JNP Martel, AW Bergman, DB Lindell, et al.
NeurIPS 2020 (Oral).
[Paper][Github]
Neural Ray-Tracing: Learning Surfaces and Reflectance for Relighting and View Synthesis.
J Knodt, SH Baek, F Heide.
ArXiv, 2021.
[Github]
Autoint: Automatic integration for fast neural volume rendering.
DB Lindell, JNP Martel, et al.
CVPR, 2021.
[Github]
NeRF in the Wild Neural Radiance Fields for Unconstrained Photo Collections.
R Martin-Brualla, N Radwan, et al.
CVPR, 2021.
Neural scene graphs for dynamic scenes.
J Ost, F Mannan, N Thuerey, et al.
CVPR, 2021.
[Github]
ACORN: Adaptive Coordinate Networks for Neural Scene Representation.
JJNP Martel, DB Lindell, CZ Lin, ER Chan, et al.
SIGGRAPH, 2021.
[Github]
awesome neural rendering.
Deep image or video generation approaches that enable explicit or implicit control of scene properties such as illumination, camera parameters, pose, geometry, appearance, and semantic structure..
[Github]
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion.
B Deng, R Tucker, Z Li, L Guibas, N Snavely, G Wetzstein.
SIGGRAPH, 2024.
MegaScenes: Scene-Level View Synthesis at Scale.
J Tung, G Chou, R Cai, G Yang, K Zhang, G Wetzstein, B Hariharan, et al.
ECCV, 2024.
Street-view image generation from a bird's-eye view layout.
A Swerdlow, R Xu, B Zhou.
IEEE Robotics and Automation Letters, 2024.
UrbanWorld: An Urban World Model for 3D City Generation.
Yu Shang, Jiansheng Chen, Hangyu Fan, Jingtao Ding, Jie Feng, Yong Li.
ArXiv, 2024.
[ArXiv]
Gaudi: A neural architect for immersive 3d scene generation.
MA Bautista, P Guo, S Abnar, et al.
NeurIPS, 2022.
Text2immersion: Generative immersive scene with 3d gaussians.
H Ouyang, K Heal, S Lombardi, T Sun.
arxiv:2312.09242, 2023.
DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling.
H Li, H Shi, W Zhang, W Wu, Y Liao, L Wang, et al.
ArXiv, 2024.
Prompt Engineering, Tools and Methods for Immersive Experience Development.
A Rozo-Torres, WJ Sarmiento.
IEEE VR, 2024.
Speak in the Scene: Diffusion-based Acoustic Scene Transfer toward Immersive Speech Generation.
M Kim, SW Chung, Y Ji, HG Kang, MS Choi.
arxiv:2406.12688, 2024.
Neural Moderation of ASMR Erotica Content in Social Networks.
Y Chen, D Jiang, C Tan, Y Song, C Zhang, L Chen.
IEEE Transactions on Knowledge and Data Engineering, 2023.