You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone,
I have been working on training a talking face model using the Hallo code, but I've encountered several issues that I need some advice on. We used a dataset comprising 32 hours of VFHQ and 12 hours of HDTF videos, without performing any data cleaning.
Issue Description:
Background Artifacts: Large-scale "blotchy" artifacts appear in the background, with many of these artifacts resembling numerous "hands." We suspect this might be due to the presence of hand gestures in the dataset. Did you perform any data cleaning to remove frames with hands or other unwanted elements? Or have you trained a version without data cleaning, and did you encounter similar issues?
Lip Sync Mismatch: The lip movements do not match the audio accurately. Despite aligning our training parameters, steps, and resources with the original code, the synchronization between audio and lip movements is significantly worse than the results achieved with the pre-trained model provided by the authors. Did you use any specific tricks or techniques to improve lip synchronization?
Training Details:
Model Architecture: Hallo code for talking face generation
Dataset: 32 hours of VFHQ + 12 hours of HDTF videos (uncleaned)
Training Parameters: Aligned with the parameters provided in the original code
Request for Advice:
Has anyone encountered similar issues with background artifacts and lip sync mismatch in talking face models?
Are there any recommended data cleaning steps or techniques to mitigate these artifacts and improve lip synchronization?
Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!
hi, i also encountered similar issues like yours. Did you find any solution to mitigate these artifacts or improve lip synchronization?
It seems that the only thing that can be done is to clean the dataset. As long as a high-quality dataset is used, the aforementioned problems will not occur.
Clip+MYHQB49k6AU+P0+C1+F50229-50388_reference_Clip+MYHQB49k6AU+P0+C1+F50229-50388_6.4s_25fps.mp4
Clip+d5LmekndQyw+P0+C0+F8739-8922_reference_Clip+d5LmekndQyw+P0+C0+F8739-8922_6.16s_25fps.mp4
Hi everyone,
I have been working on training a talking face model using the Hallo code, but I've encountered several issues that I need some advice on. We used a dataset comprising 32 hours of VFHQ and 12 hours of HDTF videos, without performing any data cleaning.
Issue Description:
Training Details:
Model Architecture: Hallo code for talking face generation
Dataset: 32 hours of VFHQ + 12 hours of HDTF videos (uncleaned)
Training Parameters: Aligned with the parameters provided in the original code
Request for Advice:
Has anyone encountered similar issues with background artifacts and lip sync mismatch in talking face models?
Are there any recommended data cleaning steps or techniques to mitigate these artifacts and improve lip synchronization?
Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!
大家好,
我最近在使用Hallo代码训练一个谈话脸模型时遇到了一些问题,需要大家的建议。我们使用了包含32小时VFHQ和12小时HDTF视频的数据集,未进行数据清洗工作。
问题描述:
训练详情:
模型架构:用于谈话脸生成的Hallo代码
数据集:32小时VFHQ + 12小时HDTF视频(未清洗)
训练参数:与原始代码提供的参数对齐
请求建议:
有没有人遇到过类似的谈话脸模型中的背景伪影和嘴形不同步问题?
是否有推荐的数据清洗步骤或技术可以减轻这些伪影并提高嘴形同步效果?
任何见解或建议都将不胜感激!
提前感谢大家的帮助!
The text was updated successfully, but these errors were encountered: