Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sinerely ask for your help #6

Open
ILearn-better opened this issue Aug 15, 2023 · 8 comments
Open

Sinerely ask for your help #6

ILearn-better opened this issue Aug 15, 2023 · 8 comments

Comments

@ILearn-better
Copy link

I am a data processing engineer,and now I also have a project named English-to-Chinese pdf-translation Tool,but I am in trouble these time.
Here is my question:
how you recovery the structure and the style of the translated text

@CBIhalsen
Copy link

我是一名数据处理工程师,现在我也有一个名为英汉pdf翻译工具的项目,但是这些时间我遇到了麻烦。这是我的问题:你如何恢复翻译文本的结构和风格

Hello,你解决了问题吗?

@ILearn-better
Copy link
Author

我是一名数据处理工程师,现在我也有一个名为英汉pdf翻译工具的项目,但是这些时间我遇到了麻烦。这是我的问题:你如何恢复翻译文本的结构和风格

Hello,你解决了问题吗?

I use different way to solve this problem

I use AI to recognize its form.

@CBIhalsen
Copy link

我是一名数据处理工程师,现在我也有一个名为英汉pdf翻译工具的项目,但是这些时间我遇到了麻烦。这是我的问题:你如何恢复翻译文本的结构和风格

Hello,你解决了问题吗?

I use different way to solve this problem

I use AI to recognize its form.

请问你现在翻译pdf有更好的办法吗? 我现在翻译pdf的思路只有两个一个是不完全保留排版,速度非常快, 2是保留排版转成docx 单独段落翻译替换,作者的项目我没有run成功,我看视频里的介绍是翻译pdf花了100s?你跑成功了吗?这个项目翻译pdf花时间的地方是针对布局排版设计重新排版还是因为翻译吗?

@ILearn-better
Copy link
Author

我现在翻译pdf的思路只有两个一个是不完全保留排版,速度非常快

翻译不太耗时,布局回复比较耗时,布局恢复是一个复杂工程,我用的是版式识别ai模型,识别出段落,图片,标题,公式等,用返回的box坐标再提取文本,然后翻译好放回原位,具体效果跟这个项目作者的效果差不多,而且支持大PDF翻译.我猜是市面上(百度,google,搜狗,wps)的应该都这样的吧,因为速度都不快,你有更好的方法欢迎交流.
ps:1.我应该是没跑这个项目,比较久了,忘了
2.我挺好奇你的docx的效果的,如有兴可以给[email protected]这个邮箱发一个效果图,感谢.

@ILearn-better
Copy link
Author

? 我现在翻译pdf的思路只有两个一个是不完全保留排版,速度非常快, 2是保留排版转成docx 单独段落翻译替换,作者的项目我没有run成功,我看视频里的介绍是翻译pdf花了100s?你跑成功了吗?这个项目翻译pdf花时间的地方是针对布局排版设计重新排版还是因为翻译吗?

https://gitee.com/zhoulikun621/pdf-paper-translation我的项目,你感兴趣可以看看,注意有两个分支

@discus0434
Copy link
Owner

discus0434 commented Apr 29, 2024

Sorry it took soooo long for me to get back to you. I use DiT to analyze layouts, and temporally save them to paste the translated sentences there.

And then, I use this helper class to style the translated sentence, whose settings might be only for Japanese though.

@isah333
Copy link

isah333 commented Sep 18, 2024

Can someone translate a PDF file for me? I can paypal if necessary

@CBIhalsen
Copy link

CBIhalsen commented Sep 19, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants