Skip to content

Commit ccd63e6

Browse files
authored
Merge pull request #33 from GreatRSingh/main
Build PDF Book
2 parents 1922692 + 9b8cb56 commit ccd63e6

File tree

5 files changed

+97
-0
lines changed

5 files changed

+97
-0
lines changed

new-website/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
/docs
33
/utils/tutorials/html-notebooks
44
/utils/tutorials/ipynb-notebooks
5+
/utils/tutorials/storage
56
/utils/tutorials/website-render-order
67
/utils/tutorials/notebooks.txt

new-website/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,18 @@ A detailed description of the working of the scripts is given below.
8080
- The CSV file itself contains the Titles and File names of the tutorials in the order in which they should be read.
8181

8282
- ### `export_tutorials.py`
83+
8384
- This script reads the list of notebooks from `/utils/tutorials/notebooks.txt` and parses the HTML files (downloaded temporarily to `/utils/tutorials/html-notebooks`) using `BeautifulSoup`.
8485
- The script then creates a react component for each tutorial and exports it to the `/deepchem/pages/tutorials` directory.
8586
- The script also creates a json data file for each tutorial and exports it to the `/deepchem/data/tutorials` directory.
8687
- The template for the react components is stored in `utils/tutorials/tutorial_component_template.py`.
8788
Please note, that any files required by scripts are generated by the scripts themselves and are not stored in the repository.
8889

90+
- ### `build_pdf_book.py`
8991

92+
- The script reads the list of notebooks from `utils/tutorials/website-render-order` and converts the HTML files (downloaded temporarily to `/utils/tutorials/html-notebooks`) to PDF files using `pdfkit` and stores them in `/utils/tutorials/storage/`.
93+
- The script then merged these PDFs and creates the file `merged.pdf`.
94+
- Please note, pdfunite package is required to be installed for merging. `apt install poppler-utils`
9095

9196

9297
## Deployment

new-website/utils/requirements.txt

30 Bytes
Binary file not shown.
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
"""
2+
This script is used to build the pdf book from DeepChem Tutorials.
3+
4+
Requirements:
5+
- pdfunite
6+
- pdfkit
7+
8+
Example Usage:
9+
- Run the script "fetch_tutorials.py" // It will fetch all the tutorials.
10+
- Run the script "build_pdf_book.py"
11+
- It may cause error, mostly due to the type of graphic used in some tutorials
12+
which donot compile properly, remove them from the website-render-order or fix
13+
them, and run this script again.
14+
15+
NOTE:
16+
- NO FILES OR DIRECTORIES HAVE TO BE CREATED MANUALLY. The script will create the required directories and files.
17+
- Run scripts in the Top-Level folder.
18+
19+
"""
20+
import os
21+
import pandas as pd
22+
import pdfkit
23+
from utils import numeric_sorter
24+
25+
26+
INFO_PATH = "/workspaces/deepchem.github.io/new-website/utils/tutorials/website-render-order/"
27+
DATA_PATH = "/workspaces/deepchem.github.io/new-website/utils/tutorials/html-notebooks/"
28+
PDF_PATH = "/workspaces/deepchem.github.io/new-website/utils/tutorials/storage/"
29+
30+
files = os.listdir(INFO_PATH)
31+
files = sorted(files)
32+
33+
files_list = numeric_sorter(files)
34+
35+
def html_to_pdf():
36+
"""
37+
Converts HTML files to PDF files.
38+
39+
Raises
40+
------
41+
ProtocolUnknownError
42+
If it faces some unknown kind of graphic.
43+
44+
"""
45+
for i in files_list:
46+
chapter = pd.read_csv(INFO_PATH + "-".join(i))
47+
for j in chapter["File Name"]:
48+
print(i, j)
49+
pdfkit.from_file(DATA_PATH + j[:-5] + "html", PDF_PATH + j[:-5] + "pdf")
50+
51+
def merge_pdf():
52+
"""Merges the compiled PDFs."""
53+
command = "pdfunite "
54+
for i in files_list:
55+
chapter = pd.read_csv(INFO_PATH + "-".join(i))
56+
for j in chapter["File Name"]:
57+
print(i, j)
58+
command = command + PDF_PATH + j[:-5] + "pdf "
59+
os.system(command + "merged.pdf")
60+
61+
62+
if __name__ == "__main__":
63+
os.system("mkdir " + PDF_PATH)
64+
html_to_pdf()
65+
merge_pdf()

new-website/utils/tutorials/utils.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,32 @@
88
import re
99

1010

11+
def numeric_sorter(s):
12+
"""
13+
Sorts the tutorials according to their serial number.
14+
15+
Parameters
16+
----------
17+
s: List[str]
18+
The List to be sorted.
19+
20+
Returns
21+
-------
22+
s_sorted: List[List[str]]
23+
The sorted and Broken into parts list.
24+
25+
"""
26+
s_splitted_list = []
27+
s_sorted = []
28+
for i in s:
29+
s_splitted_list.append(i.split("-"))
30+
for i in range(len(s_splitted_list)+1):
31+
for j in s_splitted_list:
32+
if i == int(j[0]):
33+
s_sorted.append(j)
34+
return s_sorted
35+
36+
1137
def to_valid_identifier(s):
1238
"""
1339
Converts a given string into a valid identifier.

0 commit comments

Comments
 (0)