Skip to content

Commit

Permalink
VNC (#52)
Browse files Browse the repository at this point in the history
* Some base

* Testing

* Working vnc but not working HEADFULL

* Working!

* Not working solution in HEADLESS=false

* Working VNC!

* Puppeteer version update

* Simplified Dockerfile

* Docs

* Changes due to self-review

* Dependencies
  • Loading branch information
MatthewZMSU authored Aug 22, 2024
1 parent 581e3be commit c6225c0
Show file tree
Hide file tree
Showing 10 changed files with 1,357 additions and 815 deletions.
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
node_modules
.gitignore
.git
.git
16 changes: 13 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
FROM node:18

ENV USER=root
ENV DISPLAY=:1
ENV RESOLUTION=1080x720

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update \
&& apt-get install -y wget gnupg \
&& apt-get install -y wget gnupg tightvncserver xfce4 xfce4-goodies xfonts-base dbus-x11 \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg \
&& sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
Expand All @@ -20,11 +24,17 @@ ENV NODE_PATH=/app/node_modules
RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
&& mkdir -p /home/pptruser/Downloads \
&& chown -R pptruser:pptruser /home/pptruser \
&& chown -R pptruser:pptruser /app
&& chown -R pptruser:pptruser /app \
&& chmod +x start_vnc.sh \
&& chmod +x start_container.sh

USER pptruser

RUN yarn install

# puppeteer-service
EXPOSE 3000
CMD npm start
# VNC-server
EXPOSE 5901

CMD ./start_container.sh
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,27 +18,35 @@ To start service run the docker container.
Since the Dockerfile adds a pptr user as a non-privileged user, it may not have all the necessary privileges.
So you should use `docker run --cap-add=SYS_ADMIN` option.
```shell script
$ docker run -d -p 3000:3000 --name scrapy-puppeter-service --cap-add SYS_ADMIN isprascrawlers/scrapy-puppeteer-service
$ docker run -d -p 3000:3000 --name scrapy-puppeteer-service --cap-add SYS_ADMIN isprascrawlers/scrapy-puppeteer-service
```

To run example which shows how to deploy several instances of service with load balancer use this command.
```shell script
$ docker-compose up -d
```

To run headfull puppeteer in container provide `HEADLESS=false` environment variable to the container.
In this case a VNC server at `localhost:5901` with password `password` is started.
You may change the password providing `VNC_SERVER` environment variable.

```shell script
$ docker run -d -p 3000:3000 -p 5901:5901 -e HEADLESS=false -e VNC_PASSWORD=puppeteer --name scrapy-puppeteer-service --cap-add SYS_ADMIN scrapy-puppeteer-service
```

## API

Here is the list of implemented methods that could be used to connect to puppeteer.
For All requests puppeteer browser creates new incognito browser context and new page in it.
If your want to reuse your browser context simple send context_id in your query.
All request return their context ids in response.
Also you could reuse your browser page and more actions with it.
Also, you could reuse your browser page and more actions with it.
In order to do so you should send in your request pageId that is returned in your previous request,
that would make service reuse current page and return again its pageId.
If you want to close the page you are working with you should send in query param "closePage" with non-empty value.
If you want your requests on page make through proxy, just add to normal request "proxy" param.
Proxy username and password params are optional.
Also you can add extra http headers to each request that is made on page.
Also, you can add extra http headers to each request that is made on page.
```json5
{
//request params
Expand Down Expand Up @@ -74,9 +82,10 @@ This POST method allows to goto a page with a specific url in puppeteer.

Params:

url - the url which puppeteer should navigate to.
navigationOptions - [possible options to use for request.](https://pptr.dev/api/puppeteer.page.goto#remarks)
waitOptions - [wait for selector](https://pptr.dev/api/puppeteer.page.waitforselector), [xpath](https://pptr.dev/api/puppeteer.page.waitforxpath), or timeout after navigation completes.
url - the url which puppeteer should navigate to. \
navigationOptions - [possible options to use for request.](https://pptr.dev/api/puppeteer.page.goto#remarks) \
waitOptions - [wait for selector](https://pptr.dev/api/puppeteer.page.waitforselector), [xpath](https://pptr.dev/api/puppeteer.page.waitforxpath), or timeout after navigation completes. \
harRecording - whether to start writing HAR or not.

Example request body
```json5
Expand All @@ -98,7 +107,8 @@ Example request body
"options": { // <object> Options to wait for elements (see https://pptr.dev/api/puppeteer.waitforselectoroptions)
"timeout": 10000
}
}
},
"harRecording": true,
}
```

Expand Down Expand Up @@ -205,7 +215,7 @@ async function action(page, request) {

This POST method returns screenshots of current page more.
Description of options you can see on [puppeteer GitHub](https://github.com/GoogleChrome/puppeteer/blob/v1.19.0/docs/api.md#pagescreenshotoptions).
The path options is omitted in options. Also the only possibly encoding is `base64`.
The path options is omitted in options. Also, the only possible encoding is `base64`.

Example request body:
```json5
Expand Down
5 changes: 4 additions & 1 deletion app.js
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,10 @@ async function setupBrowser() {
{
headless: HEADLESS,
defaultViewport: { width: VIEWPORT_WIDTH, height: VIEWPORT_HEIGHT },
timeout: CONNECT_TIMEOUT
timeout: CONNECT_TIMEOUT,
args: [
"--no-sandbox",
]
});
browser.on('disconnected', setupBrowser);
app.set('browser', browser);
Expand Down
2 changes: 1 addition & 1 deletion helpers/utils.js
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ async function newContext(browser, options = {}) {
}

try {
const context = await browser.createIncognitoBrowserContext(options);
const context = await browser.createBrowserContext(options);
limitContext.incContextCounter();
timeoutContext.setContextTimeout(context);
return context;
Expand Down
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "scrapy-puppeteer-service",
"version": "0.3.5",
"version": "0.3.6",
"private": true,
"scripts": {
"start": "node ./bin/www"
Expand All @@ -20,7 +20,7 @@
"fingerprint-injector": "^2.1.30",
"morgan": "~1.10.0",
"npm-run-all": "^4.1.5",
"puppeteer": "^20.1.2",
"puppeteer": "^22.15.0",
"puppeteer-extra": "^3.3.6",
"puppeteer-extra-plugin-recaptcha": "^3.6.8",
"puppeteer-extra-plugin-stealth": "^2.11.2",
Expand Down
3 changes: 2 additions & 1 deletion routes/goto.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ async function action(page, request) {
// "selector": <string> Wait for element by selector (see https://pptr.dev/api/puppeteer.page.waitforselector)
// "xpath": <string> Wait for element by xpath (see https://pptr.dev/api/puppeteer.page.waitforxpath)
// "options": <object> Options to wait for elements (see https://pptr.dev/api/puppeteer.waitforselectoroptions)
// }
// },
// "harRecording": true,
// }
//
router.post('/', async function (req, res, next) {
Expand Down
16 changes: 16 additions & 0 deletions start_container.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# Start VNC server if puppeteer is headfull
if [ "${HEADLESS}" == "false" ]; then
printf "%s\n%s\nn" "${VNC_PASSWORD:=password}" "${VNC_PASSWORD:=password}" | vncpasswd
./start_vnc.sh &
fi

# Start scrapy-puppeteer-service
yarn start

# Wait for any process to exit
wait -n

# Exit with status of process that exited first
exit $?
8 changes: 8 additions & 0 deletions start_vnc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash

echo "Starting VNC server at $RESOLUTION..."
vncserver -kill :1 || true
vncserver -geometry "${RESOLUTION}" &
echo "VNC server started at ${RESOLUTION}! ^-^"

tail -f /dev/null
Loading

0 comments on commit c6225c0

Please sign in to comment.