Skip to content

Commit

Permalink
update puppeteer and chromium
Browse files Browse the repository at this point in the history
  • Loading branch information
ckuijjer committed Apr 8, 2024
1 parent 79ce3e8 commit 6a3278e
Show file tree
Hide file tree
Showing 4 changed files with 215 additions and 41 deletions.
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,6 @@ if (require.main === module) {

with the LOG_LEVEL=debug used to have debug output from the scrapers show up in the console

### Installing Chromium for use by puppeteer-core locally

See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how

## CI/CD

GitHub actions is used, `web/` uses JamesIves/github-pages-deploy-action to deploy to the _gh-pages_ branch, and the GitHub settings has Pages take the source branch _gh-pages_ which triggers the GitHub built in _pages-build-deployment_
Expand All @@ -74,6 +70,26 @@ aws dynamodb scan --table-name expatcinema-scrapers-analytics --profile casper >
- Use https://favicongrabber.com/ to grab a favicon for the cinema.json file
- Use https://www.google.com/s2/favicons?domain=www.natlab.nl to get the favicon for the cinema.json file

## Chromium

Some scrapers need to run in a real browser, for which we use puppeteer and a lambda layer with Chromium.

### Upgrading puppeteer and chromium

- Find the preferred version of Chromium for the latest version of puppeteer at https://pptr.dev/supported-browsers, e.g. _Chrome for Testing 123.0.6312.105 - Puppeteer v22.6.3_
- Check if this version of Chromium is available (for running locally) at https://github.com/Sparticuz/chromium, check the package.json
- Check if this version of Chromium is available (as a lambda layer) at https://github.com/shelfio/chrome-aws-lambda-layer, e.g. _Has Chromium v123.0.1_ and _arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:45_

```sh
yarn add [email protected] @sparticuz/chromium@^123.0.1
```

After installing the new version of puppeteer and chromium update the lambda layer in serverless.yml, by doing a search and replace on `arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:` and change e.g. `44` to `45`

### Installing Chromium for use by puppeteer-core locally

See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how

## Troubleshooting

When running a puppeteer based scraper locally, e.g. `yarn tsx scrapers/ketelhuis.ts` and getting an error like
Expand All @@ -82,4 +98,4 @@ When running a puppeteer based scraper locally, e.g. `yarn tsx scrapers/ketelhui
Error: Failed to launch the browser process! spawn /tmp/localChromium/chromium/mac_arm-1205129/chrome-mac/Chromium.app/Contents/MacOS/Chromium ENOENT
```

you need to install Chromium locally, run `yarn install-chromium` to do so and update `LOCAL_CHROMIUM_EXECUTABLE_PATH` in `browser.ts` to point to the Chromium executable
you need to install Chromium locally, run `yarn install-chromium` to do so and update `LOCAL_CHROMIUM_EXECUTABLE_PATH` in `browser.ts` to point to the Chromium executable. See https://github.com/Sparticuz/chromium#running-locally--headlessheadful-mode for how
3 changes: 2 additions & 1 deletion cloud/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"@apollo/client": "^3.9.10",
"@aws-lambda-powertools/logger": "^2.0.3",
"@middy/core": "^5.3.2",
"@sparticuz/chromium": "^117.0.0",
"@sparticuz/chromium": "^123.0.1",
"axios": "^1.6.8",
"camelcase-keys": "^9.1.3",
"cross-fetch": "^4.0.0",
Expand All @@ -45,6 +45,7 @@
"p-map": "7.0.2",
"p-retry": "6.2.0",
"public-ip": "^6.0.2",
"puppeteer": "22.6.3",
"puppeteer-core": "^21.3.2",
"ramda": "^0.29.1",
"react": "^18.2.0",
Expand Down
10 changes: 5 additions & 5 deletions cloud/serverless.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,13 @@ functions:
SCRAPERS: ${env:SCRAPERS, ''} # '' as default value, as SCRAPERS is the only optional env var
SCRAPEOPS_API_KEY: ${env:SCRAPEOPS_API_KEY}
layers:
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:38 # https://github.com/shelfio/chrome-aws-lambda-layer
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:45 # https://github.com/shelfio/chrome-aws-lambda-layer

playground:
handler: handler.playground
timeout: 300
layers:
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:38 # https://github.com/shelfio/chrome-aws-lambda-layer
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:45 # https://github.com/shelfio/chrome-aws-lambda-layer
environment:
DYNAMODB_MOVIE_METADATA: ${env:DYNAMODB_MOVIE_METADATA}
TMDB_API_KEY: ${env:TMDB_API_KEY}
Expand All @@ -102,7 +102,7 @@ functions:
- cloudwatchLog:
logGroup: /aws/lambda/expatcinema-${self:provider.stage}-scrapers
layers:
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:38 # https://github.com/shelfio/chrome-aws-lambda-layer
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:45 # https://github.com/shelfio/chrome-aws-lambda-layer

analytics:
handler: handler.analytics
Expand All @@ -113,7 +113,7 @@ functions:
environment:
DYNAMODB_ANALYTICS: ${self:custom.scrapersAnalyticsTableName}
layers:
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:38 # https://github.com/shelfio/chrome-aws-lambda-layer
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:45 # https://github.com/shelfio/chrome-aws-lambda-layer

fillAnalytics:
handler: handler.fillAnalytics
Expand All @@ -122,7 +122,7 @@ functions:
PRIVATE_BUCKET: ${self:custom.scrapersOutputBucketName}
DYNAMODB_ANALYTICS: ${self:custom.scrapersAnalyticsTableName}
layers:
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:38 # https://github.com/shelfio/chrome-aws-lambda-layer
- arn:aws:lambda:eu-west-1:764866452798:layer:chrome-aws-lambda:45 # https://github.com/shelfio/chrome-aws-lambda-layer

resources:
Description: expatcinema.com - ${self:provider.stage} stage
Expand Down
Loading

0 comments on commit 6a3278e

Please sign in to comment.