Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: How to prevent yarn's Linking dependencies always running in CI #24

Open
dbalatero opened this issue Feb 8, 2024 · 1 comment

Comments

@dbalatero
Copy link

dbalatero commented Feb 8, 2024

👋 @belgattitude, posting a question here in case you have insights around the "Linking dependencies" phase of yarn install. I've gone through tons of GH issues + docs, and don't have a good sense of how this works.

I have a monorepo with about 70 packages in it. A cold yarn install takes about 5 minutes on CI, and the bulk of that time is spent in "Linking dependencies" (3 minutes).

I'm attempting to build a Docker image to run my GitHub Actions jobs in, which contains a prewarmed cache of:

  • all node_modules folders in the monorepo
  • the NPM global cache folder
  • the .yarn/install-state.gz

We commit the .yarn/cache folder with the package .zip files as part of Yarn zero-installs, so I don't need to cache those, as they'll be there when I git clone the monorepo in CI.

The problem

My issue is that when I restore these caches in the Docker image when running GitHub Actions CI, the yarn install process relinks the dependencies, which means any libraries that have to compile against C libraries get rebuilt.

I get output like this, and the Linking dependencies phase takes around 3 minutes to run every time:

  ➤ YN0008: │ @datadog/native-appsec@npm:3.2.0 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ @datadog/native-metrics@npm:2.0.0 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ @datadog/pprof@npm:3.1.0 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ protobufjs@npm:7.2.4 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ sharp@npm:0.32.6 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ core-js@npm:3.22.5 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ core-js-pure@npm:3.22.5 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ nodemon@npm:2.0.16 must be rebuilt because its dependency tree changed
  ➤ YN0008: │ esbuild@npm:0.19.2 must be rebuilt because its dependency tree changed

My expectation

The Linking dependencies phase should be instant, because we restored cache.

My questions

  1. Is it possible to avoid recompiling these libraries?
  2. Am I missing some obvious flags or cache folders here?

My docker setup

My main scheme for setting up the Docker images is to:

  • copy the git repo over
  • run yarn install
  • save off all the artifacts
# Copy the whole git repo over and yarn install, it's the easiest way to generate all the node_modules/etc folders.
COPY . ./
RUN yarn install --immutable --inline-builds

# Recursively tar up the node_modules directory
RUN fd -0 -t d node_modules | tar --zstd -cf /build-artifacts/node_modules_archive.tar.zst --null -T -

# Copy over the yarn install state
RUN cp .yarn/install-state.gz /build-artifacts/yarn-install-state.gz

# Copy over the NPM global cache folder
RUN cd $(npm config get cache) && tar --zstd -cf /build-artifacts/npm_global_cache.tar.zst *

My composite yarn install action

Taking some inspiration from your composite action, this just grabs all the cache artifacts and tries to untar them back into place:

name: "fast monorepo yarn install"
description: |
  Our base CI image contains prebaked npm + yarn + node_modules caches inside
  an artifacts directory.

  This shared action will:

    - Set up all the caches from the artifacts directory
    - Run `yarn install --immutable` to resolve any drift

  This action _will_ get slower over time as we add more packages to yarn, so
  rebuilding the base CI image every so often to resolve package drift is
  advised.

runs:
  using: composite
  steps:
    - name: Find the NPM global cache directory
      id: npm-config
      shell: bash
      run: |
        echo "NPM_GLOBAL_CACHE_FOLDER=$(npm config get cache)" >> $GITHUB_OUTPUT

    - name: Move yarn install state into place
      shell: bash
      run: |
        mv /build-artifacts/yarn-install-state.gz .yarn/install-state.gz

    - name: Unpack npm global cache
      shell: bash
      run: |
        mkdir -p "${{ steps.npm-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}"
        tar xf /build-artifacts/npm_global_cache.tar.zst -C "${{ steps.npm-config.outputs.NPM_GLOBAL_CACHE_FOLDER }}"

    - name: Unpack recursive node_modules cache directly into the monorepo
      shell: bash
      run: |
        tar xf /build-artifacts/node_modules_archive.tar.zst -C .

    - name: Run yarn install
      shell: bash
      run: |
        yarn install --immutable --inline-builds
      env:
        # Use local cache folder to keep downloaded archives
        YARN_ENABLE_GLOBAL_CACHE: "false"

        # Reduce node_modules size
        YARN_NM_MODE: "hardlinks-local"

        # Ensure we're using the local repo's cache
        YARN_CACHE_FOLDER: ".yarn/cache"

Anyways, if you have any quick insights about this, I'd be super curious to hear them!

@dbalatero dbalatero changed the title question: How to prevent Linking dependencies question: How to prevent yarn's Linking dependencies always running in CI Feb 8, 2024
@dbalatero
Copy link
Author

dbalatero commented Feb 8, 2024

Some things I tried:

  • ✅ If I delete all the node_modules in the Docker container, then restore them back from the tar archive, yarn install is still fast
  • ✅ If I move the npm config get cache folder temporarily, yarn install is still fast
    • Maybe I don't have to cache this after all?
  • ❌ If I move .yarn/install-state.gz out of the project, yarn install tries to rebuild all the packages in Linking dependencies step

The install-state.gz file is internally a binary file (v8.serialize'd object), so I had to extract some code from yarn itself to inspect it:

// minimal package.json
{
  "license": false,
  "type": "module",
  "dependencies": {
    "@yarnpkg/fslib": "^3.0.2",
    "util": "^0.12.5",
    "zlib": "^1.0.5"
  }
}

Minimal print program:

import { promisify } from "util";
import zlib from "zlib";
import { xfs } from "@yarnpkg/fslib";
import v8 from "v8";

const gunzip = promisify(zlib.gunzip);

async function restoreInstallState() {
  const installStatePath = "./yarn-install-state.gz";

  const installStateBuffer = await gunzip(
    await xfs.readFilePromise(installStatePath)
  );
  const installState = v8.deserialize(installStateBuffer);

  console.log(JSON.stringify(installState, null, 2));
}

restoreInstallState();

This yields an object like this:

{
  "linkersCustomData": {},
  "accessibleLocators": {},
  "conditionalLocators": {},
  "disabledLocators": {},
  "optionalBuilds": {},
  "storedDescriptors": {},
  "storedResolutions": {},
  "storedPackages": {},
  "lockFileChecksum": "8977399ec15a0316db47a347296e1ca4fd05477336fa1f5901470c13da158f70e83985dd40d77bb01af490e26dd199edca0a837fff2a212be1b64b7e3e1fdfb2",
  "skippedBuilds": {},
  "storedBuildState": {}
}

This makes me think that somehow, lockFileChecksum is different between when I make the Docker image, and when I actually run the GitHub Action in CI inside the Docker image.

Will keep digging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant