Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make NPM post-processing agnostic to node_modules location #31519

Closed
wants to merge 3 commits into from
Closed

Make NPM post-processing agnostic to node_modules location #31519

wants to merge 3 commits into from

Conversation

kdmccormick
Copy link
Member

@kdmccormick kdmccormick commented Jan 10, 2023

Description

See openedx-unsupported/wg-developer-experience#150 for the full background.

This PR handles this particular piece of that issue:

  • The edx-platform asset pipeline unfortunately needs to reach into node_modules and copy stuff out of it, notably in the case of node_modules dependencies that also need to be copied into "vendor" directories for use by non-Webpack frontend code. So: We need to update the assets pipeline to tolerate the fact that node modules have moved to /openedx/node_modules. As necessary, do one or more of these:
    • Edit edx-platform/pavelib/assets.py to be agnostic to the location of node_modules. Keep in mind that assets.py must continue working if node_modules is located inside the edx-platform directory as to not break Devstack.
    • Reimplement parts of edx-platform/pavelib/assets.py upstream as shell scripts, agnostic to the location of node_modules. Point both edx-platform/pavelib/assets.py and Tutor's openedx-assets at that reimplementation.
    • Reimplement more parts of edx-platform/pavelib/assets.py into Tutor's openedx-assets script, agnostic to the location of node_modules, but don't upstream the changes.

Specifically, this PR:

  • reimplements edx-platform/pavelib/assets.py:process_npm_assets as a shell script (first commit),
  • allows the script to be parameterized on the location of node_modules (second commit).
  • It also deletes the symlink common/static/edx-ui-toolkit/js -> ../../../node_modules/edx-ui-toolkit/src/js in favor of just copying edx-ui-tookit/js as part of process_npm_assets (third commit).

You can view the individual commits for more depth on each step.

You can see here how Tutor will use this new shell script.

Why not just edit edx-platform/pavelib/assets.py instead? Frankly, I could have. I just find the pavelib code to be overly complex and way too dependency-heavy what is really just a set of build scripts. Furthermore, Arbi-BOM's good work with unit tests affirms that we are moving away from pavelib. With that in mind, I decided to fully re-implement rather than update process_npm_assets in place. If you disagree with this decision, let me know; I'm happy to revisit it.

This is not meant to be a breaking change for any consumers of edx-platform.

Supporting information

Part of:

Blocks:

Related to

Testing instructions

  1. Check out the latest version of Tutor nightly
  2. Ensure your images are fresh: tutor config save && tutor images pull openedx && tutor dev dc build lms
  3. Check the contents of the "vendor" directories in both local and dev mode. Record them somewhere.
    tutor local run lms ls common/static/common/js/vendor
    tutor local run lms ls common/static/common/css/vendor
    tutor local run lms ls common/static/edx-ui-toolkit/js/*
    
    tutor dev run lms ls common/static/common/js/vendor
    tutor dev run lms ls common/static/common/css/vendor
    tutor dev run lms ls common/static/edx-ui-toolkit/js/*
    
  4. Check out this branch of Tutor.
  5. Build the lms image using this PR's edx-platform branch:
    tutor config save --set EDX_PLATFORM_REPOSITORY=https://github.com/kdmccormick/edx-platform
    tutor config save --set EDX_PLATFORM_VERSION=kdmccormick/no-node-modules-links
    tutor images build openedx
    tutor dev dc build lms
    
  6. Repeat step (3) and ensure the directory contents match.

Deadline

No hard deadline, but sooner is better, since this is one small piece in a greater effort to make tutor dev an acceptable replacement for devstack.

Other information

N/A

@kdmccormick
Copy link
Member Author

@davidjoy @regisb @jmbowman These are the node_modules changes that I mentioned in a couple meetings today. Before I polish this PR up, I would appreciate any input on whether this seems like a reasonable direction!

@davidjoy
Copy link
Contributor

Apologies for calling the original task into question a bit, but this very first sentence feels suspect to me:

The edx-platform asset pipeline unfortunately needs to reach into node_modules and copy stuff out of it, notably in the case of node_modules dependencies that also need to be copied into "vendor" directories for use by non-Webpack frontend code.

This is a very odd situation that doesn't follow the normal rules of npm packages. I'd expect non-Webpack frontend code living in a vendor directory to have its own package.json file that lists its dependencies, and that they should also be npm install'd. Are these things just not npm packages?

I guess it's already happening that we're sharing the stuff we're copying out of node_modules? Do we ever have version skew issues? I'd expect so if edx-platform's version of its dependencies is being used for other, arbitrary JS code... seems like it would come up from time to time and be hard to debug.

Another option that comes to mind would be for us to install the needed "stuff" as global npm packages (i.e., npm i -g <package name>) for use by this vendor code. That would feel more canonical, and would allow edx-platform to keep its node_modules directory in a standard, vanilla location. It'd help us isolate the weirdness to the vendor stuff.

Sidebar, what is the vendor stuff we're referring to?! Got a quick pointer?

@davidjoy
Copy link
Contributor

I find the rules of how npm determines which node_modules directory to use to be a bit confusing, but my understanding is that it crawls up until it finds a package. When installing, it does the same thing... my concern is that by moving this up a directory, we create a weird situation where any siblings of edx-platform are finding a node_modules directory above them and resolving dependencies to it. Maybe that's not a thing that'll actually happen in reality, but it sort of puts us in a really confusing spot.

Some more information we could stare at here: https://docs.npmjs.com/cli/v8/configuring-npm/folders

@kdmccormick
Copy link
Member Author

Thanks for taking a look @davidjoy .

This is a very odd situation that doesn't follow the normal rules of npm packages. I'd expect non-Webpack frontend code living in a vendor directory to have its own package.json file that lists its dependencies, and that they should also be npm install'd. Are these things just not npm packages?

Indeed it is odd. I do not know exactly how we got to this point, but it feels like the result of moving from one framework to another without cleaning up the old stuff first. git-blame on most of the files in question sent me back 5-10 years.

I guess it's already happening that we're sharing the stuff we're copying out of node_modules?

Yup. I'm just tweaking what's already going on. Unless you pass a custom --node-modules-path= to the script, there will no change to the build output.

Do we ever have version skew issues? I'd expect so if edx-platform's version of its dependencies is being used for other, arbitrary JS code... seems like it would come up from time to time and be hard to debug.

I mean, the JS that is using these copied depencies is not "arbitrary other" JS; it's just older JS in edx-platform. Anyway, if you take a look at the list of dependencies, they don't strike me as things that we are keeping particularly fresh. To pick a random example, the last time our pin of moment-timezone was updated was >5 years ago.

(To be clear, I'm not defending the system, I'm just guessing as to why it's held together all this time :)

Another option that comes to mind would be for us to install the needed "stuff" as global npm packages (i.e., npm i -g ) for use by this vendor code. That would feel more canonical, and would allow edx-platform to keep its node_modules directory in a standard, vanilla location. It'd help us isolate the weirdness to the vendor stuff.

That would defeat the purpose of this PR, which is to allow node_modules to be placed in a parent directory edx-platform in a Docker image.

my understanding is that it crawls up until it finds a package. When installing, it does the same thing...

Yes--in fact, that's the behavior I'm counting on for the Tutor side of this change. The idea is that developers should be able to mount their own copy of edx-platform into a container /openedx/edx-platform, such that:

  • Node modules, built into the image at /openedx/node_modules, are unaffected, so that the platform runs out-of-the box.
  • Developers can use npm install in order to install extra packages or particular versions of packages at /openedx/edx-platform/node_modules. Those packages will "shadow" the built-in packages installed at /openedx/node_modules

my concern is that by moving this up a directory, we create a weird situation where any siblings of edx-platform are finding a node_modules directory above them and resolving dependencies to it.

In the context of Tutor, I'm not concerned about this, because edx-platform is the only app on the container image. There is no sibling app to worry about.

In this edx-platform PR, I've made sure not say where node_modules could be moved to. If you're ignoring Tutor and using edx-platform directly, then node_modules will still be faithfully located under edx-platform/node_modules. This PR just allows for node_modules to be moved by external tooling.

@kdmccormick
Copy link
Member Author

Sidebar, what is the vendor stuff we're referring to?! Got a quick pointer?

You know, I think there's some terminology misuse going on either by me, edx-platform, or both. Let me try to clarify:

edx-platform has a package.json and running npm install will install stuff into node_modules, as is typical.

There are JS modules in edx-platform that were never wired up look at node_modules; I guess this is the "non-Webpack" JS. Instead, they expect the code to be located in certain "vendor" directories. If I had to speculate, I'd assume they're called "vendor" dirs because the code was previously vendored-in (commited) to the git repo at those paths; once npm+node_modules was introduced, we wrote tooling to copy said libraries from node_modules into the vendor dirs instead.

@davidjoy
Copy link
Contributor

davidjoy commented Jan 12, 2023

Sounds like we're doing this to cut down on install time, and just adding configuration for how you decide where the vendor files are originally copied from so that those using the tutor image can use the pre-baked node_modules in the image.

I think it should work fine, according to the npm docs. My only hesitation is that node_modules can be particularly finicky at times, and this adds a layer to debugging dependency resolution that will be different depending on whether or not a given operator is using the image. Maybe it'll never come up, or rarely, at least. 😄

But I get it, and if it helps shorten the development cycle in some situations, I say :shipit: Which is also to say it's reversible and doesn't sound like one need rely on this behavior. Yeehah.

@jmbowman
Copy link
Contributor

Ah, I think I see what happened here. My interpretation:

  • In ye olde days, before npm, most web apps (like early edx-platform) copied the source and minified source for any JS libraries they used into their repo. This made for reliable builds, but big repos, especially as big minified libraries that are practically binaries kept getting upgraded. (Even worse after one or two mistakes that checked in much larger image assets, etc. than was intended.)
  • npm became available as the standard place to install JS libraries from, but edx-platform still used a lot of RequireJS that didn't know how to take advantage of that. Rather than continuing to check in new library versions (or put effort into a major project to completely change JS bundlers), the decision was made to install them from npm and then symlink them into the directory where they used to live.
  • There were some bugs with that approach where some of the tooling didn't always handle files referenced by symlinks as expected, so they switched to copying the files instead.
  • Webpack was later grafted onto edx-platform, with some pages loading all their assets via npm, but there are still plenty of pages using RequireJS which still needs local copies at specific known directories of all the files that are going to be bundled together.

Does that sound accurate? Does any of it help?

@davidjoy
Copy link
Contributor

That's super helpful context, I think, yeah... @kdmccormick and I chatted in Slack and I summarized the conversation above for posterity. Without worrying too much about the off-the-rails-ness of the situation we're in, the PR adds some helpful flexibility (especially for backend work!) that doesn't destabilize anything in any significant way, just adds a tiny bit more wackiness for a big benefit. :)

"$NODE_MODULES_PATH/requirejs/require.js" \
"$NODE_MODULES_PATH/underscore.string/dist/underscore.string.js" \
"$NODE_MODULES_PATH/underscore/underscore.js" \
"$NODE_MODULES_PATH/which-country/index.js" \
Copy link
Contributor

@regisb regisb Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cp command is not going to replicate the previous paver command. For instance: $NODE_MODULES_PATH/which-country/index.js will be moved to JS_VENDOR_PATH/index.js. But it should be moved to "JS_VENDOR_PATH/which-country/index.js. (if I understand the code correctly)

I suspect that the same comment applies to the other cp commands below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code here is correct, although I wrote it months ago so I'm glad you are having me double-check.

If you run tutor local run lms openedx-assets npm on tutor master (which calls the old process_npm_assets), you get:

...
Copying vendor files into static directory
/bin/cp -rf node_modules/backbone.paginator/lib/backbone.paginator.js common/static/common/js/vendor
/bin/cp -rf node_modules/backbone/backbone.js common/static/common/js/vendor
/bin/cp -rf node_modules/bootstrap/dist/js/bootstrap.bundle.js common/static/common/js/vendor
... etc
Copying vendor library dir: node_modules/@edx/studio-frontend/dist/
/bin/cp -rf node_modules/@edx/studio-frontend/dist/assets.min.js.map common/static/common/js/vendor
/bin/cp -rf node_modules/@edx/studio-frontend/dist/courseHealthCheck.min.js common/static/common/js/vendor
/bin/cp -rf node_modules/@edx/studio-frontend/dist/common.min.js common/static/common/js/vendor
/bin/cp -rf node_modules/@edx/studio-frontend/dist/common.min.css common/static/common/css/vendor
/bin/cp -rf node_modules/@edx/studio-frontend/dist/courseHealthCheck.min.js.map common/static/common/js/vendor
... etc
...

It's copying the files directly into the common/static/common/[js|css]/vendor directories. This matches my reading of copy_vendor_library.

The corresponding output from this PR's process-npm-assets.sh script is:

+ find ./node_modules/@edx/studio-frontend/dist -type f ! -name *.css ! -name *.css.map -print0
+ xargs --null cp --target-directory=./common/static/common/js/vendor
+ find ./node_modules/@edx/studio-frontend/dist -type f ( -name *.css -o -name *.css.map+  ) -print0
xargs --null cp --target-directory=./common/static/common/css/vendor
+ rm -rf ./common/static/edx-ui-toolkit/js
+ cp --recursive ./node_modules/edx-ui-toolkit/src/js ./common/static/edx-ui-toolkit
+ cp --force ./node_modules/backbone.paginator/lib/backbone.paginator.js ./node_modules/backbone/backbone.js ./node_modules/bootstrap/dist/js/bootstrap.bundle.js ./node_modules/hls.js/dist/hls.js ./node_modules/jquery-migrate/dist/jquery-migrate.js ./node_modules/jquery.scrollto/jquery.scrollTo.js ./node_modules/jquery/dist/jquery.js ./node_modules/moment-timezone/builds/moment-timezone-with-data.js ./node_modules/moment/min/moment-with-locales.js ./node_modules/picturefill/dist/picturefill.js ./node_modules/requirejs/require.js ./node_modules/underscore.string/dist/underscore.string.js ./node_modules/underscore/underscore.js ./node_modules/which-country/index.js ./common/static/common/js/vendor

(The find ... | xargs cp ... calls correspond to the Copying vendor library dir: node_modules/@edx/studio-frontend/dist/ section. The simple cp calls correspond to the Copying vendor files into static directory section.)

@@ -604,60 +571,11 @@ def process_npm_assets():
"""
Process vendor libraries installed via NPM.
"""
def copy_vendor_library(library, skip_if_missing=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this process_npm_assets function now call the process-npm-assets.sh script? It seems to me that we would need this to be backward compatible.
As a side note, the openedx-assets.py script in Tutor calls the process_npm_assets function, so we would have to modify it if the function does not call the bash script: https://github.com/overhangio/tutor/blob/b903c69fac95e797532288d4e6f60530faf1db94/tutor/templates/build/openedx/bin/openedx-assets#L117

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this process_npm_assets function now call the process-npm-assets.sh script? It seems to me that we would need this to be backward compatible.

It does, see this line of the diff. And yes, goal is backwards-compatibility.

As a side note, the openedx-assets.py script in Tutor calls the process_npm_assets function, so we would have to modify it if the function does not call the bash script: https://github.com/overhangio/tutor/blob/b903c69fac95e797532288d4e6f60530faf1db94/tutor/templates/build/openedx/bin/openedx-assets#L117

For what it's worth, I do plan to change Tutor's openedx-assets.py script to call this bash script directly--we will need to do so in order to pass in the --node-modules=/openedx/node_modules. That change is not necessary in order to merge this current PR, though.

@kdmccormick kdmccormick changed the title build: allow node_modules to reside outside of edx-platform Allow node_modules to reside outside of edx-platform Jan 17, 2023
@kdmccormick kdmccormick marked this pull request as ready for review January 17, 2023 16:54
@kdmccormick
Copy link
Member Author

When you've got the chance @regisb, this is ready for review.

@kdmccormick kdmccormick changed the title Allow node_modules to reside outside of edx-platform Allow node_modules to reside outside of edx-platform, pt 1 Jan 17, 2023
@kdmccormick kdmccormick changed the title Allow node_modules to reside outside of edx-platform, pt 1 Make NPM post-processing agnostic to node_modules location Jan 17, 2023
@kdmccormick kdmccormick linked an issue Jan 19, 2023 that may be closed by this pull request
2 tasks
@kdmccormick
Copy link
Member Author

I updated the script's path from scripts/process-npm-assets.sh to scripts/assets/copy-node-modules.sh, because:

Reimplement `pavelib/assets.py:process_npm_assets` equivalently
as `scripts/assets/copy-node-modules.sh`. For backwards compatibility,
`pavelib/assets.py:process_npm_assets` will still exist, just as
a thin wrapper around the new shell script.

Rationale:

* Other parts of pavelib have already been reimplemented, like Python
  unit tests. We're following that trend.
* The Python logic in pavelib is harder to understand than simple
  shell scripts.
* pavelib has dependencies (Python, paver, edx-platform, other libs)
  which means that any pavelib scripts must be executed later in
  the edx-platform build process than we might want them to. For
  example, in a Dockerfile, it might be more performant to process
  npm assets *before* installing Python, but as long as we are still
  using pavelib, that is not an option.
* The benefits of paver have been eclipsed by other tools, like
  Docker (for requisite management) and Click (for CLI building).
* In the next couple commits, we make improvements to
  process-npm-assets.sh. These improvements would have been possible
  in the pavelib implementation, but would have been more complicated.

Part of openedx-unsupported/wg-developer-experience#150
Closes #31606
The problem:
  In Tutor, NPM packages are installet d into the openedx image at
  /openedx/edx-platform/node_modules. So, when you bind-mount
  your own edx-platform, you override that node_modules folder.
  Then, you must re-install node_modules yourself
  (`tutor dev run --mount=edx-platform lms npm install`),
  which takes a long time and is completely redundant with the
  node_modules that you had to download when you downloaded
  the openedx image. If you forget to do this, then your
  edx-platform frontend will be broken.

The solution:
  In Tutor: Move node_modules somewhere else in the openedx image, such
  as /openedx/node_modules.

What this commit changes:
  Add a `--node-modules` parameter to copy-node-modules.sh, thus
  helping remove the assumption that node_modules must be a child
  directory of edx-platform.

Part of openedx-unsupported/wg-developer-experience#150
The problem: (See previous commit)
The solution: (See previous commit)

What this commit changes:
  In order to allow certain especially old parts of edx-platform
  to use edx-ui-toolkit without involving webpack, the platform
  has a symlink:
    $EDX_PLATFORM/common/static/edx-ui-toolkit/js ->
    $EDX_PLATFORM/node_modules
  Unfortunately, this symlink is based on the assumption that
  node_modules will always be a child directory of edx-platform.
  In order to eliminate this assumption, we delete the symlink,
  and then have the existing `scripts/assets/copy-node-modules.sh`
  script copy edx-ui-toolkit/js from node_modules over to where the symlink
  used to be.

Part of openedx-unsupported/wg-developer-experience#150
@kdmccormick
Copy link
Member Author

This is ready for review again.

@kdmccormick
Copy link
Member Author

Thanks everyone for the reviews on this! I decided it'd be best to rewrite all of assets.py in bash first, separate from the node_modules changes here. Here's the ADR for that: #31790

Once the asset build is fully rewritten, I'll come back to this specific node_modules change, which will be much smaller and easier to review.

@kdmccormick kdmccormick deleted the kdmccormick/no-node-modules-links branch February 20, 2023 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite node_modules copying in Bash
4 participants