Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐞] Non-deterministic builds #6242

Open
Nefcanto opened this issue May 6, 2024 · 11 comments
Open

[🐞] Non-deterministic builds #6242

Nefcanto opened this issue May 6, 2024 · 11 comments
Labels
STATUS-1: needs triage New issue which needs to be triaged TYPE: bug Something isn't working

Comments

@Nefcanto
Copy link

Nefcanto commented May 6, 2024

Which component is affected?

Qwik Rollup / Vite plugin

Describe the bug

Very brief. We created a meta-framework on Qwik and Qwik-City. Then we reported #5281. And it did not get solved. We managed to work around it (abandoning some features). Yet now we're stuck again.

This is the problem. We have a signin page, and when we click on the button, we get this error:

Uncaught (in promise) TypeError: Failed to resolve module specifier "QuestionParts". Relative references must start with either "/", "./", or "../".

Screenshot from 2024-05-06 12-34-01

We can't create an MRE. We have tried tens of times with no success.
But this time we did something different:

  1. Ran npm run build and node server/entry.express.js in the development environment. The problem is not there.
  2. We compared our code inside the dev docker container, with the code in our CI/CD. There were some differences, but we mitigated those differences (for example having 404.tsx in the development but not in the CI/CD as it causes SSG problems). Yet the CI/CD code still gave us that error.
  3. Then we did one strange thing. Built our code in our CI/CD but without specifying ENV NODE_ENV production and copy/pasted CI/CD code just after build alongside the original source code into the development container. It did not work as expected. The node server/entry.express.js gave us that error.
  4. Now we simply removed build directories rm -rf dist && rm -rf server && rm -rf tmp and ran npm run build on the same code inside our dev container. It worked.

This is an image using the Beyond Compare. Left side is the code that does not work. Right side is the same code that works:

Screenshot from 2024-05-06 12-52-18

As you can see, the only difference is the dist and server and tmp directories. And when we used grep -rl QuestionParts this is what we get:
Screenshot from 2024-05-06 12-53-55

As you can see, for the version that does not work, the QuestionParts is built into an additional file. But for the version that works it's not built into its own file.

Now we use our own dev image for building in our CI/CD pipeline. This means that both are built using the same NODE_ENV and using the same node_modules. Even you can see that both node_modules directories are symbolic links.

I can't find out why it is so. The question is, why do we not get deterministic build outputs for the:

  1. Exactly the same source code (even inside the same directory)
  2. Both on containers created from the same docker image
  3. Both without NODE_ENV=production
  4. Exactly the same node_modules

Reproduction

https://github.com/Nefcanto/QwikModuleSpecifier

Steps to reproduce

The reproduction is not simple. Because we have no idea where the problem lies. The specified reproduction is not a real MRE. It's code from our real project. Yet if you would like I can manage to work it out with you and give you a demonstration or send you a film.

System Info

System:
    OS: Linux 6.8 Debian GNU/Linux 12 (bookworm) 12 (bookworm)
    CPU: (8) x64 Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
    Memory: 9.51 GB / 15.50 GB
    Container: Yes
    Shell: 5.2.15 - /bin/bash
  Binaries:
    Node: 20.12.2 - /usr/local/bin/node
    Yarn: 1.22.19 - /usr/local/bin/yarn
    npm: 10.7.0 - /usr/local/bin/npm
  Browsers:
    Chromium: 124.0.6367.78
  npmPackages:
    @builder.io/partytown: ^0.10.2 => 0.10.2 
    @builder.io/qwik: ^1.5.3 => 1.5.3 
    @builder.io/qwik-auth: ^0.1.3 => 0.1.3 
    @builder.io/qwik-city: ^1.5.3 => 1.5.3 
    typescript: ^5.4.5 => 5.4.5 
    undici: ^6.15.0 => 6.15.0 
    vite: ^5.2.10 => 5.2.10

Additional Information

No response

@wmertens
Copy link
Member

Are you sure that the versions are pinned in both environments?

Bundling is done by rollup and we don't control it.

@Nefcanto
Copy link
Author

@wmertens, yes, I'm sure that both use the same NPM packages (versions) for build.

Does rollup also build the source code? Is building and bundling mixed and done in one phase?

@Nefcanto
Copy link
Author

Nefcanto commented Jul 2, 2024

Recently we have seen these types of errors:

Screenshot from 2024-07-02 14-00-47

It comes by itself usually when we start a project, and it goes away, automatically reloading the page.

Since we do not have a dynamic import in the Layout/Component I guess this might be related to the non-deterministic build.

@Nefcanto
Copy link
Author

Nefcanto commented Jul 2, 2024

This is another example of the non-deterministic builds. It's the same app, side by side. Left side you see our development environment. The right side is the same app built and running in the production environment.

These are the differences:

  1. When we click the image for each food, we show a modal. It works in the dev, but not in the prod (even no log in the console)
  2. When we click on the category (top of the page) we scroll the page to the foods in that category. Works in both environments.
  3. When we scroll the page, we choose the category automatically based on the position on the page. Works in the dev. Not working in the prod (and no log)
non-deterministic-build.mp4

What do you think we should do? We're stuck at delivering sites to our customers. Because what works in the development environment, might or might not work in the prod.

@gioboa
Copy link
Member

gioboa commented Jul 2, 2024

Can you share a minimal reproduction for this? A public URL can be good as well 👍

@wmertens
Copy link
Member

wmertens commented Jul 2, 2024

@Nefcanto it seems as if you are mixing files from different builds somehow. When you run in dev or preview on your own system does it also fail in this way?

@Nefcanto
Copy link
Author

Nefcanto commented Jul 9, 2024

Creating an MRE is hard for this. Are there other options for troubleshooting it?

@Nefcanto
Copy link
Author

Nefcanto commented Jul 9, 2024

A weird observation:

It seems that route-level event listeners have this problem. For example, the login page has a top-level event listener defined inside the page itself. It does not work. Yet the Add to basket which is inside a component that is imported into the products page works.

@wmertens
Copy link
Member

wmertens commented Jul 9, 2024

@Nefcanto you have to find out exactly what is going wrong, e.g. it can't find some file. And then you can try to reproduce it in your dev environment and maybe find the issue in qwik or in your deployment system.

@Nefcanto
Copy link
Author

@wmertens, @gioboa, please look at these Dockerfiles. They are both from two different CI/CD pipelines. All the pipelines are the same, except for these Dockerfiles—one works, the other does not.

Does not work:

FROM holism/site:latest as builder

ENV AUTH_SECRET some_secret
ENV KEYCLOAK_CLIENT_SECRET some_other-secret
ENV KEYCLOAK_ISSUER https://accounts.example.com/realms/Production
ENV NODE_ENV production

COPY . /

RUN chmod 777 --preserve-root /Build \
    && bash /Build

WORKDIR /Holism/SiteQwik

FROM node:lts-bookworm-slim as runner

WORKDIR /Holism/SiteQwik
ENV NODE_ENV production
COPY --from=builder /Holism/SiteQwik/dist ./dist
RUN true
COPY --from=builder /Holism/SiteQwik/server ./server
RUN true
COPY --from=builder /Holism/SiteQwik/node_modules ./node_modules
RUN true
RUN mkdir -p /Temp \
    && chmod -R 777 --preserve-root /Temp

EXPOSE 3000
CMD ["node", "server/entry.express.js"]

Works:

FROM holism/site:latest as builder

ENV AUTH_SECRET some_secret
ENV KEYCLOAK_CLIENT_SECRET some_other-secret
ENV KEYCLOAK_ISSUER https://accounts.example.com/realms/Production
ENV NODE_ENV production

COPY . /

RUN chmod 777 --preserve-root /Build \
    && bash /Build

WORKDIR /Holism/SiteQwik

# FROM node:lts-bookworm-slim as runner

# WORKDIR /Holism/SiteQwik
# ENV NODE_ENV production
# COPY --from=builder /Holism/SiteQwik/dist ./dist
# RUN true
# COPY --from=builder /Holism/SiteQwik/server ./server
# RUN true
# COPY --from=builder /Holism/SiteQwik/node_modules ./node_modules
# RUN true
# RUN mkdir -p /Temp \
#     && chmod -R 777 --preserve-root /Temp

EXPOSE 3000
CMD ["node", "server/entry.express.js"]

As you can see, the only difference is that in the first one, we copy/paste the output to a new image.
Is that not weird?

@wmertens
Copy link
Member

wmertens commented Jul 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
STATUS-1: needs triage New issue which needs to be triaged TYPE: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants