Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Plugin Verification task lacking space for plugins with wide version range #462

Open
WarningImHack3r opened this issue Jun 22, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@WarningImHack3r
Copy link

What happened?

The Run Plugin Verification tasks task (./gradlew runPluginVerifier) in the verify job periodically fails to download (some?) IDE versions.

From what I can tell, it's due to a lack of disk space, despite the step to maximize space.

I support a wide range of IDE versions in my plugins, which may be why I'm more likely to fall into this kind of problem. The thing I fail to explain is why it's so periodical (sometimes no problem for months, then problems again for a week or 2 consistently): probably because the different EAP IDEs may be taking a lot more than usual.

The fear I have is the more IDE versions you guys release, the more I'll likely encounter this issue, meaning it won't get any better soon in my opinion. There will likely be a point when I would never again be able to get my CI to work past some amount of IDE versions.

I have no clue what and where the code for your Gradle tasks is, but my suggestion is the following: run each verification (download + test against this version) in parallel (if it's not already the case), then after each test immediately delete that version. Additionally, try again 1 or 2 times in case of download failure (if it's not caused by disk space) to avoid failing jobs as much as possible.
It seems to be the most scalable solution to me.

Thank you!

Relevant log output or stack trace

Run ./gradlew runPluginVerifier -Dplugin.verifier.home.dir=~/.pluginVerifier
  
Starting a Gradle Daemon (subsequent builds will be faster)
Calculating task graph as no cached configuration is available for tasks: runPluginVerifier
> Task :checkKotlinGradlePluginConfigurationErrors
> Task :initializeIntelliJPlugin
> Task :downloadAndroidStudioProductReleasesXml
> Task :downloadIdeaProductReleasesXml
> Task :patchPluginXml
> Task :verifyPluginConfiguration
> Task :processResources
> Task :listProductsReleases
> Task :compileKotlin
> Task :compileJava NO-SOURCE
> Task :classes
> Task :instrumentCode
> Task :jar
> Task :instrumentedJar
> Task :prepareSandbox
> Task :verifyPlugin
2024-06-22 17:27:01,078 [    295]   WARN - llij.ide.plugins.PluginManager - id redefinition ([row,col,system-id]: [2,3,"product classpath"]) 
> Task :buildSearchableOptions
> Task :compileTestKotlin
> Task :classpathIndexCleanup SKIPPED
> Task :buildSearchableOptions
Jun 22, 2024 5:27:04 PM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
Jun 22, 2024 5:27:04 PM java.util.prefs.FileSystemPreferences$6 run
WARNING: Prefs file removed in background /home/runner/.java/.userPrefs/prefs.xml
Starting searchable options index builder
2024-06-22 17:27:05,096 [   4313]   WARN - l.NotificationGroupManagerImpl - Notification group CodeWithMe is already registered (group=com.intellij.notification.NotificationGroup@2dc67902). Plugin descriptor: PluginDescriptor(name=Code With Me, id=com.jetbrains.codeWithMe, descriptorPath=plugin.xml, path=~/.gradle/caches/modules-2/files-2.1/com.jetbrains.intellij.idea/ideaIU/2021.3/3597c84ef4f0d240f4f9d9eacba7509e01674c4d/ideaIU-2021.3/plugins/cwm-plugin, version=213.5744.223, package=null, isBundled=true) 
2024-06-22 17:27:22,987 [  22204]   WARN - ellchecker.SpellCheckerManager - Couldn't load dictionary 'svelte.dic' with loader 'class dev.blachut.svelte.lang.spellchecker.SvelteSpellcheckingDictionaryProvider' 
2024-06-22 17:27:24,296 [  23513]   WARN - #com.intellij.ui.jcef.JBCefApp - JCEF is manually disabled in headless env via 'ide.browser.jcef.headless.enabled=false' 
Found 355 configurables
Searchable options index builder completed
> Task :jarSearchableOptions
> Task :buildPlugin
> Task :runPluginVerifier
[gradle-intellij-plugin :runPluginVerifier] Cannot download 'IU-2022.1.4' from 'release' channel: https://cache-redirector.jetbrains.com/download.jetbrains.com/idea/ideaIU-2022.1.4.tar.gz. Run with --debug option to get more log output.

Steps to reproduce

Run the CI in https://github.com/WarningImHack3r/intellij-shadcn-plugin (preferred) or https://github.com/WarningImHack3r/npm-update-dependencies

Gradle IntelliJ Plugin version

1.17.4

Gradle version

8.8

Operating System

None

Link to build, i.e. failing GitHub Action job

https://github.com/WarningImHack3r/intellij-shadcn-plugin/actions/runs/9605368413/job/26492877373

@WarningImHack3r WarningImHack3r added the bug Something isn't working label Jun 22, 2024
@ChrisCarini
Copy link

Hi @WarningImHack3r - just wanted to provide my thoughts on some of the (super valid) points you raised; see below

I have no clue what and where the code for your Gradle tasks

If you hunt through the Gradle plugin code, you might be able to find it:

...run each verification (download + test against this version) in parallel (if it's not already the case)...

I think doing download+test in an unbound parallel way would certainly increase the likelihood you'd run into this issue more, no? I'm thinking of it this way, if there are say 100 versions that are being tested, and the average size of each is say 2GB, when all 100 are being downloaded, the CI machine is going to try and download ~200GB of data which I suspect would fill up the CI machine (it certainly does for GHA CI machines), and would also be ~100 network requests to the same artifact server in a very brief period of time. I think it might make more sense if the number of download+test that were run in parallel were bound, say to 10 - that way theres only ~20GB of disk (in my example) being used at any given time.

...try again 1 or 2 times in case of download failure...

This (retries) would be ideal to have, IMO. FWIW, I don't personally believe it's the code in the gradle task being flaky, but more likely the artifact server (or, the network connections to) that's providing the IDEs having some network issues (or, it just being some other network issue). I'm the author of a GitHub action (ChrisCarini/intellij-platform-plugin-verifier-action) that does plugin verification (created before the gradle task existed, I believe) which also downloads whichever IDEs the user specifies, and I have run into this issue myself downloading IDEs countless times (I have ChrisCarini/intellij-platform-plugin-verifier-action#68 open for myself, a branch I've been testing in my 11 OSS IntelliJ plugins for the past month, and I don't believe I've had my CI in any of the 11 projects fail in that time period for IDE downloading, which has been stellar 🎉 ).

@WarningImHack3r
Copy link
Author

If you hunt through the Gradle plugin code, you might be able to find it:

Thanks for that!!

I'm thinking of it this way, if there are say 100 versions that are being tested, and the average size of each is say 2GB, when all 100 are being downloaded, the CI machine is going to try and download ~200GB of data which I suspect would fill up the CI machine

Yes of course some sort of bound will have to be set as you describe; I'd say between 3 and 5 at any given time, depending on the right balance between CI perf, speed and reliability.

FWIW, I don't personally believe it's the code in the gradle task being flaky

I agree, a fetch code isn't something hard to have, is reliable, and the JB is more than enough capable of doing it great, without even looking at the source!

but more likely the artifact server (or, the network connections to) that's providing the IDEs having some network issues

I'm indeed thinking so too, but maybe the errors are misleading and 99/100% of the failures are instead due to disk space more than network issue despite what the error states.
TBH, fixing the disk issue should already be a very good improvement, potential additional network issues would just be a bonus at this point

@ChrisCarini
Copy link

I'm indeed thinking so too, but maybe the errors are misleading and 99/100% of the failures are instead due to disk space more than network issue despite what the error states.

🤷 maybeee they are misleading, maybe not. IME with the GitHub Action I authored, debugged, actively maintain, and personally use in 11 repos for ~4 years, when the runner runs out of disk space, it's pretty clear (you'd see a write: No space left on device in the logs; check ChrisCarini/intellij-platform-plugin-verifier-action#2 as an old historical example). Now that you know where the source is, you could take a look and see what might be logged in each condition (if it'd differ or be the same), and if you wanted, I'm sure it'd take <10min to mock up a test that intentionally runs out of disk space on the GitHub runners and see if what happens aligns with your current expectations or not.

@YannCebron
Copy link
Member

Thanks for raising this issue.

https://github.com/WarningImHack3r/intellij-shadcn-plugin uses IntelliJ Ultimate

pluginSinceBuild = 213
pluginUntilBuild = (open ended)

so a total of currently 9-10 GA releases

@WarningImHack3r
Copy link
Author

https://github.com/WarningImHack3r/intellij-shadcn-plugin uses IntelliJ Ultimate [...] so a total of currently 9-10 GA releases

(same thing for https://github.com/WarningImHack3r/npm-update-dependencies)

@WarningImHack3r WarningImHack3r changed the title CI: Plugin Verification tasks randomly failing to download IDEs CI: Plugin Verification task lacking space for plugins with wide version range Aug 1, 2024
@WarningImHack3r
Copy link
Author

WarningImHack3r commented Aug 1, 2024

Update: @YannCebron / @novotnyr it seems even worse with the Plugin Template 2.0, as now with the same plugins and after dropping respectively 1 whole major version and 2 minor versions, I now always get the error
@ChrisCarini did you try yourself?

@ChrisCarini
Copy link

@WarningImHack3r I haven't had issues, but I also don't test as wide of a range as you have listed.

@rosuH
Copy link

rosuH commented Oct 26, 2024

I have encountered a similar issue.

Before upgrading to plugin template 2.0

Prior to upgrading to plugin template 2.0, I encountered an out-of-space error on one occasion. At that time, the issue was resolveed by adding the jlumbroso/free-disk-space plugin. This solution worked effectively by and I have not encountered any related problems for at least a year. At that time, my plugin configration was: pluginSinceBuild >= 213.

Afer upgrading to plugin template 2.0

The build has been continuoously failing. The failure occurs at the "Verify plugin" step. The error message is as follows:

System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

The current plugin support has been modified to: pluginSinceBuild >= 223. However, this change has had no effect.

@WarningImHack3r , has this problem been resolved on your end?

@WarningImHack3r
Copy link
Author

@WarningImHack3r , has this problem been resolved on your end?

No it’s not, I’ve simply disabled this step on both my plugins unfortunately…
We kinda get "punished" for supporting a wide range of versions, that’s a shame

@rosuH
Copy link

rosuH commented Oct 27, 2024

@WarningImHack3r Thanks. We are also planning to disable the verify step...

@ramonvermeulen
Copy link

ramonvermeulen commented Nov 12, 2024

I have encountered a similar issue.

Before upgrading to plugin template 2.0

Prior to upgrading to plugin template 2.0, I encountered an out-of-space error on one occasion. At that time, the issue was resolveed by adding the jlumbroso/free-disk-space plugin. This solution worked effectively by and I have not encountered any related problems for at least a year. At that time, my plugin configration was: pluginSinceBuild >= 213.

Afer upgrading to plugin template 2.0

The build has been continuoously failing. The failure occurs at the "Verify plugin" step. The error message is as follows:

System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at GitHub.Runner.Common.HostTraceListener.WriteHeader(String source, TraceEventType eventType, Int32 id)
   at GitHub.Runner.Common.HostTraceListener.TraceEvent(TraceEventCache eventCache, String source, TraceEventType eventType, Int32 id, String message)
   at System.Diagnostics.TraceSource.TraceEvent(TraceEventType eventType, Int32 id, String message)
   at GitHub.Runner.Common.Tracing.Error(Exception exception)
   at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)
Unhandled exception. System.IO.IOException: No space left on device : '/home/runner/runners/2.320.0/_diag/Worker_20241026-140347-utc.log'
   at System.IO.RandomAccess.WriteAtOffset(SafeFileHandle handle, ReadOnlySpan`1 buffer, Int64 fileOffset)
   at System.IO.Strategies.BufferedFileStreamStrategy.FlushWrite()
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.Diagnostics.TextWriterTraceListener.Flush()
   at System.Diagnostics.TraceSource.Flush()
   at GitHub.Runner.Common.TraceManager.Dispose(Boolean disposing)
   at GitHub.Runner.Common.TraceManager.Dispose()
   at GitHub.Runner.Common.HostContext.Dispose(Boolean disposing)
   at GitHub.Runner.Common.HostContext.Dispose()
   at GitHub.Runner.Worker.Program.Main(String[] args)

The current plugin support has been modified to: pluginSinceBuild >= 223. However, this change has had no effect.

@WarningImHack3r , has this problem been resolved on your end?

I ran recently into a similar issue on my plugin dbt-toolkit it's CI pipelines. However in my case it already throws an IOException during the build phase, anything I can do to resolve this issue? I guess custom GitHub runners with more provisioned disk space, but ideally I have another solution that works.

Because I am not certain (since it is in another step) if the cause is related, I opened a separate issue #488.

EDIT: for the people interested, apply-ing these changes to my workflows at least fixed the problem for the time being. I don't know if it is the desired solution, however if you're stuck you could give this a try:

#482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants