Optimize `SoftReference` caches using `ConcurrentHashMap`'s `compute` #3641

lolgab · 2024-10-01T10:14:05Z

@bjaglin raised a great point about concurrent initialization in mill-scalafix: joan38/mill-scalafix#206 (comment)
In ZincWorkerImpl were using synchronized to avoid that, but the built-in compute method in java.util.concurrent.ConcurrentHashMap is probably better.

Pull Request: #3641

lefou · 2024-10-01T11:10:07Z

scalalib/worker/src/mill/scalalib/worker/ZincWorkerImpl.scala

@@ -708,4 +685,36 @@ object ZincWorkerImpl {
      toAnalyze = List((List(start), deps(start).toList))
    ).reverse
  }
+
+  private class AutoCloseableCache[A, B <: AnyRef] extends Cache[A, B] {


Why is this called AutoCloseableCache? Did you forget a with AutoCloseable?

Also, I'd prefer to have Cache defined before its derivatives.

What I want to express is a cache that on clear also tries to find AutoCloseable values and close them. I don't what is the best name for it. Do you have suggestions? I will change the order.

Instead of a sub-class with a unlucky name we could add a handleAutoClosables: Boolean parameter to clear.

lihaoyi · 2024-10-02T00:35:09Z

I'm personally unsure about the motivation here. Are we seeing performance bottlenecks that we think will be solved by this optimization? How will we know that this PR actually achieves what we hope it will?

There's always a risk of bugs when making changes to this kind of concurrent code, both nondeterminsitic semantic bugs and performance bugs, and the risk typically goes up as your concurrency model gets more fine-grained. If we're not seeing any real bottlenecks, I would prefer to have coarse-grained synchronized blocks and locks over fine-grained concurrent data structures, the more coarse-grained the better

lolgab · 2024-10-02T05:27:41Z

You are right, there is no particular performance reason behind this change. I should have named the PR Simplify... instead of Optimize...
Since I learned that we have a standard library function to do what we were manually doing (initialization of a cache disallowing concurrent creation), I thought it would simplify the code to use the std library function instead of reinventing it in Mill.
Some problems I noticed while making the PR:

ScalaJSWorkerImpl uses a mutable.Map which is not thread-safe. What does it happen if two threads try to link Scala.js code at the same time? Its initialization doesn't use a synchronized block.
javaOnlyCompilersCache uses another mutable.Map which is not thread-safe and doesn't wrap initialization with synchronized
not sure about this one, but classloaderCache uses a non-thread-safe implementation. Maybe it's not a problem, but if you call clear and getOrElseCreate at the same time from two threads. Probably not possible at all.

A possibly problematic behavioral change of this PR that I haven't noticed until now:
Before it was using a LinkedHashMap, now a ConcurrentHashMap. Don't know if maintaining the order of insertion is important when we close classloaders. Is it?

lihaoyi · 2024-10-02T05:46:02Z

I agree that there are probably tons of concurrency issues in the current implementation, but we still need to be clear about what we are trying to achieve here. Just dropping in Concurrent* data structures does nothing to make piece of code thread-safe or race-free, and unlike dealing with single-threaded correctness the bar of "seems to work and passes unit tests" tells us basically nothing about thread-safety.

To be clear, I think using ConcurrentHashMap sounds great, and could be a path to code that is both more cleaner and correct. But the PR description needs to be a lot more rigorous in identifying what the current problems are and arguing why the new approach fixes it

bjaglin · 2024-10-04T07:16:12Z

scalalib/worker/src/mill/scalalib/worker/ZincWorkerImpl.scala

+          case (_, v @ SoftReference(_)) => v
+          case _ => SoftReference(create)
+        }
+      )()


Isn't there a tiny risk that the instance returned by create (or the existing one) is garbage-collected by the time the SoftReference is dereferenced? Having a var to maintain a strong reference throughout the lifetime of getOrElseCreate feels safer.

Yeah, this doesn't look right. The scheme that I used in Soft (which is a SoftReference that can recreate itself if it's needed after it's cleared) is better: https://github.com/Ichoran/kse3/blob/5a900ad9578008f0f3454c759cb83e73ec0e21a8/flow/src/Cached.scala#L105-L154

lefou · 2024-10-05T17:26:39Z

scalalib/worker/src/mill/scalalib/worker/ZincWorkerImpl.scala

+          .values()
+          .iterator()
+          .asScala
+          .foreach { case SoftReference(v: AutoCloseable) =>


foreach doesn't take a partial function, so we should provide a catch-all case. Not doing so will fail for non-Autoclosable entries.

lihaoyi · 2024-10-14T05:03:13Z

Opened a ticker to centralize discussion of the more general issue here #3730

Optimize SoftReference caches using ConcurrentHashMap's compute

e045769

lolgab force-pushed the caches-using-compute branch from 9a459be to e045769 Compare October 1, 2024 10:20

lefou reviewed Oct 1, 2024

View reviewed changes

Change order of classes and add scaladoc

b77e602

lolgab requested a review from lefou October 1, 2024 12:39

Run Scalafmt and scalafix

82ea1a8

lolgab marked this pull request as ready for review October 1, 2024 13:17

Add closeAutoCloseables paramenter instead of subclassing

ac7e438

lolgab force-pushed the caches-using-compute branch from 860bb05 to ac7e438 Compare October 2, 2024 05:12

bjaglin reviewed Oct 4, 2024

View reviewed changes

lefou reviewed Oct 5, 2024

View reviewed changes

lihaoyi mentioned this pull request Oct 14, 2024

Improve framework support for cached multi-thread-safe workers #3730

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `SoftReference` caches using `ConcurrentHashMap`'s `compute` #3641

Optimize `SoftReference` caches using `ConcurrentHashMap`'s `compute` #3641

lolgab commented Oct 1, 2024 •

edited

Loading

lefou Oct 1, 2024

lolgab Oct 1, 2024

lefou Oct 1, 2024

lihaoyi commented Oct 2, 2024

lolgab commented Oct 2, 2024

lihaoyi commented Oct 2, 2024

bjaglin Oct 4, 2024 •

edited

Loading

Ichoran Oct 4, 2024

lefou Oct 5, 2024

lihaoyi commented Oct 14, 2024

Optimize SoftReference caches using ConcurrentHashMap's compute #3641

Are you sure you want to change the base?

Optimize SoftReference caches using ConcurrentHashMap's compute #3641

Conversation

lolgab commented Oct 1, 2024 • edited Loading

lefou Oct 1, 2024

Choose a reason for hiding this comment

lolgab Oct 1, 2024

Choose a reason for hiding this comment

lefou Oct 1, 2024

Choose a reason for hiding this comment

lihaoyi commented Oct 2, 2024

lolgab commented Oct 2, 2024

lihaoyi commented Oct 2, 2024

bjaglin Oct 4, 2024 • edited Loading

Choose a reason for hiding this comment

Ichoran Oct 4, 2024

Choose a reason for hiding this comment

lefou Oct 5, 2024

Choose a reason for hiding this comment

lihaoyi commented Oct 14, 2024

Optimize `SoftReference` caches using `ConcurrentHashMap`'s `compute` #3641

Optimize `SoftReference` caches using `ConcurrentHashMap`'s `compute` #3641

lolgab commented Oct 1, 2024 •

edited

Loading

bjaglin Oct 4, 2024 •

edited

Loading