add maximum mean discrepancy metric #56

samuelstanton · 2025-01-10T17:18:55Z

estimates maximum mean discrepancy from samples

kleinhenz · 2025-01-10T17:23:40Z

src/beignet/_maximum_mean_discrepancy.py

@@ -0,0 +1,89 @@
+import numpy


probably better to use torch instead of numpy for consistency unless numpy is needed for a particular reason

would be nice if this worked with arrays of strings as well since that's a common data structure. We allowed numpy arrays here for the same reason: https://github.com/Genentech/beignet/blob/main/src/beignet/_farthest_first_traversal.py

that being said I could see wanting this both differentiable and GPU-enabled... any thoughts on that? implement two versions?

casting to tensor opens the tokenization and padding can of worms and usually you just want something simple

francesding · 2025-01-10T18:06:25Z

src/beignet/_maximum_mean_discrepancy.py

+    Y,
+    distance_fn=None,
+    kernel_width: float | None = None,
+    eps: float = 1e-16,


What's the motivation for lower bounding the MMD at sqrt(eps), rather than 0?

I suppose I was trying to forestall division by 0 errors but upon reflection that's a decision downstream users should make, will revert lower bound to 0

samuelstanton · 2025-01-16T17:09:39Z

ok I have attempted to make this compliant with the python array standard API and NEP 56 to support both numpy and pytorch arrays

0x00b1 · 2025-01-22T16:22:00Z

pyproject.toml

@@ -9,6 +9,7 @@ requires = [
 [project]
 authors = [{ email = "[email protected]", name = "Allen Goodman" }]
 dependencies = [
+    "numpy>=2.0.0",


Let torch manage the numpy dependency

I think torch doesn't depend on numpy now

yeah that what I was seeing

samuelstanton · 2025-04-03T15:43:40Z

@kleinhenz finally fixed the broken test, are we good to merge?

samuelstanton · 2025-04-03T15:45:13Z

I do think there is a valid question around whether we want to force beignet users to upgrade to NumPy 2.0 (to support the array API standard). @0x00b1 thoughts?

add MMD fn

48fc134

kleinhenz reviewed Jan 10, 2025

View reviewed changes

francesding reviewed Jan 10, 2025

View reviewed changes

samuelstanton added 2 commits January 16, 2025 12:06

support both numpy and pytorch arrays

29714fc

add numpy as dep

181cf5b

samuelstanton added 3 commits January 16, 2025 12:38

fix batching

3557b8d

update docstring

bfc23f0

update docstring

c845a02

0x00b1 reviewed Jan 22, 2025

View reviewed changes

samuelstanton and others added 3 commits February 28, 2025 16:55

add broken string test

be33c53

attempt string test fix

7b222c2

fix broken string array test

f897146

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add maximum mean discrepancy metric #56

add maximum mean discrepancy metric #56

samuelstanton commented Jan 10, 2025

kleinhenz Jan 10, 2025

samuelstanton Jan 11, 2025

samuelstanton Jan 11, 2025

samuelstanton Jan 11, 2025

francesding Jan 10, 2025

samuelstanton Jan 11, 2025

samuelstanton commented Jan 16, 2025

0x00b1 Jan 22, 2025

kleinhenz Jan 22, 2025

samuelstanton Jan 22, 2025

samuelstanton commented Apr 3, 2025

samuelstanton commented Apr 3, 2025 •

edited

Loading

add maximum mean discrepancy metric #56

Are you sure you want to change the base?

add maximum mean discrepancy metric #56

Conversation

samuelstanton commented Jan 10, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelstanton commented Jan 16, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelstanton commented Apr 3, 2025

samuelstanton commented Apr 3, 2025 • edited Loading

samuelstanton commented Apr 3, 2025 •

edited

Loading