Skip to content

Commit

Permalink
Merge branch 'release/0.5.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
wmayner committed Jan 29, 2018
2 parents 38c0b2e + 468faf9 commit e9fb657
Show file tree
Hide file tree
Showing 5 changed files with 71 additions and 54 deletions.
7 changes: 2 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,16 @@ Installation issues
===================

Before opening an issue related to installation, please try to install PyEMD in
a fresh, empty Python 3 virtual environment and check that the problem
persists:
a fresh, empty Python virtual environment and check that the problem persists:

```shell
pip install virtualenvwrapper
mkvirtualenv -p `which python3` pyemd
# Now we're an empty Python 3 virtual environment
mkvirtualenv pyemd
pip install pyemd
```

PyEMD is not officially supported for (but may nonetheless work with) the following:

- Python 2
- Anaconda distributions
- Windows operating systems

Expand Down
95 changes: 57 additions & 38 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,10 @@ at the end of this document.**
Installation
------------

To install the latest release:

.. code:: bash
pip install pyemd
Before opening an issue related to installation, please try to install PyEMD in
a fresh, empty Python 3 virtual environment and check that the problem
persists.

Usage
-----
Expand Down Expand Up @@ -68,17 +62,19 @@ emd()

.. code:: python
emd(first_histogram, second_histogram, distance_matrix,
emd(first_histogram,
second_histogram,
distance_matrix,
extra_mass_penalty=-1.0)
*Arguments:*

- ``first_histogram`` *(np.ndarray)*: A 1D array of type ``np.float64`` of
length ``N``.
length *N*.
- ``second_histogram`` *(np.ndarray)*: A 1D array of ``np.float64`` of length
``N``.
*N*.
- ``distance_matrix`` *(np.ndarray)*: A 2D array of ``np.float64,`` of size at
least ``N × N``. This defines the underlying metric, or ground distance, by
least *N* × *N*. This defines the underlying metric, or ground distance, by
giving the pairwise distances between the histogram bins. It must represent a
metric; there is no warning if it doesn't.

Expand All @@ -91,27 +87,35 @@ emd()
not guaranteed to be a metric). The default value is ``-1.0``, which means the
maximum value in the distance matrix is used.

*Returns:* *(float)* The EMD value.

----

emd_with_flow()
~~~~~~~~~~~~~~~

.. code:: python
emd_with_flow(first_histogram, second_histogram, distance_matrix,
emd_with_flow(first_histogram,
second_histogram,
distance_matrix,
extra_mass_penalty=-1.0)
Arguments are the same as for ``emd()``.

*Returns:* *(tuple(float, list(list(float))))* The EMD value and the associated
minimum-cost flow.

----

emd_samples()
~~~~~~~~~~~~~

.. code:: python
emd_samples(first_array, second_array,
extra_mass_penalty=DEFAULT_EXTRA_MASS_PENALTY,
emd_samples(first_array,
second_array,
extra_mass_penalty=-1.0,
distance='euclidean',
normalized=True,
bins='auto',
Expand All @@ -126,36 +130,53 @@ emd_samples()

*Keyword Arguments:*

- ``extra_mass_penalty`` *(float)*: Same as for ``emd()``. ``bins`` (int or
string): The number of bins to include in the generated histogram. If a
string, must be one of the bin selection algorithms accepted by
``np.histogram()``. Defaults to 'auto', which gives the maximum of the
'sturges' and 'fd' estimators.
- ``distance_matrix`` *(string or function)*: A string or function implementing
- ``extra_mass_penalty`` *(float)*: Same as for ``emd()``.
- ``distance`` *(string or function)*: A string or function implementing
a metric on a 1D ``np.ndarray``. Defaults to the Euclidean distance. Currently
limited to 'euclidean' or your own function, which must take a 1D array and
return a square 2D array of pairwise distances. - ``normalized`` (boolean): If
true, treat histograms as fractions of the dataset. If false, treat histograms
as counts. In the latter case the EMD will vary greatly by array length.
return a square 2D array of pairwise distances.
- ``normalized`` (*boolean*): If true (default), treat histograms as fractions
of the dataset. If false, treat histograms as counts. In the latter case the
EMD will vary greatly by array length.
- ``bins`` *(int or string)*: The number of bins to include in the generated
histogram. If a string, must be one of the bin selection algorithms accepted
by ``np.histogram()``. Defaults to ``'auto'``, which gives the maximum of the
'sturges' and 'fd' estimators.
- ``range`` *(tuple(int, int))*: The lower and upper range of the bins, passed
to ``numpy.histogram()``. Defaults to the range of the union of
``first_array`` and `second_array``.` Note: if the given range is not a
``first_array`` and ``second_array``. Note: if the given range is not a
superset of the default range, no warning will be given.

*Returns:* *(float)* The EMD value between the histograms of ``first_array`` and
``second_array``.

----

Limitations and Caveats
-----------------------

- ``distance_matrix`` is assumed to represent a metric; there is no check to
ensure that this is true. See the documentation in ``pyemd/lib/emd_hat.hpp``
for more information.
- The flow matrix does not contain the flows to/from the extra mass bin.
- The histograms and distance matrix must be numpy arrays of type
``np.float64``. The original C++ template function can accept any numerical
C++ type, but this wrapper only instantiates the template with ``double``
(Cython converts ``np.float64`` to ``double``). If there's demand, I can add
support for other types.
- ``emd()`` and ``emd_with_flow()``:

- The ``distance_matrix`` is assumed to represent a metric; there is no check
to ensure that this is true. See the documentation in
``pyemd/lib/emd_hat.hpp`` for more information.
- The histograms and distance matrix must be numpy arrays of type
``np.float64``. The original C++ template function can accept any numerical
C++ type, but this wrapper only instantiates the template with ``double``
(Cython converts ``np.float64`` to ``double``). If there's demand, I can add
support for other types.

- ``emd_with_flow()``:

- The flow matrix does not contain the flows to/from the extra mass bin.

- ``emd_samples()``:

- Using the default ``bins='auto'`` results in an extra call to
``np.histogram()`` to determine the bin lengths, since `the NumPy
bin-selectors are not exposed in the public API
<https://github.com/numpy/numpy/issues/10183>`_. For performance, you may
want to set the bins yourself.


Contributing
Expand Down Expand Up @@ -187,9 +208,8 @@ Credit
Please cite these papers if you use this code:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ofir Pele and Michael Werman, "A linear time histogram metric for improved SIFT
matching," in *Computer Vision - ECCV 2008*, Marseille, France, 2008, pp.
495-508.
Ofir Pele and Michael Werman. A linear time histogram metric for improved SIFT
matching. *Computer Vision - ECCV 2008*, Marseille, France, 2008, pp. 495-508.

.. code-block:: latex

Expand All @@ -203,9 +223,8 @@ matching," in *Computer Vision - ECCV 2008*, Marseille, France, 2008, pp.
publisher={Springer}
}

Ofir Pele and Michael Werman, "Fast and robust earth mover's distances," in
*Proc. 2009 IEEE 12th Int. Conf. on Computer Vision*, Kyoto, Japan, 2009, pp.
460-467.
Ofir Pele and Michael Werman. Fast and robust earth mover's distances. *Proc.
2009 IEEE 12th Int. Conf. on Computer Vision*, Kyoto, Japan, 2009, pp. 460-467.

.. code-block:: latex

Expand Down
2 changes: 1 addition & 1 deletion pyemd/__about__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"""PyEMD metadata"""

__title__ = 'pyemd'
__version__ = '0.5.0'
__version__ = '0.5.1'
__description__ = ("A Python wrapper for Ofir Pele and Michael Werman's "
"implementation of the Earth Mover's Distance.")
__author__ = 'Will Mayner'
Expand Down
18 changes: 9 additions & 9 deletions pyemd/emd.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def emd_with_flow(np.ndarray[np.float64_t, ndim=1, mode="c"] first_histogram,
matrix is used.
Returns:
(float, list(list(float))): The EMD value and the associated
(tuple(float, list(list(float)))): The EMD value and the associated
minimum-cost flow.
Raises:
Expand Down Expand Up @@ -148,7 +148,7 @@ def emd_samples(first_array,
range=None):
u"""Return the EMD between the histograms of two arrays.
See `emd()` for more information about the EMD.
See ``emd()`` for more information about the EMD.
Note:
Pairwise ground distances are taken from the center of the bins.
Expand All @@ -167,17 +167,17 @@ def emd_samples(first_array,
then the resulting distance is not guaranteed to be a metric). The
default value is -1, which means the maximum value in the distance
matrix is used.
distance (string or function): A string or function implementing
a metric on a 1D ``np.ndarray``. Defaults to the Euclidean distance.
Currently limited to 'euclidean' or your own function, which must
take a 1D array and return a square 2D array of pairwise distances.
normalized (boolean): If true (default), treat histograms as fractions
of the dataset. If false, treat histograms as counts. In the latter
case the EMD will vary greatly by array length.
bins (int or string): The number of bins to include in the generated
histogram. If a string, must be one of the bin selection algorithms
accepted by ``np.histogram()``. Defaults to 'auto', which gives the
maximum of the 'sturges' and 'fd' estimators.
distance_matrix (string or function): A string or function implementing
a metric on a 1D ``np.ndarray``. Defaults to the Euclidean distance.
Currently limited to 'euclidean' or your own function, which must
take a 1D array and return a square 2D array of pairwise distances.
normalized (boolean): If true, treat histograms as fractions of the
dataset. If false, treat histograms as counts. In the latter case
the EMD will vary greatly by array length.
range (tuple(int, int)): The lower and upper range of the bins, passed
to ``numpy.histogram()``. Defaults to the range of the union of
``first_array`` and `second_array``.` Note: if the given range is
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import io
import os
import sys
from warnings import warn
Expand Down Expand Up @@ -83,7 +84,7 @@ def finalize_options(self):
}


with open('README.rst', 'r') as f:
with io.open('README.rst', encoding='utf-8') as f:
README = f.read()

ABOUT = {}
Expand Down

0 comments on commit e9fb657

Please sign in to comment.