Skip to content

Commit

Permalink
Merge branch 'release/0.5.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
wmayner committed Jan 22, 2018
2 parents ef08133 + 1eec3c7 commit 38c0b2e
Show file tree
Hide file tree
Showing 17 changed files with 685 additions and 289 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
__pycache__
.gitconfig
.cache
.tox
.env
Expand Down
8 changes: 6 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,14 @@ sudo: false
language: python
python:
- '2.7'
- '3.4'
- '3.5'
- '3.6'
install:
install:
- pip install -r dev_requirements.txt
- make build
- pip uninstall --yes -r dev_requirements.txt
- pip install tox-travis
- pip install cython
script: tox
notifications:
email: false
Expand Down
34 changes: 26 additions & 8 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,30 @@
default:
python setup.py build_ext -b .
.PHONY: default test build clean dist test-dist check-dist build-dist clean-dist

build:
cython -v -t --cplus pyemd/emd.pyx
src = pyemd
dist_dir = dist

clean:
rm -rf build
rm -rf pyemd/emd.so
default: build

test: default
test: build
py.test

build: clean
python setup.py build_ext -b .

clean:
rm -f pyemd/*.so

dist: build-dist check-dist
twine upload $(dist_dir)/*

test-dist: build-dist check-dist
twine upload --repository-url https://test.pypi.org/legacy/ $(dist_dir)/*

check-dist:
python setup.py check --restructuredtext --strict

build-dist: clean-dist
python setup.py sdist bdist_wheel --dist-dir=$(dist_dir)

clean-dist:
rm -rf $(dist_dir)
134 changes: 103 additions & 31 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,18 @@
:target: https://wiki.python.org/moin/Python2orPython3
:alt: Python versions badge

**************************
PyEMD: Fast EMD for Python
**************************
==========================

PyEMD is a Python wrapper for `Ofir Pele and Michael Werman's implementation
<http://www.ariel.ac.il/sites/ofirpele/fastemd/code/>`_ of the `Earth Mover's
<http://ofirpele.droppages.com/>`_ of the `Earth Mover's
Distance <http://en.wikipedia.org/wiki/Earth_mover%27s_distance>`_ that allows
it to be used with NumPy. **If you use this code, please cite the papers listed
at the end of this document.**


Installation
~~~~~~~~~~~~
------------

To install the latest release:

Expand All @@ -30,15 +29,15 @@ persists.


Usage
~~~~~
-----

.. code:: python
>>> from pyemd import emd
>>> import numpy as np
>>> first_histogram = np.array([0.0, 1.0])
>>> second_histogram = np.array([5.0, 3.0])
>>> distance_matrix = np.array([[0.0, 0.5],
>>> distance_matrix = np.array([[0.0, 0.5],
... [0.5, 0.0]])
>>> emd(first_histogram, second_histogram, distance_matrix)
3.5
Expand All @@ -51,28 +50,102 @@ You can also get the associated minimum-cost flow:
>>> emd_with_flow(first_histogram, second_histogram, distance_matrix)
(3.5, [[0.0, 0.0], [0.0, 1.0]])
You can also calculate the EMD directly from two arrays of observations:

.. code:: python
>>> from pyemd import emd_samples
>>> first_array = [1, 2, 3, 4]
>>> second_array = [2, 3, 4, 5]
>>> emd_samples(first_array, second_array, bins=2)
0.5
Documentation
-------------

API
~~~
emd()
~~~~~

.. code:: python
emd(first_histogram, second_histogram, distance_matrix)
emd(first_histogram, second_histogram, distance_matrix,
extra_mass_penalty=-1.0)
*Arguments:*

- ``first_histogram`` *(np.ndarray)*: A 1D array of type ``np.float64`` of
length ``N``.
- ``second_histogram`` *(np.ndarray)*: A 1D array of ``np.float64`` of length
``N``.
- ``distance_matrix`` *(np.ndarray)*: A 2D array of ``np.float64,`` of size at
least ``N × N``. This defines the underlying metric, or ground distance, by
giving the pairwise distances between the histogram bins. It must represent a
metric; there is no warning if it doesn't.

- ``first_histogram``: A 1-dimensional numpy array of type ``np.float64``, of
length :math:`N`.
- ``second_histogram``: A 1-dimensional numpy array of type ``np.float64``, of
length :math:`N`.
- ``distance_matrix``: A 2-dimensional array of type ``np.float64``, of size at
least :math:`N \times N`. This defines the underlying metric, or ground
distance, by giving the pairwise distances between the histogram bins. It
must represent a metric; there is no warning if it doesn't.
*Keyword Arguments:*

The arguments to ``emd_with_flow`` are the same.
- ``extra_mass_penalty`` *(float)*: The penalty for extra mass. If you want the
resulting distance to be a metric, it should be at least half the diameter of
the space (maximum possible distance between any two points). If you want
partial matching you can set it to zero (but then the resulting distance is
not guaranteed to be a metric). The default value is ``-1.0``, which means the
maximum value in the distance matrix is used.

----

emd_with_flow()
~~~~~~~~~~~~~~~

.. code:: python
emd_with_flow(first_histogram, second_histogram, distance_matrix,
extra_mass_penalty=-1.0)
Arguments are the same as for ``emd()``.

----

emd_samples()
~~~~~~~~~~~~~

.. code:: python
emd_samples(first_array, second_array,
extra_mass_penalty=DEFAULT_EXTRA_MASS_PENALTY,
distance='euclidean',
normalized=True,
bins='auto',
range=None)
*Arguments:*

- ``first_array`` *(Iterable)*: A 1D array of samples used to generate a
histogram.
- ``second_array`` *(Iterable)*: A 1D array of samples used to generate a
histogram.

*Keyword Arguments:*

- ``extra_mass_penalty`` *(float)*: Same as for ``emd()``. ``bins`` (int or
string): The number of bins to include in the generated histogram. If a
string, must be one of the bin selection algorithms accepted by
``np.histogram()``. Defaults to 'auto', which gives the maximum of the
'sturges' and 'fd' estimators.
- ``distance_matrix`` *(string or function)*: A string or function implementing
a metric on a 1D ``np.ndarray``. Defaults to the Euclidean distance. Currently
limited to 'euclidean' or your own function, which must take a 1D array and
return a square 2D array of pairwise distances. - ``normalized`` (boolean): If
true, treat histograms as fractions of the dataset. If false, treat histograms
as counts. In the latter case the EMD will vary greatly by array length.
- ``range`` *(tuple(int, int))*: The lower and upper range of the bins, passed
to ``numpy.histogram()``. Defaults to the range of the union of
``first_array`` and `second_array``.` Note: if the given range is not a
superset of the default range, no warning will be given.

----

Limitations and Caveats
~~~~~~~~~~~~~~~~~~~~~~~
-----------------------

- ``distance_matrix`` is assumed to represent a metric; there is no check to
ensure that this is true. See the documentation in ``pyemd/lib/emd_hat.hpp``
Expand All @@ -86,34 +159,33 @@ Limitations and Caveats


Contributing
~~~~~~~~~~~~
------------

To help develop PyEMD, fork the project on GitHub and install the requirements
with ``pip``.
with ``pip install -r requirements.txt``.

The ``Makefile`` defines some tasks to help with development:

* ``default``: compile the Cython code into C++ and build the C++ into a Python
extension, using the ``setup.py`` build command
* ``build``: same as default, but using the ``cython`` command
* ``clean``: remove the build directory and the compiled C++ extension
* ``test``: run unit tests with ``py.test``
- ``test``: Run the test suite
- ``build`` Generate and compile the Cython extension
- ``clean``: Remove the compiled Cython extension
- ``default``: Run ``build``

Tests for different Python environments can be run with ``tox``.

Tests for different Python environments can be run by installing ``tox`` with
``pip install tox`` and running the ``tox`` command.

Credit
~~~~~~
------

- All credit for the actual algorithm and implementation goes to `Ofir Pele
<http://www.ariel.ac.il/sites/ofirpele/>`_ and `Michael Werman
<http://www.cs.huji.ac.il/~werman/>`_. See the `relevant paper
<http://www.seas.upenn.edu/~ofirpele/publications/ICCV2009.pdf>`_.
- Thanks to the Cython devlopers for making this kind of wrapper relatively
- Thanks to the Cython developers for making this kind of wrapper relatively
easy to write.

Please cite these papers if you use this code:
``````````````````````````````````````````````
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ofir Pele and Michael Werman, "A linear time histogram metric for improved SIFT
matching," in *Computer Vision - ECCV 2008*, Marseille, France, 2008, pp.
Expand Down
3 changes: 2 additions & 1 deletion conftest.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# conftest.py


collect_ignore = ["setup.py", ".pythonrc.py", "build"]
collect_ignore = ["setup.py", "build", "dist"]
1 change: 1 addition & 0 deletions dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Cython >=0.20.2
4 changes: 4 additions & 0 deletions dist_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
docutils
pygments
twine
wheel
2 changes: 1 addition & 1 deletion pyemd/__about__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"""PyEMD metadata"""

__title__ = 'pyemd'
__version__ = '0.4.4'
__version__ = '0.5.0'
__description__ = ("A Python wrapper for Ofir Pele and Michael Werman's "
"implementation of the Earth Mover's Distance.")
__author__ = 'Will Mayner'
Expand Down
13 changes: 10 additions & 3 deletions pyemd/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,14 @@
>>> emd_with_flow(first_signature, second_signature, distance_matrix)
(3.5, [[0.0, 0.0], [0.0, 1.0]])
You can also calculate the EMD directly from two arrays of observations:
>>> from pyemd import emd_samples
>>> first_array = [1,2,3,4]
>>> second_array = [2,3,4,5]
>>> emd_samples(first_array, second_array, bins=2)
0.5
Limitations and Caveats
~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -59,10 +67,9 @@
easy to write.
:copyright: Copyright (c) 2014-2017 Will Mayner.
:copyright: Copyright (c) 2014-2018 Will Mayner.
:license: See the LICENSE file.
"""

from .__about__ import *
from .emd import emd
from .emd import emd_with_flow
from .emd import emd, emd_with_flow, emd_samples
Loading

0 comments on commit 38c0b2e

Please sign in to comment.