ddsmu (#166)

* DD(s,mu) function for mocks/theory (#130) * Updated README.rst [ci skip] Weights are on `pip`. Changed repo files to be links. * add a DDsmu mocks function * remove extra slash * add tests of DDsmu_mocks * add name to authors * update to add a mu_max function parameter * bug fixes; verified output against kdcount for different mu_max for AVX, SSE42, and fallback * adding theory DDsmu; verified for different mu_max and all ISA against kdcount * update docs * include DDsmu in theory/tests * fix type error * forgot to remove other variable definition * Updating the docs for (theory) DDsmu * Reviewed the theory functions (still need comprehensive tests and update RTD docs) * Adding the new file, tests_common.h, to allow integration tests (exhaustive tests for new pair-counters). * My (broken) mocks code * Fixing bugs uncovered by doctests (which are still not failing the build) * Trying to solve the doctests failures and the warnings raised during compiling the docs for DDsmu * I have a suspicion that doctests are not failing the build because they are in the 'after_success' part. Moved the doctests into the tests section. Might solve #143 * Attempting to fix #144 * Fixed the Makefile for DDsmu tests * Added the tests for DDsmu_mocks into the Makefile * Whitespace changes only for better readability [ci skip] * Corrected the variable type for nmu_bins and some small changes for better code readability * The output file for DDsmu_mocks.DD really corresponds to DDsmu_mocks.RR (see #132) * Fixed the DDsmu_mocks tests * Changed the name of the DDsmu_mocks test from DD->RR. Put the name of each test on a new line * Attempting to fix travis failure (from doctest failure) * Another attempt at fixing the doctest failure on travis * Next attempt at fixing doctest failure * Small change to the auto-generated docs [ci skip] * Doctests are failing because numpy does not honour set_printoptions for structured arrays (numpy issue #5606). This numpy issue seems to have been solved in 1.12. Bumping the default travis numpy version to 1.12 * Still trying to fix doctest failures. Now removed testing for python3.3 and added python3.6 * Missed the 'then' in the if condition. Added a xcode9 image for osx tests * Added a python3.6 for osx and changed the python version to python2.7 for xcode6 and xcode7 * Corrected the miniconda installer filenames for python2 * Added the numpy version=1.7 for testing the minimum requirements on osx * Added C mode declaration for syntax highlighting [ci skip] * Made sure that mu_max is specified before nmu_bins. Changed the ordering in the python extension as well * Added example C codes for the DDsmu and DDsmu_mocks pair-counters * The case of a mis-placed dot (or how to break the build) * Enforce that mu_max is scalar and greater than 0 * pimax is not required for DDsmu_mocks. Correctly added the parx/pary/parz components into the pair-weight struct for DDsmu_mocks and DDrppi_mocks. Renamed variables to make context clearer (will need to be done for DDrppi_mocks as well) * Renamed sqr_sep to sqr_s and removed checks for pimax * Changed the kernel parameters to smax/smin from sqr_smax/sqr_smin * The AVX tests pass now for DDsmu_mocks * Fixed the INTEGRATION_TEST section for DDtheta_mocks * Updated docstrings in python bindings for DDsmu and DDsmu_mocks * Added docs for DDsmu and DDsmu_mocks. Fixed the docstring formatting (removed notes within function docstrings) * Added the missing variable for doctests * Renamed w(theta) to DD(theta) and changed some text formatting * I forgot to fix the DDsmu_mocks file for the doctest failure * DDsmu PR is now ready to be merged. Bumping version to 2.1 * README updated to show that github pages are no longer being published [ci skip] * Filled in some more missing docs/docstrings * Remove further references to github pages site [ci skip] * Adding in the fast_divide option to theory/DDsmu paircounter. Not tested * Fixing the typos in fast-divide part of DDsmu. Added in other changes as well -- oops * Added in the fast_divide option into the main python wrappers. Fixed build failure * Added entries for the upcoming versions and features [ci skip] * Hopefully fixing build failure * Attempting to fix warning during building docs * Add PR # to changelog
manodeep · Aug 17, 2018 · 24c73b2 · 24c73b2
1 parent c2ade4e
commit 24c73b2
Show file tree

Hide file tree

Showing 27 changed files with 396 additions and 263 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -43,7 +43,6 @@ matrix:
     #     - brew outdated xctool || brew upgrade xctool
     #     - brew tap homebrew/versions && brew install clang-omp
     #     - wget http://repo.continuum.io/miniconda/Miniconda-latest-MacOSX-x86_64.sh -O miniconda.sh
-
     - os: osx
       osx_image: xcode9
       compiler: clang
@@ -66,12 +65,6 @@ matrix:
       before_install:
         - wget http://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh -O miniconda.sh
 
-    # - os: osx
-    #   osx_image: xcode6.4
-    #   compiler: clang
-    #   env: COMPILER=clang FAMILY=clang V='Apple LLVM 7.0.0' PYTHON_VERSION=2.6 NUMPY_VERSION=1.7 DOCTEST=FALSE
-    #   before_install:
-    #     - wget http://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh -O miniconda.sh
 
     # - os: osx
     #   compiler: gcc

diff --git a/CHANGES.rst b/CHANGES.rst
@@ -7,16 +7,25 @@
 
 New features
 ------------
-- New pair counter `DD(s, mu)` for theory and mocks
 - conda installable package
+- GPU version
 
 
 2.1.0
 =======
 
+New features
+------------
+- New pair counter `DD(s, mu)` for theory and mocks (contributed by @nickhand,
+  in #130 and #132) [#166]
+
+
 Enhancements
 ------------
 - GSL version now specified and tested by Travis [#164]
+- Now possible to specify the number of Newton-Raphson steps to
+improve accuracy of approximate reciprocals. Available in `DD(rp, pi)` for mocks,
+and `DD(s, mu)` for both theory and mocks
 
 
 2.0.0

diff --git a/Corrfunc/mocks/DDrppi_mocks.py b/Corrfunc/mocks/DDrppi_mocks.py
@@ -19,9 +19,9 @@ def DDrppi_mocks(autocorr, cosmology, nthreads, pimax, binfile,
                  RA2=None, DEC2=None, CZ2=None, weights2=None,
                  is_comoving_dist=False,
                  verbose=False, output_rpavg=False,
-                 fast_divide=False, xbin_refine_factor=2,
-                 ybin_refine_factor=2, zbin_refine_factor=1,
-                 max_cells_per_dim=100,
+                 fast_divide_and_NR_steps=0,
+                 xbin_refine_factor=2, ybin_refine_factor=2,
+                 zbin_refine_factor=1, max_cells_per_dim=100,
                  c_api_timer=False, isa=r'fastest', weight_type=None):
     """
     Calculate the 2-D pair-counts corresponding to the projected correlation
@@ -169,12 +169,13 @@ def DDrppi_mocks(autocorr, cosmology, nthreads, pimax, binfile,
         suffer from numerical loss of precision and can not be trusted. If 
         you need accurate ``rpavg`` values, then pass in double precision 
         arrays for the particle positions.
-    
-    fast_divide : boolean (default false)
-        Boolean flag to replace the division in ``AVX`` implementation with an
-        approximate reciprocal, followed by two Newton-Raphson steps. Improves
-        runtime by ~15-20%. Loss of precision is at the 5-6th decimal place.
 
+    fast_divide_and_NR_steps: integer (default 0)
+        Replaces the division in ``AVX`` implementation with an approximate
+        reciprocal, followed by ``fast_divide_and_NR_steps`` of Newton-Raphson.
+        Can improve runtime by ~15-20% on older computers. Value of 0 uses
+        the standard division operation.
+    
     (xyz)bin_refine_factor : integer, default is (2,2,1); typically within [1-3]
         Controls the refinement on the cell sizes. Can have up to a 20% impact
         on runtime.
@@ -366,7 +367,7 @@ def DDrppi_mocks(autocorr, cosmology, nthreads, pimax, binfile,
                                          is_comoving_dist=is_comoving_dist,
                                          verbose=verbose,
                                          output_rpavg=output_rpavg,
-                                         fast_divide=fast_divide,
+                                         fast_divide_and_NR_steps=fast_divide_and_NR_steps,
                                          xbin_refine_factor=xbin_refine_factor,
                                          ybin_refine_factor=ybin_refine_factor,
                                          zbin_refine_factor=zbin_refine_factor,

diff --git a/Corrfunc/mocks/DDsmu_mocks.py b/Corrfunc/mocks/DDsmu_mocks.py
@@ -18,9 +18,9 @@ def DDsmu_mocks(autocorr, cosmology, nthreads, mu_max, nmu_bins, binfile,
                 RA2=None, DEC2=None, CZ2=None, weights2=None,
                 is_comoving_dist=False,
                 verbose=False, output_savg=False,
-                fast_divide=False, xbin_refine_factor=2,
-                ybin_refine_factor=2, zbin_refine_factor=1,
-                max_cells_per_dim=100,
+                fast_divide_and_NR_steps=0,
+                xbin_refine_factor=2, ybin_refine_factor=2,
+                zbin_refine_factor=1, max_cells_per_dim=100,
                 c_api_timer=False, isa='fastest', weight_type=None):
     """
     Calculate the 2-D pair-counts corresponding to the projected correlation
@@ -121,10 +121,11 @@ def DDsmu_mocks(autocorr, cosmology, nthreads, mu_max, nmu_bins, binfile,
         co-moving distance, rather than `cz`.
 
     weights1: array_like, real (float/double), optional
-        A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,).
-        `weight_type` specifies how these weights are used; results are returned
-        in the `weightavg` field.  If only one of weights1 and weights2 is
-        specified, the other will be set to uniform weights.
+        A scalar, or an array of weights of shape (n_weights, n_positions)
+        or (n_positions,). `weight_type` specifies how these weights are used;
+        results are returned in the `weightavg` field.  If only one of
+        ``weights1`` or ``weights2`` is specified, the other will be set
+        to uniform weights.
 
     RA2: array-like, real (float/double)
         The array of Right Ascensions for the second set of points. RA's
@@ -171,11 +172,12 @@ def DDsmu_mocks(autocorr, cosmology, nthreads, mu_max, nmu_bins, binfile,
         values, then pass in double precision arrays for the particle
         positions.
 
-    fast_divide: boolean (default false)
-        Boolean flag to replace the division in ``AVX`` implementation with an
-        approximate reciprocal, followed by a Newton-Raphson step. Improves
-        runtime by ~15-20%. Loss of precision is at the 5-6th decimal place.
-
+    fast_divide_and_NR_steps: integer (default 0)
+        Replaces the division in ``AVX`` implementation with an approximate
+        reciprocal, followed by ``fast_divide_and_NR_steps`` of Newton-Raphson.
+        Can improve runtime by ~15-20% on older computers. Value of 0 uses
+        the standard division operation.
+    
     (xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
         Controls the refinement on the cell sizes. Can have up to a 20% impact
         on runtime.
@@ -290,7 +292,7 @@ def DDsmu_mocks(autocorr, cosmology, nthreads, mu_max, nmu_bins, binfile,
                                         is_comoving_dist=is_comoving_dist,
                                         verbose=verbose,
                                         output_savg=output_savg,
-                                        fast_divide=fast_divide,
+                                        fast_divide_and_NR_steps=fast_divide_and_NR_steps,
                                         xbin_refine_factor=xbin_refine_factor,
                                         ybin_refine_factor=ybin_refine_factor,
                                         zbin_refine_factor=zbin_refine_factor,

diff --git a/Corrfunc/theory/DDsmu.py b/Corrfunc/theory/DDsmu.py
@@ -14,11 +14,12 @@
 
 
 def DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins, X1, Y1, Z1, weights1=None,
-           periodic=True, X2=None, Y2=None, Z2=None, weights2=None,
-           verbose=False, boxsize=0.0, output_savg=False,
-           xbin_refine_factor=2, ybin_refine_factor=2,
-           zbin_refine_factor=1, max_cells_per_dim=100,
-           c_api_timer=False, isa=r'fastest', weight_type=None):
+          periodic=True, X2=None, Y2=None, Z2=None, weights2=None,
+          verbose=False, boxsize=0.0, output_savg=False,
+          fast_divide_and_NR_steps=0,
+          xbin_refine_factor=2, ybin_refine_factor=2,
+          zbin_refine_factor=1, max_cells_per_dim=100,
+          c_api_timer=False, isa=r'fastest', weight_type=None):
     """
     Calculate the 2-D pair-counts corresponding to the redshift-space 
     correlation function, :math:`\\xi(s, \mu)` Pairs which are separated
@@ -111,6 +112,12 @@ def DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins, X1, Y1, Z1, weights1=No
         precision and can not be trusted. If you need accurate ``s``
         values, then pass in double precision arrays for the particle positions.
 
+    fast_divide_and_NR_steps: integer (default 0)
+        Replaces the division in ``AVX`` implementation with an approximate
+        reciprocal, followed by ``fast_divide_and_NR_steps`` of Newton-Raphson.
+        Can improve runtime by ~15-20% on older computers. Value of 0 uses
+        the standard division operation.
+    
     (xyz)bin_refine_factor: integer (default (2,2,1) typical values in [1-3])
         Controls the refinement on the cell sizes. Can have up to a 20% impact
         on runtime.
@@ -283,6 +290,7 @@ def DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins, X1, Y1, Z1, weights1=No
                                         verbose=verbose,
                                         boxsize=boxsize,
                                         output_savg=output_savg,
+                                        fast_divide_and_NR_steps=fast_divide_and_NR_steps,
                                         xbin_refine_factor=xbin_refine_factor,
                                         ybin_refine_factor=ybin_refine_factor,
                                         zbin_refine_factor=zbin_refine_factor,