[paper] Update paper in line with review comments.

annoviko · annoviko · commit 48e902404b9c · 2019-04-06T22:41:35.000+03:00
diff --git a/paper/paper.bib b/paper/paper.bib
@@ -1,29 +1,3 @@
-@article{Novikov2014,
-    author    = {Novikov, A. V. and Benderskaya, E. N.},
-    title     = {Oscillatory neural networks based on the Kuramoto model for cluster analysis},
-    journal   = {Pattern Recognition and Image Analysis},
-    year      = {2014},
-    month     = {Sep},
-    day       = {01},
-    volume    = {24},
-    number    = {3},
-    pages     = {365--371},
-    abstract  = {This paper presents the results of a study of synchronization processes in oscillatory neural networks of various structures based on the Kuramoto model. The estimates of synchronization processes occurring in the oscillatory networks are examined. The results of studying the practical application of oscillatory networks for solving cluster analysis problems are presented.},
-    issn      = {1555-6212},
-    doi       = {10.1134/S1054661814030146},
-    url       = {https://doi.org/10.1134/S1054661814030146}
-}
-
-@article{Cumin2007,
-    author    = {Unsworth, C.P. and Cumin D.},
-    year      = {2007},
-    month     = {02},
-    title     = {Generalising the Kuramoto Model for the study of Neuronal Synchronisation in the Brain},
-    volume    = {226},
-    journal   = {http://www.esc.auckland.ac.nz/research/tech/esc-tr-638.pdf},
-    doi       = {10.1016/j.physd.2006.12.004}
-}
-
 @book{Oliphant2006,
     author    = {Oliphant, Travis},
     year      = {2006},
diff --git a/paper/paper.md b/paper/paper.md
@@ -23,19 +23,17 @@ bibliography: paper.bib
 
 # Introduction
 
-The exponential growth of data from sensors, devices, mobile networks, social media, and other data sources leads to an appearance of new disciplines such as cluster analysis and machine learning, and therefore information requires processing in different areas like medicine, marketing, science, engineering, etc. Data processing can be performed to extract clusters and structures to understand the nature of the data, for example, finding customer groups with similar needs and identifying their behavior in each group. And using obtained information make decisions or create models that help to predict further behavior. As a result, more and more algorithms and methods appear to resolve faced problems. PyClustering is an open source data mining library written in Python and C++ that provides a wide range of clustering algorithms and methods including bio-inspired oscillatory networks for data analysis. PyClustering is mostly focused on cluster analysis to make it more accessible and understandable for users. The library is distributed under GNU Public License and provides a comprehensive interface that makes it easy to use in every project.
+A variety of scientific and industrial sectors continue to experience exponential growth in their data volumes, and so automatic categorization techniques have become standard tools for dataset exploration. Automatic categorization techniques -- typically referred to as clustering -- help expose the structure of a dataset. For example, the generated clusters might each correspond to a customer group with reasonably similar needs and behavior. Because the resulting clusters are often used as building blocks for higher-level -- often custom -- predictive models, researchers have continually tweaked and invented new clustering techniques. PyClustering is an open source data mining library written in Python and C++ that provides a wide range of clustering algorithms and methods, including bio-inspired oscillatory networks. PyClustering is mostly focused on cluster analysis to make it more accessible and understandable for users.
 
 # Summary
 
-The PyClustering library is a Python and C++ data mining library focused on cluster analysis. The library provides an implementation of each algorithm and method in the Python and C++ programming languages. By default, the C++ part of the library is used for processing in order to achieve maximum performance. This is especially relevant for algorithms that are based on oscillatory networks, whose dynamic is described by a system of differential equations and where the C++ implementation is more suitable. If PyClustering detects that it is not possible to delegate th computation to C++ part, for example, in case of an unsupported hardware platform or operating system (current version 0.8.2 supports x86 and x86_64 for Windows and Linux operating systems), then the Python implementation is used. Python supports a wide range of platforms and operating systems and that ensures high portability of the library. Such an architecture gives a balance between portability and performance. PyClustering uses the NumPy [@Oliphant2006] package to increase the performance of the Python implementation. NumPy is a fundamental package for scientific computing that provides efficient operations on large N-dimensional arrays.
+The PyClustering library is a Python and C++ data mining library focused on cluster analysis. By default, the C++ part of the library is used for processing in order to achieve maximum performance. This is especially relevant for algorithms that are based on oscillatory networks, whose dynamics are governed by a system of differential equations. If support for a C++ compiler is not detected, PyClustering falls back to pure Python implementations of all kernels. In order to increase the performance of the Python implementations, PyClustering makes use of the NumPy (Oliphant, 2006) library for its array manipulations.
 
-PyClustering provides a separate and highly optimized implementation of clustering algorithms using parallel computing using C++ 11, 14 (`std::thread`, `std::async`, and in case of Windows functionality that is provided by `ppl.h`), without any third-party code and it can therefore easily integrated in any C++ project as a library or as some part of it. In other words, PyClustering usage is not restricted to the Python language and corresponding dependencies. The C++ implementation of the library is based on the C++14 standard and can be built using common compilers, including gcc, clang, mingw, and VS2015. Such flexibility allows developers or scientists to focus on their own projects and not think about library integration and implementation details.
+PyClustering provides optimized, parallel C++14 clustering implementations; on most platforms, threading is provided by std::thread, though the Parallel Patterns Library is used for Windows. Due to the standardization of these threading libraries, PyClustering is simple to integrate into pre-existing projects.
 
-The Python implementation uses the SciPy [@SciPy], MatPlotLib [@Hunter2007], NumPy, and Pillow packages. SciPy and NumPy are mandatory dependencies that are used for computing purposes. The MatPlotLib and Pillow packages are optional and are used for visualization services. If these latter two packages are not installed, the PyClustering visualization tools are not available. The PyClustering visualization services display, for example, clustering results, data and its clusters in N-dimensional space, image segments, histograms, algorithm-specific features, dynamics of oscillatory and neural network outputs, etc. Visualization makes the clustering outcome easier to understand and useful for research and educational purposes, especially in case of complex clustering algorithms. For example, in the case of algorithms that are based on oscillatory networks, synchronization processes should be visualized in order to understand the clustering results.
+The core Python dependencies of PyClustering are NumPy and SciPy (Jones, Oliphant, Peterson, et al., 2019), and MatPlotLib (Hunter, 2007) and Pillow are required for visualization support. The visualization functionality includes 2D and 3D plots of the cluster embeddings, image segments, and, in the case of oscillatory networks, graphs of the synchronization processes.
 
-One of the unique features of the library is a collection of oscillatory networks for cluster analysis, graph coloring, and image segmentation. Oscillatory networks are biologically plausible neural networks that use synchronization processes for solving practical problems. Formally, oscillatory neural networks are nonlinear dynamic systems in which the neuron is an oscillating element called an oscillator. There is an assumption that the synchronization processes between neurons in the brain are used to implement cognitive functions [@Novikov2014][@Cumin2007]. Thus, oscillatory networks are of great interest because they allow to research mechanisms that synchronize the neuronal activity at the model level.
-
-The PyClustering library is available on PyPi and from a github repository. Since the first release on PyPi in 2014, it has been downloaded more than 141.000 times. The quality of the library is supported by static and dynamic analyzers, such as cppcheck, scan-build, and valgrid [@Nethercote2007], including compilers gcc, clang, and VS2015. Code coverage is more than 93% that is ensured by unit and integration tests (there are over 2.200 tests). Each commit to the repository triggers building, analysis, and testing on CI services such as travis-ci or appveyor. PyClustering provides fully-documented code for each library version, including examples, math and algorithms description, and installation instructions. The API documentation is generated by doxygen without any warnings and notes to ensure completeness.
+The PyClustering library is available on PyPi and from a github repository. Since the first release on PyPi in 2014, it has been downloaded more than 141.000 times. The quality of the library is supported by static and dynamic analyzers, such as cppcheck, scan-build, and valgrind [@Nethercote2007]. More than 93% code coverage is provided by more than 2200 unit and integration tests. Each commit to the repository triggers building, analysis, and testing on CI services such as travis-ci or appveyor. PyClustering provides fully-documented code for each library version, including examples, math and algorithms description, and installation instructions.
 
 # Clustering Algorithms