diff --git a/docs/HTML/DataFrame.html b/docs/HTML/DataFrame.html index b827ef676..1aaa4396c 100644 --- a/docs/HTML/DataFrame.html +++ b/docs/HTML/DataFrame.html @@ -1305,6 +1305,16 @@

Visitors

See this document, DataFrameStatsVisitors.h, DataFrameMLVisitors.h, DataFrameFinancialVisitors.h, DataFrameTransformVisitors.h, and test/dataframe_tester[_2].cc for more examples and documentation.

+ I have been asked many times, why I chose the visitor pattern for algorithms as opposed to having member functions.
+ I had a few reasons:
+
    +
  1. If I had implemented them as member functions, I would have had 100's of member function in DataFrame -- I already have too many. It wouldn't be a good design
  2. +
  3. I wanted users to be able to incorporate their custom algorithms without touching the DataFrame codebase. If you follow a simple interface, you can write your custom visitor and use it in DataFrame, easily
  4. +
  5. Algorithms sometime have complex results. Sometimes the result is a single number. But sometimes the result of an algorithms could be a single or multiple vectors. That's not efficient to implement as a member function
  6. +
  7. I wanted the algorithms to be self-contained. That means a single object should contain the algorithm, parameters, and results
  8. +
  9. Because algorithms are self-contained, they can be passed to other algorithms to be used
  10. +
+

Numeric Generators

diff --git a/docs/HTML/read.html b/docs/HTML/read.html index 4596ea257..485862274 100644 --- a/docs/HTML/read.html +++ b/docs/HTML/read.html @@ -174,7 +174,15 @@ This is a convenient function (simple implementation) to restore a DataFrame from a string that was previously generated by calling to_string(). It utilizes the read() member function of DataFrame. - These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … + These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, …

+ + I have been asked why I implemented from_string instead of/before doing “from binary format”
+ Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out
+
    +
  1. Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference
  2. +
  3. In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency
  4. +
+ data_frame: A null terminated string that was generated by calling to_string(). It must contain a complete DataFrame
diff --git a/docs/HTML/write.html b/docs/HTML/write.html index 5d9089411..991ee14be 100644 --- a/docs/HTML/write.html +++ b/docs/HTML/write.html @@ -186,7 +186,15 @@ This is a convenient function (simple implementation) to convert a DataFrame into a string that could be restored later by calling from_string(). It utilizes the write() member function of DataFrame.
- These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, …
+ These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, …

+ + I have been asked why I implemented to_string instead of/before doing “to binary format”
+ Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out
+
    +
  1. Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference
  2. +
  3. In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency
  4. +
+ Ts: The list of types for all columns. A type should be specified only once