Added more docs

hosseinmoein · Oct 20, 2022 · 9f67bd9 · 9f67bd9
1 parent 606e46b
commit 9f67bd9
Show file tree

Hide file tree

Showing 3 changed files with 28 additions and 2 deletions.
diff --git a/docs/HTML/DataFrame.html b/docs/HTML/DataFrame.html
@@ -1305,6 +1305,16 @@ <H2><font color="blue">Visitors</font></H2>
     See this document, <I>DataFrameStatsVisitors.h, DataFrameMLVisitors.h, DataFrameFinancialVisitors.h, DataFrameTransformVisitors.h</I>, and <I>test/dataframe_tester[_2].cc</I> for more examples and documentation.
   </P>
 
+  I have been asked many times, why I chose the visitor pattern for algorithms as opposed to having member functions.<BR>
+  I had a few reasons:<BR>
+  <OL>
+    <LI>If I had implemented them as member functions, I would have had 100's of member function in DataFrame -- I already have too many. It wouldn't be a good design</LI>
+    <LI>I wanted users to be able to incorporate their custom algorithms without touching the DataFrame codebase. If you follow a simple interface, you can write your custom visitor and use it in DataFrame, easily</LI>
+    <LI>Algorithms sometime have complex results. Sometimes the result is a single number. But sometimes the result of an algorithms could be a single or multiple vectors. That's not efficient to implement as a member function</LI>
+    <LI>I wanted the algorithms to be self-contained. That means a single <I>object</I> should contain the algorithm, parameters, and results</LI>
+    <LI>Because algorithms are self-contained, they can be passed to other algorithms to be used</LI>
+  </OL>
+
 <HR>
 
   <H2><font color="blue">Numeric Generators</font></H2>

diff --git a/docs/HTML/read.html b/docs/HTML/read.html
@@ -174,7 +174,15 @@
       </td>
       <td>
         This is a convenient function (simple implementation) to restore a DataFrame from a string that was previously generated by calling to_string(). It utilizes the read() member function of DataFrame.<R>
-        These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <R>
+        These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <BR><BR>
+
+        I have been asked why I implemented from_string instead of/before doing “from binary format”<BR>
+        Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out<BR>
+        <OL>
+          <LI>Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference</LI>
+          <LI>In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency</LI>
+        </OL>
+
       </td>
       <td>
         <B>data_frame</B>: A null terminated string that was generated by calling to_string(). It must contain a complete DataFrame<BR>

diff --git a/docs/HTML/write.html b/docs/HTML/write.html
@@ -186,7 +186,15 @@
       </td>
       <td>
         This is a convenient function (simple implementation) to convert a DataFrame into a string that could be restored later by calling from_string(). It utilizes the write() member function of DataFrame.<BR>
-        These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <BR>
+		These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <BR><BR>
+
+        I have been asked why I implemented to_string instead of/before doing “to binary format”<BR>
+        Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out<BR>
+        <OL>
+          <LI>Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference</LI>
+          <LI>In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency</LI>
+        </OL>
+
       </td>
       <td>
         <B>Ts</B>: The list of types for all columns. A type should be specified only once<BR>