Skip to content

Commit

Permalink
Added more docs
Browse files Browse the repository at this point in the history
  • Loading branch information
hosseinmoein committed Oct 20, 2022
1 parent 606e46b commit 9f67bd9
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 2 deletions.
10 changes: 10 additions & 0 deletions docs/HTML/DataFrame.html
Original file line number Diff line number Diff line change
Expand Up @@ -1305,6 +1305,16 @@ <H2><font color="blue">Visitors</font></H2>
See this document, <I>DataFrameStatsVisitors.h, DataFrameMLVisitors.h, DataFrameFinancialVisitors.h, DataFrameTransformVisitors.h</I>, and <I>test/dataframe_tester[_2].cc</I> for more examples and documentation.
</P>

I have been asked many times, why I chose the visitor pattern for algorithms as opposed to having member functions.<BR>
I had a few reasons:<BR>
<OL>
<LI>If I had implemented them as member functions, I would have had 100's of member function in DataFrame -- I already have too many. It wouldn't be a good design</LI>
<LI>I wanted users to be able to incorporate their custom algorithms without touching the DataFrame codebase. If you follow a simple interface, you can write your custom visitor and use it in DataFrame, easily</LI>
<LI>Algorithms sometime have complex results. Sometimes the result is a single number. But sometimes the result of an algorithms could be a single or multiple vectors. That's not efficient to implement as a member function</LI>
<LI>I wanted the algorithms to be self-contained. That means a single <I>object</I> should contain the algorithm, parameters, and results</LI>
<LI>Because algorithms are self-contained, they can be passed to other algorithms to be used</LI>
</OL>

<HR>

<H2><font color="blue">Numeric Generators</font></H2>
Expand Down
10 changes: 9 additions & 1 deletion docs/HTML/read.html
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,15 @@
</td>
<td>
This is a convenient function (simple implementation) to restore a DataFrame from a string that was previously generated by calling to_string(). It utilizes the read() member function of DataFrame.<R>
These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <R>
These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <BR><BR>

I have been asked why I implemented from_string instead of/before doing “from binary format”<BR>
Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out<BR>
<OL>
<LI>Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference</LI>
<LI>In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency</LI>
</OL>

</td>
<td>
<B>data_frame</B>: A null terminated string that was generated by calling to_string(). It must contain a complete DataFrame<BR>
Expand Down
10 changes: 9 additions & 1 deletion docs/HTML/write.html
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,15 @@
</td>
<td>
This is a convenient function (simple implementation) to convert a DataFrame into a string that could be restored later by calling from_string(). It utilizes the write() member function of DataFrame.<BR>
These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <BR>
These functions could be used to transmit a DataFrame from one place to another or store a DataFrame in databases, caches, … <BR><BR>

I have been asked why I implemented to_string instead of/before doing “to binary format”<BR>
Implementing a binary format as a form of serialization is a legit ask and I will add that option when I find time to implement it. But implementing a binary format is more involved. And binary format is not always more efficient than string format. Two issues stand out<BR>
<OL>
<LI>Consider Options market data. Options' prices and sizes are usually smaller numbers. For example, consider the number 0.5. In string format that is 3 bytes ".5|". In binary format it is always 8 bytes. So, if you have a dataset with millions/billions of this kind of numbers, it makes a significant difference</LI>
<LI>In binary format you must deal with big-endian vs. little-endian. It is a pain in the neck and affects efficiency</LI>
</OL>

</td>
<td>
<B>Ts</B>: The list of types for all columns. A type should be specified only once<BR>
Expand Down

0 comments on commit 9f67bd9

Please sign in to comment.