Skip to content

Commit 7d8bcd8

Browse files
kosiewtimsaucer
andauthored
Partial fix for 1078: Enhance DataFrame Formatter Configuration with Memory and Display Controls (#1119)
* feat: add configurable max table bytes and min table rows for DataFrame display * Revert "feat: add configurable max table bytes and min table rows for DataFrame display" This reverts commit f9b78fa. * feat: add FormatterConfig for configurable DataFrame display options * refactor: simplify attribute extraction in get_formatter_config function * refactor: remove hardcoded constants and use FormatterConfig for display options * refactor: simplify record batch collection by using FormatterConfig for display options * feat: add max_memory_bytes, min_rows_display, and repr_rows parameters to DataFrameHtmlFormatter * feat: add tests for HTML formatter row display settings and memory limit * refactor: extract Python formatter retrieval into a separate function * Revert "feat: add tests for HTML formatter row display settings and memory limit" This reverts commit e089d7b. * feat: add tests for HTML formatter row and memory limit configurations * Revert "feat: add tests for HTML formatter row and memory limit configurations" This reverts commit 4090fd2. * feat: add tests for new parameters and validation in DataFrameHtmlFormatter * Reorganize tests * refactor: rename and restructure formatter functions for clarity and maintainability * feat: implement PythonFormatter struct and refactor formatter retrieval for improved clarity * refactor: improve comments and restructure FormatterConfig usage in PyDataFrame * Add DataFrame usage guide with HTML rendering customization options (#1108) * docs: enhance user guide with detailed DataFrame operations and examples * move /docs/source/api/dataframe.rst into user-guide * docs: remove DataFrame API documentation * docs: fix formatting inconsistencies in DataFrame user guide * Two minor corrections to documentation rendering --------- Co-authored-by: Tim Saucer <[email protected]> * Update documentation * refactor: streamline HTML rendering documentation * refactor: extract validation logic into separate functions for clarity * Implement feature X to enhance user experience and optimize performance * feat: add validation method for FormatterConfig to ensure positive integer values * add comment - ensure minimum rows are collected even if memory or row limits are hit * Update html_formatter documentation * update tests * remove unused type hints from imports in html_formatter.py * remove redundant tests for DataFrameHtmlFormatter and clean up assertions * refactor get_attr function to support generic default values * build_formatter_config_from_python return PyResult * fix ruff errors * trigger ci * fix: remove redundant newline in test_custom_style_provider_html_formatter * add more tests * trigger ci * Fix ruff errors * fix clippy error * feat: add validation for parameters in configure_formatter * test: add tests for invalid parameters in configure_formatter * Fix ruff errors --------- Co-authored-by: Tim Saucer <[email protected]>
1 parent 15b96c4 commit 7d8bcd8

File tree

4 files changed

+413
-68
lines changed

4 files changed

+413
-68
lines changed

docs/source/user-guide/dataframe.rst

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -75,13 +75,17 @@ You can customize how DataFrames are rendered in HTML by configuring the formatt
7575
7676
# Change the default styling
7777
configure_formatter(
78-
max_rows=50, # Maximum number of rows to display
79-
max_width=None, # Maximum width in pixels (None for auto)
80-
theme="light", # Theme: "light" or "dark"
81-
precision=2, # Floating point precision
82-
thousands_separator=",", # Separator for thousands
83-
date_format="%Y-%m-%d", # Date format
84-
truncate_width=20 # Max width for string columns before truncating
78+
max_cell_length=25, # Maximum characters in a cell before truncation
79+
max_width=1000, # Maximum width in pixels
80+
max_height=300, # Maximum height in pixels
81+
max_memory_bytes=2097152, # Maximum memory for rendering (2MB)
82+
min_rows_display=20, # Minimum number of rows to display
83+
repr_rows=10, # Number of rows to display in __repr__
84+
enable_cell_expansion=True,# Allow expanding truncated cells
85+
custom_css=None, # Additional custom CSS
86+
show_truncation_message=True, # Show message when data is truncated
87+
style_provider=None, # Custom styling provider
88+
use_shared_styles=True # Share styles across tables
8589
)
8690
8791
The formatter settings affect all DataFrames displayed after configuration.
@@ -113,6 +117,25 @@ For advanced styling needs, you can create a custom style provider:
113117
# Apply the custom style provider
114118
configure_formatter(style_provider=MyStyleProvider())
115119
120+
Performance Optimization with Shared Styles
121+
-------------------------------------------
122+
The ``use_shared_styles`` parameter (enabled by default) optimizes performance when displaying
123+
multiple DataFrames in notebook environments:
124+
125+
.. code-block:: python
126+
from datafusion.html_formatter import StyleProvider, configure_formatter
127+
# Default: Use shared styles (recommended for notebooks)
128+
configure_formatter(use_shared_styles=True)
129+
130+
# Disable shared styles (each DataFrame includes its own styles)
131+
configure_formatter(use_shared_styles=False)
132+
133+
When ``use_shared_styles=True``:
134+
- CSS styles and JavaScript are included only once per notebook session
135+
- This reduces HTML output size and prevents style duplication
136+
- Improves rendering performance with many DataFrames
137+
- Applies consistent styling across all DataFrames
138+
116139
Creating a Custom Formatter
117140
---------------------------
118141

@@ -177,3 +200,18 @@ You can also use a context manager to temporarily change formatting settings:
177200
178201
# Back to default formatting
179202
df.show()
203+
204+
Memory and Display Controls
205+
---------------------------
206+
207+
You can control how much data is displayed and how much memory is used for rendering:
208+
209+
.. code-block:: python
210+
211+
configure_formatter(
212+
max_memory_bytes=4 * 1024 * 1024, # 4MB maximum memory for display
213+
min_rows_display=50, # Always show at least 50 rows
214+
repr_rows=20 # Show 20 rows in __repr__ output
215+
)
216+
217+
These parameters help balance comprehensive data display against performance considerations.

python/datafusion/html_formatter.py

Lines changed: 84 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,36 @@
2727
)
2828

2929

30+
def _validate_positive_int(value: Any, param_name: str) -> None:
31+
"""Validate that a parameter is a positive integer.
32+
33+
Args:
34+
value: The value to validate
35+
param_name: Name of the parameter (used in error message)
36+
37+
Raises:
38+
ValueError: If the value is not a positive integer
39+
"""
40+
if not isinstance(value, int) or value <= 0:
41+
msg = f"{param_name} must be a positive integer"
42+
raise ValueError(msg)
43+
44+
45+
def _validate_bool(value: Any, param_name: str) -> None:
46+
"""Validate that a parameter is a boolean.
47+
48+
Args:
49+
value: The value to validate
50+
param_name: Name of the parameter (used in error message)
51+
52+
Raises:
53+
TypeError: If the value is not a boolean
54+
"""
55+
if not isinstance(value, bool):
56+
msg = f"{param_name} must be a boolean"
57+
raise TypeError(msg)
58+
59+
3060
@runtime_checkable
3161
class CellFormatter(Protocol):
3262
"""Protocol for cell value formatters."""
@@ -91,6 +121,9 @@ class DataFrameHtmlFormatter:
91121
max_cell_length: Maximum characters to display in a cell before truncation
92122
max_width: Maximum width of the HTML table in pixels
93123
max_height: Maximum height of the HTML table in pixels
124+
max_memory_bytes: Maximum memory in bytes for rendered data (default: 2MB)
125+
min_rows_display: Minimum number of rows to display
126+
repr_rows: Default number of rows to display in repr output
94127
enable_cell_expansion: Whether to add expand/collapse buttons for long cell
95128
values
96129
custom_css: Additional CSS to include in the HTML output
@@ -108,6 +141,9 @@ def __init__(
108141
max_cell_length: int = 25,
109142
max_width: int = 1000,
110143
max_height: int = 300,
144+
max_memory_bytes: int = 2 * 1024 * 1024, # 2 MB
145+
min_rows_display: int = 20,
146+
repr_rows: int = 10,
111147
enable_cell_expansion: bool = True,
112148
custom_css: Optional[str] = None,
113149
show_truncation_message: bool = True,
@@ -124,6 +160,12 @@ def __init__(
124160
Maximum width of the displayed table in pixels.
125161
max_height : int, default 300
126162
Maximum height of the displayed table in pixels.
163+
max_memory_bytes : int, default 2097152 (2MB)
164+
Maximum memory in bytes for rendered data.
165+
min_rows_display : int, default 20
166+
Minimum number of rows to display.
167+
repr_rows : int, default 10
168+
Default number of rows to display in repr output.
127169
enable_cell_expansion : bool, default True
128170
Whether to allow cells to expand when clicked.
129171
custom_css : str, optional
@@ -139,7 +181,8 @@ def __init__(
139181
Raises:
140182
------
141183
ValueError
142-
If max_cell_length, max_width, or max_height is not a positive integer.
184+
If max_cell_length, max_width, max_height, max_memory_bytes,
185+
min_rows_display, or repr_rows is not a positive integer.
143186
TypeError
144187
If enable_cell_expansion, show_truncation_message, or use_shared_styles is
145188
not a boolean,
@@ -148,27 +191,17 @@ def __init__(
148191
protocol.
149192
"""
150193
# Validate numeric parameters
151-
152-
if not isinstance(max_cell_length, int) or max_cell_length <= 0:
153-
msg = "max_cell_length must be a positive integer"
154-
raise ValueError(msg)
155-
if not isinstance(max_width, int) or max_width <= 0:
156-
msg = "max_width must be a positive integer"
157-
raise ValueError(msg)
158-
if not isinstance(max_height, int) or max_height <= 0:
159-
msg = "max_height must be a positive integer"
160-
raise ValueError(msg)
194+
_validate_positive_int(max_cell_length, "max_cell_length")
195+
_validate_positive_int(max_width, "max_width")
196+
_validate_positive_int(max_height, "max_height")
197+
_validate_positive_int(max_memory_bytes, "max_memory_bytes")
198+
_validate_positive_int(min_rows_display, "min_rows_display")
199+
_validate_positive_int(repr_rows, "repr_rows")
161200

162201
# Validate boolean parameters
163-
if not isinstance(enable_cell_expansion, bool):
164-
msg = "enable_cell_expansion must be a boolean"
165-
raise TypeError(msg)
166-
if not isinstance(show_truncation_message, bool):
167-
msg = "show_truncation_message must be a boolean"
168-
raise TypeError(msg)
169-
if not isinstance(use_shared_styles, bool):
170-
msg = "use_shared_styles must be a boolean"
171-
raise TypeError(msg)
202+
_validate_bool(enable_cell_expansion, "enable_cell_expansion")
203+
_validate_bool(show_truncation_message, "show_truncation_message")
204+
_validate_bool(use_shared_styles, "use_shared_styles")
172205

173206
# Validate custom_css
174207
if custom_css is not None and not isinstance(custom_css, str):
@@ -183,6 +216,9 @@ def __init__(
183216
self.max_cell_length = max_cell_length
184217
self.max_width = max_width
185218
self.max_height = max_height
219+
self.max_memory_bytes = max_memory_bytes
220+
self.min_rows_display = min_rows_display
221+
self.repr_rows = repr_rows
186222
self.enable_cell_expansion = enable_cell_expansion
187223
self.custom_css = custom_css
188224
self.show_truncation_message = show_truncation_message
@@ -597,6 +633,9 @@ def configure_formatter(**kwargs: Any) -> None:
597633
**kwargs: Formatter configuration parameters like max_cell_length,
598634
max_width, max_height, enable_cell_expansion, etc.
599635
636+
Raises:
637+
ValueError: If any invalid parameters are provided
638+
600639
Example:
601640
>>> from datafusion.html_formatter import configure_formatter
602641
>>> configure_formatter(
@@ -606,6 +645,31 @@ def configure_formatter(**kwargs: Any) -> None:
606645
... use_shared_styles=True
607646
... )
608647
"""
648+
# Valid parameters accepted by DataFrameHtmlFormatter
649+
valid_params = {
650+
"max_cell_length",
651+
"max_width",
652+
"max_height",
653+
"max_memory_bytes",
654+
"min_rows_display",
655+
"repr_rows",
656+
"enable_cell_expansion",
657+
"custom_css",
658+
"show_truncation_message",
659+
"style_provider",
660+
"use_shared_styles",
661+
}
662+
663+
# Check for invalid parameters
664+
invalid_params = set(kwargs) - valid_params
665+
if invalid_params:
666+
msg = (
667+
f"Invalid formatter parameters: {', '.join(invalid_params)}. "
668+
f"Valid parameters are: {', '.join(valid_params)}"
669+
)
670+
raise ValueError(msg)
671+
672+
# Create and set formatter with validated parameters
609673
set_formatter(DataFrameHtmlFormatter(**kwargs))
610674

611675

0 commit comments

Comments
 (0)