Allow users to specify string matching case sensitivity behavior via configurations #257
Labels
documentation
Improvements or additions to documentation
effort - medium
mid-sized issue with average implementation time/difficulty
enhancement
New feature or request
extensibility
Increasing situations in which PyDough works
user feature
Adding a new user-facing feature/functionality
Currently, the default in sqlite is that string matching via
LIKE
is case insensitive, but users may want to make it case sensitive (and frankly, this makes sense as the default). To that end, the goal of this issue is to add a new config property to thePyDoughConfigs
class:case_sensitive_string_matching
(default=True). When True,LIKE
and similar operators (STARTSWITH
,ENDSWITH
,CONTAINS
are all case sensitive. For sqlite, this means that during conversion, the 2nd argument to like should be wrapped inCOLLATE BINARY
. For example, if I have PyDough code with a predicateX LIKE 'A%"
, that should becomeSELECT ... FROM ... WHERE X LIKE 'A%' COLLATE BINARY
, which is the following tree expression in sqlglot:When the config is False, like & similar functions should go back to being case-insensitive. We can keep the
COLLATE
but switch it fromBINARY
toNOCASE
, which is the default. For other dialects, we can think about how to handle the translation, b ut if a dialect's default is case-sensitive then all we need to do is callLOWER
on both arguments when case-insensitivity is requested According to ChatGPT, the dialects have the following behavior:The behavior of the config should be well documented in the usage guide and various README files.
The text was updated successfully, but these errors were encountered: