diff --git a/exercises/practice/matching-brackets/.approaches/config.json b/exercises/practice/matching-brackets/.approaches/config.json new file mode 100644 index 0000000000..295cbca6e3 --- /dev/null +++ b/exercises/practice/matching-brackets/.approaches/config.json @@ -0,0 +1,30 @@ +{ + "introduction": { + "authors": [ + "colinleach", + "BethanyG" + ] + }, + "approaches": [ + { + "uuid": "449c828e-ce19-4930-83ab-071eb2821388", + "slug": "stack-match", + "title": "Stack Match", + "blurb": "Maintain context during stream processing by use of a stack.", + "authors": [ + "colinleach", + "BethanyG" + ] + }, + { + "uuid": "b4c42162-751b-42c8-9368-eed9c3f4e4c8", + "slug": "repeated-substitution", + "title": "Repeated Substitution", + "blurb": "Use substring replacement to iteratively simplify the string.", + "authors": [ + "colinleach", + "BethanyG" + ] + } + ] +} diff --git a/exercises/practice/matching-brackets/.approaches/introduction.md b/exercises/practice/matching-brackets/.approaches/introduction.md new file mode 100644 index 0000000000..0096dac45c --- /dev/null +++ b/exercises/practice/matching-brackets/.approaches/introduction.md @@ -0,0 +1,78 @@ +# Introduction + +The aim in this exercise is to determine whether opening and closing brackets are properly paired within the input text. + +These brackets may be nested deeply (think Lisp code) and/or dispersed among a lot of other text (think complex LaTeX documents). + +Community solutions fall into two main groups: + +1. Those which make a single pass or loop through the input string, maintaining necessary context for matching. +2. Those which repeatedly make global substitutions within the text for context. + + +## Single-pass approaches + +```python +def is_paired(input_string): + bracket_map = {"]" : "[", "}": "{", ")":"("} + tracking = [] + + for element in input_string: + if element in bracket_map.values(): + tracking.append(element) + if element in bracket_map: + if not tracking or (tracking.pop() != bracket_map[element]): + return False + return not tracking +``` + +The key in this approach is to maintain context by pushing open brackets onto some sort of stack (_in this case appending to a `list`_), then checking if there is a corresponding closing bracket to pair with the top stack item. + +See [stack-match][stack-match] approaches for details. + + +## Repeated-substitution approaches + +```python +def is_paired(text): + text = "".join(item for item in text if item in "()[]{}") + while "()" in text or "[]" in text or "{}" in text: + text = text.replace("()","").replace("[]", "").replace("{}","") + return not text +``` + +In this approach, we first remove any non-bracket characters, then use a loop to repeatedly remove inner bracket pairs. + +See [repeated-substitution][repeated-substitution] approaches for details. + + +## Other approaches + +Languages prizing immutibility are likely to use techniques such as `foldl()` or recursive matching, as discussed on the [Scala track][scala]. + +This is possible in Python, but can read as unidiomatic and will (likely) result in inefficient code if not done carefully. + +For anyone wanting to go down the functional-style path, Python has [`functools.reduce()`][reduce] for folds and added [structural pattern matching][pattern-matching] in Python 3.10. + +Recursion is not highly optimised in Python and there is no tail call optimization, but the default stack depth of 1000 should be more than enough for solving this problem recursively. + + +## Which approach to use + +For short, well-defined input strings such as those currently in the test file, repeated-substitution allows a passing solution in very few lines of code. +But as input grows, this method could become less and less performant, due to the multiple passes and changes needed to determine matches. + +The single-pass strategy of the stack-match approach allows for stream processing, scales linearly (_`O(n)` time complexity_) with text length, and will remain performant for very large inputs. + +Examining the community solutions published for this exercise, it is clear that many programmers prefer the stack-match method which avoids the repeated string copying of the substitution approach. + +Thus it is interesting and perhaps humbling to note that repeated-substitution is **_at least_** as fast in benchmarking, even with large (>30 kB) input strings! + +See the [performance article][article-performance] for more details. + +[article-performance]:https://exercism.org/tracks/python/exercises/matching-brackets/articles/performance +[pattern-matching]: https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching +[reduce]: https://docs.python.org/3/library/functools.html#functools.reduce +[repeated-substitution]: https://exercism.org/tracks/python/exercises/matching-brackets/approaches/repeated-substitution +[scala]: https://exercism.org/tracks/scala/exercises/matching-brackets/dig_deeper +[stack-match]: https://exercism.org/tracks/python/exercises/matching-brackets/approaches/stack-match diff --git a/exercises/practice/matching-brackets/.approaches/repeated-substitution/content.md b/exercises/practice/matching-brackets/.approaches/repeated-substitution/content.md new file mode 100644 index 0000000000..2c8c17d637 --- /dev/null +++ b/exercises/practice/matching-brackets/.approaches/repeated-substitution/content.md @@ -0,0 +1,67 @@ +# Repeated Substitution + + +```python +def is_paired(text): + text = "".join([element for element in text if element in "()[]{}"]) + while "()" in text or "[]" in text or "{}" in text: + text = text.replace("()","").replace("[]", "").replace("{}","") + return not text +``` + +In this approach, the steps are: + +1. Remove all non-bracket characters from the input string (_as done through the filter clause in the list-comprehension above_). +2. Iteratively remove all remaining bracket pairs: this reduces nesting in the string from the inside outwards. +3. Test for a now empty string, meaning all brackets have been paired. + + +The code above spells out the approach particularly clearly, but there are (of course) several possible variants. + + +## Variation 1: Walrus Operator within a Generator Expression + + +```python +def is_paired(input_string): + symbols = "".join(char for char in input_string if char in "{}[]()") + while (pair := next((pair for pair in ("{}", "[]", "()") if pair in symbols), False)): + symbols = symbols.replace(pair, "") + return not symbols +``` + +The second solution above does essentially the same thing as the initial approach, but uses a generator expression assigned with a [walrus operator][walrus] `:=` (_introduced in Python 3.8_) in the `while-loop` test. + + +## Variation 2: Regex Substitution in a While Loop + +Regex enthusiasts can modify the previous approach, using `re.sub()` instead of `string.replace()` in the `while-loop` test: + +```python +import re + +def is_paired(text: str) -> bool: + text = re.sub(r'[^{}\[\]()]', '', text) + while text != (text := re.sub(r'{\}|\[]|\(\)', '', text)): + continue + return not bool(text) +``` + + +## Variation 3: Regex Substitution and Recursion + + +It is possible to combine `re.sub()` and recursion in the same solution, though not everyone would view this as idiomatic Python: + + +```python +import re + +def is_paired(input_string): + replaced = re.sub(r"[^\[\(\{\}\)\]]|\{\}|\(\)|\[\]", "", input_string) + return not input_string if input_string == replaced else is_paired(replaced) +``` + +Note that solutions using regular expressions ran slightly *slower* than `string.replace()` solutions in benchmarking, so adding this type of complexity brings no benefit to this problem. + +[walrus]: https://martinheinz.dev/blog/79/ diff --git a/exercises/practice/matching-brackets/.approaches/repeated-substitution/snippet.txt b/exercises/practice/matching-brackets/.approaches/repeated-substitution/snippet.txt new file mode 100644 index 0000000000..0fa6d54abd --- /dev/null +++ b/exercises/practice/matching-brackets/.approaches/repeated-substitution/snippet.txt @@ -0,0 +1,5 @@ +def is_paired(text): + text = "".join(element for element in text if element in "()[]{}") + while "()" in text or "[]" in text or "{}" in text: + text = text.replace("()","").replace("[]", "").replace("{}","") + return not text \ No newline at end of file diff --git a/exercises/practice/matching-brackets/.approaches/stack-match/content.md b/exercises/practice/matching-brackets/.approaches/stack-match/content.md new file mode 100644 index 0000000000..9619e83390 --- /dev/null +++ b/exercises/practice/matching-brackets/.approaches/stack-match/content.md @@ -0,0 +1,50 @@ +# Stack Match + + +```python +def is_paired(input_string): + bracket_map = {"]" : "[", "}": "{", ")":"("} + stack = [] + + for element in input_string: + if element in bracket_map.values(): + stack.append(element) + if element in bracket_map: + if not stack or (stack.pop() != bracket_map[element]): + return False + return not stack +``` + +The point of this approach is to maintain a context of which bracket sets are currently "open": + +- If a left bracket is found, push it onto the stack (_append it to the `list`_). +- If a right bracket is found, **and** it pairs with the last item placed on the stack, pop the bracket off the stack and continue. +- If there is a mismatch, for example `'['` with `'}'` or there is no left bracket on the stack, the code can immediately terminate and return `False`. +- When all the input text is processed, determine if the stack is empty, meaning all left brackets were matched. + +In Python, a [`list`][concept:python/lists]() is a good implementation of a stack: it has [`list.append()`][list-append] (_equivalent to a "push"_) and [`lsit.pop()`][list-pop] methods built in. + +Some solutions use [`collections.deque()`][collections-deque] as an alternative implementation, though this has no clear advantage (_since the code only uses appends to the right-hand side_) and near-identical runtime performance. + +The default iteration for a dictionary is over the _keys_, so the code above uses a plain `bracket_map` to search for right brackets, while `bracket_map.values()` is used to search for left brackets. + +Other solutions created two sets of left and right brackets explicitly, or searched a string representation: + +```python + if element in ']})': +``` + +Such changes made little difference to code length or readability, but ran about 5-fold faster than the dictionary-based solution. + +At the end, success is an empty stack, tested above by using the [False-y quality][falsey] of `[]` (_as Python programmers often do_). + +To be more explicit, we could alternatively use an equality: + +```python + return stack == [] +``` + +[list-append]: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists +[list-pop]: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists +[collections-deque]: https://docs.python.org/3/library/collections.html#collections.deque +[falsey]: https://docs.python.org/3/library/stdtypes.html#truth-value-testing diff --git a/exercises/practice/matching-brackets/.approaches/stack-match/snippet.txt b/exercises/practice/matching-brackets/.approaches/stack-match/snippet.txt new file mode 100644 index 0000000000..571b6792a6 --- /dev/null +++ b/exercises/practice/matching-brackets/.approaches/stack-match/snippet.txt @@ -0,0 +1,8 @@ + bracket_map = {"]" : "[", "}": "{", ")":"("} + stack = [] + for element in input_string: + if element in bracket_map.values(): tracking.append(element) + if element in bracket_map: + if not stack or (stack.pop() != bracket_map[element]): + return False + return not stack \ No newline at end of file diff --git a/exercises/practice/matching-brackets/.articles/config.json b/exercises/practice/matching-brackets/.articles/config.json new file mode 100644 index 0000000000..0a5a8856a3 --- /dev/null +++ b/exercises/practice/matching-brackets/.articles/config.json @@ -0,0 +1,14 @@ +{ + "articles": [ + { + "uuid": "af7a43b5-c135-4809-9fb8-d84cdd5138d5", + "slug": "performance", + "title": "Performance", + "blurb": "Compare a variety of solutions using benchmarking data.", + "authors": [ + "colinleach", + "BethanyG" + ] + } + ] +} diff --git a/exercises/practice/matching-brackets/.articles/performance/code/Benchmark.py b/exercises/practice/matching-brackets/.articles/performance/code/Benchmark.py new file mode 100644 index 0000000000..1ca6ff0025 --- /dev/null +++ b/exercises/practice/matching-brackets/.articles/performance/code/Benchmark.py @@ -0,0 +1,184 @@ +import timeit + +import pandas as pd +import numpy as np +import requests + + +# ------------ FUNCTIONS TO TIME ------------- # + +def stack_match1(input_string): + bracket_map = {"]" : "[", "}": "{", ")":"("} + tracking = [] + + for element in input_string: + if element in bracket_map.values(): + tracking.append(element) + if element in bracket_map: + if not tracking or (tracking.pop() != bracket_map[element]): + return False + return not tracking + + +def stack_match2(input_string): + opening = {'[', '{', '('} + closing = {']', '}', ')'} + pairs = {('[', ']'), ('{', '}'), ('(', ')')} + stack = list() + + for char in input_string: + if char in opening: + stack.append(char) + elif char in closing: + if not stack or (stack.pop(), char) not in pairs: + return False + return stack == [] + + + +def stack_match3(input_string): + BRACKETS = {'(': ')', '[': ']', '{': '}'} + END_BRACKETS = {')', ']', '}'} + + stack = [] + + def is_valid(char): + return stack and stack.pop() == char + + for char in input_string: + if char in BRACKETS: + stack.append(BRACKETS[char]) + elif char in END_BRACKETS and not is_valid(char): + return False + + return not stack + + +def stack_match4(input_string): + stack = [] + r = {')': '(', ']': '[', '}': '{'} + for c in input_string: + if c in '[{(': + stack.append(c) + if c in ']})': + if not stack: + return False + if stack[-1] == r[c]: + stack.pop() + else: + return False + return not stack + + +from collections import deque +from typing import Deque + + +def stack_match5(text: str) -> bool: + """ + Determine if the given text properly closes any opened brackets. + """ + PUSH = {"[": "]", "{": "}", "(": ")"} + PULL = set(PUSH.values()) + + stack: Deque[str] = deque() + for char in text: + if char in PUSH: + stack.append(PUSH[char]) + elif char in PULL: + if not stack or char != stack.pop(): + return False + return not stack + + +def repeated_substitution1(text): + text = "".join(x for x in text if x in "()[]{}") + while "()" in text or "[]" in text or "{}" in text: + text = text.replace("()","").replace("[]", "").replace("{}","") + return not text + + +def repeated_substitution2(input_string): + symbols = "".join(c for c in input_string if c in "{}[]()") + while (pair := next((pair for pair in ("{}", "[]", "()") if pair in symbols), False)): + symbols = symbols.replace(pair, "") + return not symbols + + +import re + +def repeated_substitution3(str_: str) -> bool: + str_ = re.sub(r'[^{}\[\]()]', '', str_) + while str_ != (str_ := re.sub(r'{\}|\[]|\(\)', '', str_)): + pass + return not bool(str_) + + +def repeated_substitution4(input_string): + replaced = re.sub(r"[^\[\(\{\}\)\]]|\{\}|\(\)|\[\]", "", input_string) + return not input_string if input_string == replaced else repeated_substitution4(replaced) + +## ---------END FUNCTIONS TO BE TIMED-------------------- ## + +## -------- Timing Code Starts Here ---------------------## + +def get_file(url): + resp = requests.get(url) + return resp.text + +short = "\\left(\\begin{array}{cc} \\frac{1}{3} & x\\\\ \\mathrm{e}^{x} &... x^2 \\end{array}\\right)" +mars_moons = get_file("https://raw.githubusercontent.com/colinleach/PTYS516/main/term_paper/term_paper.tex") +galaxy_cnn = get_file("https://raw.githubusercontent.com/colinleach/proj502/main/project_report/report.tex") + + +# Input Data Setup +inputs = [short, mars_moons, galaxy_cnn] + +# Ensure the code doesn't terminate early with a mismatch +assert all([stack_match1(txt) for txt in inputs]) + +# #Set up columns and rows for Pandas Data Frame +col_headers = ['short', 'mars_moons', 'galaxy_cnn'] +row_headers = [ + "stack_match1", + "stack_match2", + "stack_match3", + "stack_match4", + "stack_match5", + + "repeated_substitution1", + "repeated_substitution2", + "repeated_substitution3", + "repeated_substitution4" + ] + +# Empty dataframe will be filled in one cell at a time later +df = pd.DataFrame(np.nan, index=row_headers, columns=col_headers) + +# Function List to Call When Timing +functions = [stack_match1, stack_match2, stack_match3, stack_match4, stack_match5, + repeated_substitution1, repeated_substitution2, repeated_substitution3, repeated_substitution4] + +# Run timings using timeit.autorange(). Run Each Set 3 Times. +for function, title in zip(functions, row_headers): + timings = [[ + timeit.Timer(lambda: function(data), globals=globals()).autorange()[1] / + timeit.Timer(lambda: function(data), globals=globals()).autorange()[0] + for data in inputs] for rounds in range(3)] + + # Only the fastest Cycle counts. + timing_result = min(timings) + + print(f'{title}', f'Timings : {timing_result}') + # Insert results into the dataframe + df.loc[title, col_headers[0]:col_headers[-1]] = timing_result + +# Save the data to avoid constantly regenerating it +df.to_feather('run_times.feather') +print("\nDataframe saved to './run_times.feather'") + +# The next bit is useful for `introduction.md` +pd.options.display.float_format = '{:,.2e}'.format +print('\nDataframe in Markdown format:\n') +print(df.to_markdown(floatfmt=".2e")) + diff --git a/exercises/practice/matching-brackets/.articles/performance/code/run_times.feather b/exercises/practice/matching-brackets/.articles/performance/code/run_times.feather new file mode 100644 index 0000000000..72ef312418 Binary files /dev/null and b/exercises/practice/matching-brackets/.articles/performance/code/run_times.feather differ diff --git a/exercises/practice/matching-brackets/.articles/performance/content.md b/exercises/practice/matching-brackets/.articles/performance/content.md new file mode 100644 index 0000000000..0d34786e73 --- /dev/null +++ b/exercises/practice/matching-brackets/.articles/performance/content.md @@ -0,0 +1,41 @@ +# Performance + +All functions were tested on three inputs, a short string from the exercise tests plus two scientific papers in $\LaTeX$ format. + +Python reported these string lengths: + +``` + short: 84 + mars_moons: 34836 + galaxy_cnn: 31468 +``` + +A total of 9 community solutions were tested: 5 variants of stack-match and 4 of repeated-substitution. +Full details are in the [benchmark code][benchmark-code], including URLs for the downloaded papers. +Results are summarized in the table below, with all times in seconds: + + +| | short | mars_moons | galaxy_cnn | +|:-----------------------|:--------:|:------------:|:------------:| +| stack_match4 | 1.77e-06 | 5.92e-04 | 5.18e-04 | +| stack_match2 | 1.71e-06 | 7.38e-04 | 6.64e-04 | +| stack_match3 | 1.79e-06 | 7.72e-04 | 6.95e-04 | +| stack_match5 | 1.70e-06 | 7.79e-04 | 6.97e-04 | +| stack_match1 | 5.64e-06 | 21.9e-04 | 39.7e-04 | +| repeated_substitution1 | 1.20e-06 | 3.50e-04 | 3.06e-04 | +| repeated_substitution2 | 1.86e-06 | 3.58e-04 | 3.15e-04 | +| repeated_substitution3 | 4.27e-06 | 14.0e-04 | 12.5e-04 | +| repeated_substitution4 | 4.96e-06 | 14.9e-04 | 13.5e-04 | + + +Overall, most of these solutions had fairly similar performance, and runtime scaled similarly with input length. + +There is certainly no evidence for either class of solutions being systematically better than the other. + +The slowest was `stack_match1`, which did a lot of lookups in dictionary. +keys and values. Searching instead in sets or strings gave a small but perhaps useful improvement. + +Among the repeated-substitution solutions, the first two used standard Python string operations, running slightly faster than the second two which use regular expressions. + + +[benchmark-code]: https://github.com/exercism/python/blob/main/exercises/practice/matching-brackets/.articles/performance/code/Benchmark.py diff --git a/exercises/practice/matching-brackets/.articles/performance/snippet.md b/exercises/practice/matching-brackets/.articles/performance/snippet.md new file mode 100644 index 0000000000..1479ad508e --- /dev/null +++ b/exercises/practice/matching-brackets/.articles/performance/snippet.md @@ -0,0 +1,3 @@ +# Performance + +Compare a variety of solutions using benchmarking data.