The 'awesome List' of Patterns *(the only repo of it's kind)
Please edit this draft wildy 🎉: Spreadsheet / Readme )
Please don't hesitate to add sublists for specific scientific fields such as DNA
Exploration: Patterns make bite-sized tools🍒🍟 ( Searching such list (,once well populated,) you will already have mentally defined the specific, regular scope of your goal (the task of identifying specific data / matches). That can be more efficient / versatile than searching Stack Overflow Answers or node.js NPM's. Yet each regex could also he an NPM or module/package in any language.
Regex are most common & most efficient to type. (Despite they are one of the oldest dicsiplines in programming to make sense of data, convert it, clean it or spell-check it. https://en.wikipedia.org/wiki/Regular_expression)
Regex are versatile, because they work in most languages and editors and many apps.
Common Data Formats² | match | replacement | comment/justify | extra³_ |
---|---|---|---|---|
ISBN | ||||
Youtube Video ID | [^\w-]([\w-]{11})[^\w-] |
$1 | 11char base64 is almost unique | (?:https?://|//)?(?:www\.|m\.)?youtu/?be(?:\.com)?/(?:embed/|v/|watch\/?\?[&\w=]{,128}v=([\w-]{11})[^\w-] |
Hashes, Public Keys, Signatures | match | |||
MD6 | ||||
SHA256, Bitcoin, ... | ||||
Convert | match | replacement | ||
MarkDown links to HTML links | \[([^\]]*)\]\(([^\)]*)\) |
<a href="$2">$1</a> |
||
this table2Javascript | |`([^\`]*)`\|`([^\`]*)`| | replaceAll(/$1/g, "$2").replaceAll("\|","|") |
||
Javascript 2 Python | ... | $1$2$3 |
² date, postal code, formal greeting, formal __, ...
³extra: match typos too (common) and/or add precision ('no false positives' / perfectionism)
[we could add 1000s]
Currently little of this is automated. Solutions such as Microsoft Power Automate for Desktop (Windows 11) want to change some of it.
A raw text / data source material - or a list or category of patterns - can sometimes be analyzed for similarities and thus be combined in one preprocessing step. i.e. Preprocessing might Reduce Input data by 90% already in a fraction of the time / CPU
word-lists, topics, frequencies, thesaurus, antonyms, semantic dictionaries, psychologic & sentiment dictionaries
wordnet, framenet, google ngrams, google trends, ....
~synonyms a|b AROUND(3) c|d -e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
https://ahrefs.com/blog/google-advanced-search-operators/
https://github.com/edobashira/speech-language-processing#readme
-
Others Lists // potential Sources: ___ , ___ , ___ , ____ ,____ , ( not a list but 1 repo per regex: https://github.com/regexhq, takes clicks to see one: regexhq/youtube-regex/index.js)
-
Compare: https://www.mulesoft.com/exchange/?type=connector&view=list (>10000 'enterprise converts')
Name | pattern match | replacement | language | comment/justify | raw³ | extra context/precision |
---|---|---|---|---|---|---|
regex | ||||||
css | ||||||