Skip to content

Commit

Permalink
Merge pull request #7 from es7s/dev
Browse files Browse the repository at this point in the history
dev
  • Loading branch information
delameter authored Nov 10, 2023
2 parents d1f9107 + 7879f8f commit d1c6b9d
Show file tree
Hide file tree
Showing 18 changed files with 118 additions and 53 deletions.
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ This project uses Semantic Versioning -- https://semver.org
Version list
===============

1.1.0 (Nov 23)
---------------
- 🌱 NEW: basic latin letters different look
- 🐞 FIX: `--version` option
- 🐞 FIX: `dev` environment initializing

1.0.0 (Nov 23)
---------------
- 🌱 NEW: `raw`, `type`, `typename` columns
Expand Down
57 changes: 39 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Basic usage
<div align="center">
<img alt="example001" width="49%" src="https://github.com/es7s/holms/assets/50381946/df486162-bd47-4a33-ba10-f9a5c932977c">
<img alt="example004" width="49%" src="https://github.com/es7s/holms/assets/50381946/872a0a88-f09c-41c7-9950-11d77b6eef10">
<img alt="example002" width="49%" src="https://github.com/es7s/holms/assets/50381946/7d3c899b-bc1a-4731-9680-35acd8c79b31">
<img alt="example002" width="49%" src="https://github.com/es7s/holms/assets/50381946/ef1e15b7-4652-475f-82a1-a546b425b41f">
<img alt="example003" width="49%" src="https://github.com/es7s/holms/assets/50381946/cdf8766d-a0ae-430c-8737-fa19b5678589">
</div>

Expand Down Expand Up @@ -234,7 +234,7 @@ dot/bullet code points:
the occurrence rate of each one:

<div align="center">
<img alt="example008" src="https://github.com/es7s/holms/assets/50381946/b7d5b1d2-a78f-4597-8ebb-38b99f733ea1">
<img alt="example008" src="https://github.com/es7s/holms/assets/50381946/20195bc3-115d-4eac-99dc-742ef74e5b88">
</div>

<details>
Expand Down Expand Up @@ -299,31 +299,52 @@ e.g. for frequency domain analysis:
When `--format` is specified exactly as a single `char` column: `--format=char`,
the application omits all the columns and prints the original file contents,
while highligting each character with a color that indicates its' Unicode
category. Note that ASCII control codes, as well as Unicode ones, are kept
category.

> Note that ASCII control codes, as well as Unicode ones, are kept
untouched and invisible.

<div align="center">
<img alt="example007" src="https://github.com/es7s/holms/assets/50381946/a29053d9-7da4-4050-a2e8-c8c943a32a2b">
<img alt="example007" src="https://github.com/es7s/holms/assets/50381946/7e1532ac-b313-49c1-8051-9a78ebeefe7b">
</div>

<details>
<summary>Plain text output</summary>

> sed chars.txt -nEe 150,159p |
  holms --format=char -S -
‰ ‱ ′ ″ ‴ ‵ ‶ ‷ ‸ ‹ › ※ ‼ ‽ ‾ ‿
⁀ ⁁ ⁂ ⁃ ⁄ ⁅ ⁆ ⁇ ⁈ ⁉ ⁊ ⁋ ⁌ ⁍ ⁎ ⁏
⁐ ⁑ ⁒ ⁓ ⁔ ⁕ ⁖ ⁗ ⁘ ⁙ ⁚ ⁛ ⁜ ⁝ ⁞
⁰ ⁱ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ⁿ
₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎
ₐ ₑ ₒ ₓ ₔ ₕ ₖ ₗ ₘ ₙ ₚ ₛ ₜ
₣ ₤ ₩ ₪ ₫ €
₱ ₲ ₳ ₵ ₹ ₺ ₽ ₿
⃐ ⃑ ⃒ ⃓ ⃔ ⃕ ⃖ ⃗ ⃘ ⃙ ⃚ ⃛ ⃜ ⃝ ⃞ ⃟
⃠ ⃡ ⃢ ⃣ ⃤ ⃥ ⃦ ⃧ ⃨ ⃩ ⃪ ⃫ ⃬ ⃭ ⃮ ⃯
> sed chars.txt -nEe 1,12p |
  holms --format=char -S -
! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~
¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

</details>


ASCII latin letters (`A-Za-z`) are colored in 50% gray color instead of regular
white on purpose — this can be extremely helpful when the task is to find
non-ASCII character(s) in an massive text of plain ASCII ones, or vice versa.

Below is a real example of broken characters which are the result of two
operations being applied in the wrong order: *UTF-8 decoding* and *URL %-based
unescaping*. This error is different from incorrect codepage selection errors,
which mess up the whole text or a part of it; all byte sequences are valid UTF-8
encoded code points, but the result differs from the origin and is completely
unreadable nevertheless.

<div align="center">
<img alt="example007" src="https://github.com/es7s/holms/assets/50381946/438e7f7a-4487-4a7c-98fb-bf269b4d0c96">
</div>


### ASCII C0 / C1 details

While developing the application I encountered strange (as it seemed to be at
Expand Down Expand Up @@ -358,7 +379,7 @@ Python's full transparent Unicode support, we don't even need to bother much
about the encodings and such):

<div align="center">
<img alt="example013" src="https://github.com/es7s/holms/assets/50381946/b8448375-552f-443f-a347-8e9741ec7cf6">
<img alt="example013" src="https://github.com/es7s/holms/assets/50381946/a6bdf9a2-fb19-4dbd-a507-f474c5c6a314">
</div>

<details>
Expand All @@ -384,7 +405,7 @@ The image below illustrates the color scheme developed for the app specifically,
to simplify distinguishing code points of one category from others.

<div align="center">
<img alt="example009" src="https://github.com/es7s/holms/assets/50381946/6d7f9372-fe20-4e04-a9ab-9018f60648df">
<img alt="example009" src="https://github.com/es7s/holms/assets/50381946/fd71430a-becb-4d9d-84e3-12900f4fc548">
</div>

Most frequently encountering control codes also have a unique character
Expand Down
4 changes: 2 additions & 2 deletions holms/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
# es7s/holms
# (c) 2023 A. Shavykin <[email protected]>
# ------------------------------------------------------------------------------
__version__ = "1.0.2"
__updated__ = "2023-11-08 06:09:57+03:00"
__version__ = "1.1.0"
__updated__ = "2023-11-10 05:18:07+03:00"
18 changes: 9 additions & 9 deletions holms/cmd.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,21 +55,21 @@ def invoke_legend(ctx: click.Context, **kwargs):
def invoke_version(ctx: click.Context, value: int, **kwargs):
# fmt: off
"""
███████
ₑₛ₇ₛ ║███╔═══╗███║
┏━┓┏━┓═╣███║ ║█████╠═┏━┓════┏━┓═┏━┓════════┏━━━━┓
┃ ┗┛ ┃ ║███║ ╚═╗███║ ┃ ┃ ┃ ┗┳┛ ┃ ╎ -┬-╵ ┃ ━━━┫
┃ ┏┓ ┃ ║█████║███║ ┃ ┗━━┓ ┃ ┣━┫ ┃ ╎ -┴-┐ ┣━━━ ┃
┗━┛┗━┛ ║███╔═╝ ║███║ ┗━━━━┛ ┗━┛ ┗━┛ '╌╌╌╌╵ ┗━━━━┛
╚═╗███████╔═╝
╚═══════╝
███████
ₑₛ₇ₛ ║███╔═══╗███║
┏━┓┏━┓═╣███║ ║█████╠═┏━┓════┏━┓═┏━┓════════┏━━━━┓
┃ ┗┛ ┃ ║███║▐█▌╗███║ ┃ ┃ ┃ ┗┳┛ ┃ ╎ -┬-╵ ┃ ━━━┫
┃ ┏┓ ┃ ║█████╔╝║███║ ┃ ┗━━┓ ┃ ┣━┫ ┃ ╎ -┴-┐ ┣━━━ ┃
┗━┛┗━┛ ║███╔═╝ ║███║ ┗━━━━┛ ┗━┛ ┗━┛ '╌╌╌╌╵ ┗━━━━┛
╚═╗███████╔═╝
╚═══════╝
"""
# fmt: on
if not value or ctx.resilient_parsing:
return
vfmt = lambda s: pt.Fragment(s, "green")
ufmt = lambda s: pt.Fragment(s, "gray")
regex = re.compile("(+)|([┃━┏┳┓┣╋┫┗┻┛]+)|([╎╌└╵╴╷┘,'┐┴┬-]+)|(.+?)")
regex = re.compile("([▐█▌]+)|([┃━┏┳┓┣╋┫┗┻┛]+)|([╎╌└╵╴╷┘,'┐┴┬-]+)|(.+?)")
group_colors = [pt.cv.DARK_RED, pt.NOOP_COLOR, pt.cv.GRAY, pt.cv.DARK_GOLDENROD]

def replace(m: re.Match) -> str:
Expand Down
6 changes: 6 additions & 0 deletions holms/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,11 @@ class Attribute(str, pt.ExtendedEnum):
Attribute.TYPE_NAME,
]


class Char(t.Generic[_CT]):
_ASCII_C0 = [*range(0x00, 0x20), 0x7F]
_ASCII_C1 = [*range(0x80, 0xA0)]
_ASCII_LETTERS = [*pt.char_range('A', 'Z'), *pt.char_range('a', 'z')]

def __init__(self, c: _CT):
if isinstance(c, int):
Expand Down Expand Up @@ -127,6 +129,10 @@ def is_ascii_c1(self) -> bool:
def is_ascii_cc(self) -> bool:
return self.is_ascii_c0 or self.is_ascii_c1

@property
def is_ascii_letter(self) -> bool:
return self._value in self._ASCII_LETTERS

def _get_name(self) -> str:
if self.is_surrogate:
return "UTF-16 SURROGATE"
Expand Down
7 changes: 5 additions & 2 deletions holms/writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ class CliWriter:
CPNUM_PFX_STYLE = pt.FrozenStyle(fg=pt.cv.GRAY_30)
CHAR_STYLE = pt.FrozenStyle(fg=0xFFFFFF, bg=0)
INVALID_STYLE = pt.FrozenStyle(fg=pt.cv.GRAY_30)
PLAIN_STYLE = pt.FrozenStyle(fg=pt.cv.GRAY_50)

COLUMN_SEPARATOR = " "
LTR_CHAR = "\u200e" # to normalize the output after possible RTL switch
Expand Down Expand Up @@ -490,19 +491,21 @@ def _render_char(self, row: Row) -> str:
override = self._OVERRIDE_CHARS.get(ord(value), None)
if override:
chr_st = override.style
if row.char.is_ascii_letter:
chr_st = cat_st = self.PLAIN_STYLE
chr_st = pt.merge_styles(self.CHAR_STYLE, overwrites=[self._styles._BASE, chr_st])

if self._setup.highlight_only_mode:
if row.char.is_ascii_c0:
return value
if row.char.is_surrogate or row.char.is_invalid:
return "▯"
value = "▯"
pad = " " * bool(unicodedata.combining(value))
return pt.render(pad + value, cat_st)

pad = ""

if override := self._OVERRIDE_CHARS.get(ord(value), None):
if override:
val_len = 1
value = override.char
elif (
Expand Down
Binary file modified misc/example002.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified misc/example007.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 13 additions & 11 deletions misc/example007.png.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
> sed chars.txt -nEe 150,159p |
> sed chars.txt -nEe 1,12p |
  holms --format=char -S -
‰ ‱ ′ ″ ‴ ‵ ‶ ‷ ‸ ‹ › ※ ‼ ‽ ‾ ‿
⁀ ⁁ ⁂ ⁃ ⁄ ⁅ ⁆ ⁇ ⁈ ⁉ ⁊ ⁋ ⁌ ⁍ ⁎ ⁏
⁐ ⁑ ⁒ ⁓ ⁔ ⁕ ⁖ ⁗ ⁘ ⁙ ⁚ ⁛ ⁜ ⁝ ⁞
⁰ ⁱ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ⁺ ⁻ ⁼ ⁽ ⁾ ⁿ
₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ₊ ₋ ₌ ₍ ₎
ₐ ₑ ₒ ₓ ₔ ₕ ₖ ₗ ₘ ₙ ₚ ₛ ₜ
₣ ₤ ₩ ₪ ₫ €
₱ ₲ ₳ ₵ ₹ ₺ ₽ ₿
⃐ ⃑ ⃒ ⃓ ⃔ ⃕ ⃖ ⃗ ⃘ ⃙ ⃚ ⃛ ⃜ ⃝ ⃞ ⃟
⃠ ⃡ ⃢ ⃣ ⃤ ⃥ ⃦ ⃧ ⃨ ⃩ ⃪ ⃫ ⃬ ⃭ ⃮ ⃯
! " # $ % & ' ( ) * + , - . /
0 1 2 3 4 5 6 7 8 9 : ; < = > ?
@ A B C D E F G H I J K L M N O
P Q R S T U V W X Y Z [ \ ] ^ _
` a b c d e f g h i j k l m n o
p q r s t u v w x y z { | } ~
¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
Binary file modified misc/example008.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified misc/example009.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified misc/example013.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added misc/example015.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions misc/example015.png.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
> holms -fchar -S ./tests/data/broken-utf8.txt

<path d="M4 12.5L9.5 18.5L20 6.5" stroke="#000" stroke-width="3" stroke-linecap
="round" stroke-linejoin="round"></path></svg></div></div><div class="CheckboxC
aptcha-Label"><span class="Text Text_weight_regular Text_typography_control-xxl
CheckboxCaptcha-LabelText">Я не ÑобоÑ</span><span class="Text Text_color_
control-secondary Text_weight_regular Text_typography_control-l CheckboxCaptcha
-SecondaryText">ÐажмиÑе, ÑÑÐ¾Ð±Ñ Ð¿ÑодолжиÑÑ</span></div></div><div
class="Text Text_color_ghost Text_weight_regular Text_typography_control-s Capt
chaLinks CheckboxCaptcha-Links"><div class="CaptchaLinks-Links"><a color="secon
dary" target="_blank" href="https://cloud.yandex.ru/services/smartcaptcha" aria
-describedby="service-link-description" class="Link Link_color_secondary Link_v
...
Binary file modified misc/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified misc/logo.xcf
Binary file not shown.
25 changes: 14 additions & 11 deletions scripts/make-readme-images.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,11 @@ function measure_width() {
# ------------------
__help() {
__SELF="$(basename "$0" | sed -Ee 's/\..+$//')"
echo "USAGE: $__SELF [OUTPUT_DIR]"
echo "USAGE: $__SELF [OUTPUT_DIR] [EXEC_DIR]"
echo
echo " Readme images renderer."
echo " Render readme images, place into OUTPUT_DIR. If specified,"
echo " use EXEC_DIR as custom path to 'holms'. Both can be provided"
echo " in a format relative to current working directory."
}
__holms() {
local -ir MIN_RESULT_W_PX=500
Expand All @@ -37,7 +39,7 @@ __holms() {
local -ir OFFSET_Y_PX=68
local -ir PADDING_X_CH=1

local -ir PP=14
local -ir PP=15

# shellcheck disable=SC2086
function invoke() {
Expand All @@ -62,9 +64,8 @@ __holms() {
hend+=$'\n '
fi


{ [[ -n $input ]] && printf %s $input || "${@:4}" ; } |
ES7S_PADBOX_HEADER="${hstart}${hopts}${hend}" padbox holms "$file" -cbS $opts
ES7S_PADBOX_HEADER="${hstart}${hopts}${hend}" padbox "${execpath:-holms}" "$file" -cbS $opts
}
function invoke_simple() { invoke "" "" "${1:?}" ; }
function invoke_cut() {
Expand All @@ -84,7 +85,7 @@ __holms() {
function p4() { invoke_simple '🌯👄🤡🎈🐳🐍' ; }
function p5() { invoke_limit -m ~/phpstan.txt ; }
function p6() { invoke "" "" "" sed ./tests/data/confusables.txt -Ee 's/^.|\t//g' -e 3620!d ; }
function p7() { invoke --format=char "" "" sed ./tests/data/chars.txt -nEe '150,159p' ; }
function p7() { invoke --format=char "" "" sed ./tests/data/chars.txt -nEe '1,12p' ; }
function p8() { invoke_limit -g ./tests/data/confusables.txt ; }
function p11() { invoke_limit -gg ./tests/data/confusables.txt ; }
function p12() { invoke_limit -ggg ./tests/data/confusables.txt ; }
Expand All @@ -93,6 +94,7 @@ __holms() {
function p13() { _p13 printf '\x80\x90\x9f' ; _p13 python -c 'print("\x80\x90\x9f", end="")' ; }
function _p13() { invoke "-u --format=raw,number,char,type,name" "" "" "$@" ; }
function p14() { invoke "" ./tests/data/specials ; }
function p15() { invoke_limit -fchar ./tests/data/broken-utf8.txt ; }

function measure() {
# arg: filepath
Expand Down Expand Up @@ -137,21 +139,24 @@ __holms() {
gmic "${1:?}" "${cmds[@]}"
}

local wd="${1:-.}"
local outdir=$(realpath "${1:-.}")
local execpath=$(realpath "${2:-./run}")

local tmpout=/tmp/pbc-out
local tmpimg=/tmp/pbc.png
local tmpimgpp=/tmp/pbcpp.png
local promptyn=$'\x1b[m Save? \x1b[34m[y/N/^C]\x1b[94m>\x1b[m '

[[ -n $wd ]] && { pushd "$wd" || exit 1 ; }
[[ -d "$outdir" ]] || { echo "ERROR: Dir does not exist: ${outdir@Q}" && exit 1 ; }
[[ -x "$execpath" ]] || execpath=

export ES7S_PADBOX_PAD_Y=0
export ES7S_PADBOX_PAD_X=$PADDING_X_CH
export ES7S_PADBOX_NO_CLEAR=true
export PAGER=

for fn in $(seq $PP) ; do
local imgout="./example$(printf %03d $fn).png"
local imgout="$outdir/example$(printf %03d $fn).png"
local txtout="$imgout.txt"
local prompt=$(printf '\x1b[33m[\x1b[93;1m%2d\x1b[;33m/\x1b[1m%2d\x1b[;33m]\x1b[m' "$fn" $PP)

Expand All @@ -176,8 +181,6 @@ __holms() {
fstat "$imgout" "$txtout"
fi
done

[[ -n $wd ]] && { popd || exit 1 ; }
}

[[ ${*/ /s} =~ (^| )-{,2}h(elp)?( |$) ]] && __help && exit
Expand Down
11 changes: 11 additions & 0 deletions tests/data/broken-utf8.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<path d="M4 12.5L9.5 18.5L20 6.5" stroke="#000" stroke-width="3" stroke-linecap
="round" stroke-linejoin="round"></path></svg></div></div><div class="CheckboxC
aptcha-Label"><span class="Text Text_weight_regular Text_typography_control-xxl
CheckboxCaptcha-LabelText">Я не ÑобоÑ</span><span class="Text Text_color_
control-secondary Text_weight_regular Text_typography_control-l CheckboxCaptcha
-SecondaryText">ÐажмиÑе, ÑÑÐ¾Ð±Ñ Ð¿ÑодолжиÑÑ</span></div></div><div
class="Text Text_color_ghost Text_weight_regular Text_typography_control-s Capt
chaLinks CheckboxCaptcha-Links"><div class="CaptchaLinks-Links"><a color="secon
dary" target="_blank" href="https://cloud.yandex.ru/services/smartcaptcha" aria
-describedby="service-link-description" class="Link Link_color_secondary Link_v
iew_captcha">SmartCaptcha by Yandex Cloud</a></div><button type="button" class=

0 comments on commit d1c6b9d

Please sign in to comment.