Skip to content

Commit

Permalink
Deployed bacff63 with MkDocs version: 1.6.1
Browse files Browse the repository at this point in the history
  • Loading branch information
drdv committed Dec 10, 2024
1 parent ecc394e commit 55c421e
Show file tree
Hide file tree
Showing 5 changed files with 29 additions and 27 deletions.
22 changes: 12 additions & 10 deletions blog/202412-python-strings/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1049,29 +1049,31 @@ <h2 id="code-units">Code units<a class="headerlink" href="#code-units" title="Pe
<pre class="mermaid"><code>flowchart TD
%%{init: {'themeVariables': {'title': 'My Flowchart Title'}}}%%

s["0x301"]
s["U+0301"]
s --&gt; utf8["UTF-8"]
s --&gt; utf16["UTF-16"]
s --&gt; utf32["UTF-32"]

C@{ shape: framed-circle, label: "Stop" }
C -.-&gt; utf8-1["0xCC"]
C -.-&gt; utf8-2["0x81"]
C -.-&gt; utf8-1["CC"]
C -.-&gt; utf8-2["81"]

utf8 -.-&gt; C
utf16 -.-&gt; utf16-1["0x0103"]
utf32 -.-&gt; utf16-2["0x01030000"]
utf16 -.-&gt; utf16-1["0103"]
utf32 -.-&gt; utf16-2["01030000"]

style utf8 stroke-width:2px,stroke-dasharray: 5 5
style utf16 stroke-width:2px,stroke-dasharray: 5 5
style utf32 stroke-width:2px,stroke-dasharray: 5 5</code></pre>
<ul>
<li>with a <code>utf-8</code> encoding there are two 8-bit code units (<code>0xCC</code> and <code>0x81</code>)</li>
<li>with a <code>utf-16</code> encoding there is one 16-bit code unit (<code>0x0103</code>)</li>
<li>with a <code>utf-32</code> encoding there is one 32-bit code unit (<code>0x01030000</code>).</li>
<li>with a <code>utf-16</code> encoding there is one 16-bit code unit</li>
<li>with a <code>utf-32</code> encoding there is one 32-bit code unit .</li>
</ul>
<p>Note that, in the above example, the code units for <code>utf-16</code> and <code>utf-32</code> are stored
using little-endian.</p>
<h3 id="four-string-encodings">Four string encodings<a class="headerlink" href="#four-string-encodings" title="Permanent link">#</a></h3>
<p>Python uses a different encoding in each of the four cases discussed above.</p>
<p>A different encoding is used in each of the four cases discussed above.</p>
<ul>
<li>case 1 <script type="math/tex">\left(\mu(s) < 2^7\right)</script>: ASCII (which is equivalent to UTF-8 in this range)</li>
<li>case 2 <script type="math/tex">\left(\mu(s) < 2^8\right)</script>: UCS1 (i.e., LATIN-1)</li>
Expand All @@ -1087,8 +1089,8 @@ <h3 id="four-string-encodings">Four string encodings<a class="headerlink" href="
</span><span id="__span-9-3"><a id="__codelineno-9-3" name="__codelineno-9-3" href="#__codelineno-9-3"></a><span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">mess</span><span class="p">)</span> <span class="o">==</span> <span class="mi">8</span>
</span><span id="__span-9-4"><a id="__codelineno-9-4" name="__codelineno-9-4" href="#__codelineno-9-4"></a><span class="k">assert</span> <span class="nb">ord</span><span class="p">(</span><span class="nb">max</span><span class="p">(</span><span class="n">mess</span><span class="p">))</span> <span class="o">==</span> <span class="mi">65039</span> <span class="c1"># case 3: 255 &lt; 65039 &lt; 65536</span>
</span><span id="__span-9-5"><a id="__codelineno-9-5" name="__codelineno-9-5" href="#__codelineno-9-5"></a>
</span><span id="__span-9-6"><a id="__codelineno-9-6" name="__codelineno-9-6" href="#__codelineno-9-6"></a><span class="c1"># [2:] removes the Byte Order Mark (little-endian)</span>
</span><span id="__span-9-7"><a id="__codelineno-9-7" name="__codelineno-9-7" href="#__codelineno-9-7"></a><span class="n">encoding</span> <span class="o">=</span> <span class="sa">b</span><span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">char</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">&quot;utf-16&quot;</span><span class="p">)[</span><span class="mi">2</span><span class="p">:]</span> <span class="k">for</span> <span class="n">char</span> <span class="ow">in</span> <span class="n">mess</span><span class="p">])</span><span class="o">.</span><span class="n">hex</span><span class="p">()</span>
</span><span id="__span-9-6"><a id="__codelineno-9-6" name="__codelineno-9-6" href="#__codelineno-9-6"></a><span class="c1"># utf-16-le stands for utf-16 with little-endian</span>
</span><span id="__span-9-7"><a id="__codelineno-9-7" name="__codelineno-9-7" href="#__codelineno-9-7"></a><span class="n">encoding</span> <span class="o">=</span> <span class="sa">b</span><span class="s1">&#39;&#39;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">char</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s2">&quot;utf-16-le&quot;</span><span class="p">)</span> <span class="k">for</span> <span class="n">char</span> <span class="ow">in</span> <span class="n">mess</span><span class="p">])</span><span class="o">.</span><span class="n">hex</span><span class="p">()</span>
</span><span id="__span-9-8"><a id="__codelineno-9-8" name="__codelineno-9-8" href="#__codelineno-9-8"></a>
</span><span id="__span-9-9"><a id="__codelineno-9-9" name="__codelineno-9-9" href="#__codelineno-9-9"></a><span class="k">assert</span> <span class="n">string_bytes</span><span class="p">(</span><span class="n">mess</span><span class="p">)</span> <span class="o">==</span> <span class="mi">74</span> <span class="c1"># 56 + (8 + 1) * 2</span>
</span><span id="__span-9-10"><a id="__codelineno-9-10" name="__codelineno-9-10" href="#__codelineno-9-10"></a><span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">encoding</span><span class="p">)</span> <span class="o">==</span> <span class="mi">32</span> <span class="c1"># i.e., 16 bytes as it is in hex</span>
Expand Down
10 changes: 5 additions & 5 deletions blog/202412-python-strings/verify_string_encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ class Pep393VerifyEncoding:
"""

def __init__(self, numb_test=100000):
self.numb_tests = numb_test
def __init__(self, numb_tests=100_000):
self.numb_tests = numb_tests
self.max_code_poit = 1114112

self.data = {
Expand Down Expand Up @@ -97,7 +97,7 @@ def verify_case2(self):

e1 = self.memory_dump(s)[56:-1].hex()
e2 = s.encode("latin-1").hex()
e3 = s.encode("utf-16")[2:].hex() # [2:] removes BOM
e3 = s.encode("utf-16-le").hex()
e3 = e3[:2] + e3[-4:-2]
assert e1 == e2
assert e1 == e3
Expand All @@ -112,7 +112,7 @@ def verify_case3(self):
s = chr(i1) + chr(i2)

e1 = self.memory_dump(s)[56:-2].hex()
e2 = s.encode("utf-16")[2:].hex() # [2:] removes BOM
e2 = s.encode("utf-16-le").hex()
assert e1 == e2

def verify_case4(self):
Expand All @@ -125,7 +125,7 @@ def verify_case4(self):
s = chr(i1) + chr(i2)

e1 = self.memory_dump(s)[56:-4].hex()
e2 = s.encode("utf-32")[4:].hex() # [4:] removes BOM
e2 = s.encode("utf-32-le").hex()
assert e1 == e2

@staticmethod
Expand Down
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

22 changes: 11 additions & 11 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,46 +2,46 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://drdv.github.io/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/cv/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/publications/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/202411-summer-walking-challenge/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/202412-python-strings/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/202411-summer-walking-challenge/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/202412-python-strings/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/archive/2024/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/category/python/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
<url>
<loc>https://drdv.github.io/blog/category/sports/</loc>
<lastmod>2024-12-09</lastmod>
<lastmod>2024-12-10</lastmod>
</url>
</urlset>
Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit 55c421e

Please sign in to comment.