-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.xml
3432 lines (3432 loc) · 380 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Furffiblog</title><link>https://blog.furffisite.link/</link><description>Recent content on Furffiblog</description><generator>Hugo -- gohugo.io</generator><language>zh-cn</language><lastBuildDate>Sun, 20 Oct 2024 17:05:46 +0800</lastBuildDate><atom:link href="https://blog.furffisite.link/index.xml" rel="self" type="application/rss+xml"/><item><title>用 Python 写一个简单的单页面静态网站生成器</title><link>https://blog.furffisite.link/p/liveserver/</link><pubDate>Sun, 20 Oct 2024 17:05:46 +0800</pubDate><guid>https://blog.furffisite.link/p/liveserver/</guid><description><img src="https://files.furffisite.link/blogimg/20241020234336-67f7ee39e8f8df0b.jpg" alt="Featured image of post 用 Python 写一个简单的单页面静态网站生成器" /><p>我最初的任务是基于已有的模板给我之前参与的论文做一个 project page,选用的模板是 <a class="link" href="https://github.com/eliahuhorwitz/Academic-project-page-template" target="_blank" rel="noopener"
>eliahuhorwitz/Academic-project-page-template</a>。然而,这个模板是一个 html 文件,在涉及到列表(例如作者列表)时需要手动复制粘贴并逐一修改内容,这不仅繁琐,也不便于后续的修改(例如改展示格式,或是迁移到别的项目)。我偷懒的天性不允许我这样做(好麻烦啊😫),于是我萌生了一个自然的想法:将 html 修改为 <a class="link" href="https://jinja.palletsprojects.com/en/3.0.x/" target="_blank" rel="noopener"
>jinja</a> 模板,使用程序渲染模板得到最终的静态页面。</p>
<h2 id="第一版基础功能">第一版:基础功能
</h2><p>这时,我的想法比较简单:使用 Python 标准库里的 <a class="link" href="https://docs.python.org/3/library/tomllib.html" target="_blank" rel="noopener"
><code>tomllib (python&gt;=3.11)</code></a> 读取 toml 文件作为 context,并使用 <a class="link" href="https://jinja.palletsprojects.com/en/3.0.x/" target="_blank" rel="noopener"
>jinja</a> 读取模板进行渲染,最后将渲染结果输出到 html 文件里。代码实现也非常简洁,仅包含几行代码:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">tomllib</span><span class="o">,</span> <span class="nn">os</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">jinja2</span> <span class="kn">import</span> <span class="n">Environment</span><span class="p">,</span> <span class="n">FileSystemLoader</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">target_folder</span> <span class="o">=</span> <span class="s2">&#34;./dist&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">env</span> <span class="o">=</span> <span class="n">Environment</span><span class="p">(</span><span class="n">loader</span><span class="o">=</span><span class="n">FileSystemLoader</span><span class="p">(</span><span class="s2">&#34;./template&#34;</span><span class="p">),</span> <span class="n">autoescape</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&#34;./config.toml&#34;</span><span class="p">,</span> <span class="s1">&#39;rb&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">config_file</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">config</span> <span class="o">=</span> <span class="n">tomllib</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">config_file</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">page</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">get_template</span><span class="p">(</span><span class="s1">&#39;index.html&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">render</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">target_folder</span><span class="p">,</span> <span class="s2">&#34;index.html&#34;</span><span class="p">),</span> <span class="s1">&#39;w&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">page</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>作为输入的toml文件类似于这样:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-toml" data-lang="toml"><span class="line"><span class="cl"><span class="p">[</span><span class="nx">paper</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="nx">title</span> <span class="p">=</span> <span class="s2">&#34;...&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nx">conference</span> <span class="p">=</span> <span class="s2">&#34;...&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nx">abstract</span> <span class="p">=</span> <span class="s2">&#34;...&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="p">[[</span><span class="nx">paper</span><span class="p">.</span><span class="nx">authors</span><span class="p">]]</span>
</span></span><span class="line"><span class="cl"><span class="nx">name</span> <span class="p">=</span> <span class="s2">&#34;...&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">......</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>通过结合 TOML 配置文件与 Jinja 模板中的循环和判断语句,我们可以将编辑 html 的工作转换为编辑配置文件。相较于直接编辑 html,后者显然更为简便。</p>
<p>给以上的代码加上一些处理静态文件的逻辑,例如:</p>
<ol>
<li>构建前清除目标文件夹</li>
<li>构建时从指定文件夹复制文件到目标文件夹</li>
</ol>
<p>那怎么预览构建结果呢?我当时的做法是在 <code>./dist</code> 中使用以下指令启动一个 Python 的简易文件服务器:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ python3 -m http.server
</span></span></code></pre></td></tr></table>
</div>
</div><p>然后,再编写<a class="link" href="https://github.com/ai4co/reevo/blob/7fcf3878943629c08a38983bbab1b1822f5a924b/docs/publish.sh" target="_blank" rel="noopener"
>一点 bash 脚本</a>实现一键部署到 GitHub Pages,即可形成一个简易的单页面静态网站生成器,这便是<a class="link" href="https://github.com/ai4co/reevo/blob/27f4bcd57034bdf82dff76f91b2cf4c1575f4c6f/docs/build.py" target="_blank" rel="noopener"
>第一版</a>。</p>
<h2 id="第二版监测文件更改并自动构建">第二版:监测文件更改并自动构建
</h2><p>在开发过程中,我很快发现第一版存在一个显著问题:每次修改模板或配置后,都需要手动执行构建命令并刷新页面。
我针对前者的解决方案是使用 <code>watch</code> 指令每隔一秒执行一次构建脚本,例如:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ watch -d -n1 python3 build.py
</span></span></code></pre></td></tr></table>
</div>
</div><p>但是,我认为这样的解决方法不是很优雅,于是就想着,能不能由构建脚本持续监测文件更改,并在有需要的时候自动构建呢?答案当然是可以的。互联网对此给出了四种方案:</p>
<ol>
<li>使用 <a class="link" href="https://python-watchdog.readthedocs.io/en/stable/index.html" target="_blank" rel="noopener"
>watchdog 库</a></li>
<li>使用 <a class="link" href="https://pypi.org/project/inotify/" target="_blank" rel="noopener"
>inotify 库</a> (仅限 Linux)</li>
<li>计算文件散列值,判断是否修改</li>
<li>使用 os 库获取文件修改时间,判断是否修改</li>
</ol>
<p>前两种方法最为便捷,可以直接利用已有的库;第三种方法通过计算文件散列值来判断文件是否修改,虽然,能准确获知文件内容是否变化(其余三种只是监测更改文件的操作),但是,资源消耗较大;第四种方法通过获取文件修改时间来判断文件是否修改,易于理解且实现简单。最终,我选择了第四种方法:</p>
<p>首先,定义 <code>all_filepaths</code> 函数(generator)获取给定所有目录下的所有路径,这里需要返回文件夹是因为文件夹的最后修改时间可以反映该文件夹内是否有文件被移动或删除。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">all_filepaths</span><span class="p">(</span><span class="o">*</span><span class="n">paths</span><span class="p">):</span> <span class="c1"># 获取指定目录下的所有路径,包括文件和文件夹</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">paths</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">exists</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="c1"># 如果路径不存在,跳过</span>
</span></span><span class="line"><span class="cl"> <span class="k">pass</span>
</span></span><span class="line"><span class="cl"> <span class="k">elif</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">path</span><span class="p">):</span> <span class="c1"># 如果是文件,直接返回文件路径</span>
</span></span><span class="line"><span class="cl"> <span class="k">yield</span> <span class="n">path</span>
</span></span><span class="line"><span class="cl"> <span class="k">else</span><span class="p">:</span> <span class="c1"># 如果是文件夹,遍历文件夹内的所有文件和子文件夹</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">root</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">files</span> <span class="ow">in</span> <span class="n">os</span><span class="o">.</span><span class="n">walk</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">yield</span> <span class="n">root</span>
</span></span><span class="line"><span class="cl"> <span class="k">yield from</span> <span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span> <span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">files</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>接下来,定义一个函数 <code>build_on_change</code>,循环获取文件的最后更改时间,并判断文件是否被更改。如果检测到文件在上次构建之后有更改,则执行 <code>build</code> 函数重新构建页面。此处将 <code>last_update</code> 初值设为零,可以确保程序启动后会立刻触发一次构建。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">build_on_change</span><span class="p">(</span><span class="o">*</span><span class="n">paths</span><span class="p">,</span> <span class="o">**</span><span class="n">build_kwargs</span><span class="p">):</span> <span class="c1"># 监听文件更改并自动构建页面</span>
</span></span><span class="line"><span class="cl"> <span class="n">last_update</span> <span class="o">=</span> <span class="mf">0.0</span> <span class="c1"># 初始化最后更新时间为0,确保程序启动后立即触发一次构建</span>
</span></span><span class="line"><span class="cl"> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">filepath</span> <span class="ow">in</span> <span class="n">all_filepaths</span><span class="p">(</span><span class="o">*</span><span class="n">paths</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">last_modified_at</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">stat</span><span class="p">(</span><span class="n">filepath</span><span class="p">)</span><span class="o">.</span><span class="n">st_mtime</span>
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">last_modified_at</span> <span class="o">&gt;=</span> <span class="n">last_update</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">build</span><span class="p">(</span><span class="o">**</span><span class="n">build_kwargs</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">last_update</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"> <span class="k">break</span>
</span></span><span class="line"><span class="cl"> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 每隔1秒检查一次文件更改</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>通过以上代码,仅需 20 行即可实现 Python 原生的自动监测文件更改的功能(当然,调库更快,但是需要安装依赖)。
最后,只需再编写少量代码,利用 <code>sys.argv</code> 获取命令行参数,并在程序入口处区分<strong>单次构建</strong>和<strong>自动更新</strong>(预览)两种模式,这便是第二版。</p>
<h2 id="第三版集成静态文件服务器自动刷新页面">第三版:集成静态文件服务器+自动刷新页面
</h2><p>尽管第二版解决了“手动执行构建命令并刷新页面”的部分问题,但保存后仍需手动刷新页面,这一痛点依然存在。第三版的目标便是彻底解决这一问题。</p>
<p>先把每次需要手动启动的 <code>http.server</code> 集成到脚本里来:直接修改<a class="link" href="https://docs.python.org/3/library/http.server.html#http.server.SimpleHTTPRequestHandler.do_GET" target="_blank" rel="noopener"
><code>http.server</code> 官方文档里的示例代码</a>,然后加上 <a class="link" href="https://docs.python.org/3/library/threading.html#threading.Thread" target="_blank" rel="noopener"
>Threading 标准库</a>使其以守护线程的方式运行(主程序退出时自动结束,无需手动终止)。这仅需 10 行代码(不包括空行):</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">start_server_daemon</span><span class="p">(</span><span class="n">ip</span><span class="o">=</span><span class="s2">&#34;127.0.0.1&#34;</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">8123</span><span class="p">,</span> <span class="n">directory</span><span class="o">=</span><span class="s2">&#34;./dist&#34;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="kn">from</span> <span class="nn">threading</span> <span class="kn">import</span> <span class="n">Thread</span>
</span></span><span class="line"><span class="cl"> <span class="kn">import</span> <span class="nn">http.server</span><span class="o">,</span> <span class="nn">socketserver</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="k">def</span> <span class="nf">start_server</span><span class="p">():</span> <span class="c1"># 创建一个简单的HTTP服务器,绑定到指定的IP和端口</span>
</span></span><span class="line"><span class="cl"> <span class="k">with</span> <span class="n">socketserver</span><span class="o">.</span><span class="n">TCPServer</span><span class="p">((</span><span class="n">ip</span><span class="p">,</span> <span class="n">port</span><span class="p">),</span> <span class="n">http</span><span class="o">.</span><span class="n">server</span><span class="o">.</span><span class="n">SimpleHTTPRequestHandler</span><span class="p">)</span> <span class="k">as</span> <span class="n">httpd</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Serving live preview at http://</span><span class="si">{</span><span class="n">ip</span><span class="si">}</span><span class="s2">:</span><span class="si">{</span><span class="n">port</span><span class="si">}</span><span class="s2">/&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">httpd</span><span class="o">.</span><span class="n">serve_forever</span><span class="p">()</span> <span class="c1"># 启动服务器并保持运行</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"> <span class="c1"># 创建一个守护线程,当主线程退出时,守护线程也会自动退出</span>
</span></span><span class="line"><span class="cl"> <span class="n">t</span> <span class="o">=</span> <span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">start_server</span><span class="p">,</span> <span class="n">daemon</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</span></span></code></pre></td></tr></table>
</div>
</div><blockquote>
<p>不过这样的实现有一个问题:在主程序终止导致守护线程退出后,占用的端口并不会正常释放。按理来说第六行的 <code>with</code> 语句会在退出时自动处理端口的释放,可能在守护线程模式下就不行了?</p>
</blockquote>
<p>接下来,我们探讨如何实现页面的自动刷新功能。换言之,服务器如何通知浏览器内容已更新呢?
<a class="link" href="https://docs.python.org/3/library/http.server.html#http.server.SimpleHTTPRequestHandler.do_GET" target="_blank" rel="noopener"
><code>http.server</code> 的官方文档</a>内同样提供了答案:在其提供文件时,会在 response header 中加入<a class="link" href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified" target="_blank" rel="noopener"
><code>Last-Modified</code> header</a>,这个header的时间戳来源于文件系统记录的最后更改时间。由于我的程序只会全量构建,所以 <code>dist</code> 目录内的所有文件的最后修改时间必定与最后一次构建的时间相同。</p>
<p>基于此,我们只需在浏览器中用 JavaScript 轮询服务器上的某个文件(我这里是在自动更新模式下额外创建一个空文件),获取其 header 内的 <code>Last-Modified</code>,判断页面是否更新,如果更新了就调用 <code>window.location.reload()</code> 刷新页面即可。</p>
<p>Python 部分只需要创建上述的空文件,以及在模板的 context 中加入是否启用自动刷新的参数即可。模板内需要加入如下 javascript 代码:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="cl"><span class="p">(</span><span class="kd">function</span><span class="p">(){</span> <span class="c1">// 防止命名冲突
</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="kd">let</span> <span class="nx">last_updated</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Date</span><span class="p">();</span> <span class="c1">// 初始化为当前时间
</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="kr">const</span> <span class="nx">dummy_file_path</span> <span class="o">=</span> <span class="s2">&#34;/{{preview_mode.dummy_file_path}}&#34;</span><span class="p">;</span> <span class="c1">// 模板渲染时填充空文件名称
</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="kr">const</span> <span class="nx">refresh_on_change</span> <span class="o">=</span> <span class="p">()</span> <span class="p">=&gt;</span> <span class="p">{</span> <span class="c1">// 定义函数
</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="nx">fetch</span><span class="p">(</span><span class="nx">dummy_file_path</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">res</span> <span class="p">=&gt;</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="kr">const</span> <span class="nx">last_modified</span> <span class="o">=</span> <span class="k">new</span> <span class="nb">Date</span><span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">headers</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="s2">&#34;Last-Modified&#34;</span><span class="p">));</span>
</span></span><span class="line"><span class="cl"> <span class="k">if</span><span class="p">(</span><span class="nx">last_updated</span> <span class="o">&lt;</span> <span class="nx">last_modified</span><span class="p">){</span>
</span></span><span class="line"><span class="cl"> <span class="nb">window</span><span class="p">.</span><span class="nx">location</span><span class="p">.</span><span class="nx">reload</span><span class="p">();</span> <span class="c1">// 刷新页面
</span></span></span><span class="line"><span class="cl"><span class="c1"></span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">})</span>
</span></span><span class="line"><span class="cl"> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="nx">setInterval</span><span class="p">(</span><span class="nx">refresh_on_change</span><span class="p">,</span> <span class="mi">1000</span><span class="p">);</span> <span class="c1">// 每一秒检查一次是否更新
</span></span></span><span class="line"><span class="cl"><span class="c1"></span><span class="p">})();</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>当然,这段 JavaScript 代码需要放在 <code>&lt;script&gt;</code> 标签内,同时在模板中确保只有自动更新启用时才会渲染这个标签。这便是第三版。</p>
<h2 id="第四版最小化生成的文件">第四版:最小化生成的文件
</h2><p>通过模板引擎渲染生成的 html 文件,常常会包含大量的空白字符、引号和注释等冗余元素。这些元素虽然在开发阶段有助于提升代码的可读性与维护性,但在实际网页加载过程中,它们却显著增加了文件的体积,进而拖慢了网页的加载速度。为了优化页面加载性能,我们有必要在构建完成后,对这些冗余字符进行精简处理,以缩减文件大小。</p>
<p>这一优化策略同样适用于 css 文件,不同的是,css 文件中的冗余字符往往遵循特定的模式,可以直接运用正则表达式进行匹配与去除,我的代码如下(这段代码并不全面,但是它已经能显著缩小文件的体积,于我而言,这样就够了):</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">re</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">minify_css</span><span class="p">(</span><span class="n">filepath</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">content</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span><span class="s1">&#39;r&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"> <span class="n">minimized</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s1">&#39; *\n *|/\*.*?\*/&#39;</span><span class="p">,</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="n">content</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">minimized</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="sa">r</span><span class="s1">&#39;; *(?=})|(?&lt;=:) +| +(?={)&#39;</span><span class="p">,</span> <span class="s1">&#39;&#39;</span><span class="p">,</span> <span class="n">minimized</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">minimized</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="s1">: reduced from </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">content</span><span class="p">)</span><span class="si">}</span><span class="s1"> Bytes to </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">minimized</span><span class="p">)</span><span class="si">}</span><span class="s1"> Bytes&#39;</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>而 html 的规则复杂,有的空格和换行符的移除会影响页面的呈现效果(例如 <code>&lt;pre&gt;</code> 标签的内容)。因此,在处理 html 文件时,相比于自己实现,我决定使用 <a class="link" href="https://htmlmin.readthedocs.io/en/latest/" target="_blank" rel="noopener"
>htmlmin 库</a>的实现。代码如下,注意其中的 <code>remove_all_empty_space=True</code> 也会在一定情况下影响页面的呈现,例如,两个 <code>&lt;span&gt;</code> 标签之间的空格也会被移除,对于这种情况需要去掉这个参数,或是在 css 或 <code>style</code> 属性中加入 <a class="link" href="https://developer.mozilla.org/zh-CN/docs/Web/CSS/margin" target="_blank" rel="noopener"
><code>margin-right</code></a> 创造空隙。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">htmlmin.main</span> <span class="kn">import</span> <span class="n">minify</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">minify_html</span><span class="p">(</span><span class="n">filepath</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">content</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span><span class="s1">&#39;r&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"> <span class="n">minimized</span> <span class="o">=</span> <span class="n">minify</span><span class="p">(</span><span class="n">content</span><span class="p">,</span> <span class="n">remove_comments</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">remove_all_empty_space</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="s1">&#39;w&#39;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">minimized</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;</span><span class="si">{</span><span class="n">filepath</span><span class="si">}</span><span class="s1">: reduced from </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">content</span><span class="p">)</span><span class="si">}</span><span class="s1"> Bytes to </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">minimized</span><span class="p">)</span><span class="si">}</span><span class="s1"> Bytes&#39;</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>至于 JavaScript 文件,由于在这个项目中我自己写的 js 代码很少,且都已经嵌入 html 文件,而外部库都是通过 CDN 引入的,加载的大多是优化过的文件,因此我并没有考虑对 js 文件做最小化。</p>
<p>最后,在构建后利用之前写的 <code>all_filepaths</code> 函数遍历 <code>dist</code> 文件夹内的所有文件,根据扩展名判断对应的最小化方法,然后调用相应的函数最小化即可。因为这里的最小化是为了部署服务的,所以在预览模式下不会执行这段代码。</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span><span class="lnt">7
</span><span class="lnt">8
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">all_filepaths</span><span class="p">(</span><span class="n">target_folder</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="n">path</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="k">continue</span>
</span></span><span class="line"><span class="cl"> <span class="n">extension</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">basename</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">rsplit</span><span class="p">(</span><span class="s2">&#34;.&#34;</span><span class="p">,</span><span class="mi">1</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="c1"># 获取文件扩展名</span>
</span></span><span class="line"><span class="cl"> <span class="k">if</span> <span class="n">extension</span> <span class="o">==</span> <span class="s2">&#34;css&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">minify_css</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="k">elif</span> <span class="n">extension</span> <span class="ow">in</span> <span class="p">{</span><span class="s1">&#39;html&#39;</span><span class="p">,</span> <span class="s1">&#39;svg&#39;</span><span class="p">,</span> <span class="s1">&#39;xml&#39;</span><span class="p">}:</span>
</span></span><span class="line"><span class="cl"> <span class="n">minify_html</span><span class="p">(</span><span class="n">path</span><span class="p">)</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>综合上述所有功能,便得到了第四版,也是<a class="link" href="https://github.com/ai4co/reevo/blob/30e87a43e05bce6a8ccb42b9d453a89f6d4a1740/docs/build.py" target="_blank" rel="noopener"
>目前的最终版</a>。</p>
<h2 id="总结">总结
</h2><p>回顾这个流程,最初我只是想偷个懒,没想到最终付出的时间比原本不偷懒还要多。然而,考虑到在这个过程中我的收获,我觉得这个时间花的还是很值得的。例如,我之前在用现成的 live server 时,从未想过这个功能的实现可以如此简单(当然我的实现也比较糙就是了)。同时,这次的经验、编写的脚本和模板,到以后做单页面静态网站或者新的论文的 project page 时也都可以派上用场。</p>
<p>当然,我也知道市面上有很多现成的静态网站生成工具,例如基于 Python 的 <a class="link" href="https://docs.getpelican.com/en/latest/" target="_blank" rel="noopener"
>Pelican</a>,基于 Ruby 的 <a class="link" href="https://jekyllrb.com/" target="_blank" rel="noopener"
>Jekyll</a> 和我构建这个博客所使用的基于 Golang 的 <a class="link" href="https://gohugo.io/" target="_blank" rel="noopener"
>hugo</a>,问题是它们都是针对多页面的内容向网站,它们的项目结构也比较复杂,对于我的单页面、仅有两张图片的 project page 而言,确实显得有些大材小用。因此,我选择自己动手造轮子,过程虽然曲折,但收获颇丰。</p></description></item><item><title>【读论文】An End-to-End Submodular Framework for Data-Efficient In-Context Learning</title><link>https://blog.furffisite.link/p/read-papers/div-s3/</link><pubDate>Mon, 08 Jul 2024 12:33:03 +0800</pubDate><guid>https://blog.furffisite.link/p/read-papers/div-s3/</guid><description><img src="https://files.furffisite.link/blogimg/20240709193826-409d3c5c4adf797af0b0d55992852fce-5194f.jpg" alt="Featured image of post 【读论文】An End-to-End Submodular Framework for Data-Efficient In-Context Learning" /><p>This blog post is completely written in English, just for practicing my English writing skills. Please let me know if there is any suggestions.</p>
<hr>
<h2 id="basic-information">Basic Information
</h2><ul>
<li>Title: <strong>An End-to-End Submodular Framework for Data-Efficient In-Context Learning</strong> <sup><a id='ref-cite1-1' href='#cite1'>[1]</a></sup></li>
<li>Authors: Lilly Kumari, Shengjie Wang, Arnav Das, Tianyi Zhou, Jeff Bilmes</li>
<li>Conference: <a class="link" href="https://2024.naacl.org/" target="_blank" rel="noopener"
>NAACL 2024</a></li>
<li>Open Access: <a class="link" href="https://aclanthology.org/2024.findings-naacl.209" target="_blank" rel="noopener"
>https://aclanthology.org/2024.findings-naacl.209</a></li>
</ul>
<h2 id="the-problem-to-solve">The Problem to Solve
</h2><p><strong>The problem of annotating, selecting and ordering in-context exemplars for Large Language Models (LLMs).</strong></p>
<p>The In-Context Learning (ICL) performance of LLMs is largely affected by the selection and ordering of in-context exemplars, which makes it necessary to develop a methodology to select the in-context exemplars according to the query.</p>
<h2 id="the-proposed-algorithm">The Proposed Algorithm
</h2><p>In reality, the most of the data we have are unannotated data, <em>i.e.</em> queries without answers, some recent methods <sup><a id='ref-cite2-1' href='#cite2'>[2]</a></sup> suggest to select and annotate a subset from an unannotated huge dataset
$\mathcal X_\mathrm{unlabeled}$ to form a small annotated dataset $\mathcal X_\mathrm{labeled}$, and to choose the in-context exemplars from this subset during evaluation. This passage followed this paradigm, and proposed <strong>Div-S3</strong>, a two-stage data-efficient learning-free framework for exemplar selection.</p>
<ul>
<li>The first stage (<strong>Div</strong>): Exemplar Annotation, from $\mathcal X_\mathrm{unlabeled}$ to $\mathcal X_\mathrm{labeled}$.</li>
<li>The second stage (<strong>S3</strong>): Exemplar Retrieval, from $\mathcal X_\mathrm{labeled}$ and query set $Q$ to get in-context exemplars $\mathcal D_\mathrm{context}$.</li>
</ul>
<h3 id="exemplar-annotation">Exemplar Annotation
</h3><p><strong>Problem Definition.</strong>
The first stage of <em>Div-S3</em>, as mentioned above, selects from unannotated data $\mathcal X_\mathrm{unlabeled}$ and let <em>homo sapiens</em> do the annotations to make an informative and diverse subset $\mathcal X_\mathrm{labeled}$, with the constraint of $|\mathcal X_\mathrm{labeled}| \ll |\mathcal X_\mathrm{unlabeled}|$. This is a process similar to one iteration of pool-based active learning. With the objective of diversity and reducing redundancy, this can also be formulated as a set optimization problem <sup><a id='ref-cite3-1' href='#cite3'>[3]</a></sup> as follows:
$$\max_{A\subset \mathcal X_\mathrm{unlabeled},|A|\le k} f(A),$$
where $k$ is a hyperparameter representing the annotation budget, $f:~2^{\mathcal X_\mathrm{unlabeled}}\rightarrow\mathbb R$ is a submodular function mapping all subsets of $\mathcal X_\mathrm{unlabeled}$ to the respective score of each subset. The higher the score, the better the subset is.</p>
<blockquote>
<p>The submodular function must satisfy the properties of monotone and decreasing marginal profit, <em>i.e.</em>, respectively,
$$\begin{aligned}\forall A\subseteq T, \quad&amp;f(A) \le f(T), \\ \forall A\subset T,~ \forall x\not\in T, \quad&amp;f(\{x\}|A) \ge f(\{x\}|T), \end{aligned}$$
where $f(\{x\}|T) = f(\{x\}\cup T) - f(T)$ is the marginal profit of adding $\{x\}$ to $T$.</p>
</blockquote>
<p>The authors set the submodular function to be <em>facility location</em>, which is defined as
$$f(A) = \sum_{s_i\in\mathcal X_\mathrm{unlabeled}}\max_{s_j\in A}\mathrm{sim}(s_i,s_j),$$
where $\mathrm{sim}(\cdot,\cdot)$ denotes the cosine similarity of two queries&rsquo; embeddings, generated by sentence-BERT <sup><a id='ref-cite4-1' href='#cite4'>[4]</a></sup>.</p>
<p><strong>Intuitive Interpretation.</strong>
This problem can be intuitively and geometrically interpreted as minimizing the sum of all distances between each node and its nearest selected nodes, which is a k-medoids problem and is pretty similar to k-means clustering.</p>
<p><strong>Solution.</strong>
The authors of this paper use a greedy algorithm proposed by Nemhauser <em>et al.</em> <sup><a id='ref-cite5-1' href='#cite5'>[5]</a></sup>, called Greedy Submodular Maximization (GSM). Its fundamental idea is to greedily select the item with the maximum marginal profit for $k$ iterations, <em>i.e.</em>
$$A\leftarrow A\cup \{\argmax_{v\in V, v\not\in A} f(A\cup \{v\}) - f(A) \}.$$
Considering the time spent in calculating $f$, this is an algorithm with the time complexity of $O((nk)^2)$. The paper says it&rsquo;s $O(n^2+nk)$ but I beg to differ as it seems to have ignored the time complexity of calculating $f$, which is $\Theta(nm)$ with pre-calculated similarity matrix, where $m=0,1,\ldots,k-1$ in the iterations. But I agree with the authors that the algorithm can be accelerated by techniques like caching and priority queues, <em>etc.</em></p>
<h3 id="exemplar-retrieval">Exemplar Retrieval
</h3><p>Exemplar retrieval is intended to find the best in-context exemplars $\mathcal D_\mathrm{context}$ for the given query set $Q$ from $\mathcal X_\mathrm{labeled}$. Previous similarity-based methods usually yield in-context exemplars with redundant information. To reduce such redundancy, the authors formalize exemplar retrieval as a conditional submodular subset selection problem. The purpose of this stage is to &ldquo;obtain a set of exemplars that are not only relevant to the test query but also encompass diverse aspects crucial for aiding the LLM in the target task&rdquo; <sup><a id='ref-cite1-2' href='#cite1'>[1]</a></sup>. They come up with a two-phase method called Submodular Span Summarization (S3), which was published prior to this paper <sup><a id='ref-cite6-1' href='#cite6'>[6]</a></sup>.</p>
<h4 id="phase-1-of-s3">Phase 1 of S3
</h4><p><strong>Problem Definition.</strong>
This phase targets to select a relatively large subset relevant to the query set $Q$, but might be redundant.
The original problem might be difficult to solve, so the paper considered solving the dual problem of it:
$$\min_{A\subseteq V - Q,|A|\ge k_1}f(A|Q),$$
which is a cardinality-constrained submodular minimization problem. The authors use $m_Q(A) = \sum_{a\in A}f(a|Q)$ to approximate $f(A|Q)$, which is an upper bound of $f(A|Q)$.</p>
<p><strong>Solution.</strong>
Although the paper doesn&rsquo;t state it clearly, I think the algorithm to solve this problem, given the approximation, is to select $k_1$ exemplars with minimal values of $f(a|Q)$ from $A$. And this algorithm matches the time complexity $O(k+k\log k_1)$ stated in Appendix B, if ignoring the time for calculating $f$.</p>
<h4 id="phase-2-of-s3">Phase 2 of S3
</h4><p>This stage is intended to select the most representative exemplars from the result of phase 1, and is mathematically the same as <a class="link" href="#exemplar-annotation" >Exemplar Annotation</a>:
$$\max_{A\subset A_Q,|A|\le k_2} f(A),$$
where $A_Q$ is the set of selected exemplars from phase 1.
Optionally, we can apply a knapsack constraint on the problem, and solve it with a modified version of GSM.</p>
<h2 id="comments">Comments
</h2><p>The authors proposed a learning-free framework named <strong>Div-S3</strong> to select in-context exemplars from unlabeled datasets, and evaluated its effectiveness with ablation experiments across 7 Natural Language Processing (NLP) tasks and 5 LLMs. Although the algorithm doesn&rsquo;t take ordering into consideration, the paper proves its insensitiveness to the order of exemplars.</p>
<p>However, from my perspective, I still have some problems related to the paper:</p>
<ul>
<li>In Exemplar Retrieval stage, what&rsquo;s the meaning of computing $f(Q)$ against all unlabeled data $\mathcal X_\mathrm{unlabeled}$. Given that $\mathcal X_\mathrm{labeled}$ is a representative subset of $\mathcal X_\mathrm{unlabeled}$, would it be computationally better to calculate $f(Q)$ and $f(a|Q)$ only against $\mathcal X_\mathrm{labeled}\cup Q$?</li>
<li>What&rsquo;s the purpose of dealing with the queries as a whole (query set $Q$), instead of retrieving the best exemplars for each query $q$?</li>
<li>The experiments didn&rsquo;t cover the comparison of Div-S3 with some recent algorithms, especially the learning-based ones.</li>
<li>Will the algorithm averagely perform better if we handcraft a rule to arrange the selected exemplars? Also, in Figure 3, the variances do not seem to be small.</li>
</ul>
<h2 id="references">References
</h2><style>
.bibliography { display: table; font-size: medium; line-height: normal; }
.bib-item { display: table-row; }
.bib-item > :first-child { display: table-cell; padding-right: .5em; font-weight: bold; text-align: right; }
.bib-item > :last-child { display: table-cell; padding-bottom: .5ex; }
</style>
<div class="bibliography"><div id="cite1" class="bib-item">
<span>[1]</span>
<span>L. Kumari, S. Wang, A. Das, T. Zhou, and J. Bilmes, “An End-to-End Submodular Framework for Data-Efficient In-Context Learning,” in Findings of the Association for Computational Linguistics: NAACL 2024, K. Duh, H. Gomez, and S. Bethard, Eds., Mexico City, Mexico: Association for Computational Linguistics, Jun. 2024, pp. 3293–3308. Accessed: Jul. 02, 2024. [Online]. Available: <a class="link" href="https://aclanthology.org/2024.findings-naacl.209" target="_blank" rel="noopener"
>https://aclanthology.org/2024.findings-naacl.209</a><a href="#ref-cite1-1">⤶</a><a href="#ref-cite1-2">⤶</a></span>
</div><div id="cite2" class="bib-item">
<span>[2]</span>
<span>H. Su et al., “Selective Annotation Makes Language Models Better Few-Shot Learners,” presented at the The Eleventh International Conference on Learning Representations, Sep. 2022. Accessed: May 30, 2024. [Online]. Available: <a class="link" href="https://openreview.net/forum?id=qY1hlv7gwg" target="_blank" rel="noopener"
>https://openreview.net/forum?id=qY1hlv7gwg</a><a href="#ref-cite2-1">⤶</a></span>
</div><div id="cite3" class="bib-item">
<span>[3]</span>
<span>S. C. H. Hoi, R. Jin, J. Zhu, and M. R. Lyu, “Batch mode active learning and its application to medical image classification,” in Proceedings of the 23rd international conference on Machine learning, in ICML ’06. New York, NY, USA: Association for Computing Machinery, Jun. 2006, pp. 417–424. doi: 10.1145/1143844.1143897.<a href="#ref-cite3-1">⤶</a></span>
</div><div id="cite4" class="bib-item">
<span>[4]</span>
<span>N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” arXiv, Aug. 27, 2019. Accessed: Jul. 08, 2024. [Online]. Available: <a class="link" href="http://arxiv.org/abs/1908.10084" target="_blank" rel="noopener"
>http://arxiv.org/abs/1908.10084</a><a href="#ref-cite4-1">⤶</a></span>
</div><div id="cite5" class="bib-item">
<span>[5]</span>
<span>G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher, “An analysis of approximations for maximizing submodular set functions—I,” Mathematical programming, vol. 14, pp. 265–294, 1978.<a href="#ref-cite5-1">⤶</a></span>
</div><div id="cite6" class="bib-item">
<span>[6]</span>
<span>L. Kumari and J. Bilmes, “Submodular Span, with Applications to Conditional Data Summarization,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, Art. no. 14, May 2021, doi: 10.1609/aaai.v35i14.17465.<a href="#ref-cite6-1">⤶</a></span>
</div></div>
<!--
{{#- And it is formulated as a submodular span optimization problem:
$$\begin{aligned}&\max_{A\subseteq V - Q} |A| \\\\ \mathrm{s.t.}\quad&f(A\cup Q) - f(Q)\le\epsilon\end{aligned},$$
where $\epsilon>0$ is a hyperparameter controlling the relevance level.
This can be interpreted as maximizing the marginal -#}}
--></description></item><item><title>个人常用的新 Windows 11 系统的配置过程</title><link>https://blog.furffisite.link/p/windows-setup/</link><pubDate>Fri, 05 Jul 2024 10:57:02 +0800</pubDate><guid>https://blog.furffisite.link/p/windows-setup/</guid><description><img src="https://files.furffisite.link/blogimg/20240707215654-5c276c9ae1c98340521bf56811f4b765-87c36.jpg" alt="Featured image of post 个人常用的新 Windows 11 系统的配置过程" /><h2 id="验机">验机
</h2><p>在完成 Windows 初始配置,进入桌面后的第一件事是验机,即确保笔记本的配置符合厂家的宣传、确保笔记本没有暗病等。</p>
<p>在外观上的验机步骤在开机前应当已经完成了,例如包装正常、外观无划痕、首次开机必须接入电源等。
但是,我们并不能从外观看出笔记本的配置信息,因此,需要在进入系统后使用相关软件检测笔记本的硬件配置。这些工具例如:</p>
<ul>
<li><a class="link" href="https://www.cpuid.com/softwares/cpu-z.html" target="_blank" rel="noopener"
>CPU-Z</a>:检测主要硬件信息(CPU/主板/内存显卡);</li>
<li><a class="link" href="https://www.aida64.com/downloads" target="_blank" rel="noopener"
>AIDA-64</a>:检测所有硬件信息;</li>
<li><a class="link" href="https://crystalmark.info/en/software/crystaldiskinfo/" target="_blank" rel="noopener"
>CrystalDiskInfo</a>: 查看硬盘信息和状态;</li>
<li>……</li>
</ul>
<p>这类工具软件都已经被DIY爱好者中著名的<a class="link" href="https://www.tbtool.cn/" target="_blank" rel="noopener"
>图吧工具箱</a>和<a class="link" href="http://www.kbtool.cn/" target="_blank" rel="noopener"
>卡硬工具箱</a>收录,所以也可以直接下载使用这些工具箱用于检测硬件。
对于办公用的笔记本而言这些验机软件是无用的,因此较为推荐的做法是将其程序放在 U 盘里,验新机的时候只需插入这个 U 盘,运行这些程序即可。</p>
<h2 id="清理">清理
</h2><p>对于新系统而言,需要清理的只有预装软件和开机启动项</p>
<ul>
<li>清理预装软件:右键 Windows 徽标 -&gt; “安装的应用” -&gt; 遍历列表并卸载用不上的预装应用;</li>
<li>清理开机启动项:右键 Windows 徽标 -&gt; “任务管理器” -&gt; 启动应用 -&gt; 遍历列表并卸载用不上的预装应用。</li>
</ul>
<h2 id="设置-windows">设置 Windows
</h2><h3 id="硬盘分区">硬盘分区
</h3><p>因为有些品牌的笔记本出厂是自带 <a class="link" href="https://learn.microsoft.com/zh-cn/windows/security/operating-system-security/data-protection/bitlocker/" target="_blank" rel="noopener"
>BitLocker</a> 分区加密的,而常用的 <a class="link" href="https://www.diskgenius.cn/" target="_blank" rel="noopener"
>DiskGenius</a> 等第三方的分区助手似乎并不支持 BitLocker 加密的分区,所以我这里使用 Windows 自带的硬盘管理功能管理硬盘分区。
Windows 的硬盘管理功能可以直接通过 Windows 徽标的右键菜单打开。</p>
<p>在“磁盘管理”中,使用 右键 -&gt; 删除卷 删除出厂预分配的带盘符的分区(不带盘符的恢复分区最好别删),然后使用右键 Windows 分区 -&gt; 扩展卷或者压缩卷,以调整 Windows 分区的大小。我订购的笔记本内置了一块 1 TB 的固态硬盘,我的空间分配方案如下:</p>
<ul>
<li>Windows 分区(300 GiB),用于存放 Windows 系统、应用程序和开发环境;</li>
<li>Data 分区(200 GiB),用于存放学习资料、代码和照片等数据;</li>
<li>Extra 分区(剩余的 451 GiB),用于存放可以从互联网重复下载的资源,例如 Steam 游戏库、预训练模型权重和数据集等。</li>
</ul>
<p>这里没有给 Linux 系统留空间是因为新的笔记本内有一个空的 <a class="link" href="https://zh.wikipedia.org/zh-cn/M.2" target="_blank" rel="noopener"
>M.2</a> NVME 硬盘位,我准备把旧笔记本内的固态硬盘拆了放在这里作为 Linux 系统的系统盘。</p>
<h3 id="开始菜单">开始菜单
</h3><ul>
<li>清理开始菜单:点击 Windows 徽标打开开始菜单 -&gt; 删除开始菜单内不需要的应用图标</li>
<li>展示更多项目:进入设置 -&gt; “个性化” -&gt; “开始” -&gt; 选择“更多固定项”</li>
<li>管理电源按钮左侧的图标:进入设置 -&gt; “个性化” -&gt; “开始” -&gt; “文件夹” -&gt; 开启需要显示在电源按钮左侧的图标(我选择了设置、文件资源管理器、下载和个人文件夹)</li>
</ul>
<h3 id="任务栏">任务栏
</h3><ul>
<li>任务栏居左:进入设置 -&gt; “个性化” -&gt; “任务栏” -&gt; “任务栏行为” -&gt; “任务栏对齐方式” -&gt; “靠左”</li>
<li>搜索框改为搜索图标:&hellip; -&gt; “任务栏” -&gt; “任务栏项” -&gt; “搜索” -&gt; “仅搜索图标”</li>
<li>关闭小组件:&hellip; -&gt; “任务栏项” -&gt; “小组件” -&gt; 设为关闭</li>
</ul>
<h3 id="鼠标指针样式">鼠标指针样式
</h3><p>个人认为默认的白色指针不好看,并且在白底黑字、文字和指针大小相近的情况下,有时会混进文字里难以分辨。</p>
<ul>
<li>更改鼠标指针样式:进入设置 -&gt; “辅助功能” -&gt; “鼠标指针与触控” -&gt; “鼠标指针样式” -&gt; 我个人更喜欢第四项 “自定义” -&gt; 从给出的颜色中选一个(不满意的话也可以选择其它颜色)</li>
</ul>
<h3 id="语言">语言
</h3><ul>
<li>添加英文输入法(便于写代码):进入设置 -&gt; “时间和语言” -&gt; “语言和区域” -&gt; “添加语言” -&gt; 搜索并选择“英语(美国)”</li>
<li>添加日文输入法(有时会用到):&hellip; -&gt; “添加语言” -&gt; 搜索并选择“日语”</li>
</ul>
<h3 id="个人数据存储位置">个人数据存储位置
</h3><p>这一步的目的是将一部分个人数据的保存位置从系统盘更改至 D 盘。不过这个操作只能迁移一部分数据,对于 AppData 内的文件则无能为力。</p>
<ul>
<li>进入个人文件夹 -&gt; 右键“文档/音乐/图片/视频/下载” -&gt; “属性” -&gt; “位置”标签页 -&gt; 点击“移动” -&gt; 选择 D 盘内的目标文件夹 -&gt; “确定”</li>
<li>进入设置 -&gt; “系统” -&gt; “存储” -&gt; “保存新内容的地方” -&gt; 将中间四项的存储位置从 C 盘更改为 D 盘</li>
</ul>
<h2 id="安装软件">安装软件
</h2><h3 id="工具">工具
</h3><p>目前只安装了这些,我在之后发现的和用到的好用小工具也会更新到这个列表里。</p>
<ul>
<li><a class="link" href="https://github.com/microsoft/PowerToys" target="_blank" rel="noopener"
>PowerToys</a> (<a class="link" href="https://apps.microsoft.com/detail/xp89dcgq3k6vld" target="_blank" rel="noopener"
>Microsoft Store</a>):开源的 Windows 小工具合集;</li>
<li><a class="link" href="https://zh.snipaste.com/" target="_blank" rel="noopener"
>Snipaste</a> (<a class="link" href="https://apps.microsoft.com/detail/9p1wxpkb68kx" target="_blank" rel="noopener"
>Microsoft Store</a>):截图/贴图工具;</li>
<li><a class="link" href="https://github.com/0x7c13/Notepads" target="_blank" rel="noopener"
>Notepads App</a> (<a class="link" href="https://apps.microsoft.com/detail/9nhl4nsc67wm" target="_blank" rel="noopener"
>Microsoft Store</a>):更美观的记事本平替;</li>
<li><a class="link" href="https://potplayer.daum.net/" target="_blank" rel="noopener"
>PotPlayer</a> (<a class="link" href="https://apps.microsoft.com/detail/xp8bsbgqw2dks0" target="_blank" rel="noopener"
>Microsoft Store</a>):媒体播放器;</li>
<li><a class="link" href="https://www.bandisoft.com/bandizip/" target="_blank" rel="noopener"
>Bandizip</a> (<a class="link" href="https://apps.microsoft.com/detail/9p2w3w81sppb" target="_blank" rel="noopener"
>Microsoft Store</a>):好用的压缩软件;</li>
<li><a class="link" href="https://www.sumatrapdfreader.org/free-pdf-reader" target="_blank" rel="noopener"
>Sumatra PDF</a> (<a class="link" href="https://www.sumatrapdfreader.org/download-free-pdf-viewer" target="_blank" rel="noopener"
>Download</a>):轻量级 PDF 阅读器;</li>
<li><a class="link" href="https://www.zotero.org/" target="_blank" rel="noopener"
>Zotero</a> (<a class="link" href="https://www.zotero.org/download/" target="_blank" rel="noopener"
>Download</a>):文献管理/阅读工具;</li>
<li><a class="link" href="https://www.mactype.net/" target="_blank" rel="noopener"
>MacType</a>:Windows 字体渲染优化。</li>
</ul>
<h3 id="开发环境">开发环境
</h3><p>后续我主要的开发工作还是会在 Linux 系统中进行,因此在 Windows 系统里只装这些应该就够了。</p>
<ul>
<li><a class="link" href="https://code.visualstudio.com/" target="_blank" rel="noopener"
>Visual Studio Code</a>:代码编辑器;</li>
<li><a class="link" href="https://www.python.org/downloads/" target="_blank" rel="noopener"
>Python</a>:Python 解释器;
<ul>
<li>Pip 换源:
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">pip config <span class="nb">set</span> global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
</span></span></code></pre></td></tr></table>
</div>
</div></li>
</ul>
</li>
<li><a class="link" href="https://scoop.sh/" target="_blank" rel="noopener"
>Scoop</a>:Windows 的第三方包管理工具;
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-powershell" data-lang="powershell"><span class="line"><span class="cl"><span class="nb">Set-ExecutionPolicy</span> <span class="n">-ExecutionPolicy</span> <span class="n">RemoteSigned</span> <span class="n">-Scope</span> <span class="n">CurrentUser</span>
</span></span><span class="line"><span class="cl"><span class="nb">Invoke-RestMethod</span> <span class="n">-Uri</span> <span class="n">https</span><span class="err">:</span><span class="p">//</span><span class="n">get</span><span class="p">.</span><span class="py">scoop</span><span class="p">.</span><span class="py">sh</span> <span class="p">|</span> <span class="nb">Invoke-Expression</span>
</span></span></code></pre></td></tr></table>
</div>
</div></li>
<li><a class="link" href="https://git-scm.com/" target="_blank" rel="noopener"
>Git</a> + <a class="link" href="https://gohugo.io/" target="_blank" rel="noopener"
>hugo</a>:版本管理 + 写博客用的。
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-powershell" data-lang="powershell"><span class="line"><span class="cl"><span class="n">scoop</span> <span class="n">install</span> <span class="n">git</span> <span class="n">go</span>
</span></span></code></pre></td></tr></table>
</div>
</div></li>
</ul></description></item><item><title>Apple Music 曲库迁移</title><link>https://blog.furffisite.link/p/applemusic-transfer/</link><pubDate>Thu, 23 May 2024 16:01:12 +0800</pubDate><guid>https://blog.furffisite.link/p/applemusic-transfer/</guid><description><img src="https://files.furffisite.link/blogimg/20240523203808-9f4a7d937672d53fbc1a7174037c6f64-581fc.jpg" alt="Featured image of post Apple Music 曲库迁移" /><h2 id="背景">背景
</h2><p>刚上大学的时候,我考虑到日区的 Apple Music 版本可选的歌曲更多,而且各种服务也更给力<sup><a id='ref-cite1-1' href='#cite1'>[1]</a></sup>。
于是我就开通了日区的会员,因为有学生优惠,每月也就花 580 <strong>日元</strong>(约 25 人民币)。
但现在毕业在即,我开始怀疑我到研究生阶段还能不能继续享受这个优惠。而日区会员原价要每月 1080 日元(约 50 人民币),这超出我的可接受范围了。
此外,最近淘宝上苹果日区 App Store 的充值卡不是很好找,而我日区账户的余额也要见底了。
所以,是时候换个平台了。</p>
<p>然而,国内的音乐平台的社交属性太强,我个人不太喜欢这些,觉得太热闹了<sup><a id='ref-cite2-1' href='#cite2'>[2]</a></sup>。
相对而言,Apple Music 则更专注于音乐本身,而且跟苹果生态融合的更好,所以我最后还是选择了转到 Apple Music 的中国大陆区(国区)。</p>
<p>我有两个苹果账户,一个在国区、一个在日区,于我而言这里的“转区”就是将后者资料库的所有歌曲添加到前者的资料库中。
考虑到 Apple Music 国区的版权库不如日区,我并不要求将其<strong>全部</strong>迁移,只要能将大部分歌曲复制到目标账户的资料库里就行。</p>
<p>我能搜到的成熟的工具不是满足不了我的需求就是需要收费,因此我决定自己探索迁移曲库的方法。本文的方法分为两步:</p>
<ol>
<li>从源账户获取歌曲</li>
<li>将歌曲添加到新账户</li>
</ol>
<p>这需要操作者<strong>有一定的编程和爬虫基础</strong>,本文只是讲述我的思路和操作流程,我也没有将其整理成通用的代码的想法,<strong>文中的脚本需要根据实际情况修改后才可以正常使用</strong>。
本文的方法与账户所在的区域无关,因此理论上也适用于其它区域间的和相同区域不同账户之间的迁移。</p>
<h2 id="第一步从源账户获取歌曲">第一步:从源账户获取歌曲
</h2><p>我积累了四年的账户里有歌曲 8441 首,它们分布在 1820 张专辑内,所以以专辑为单位迁移更合理一些。</p>
<p>得益于 Apple Music 的网页端,获取相关的 API 并不是什么难事:</p>
<ol>
<li>在浏览器内的 <a class="link" href="https://music.apple.com/" target="_blank" rel="noopener"
>Apple Music网页端</a>登录源账户,打开调试工具,切换到 Network 页,过滤出 XHR 请求。</li>
<li>点击“专辑/Albums”切换到专辑页,这时在调试工具内应该会出现一条 <code>amp-api</code> 开头的请求:
<img src="https://files.furffisite.link/blogimg/20240523174431-b5b39700f6d62e42d12f27ec83098955-bb102.png"
loading="lazy"
></li>
<li>右键这条请求,Copy -&gt; Copy as cURL,复制到终端内尝试重放请求,在我这里成功了,说明苹果并没有做各种弯弯绕绕的反爬措施,好评。</li>
</ol>
<p>这个API响应的结构大致如下:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;data&#34;</span><span class="p">:</span> <span class="p">[</span><span class="err">...</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;resources&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;library-albums&#34;</span><span class="p">:</span> <span class="p">{</span><span class="err">...</span><span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;albums&#34;</span> <span class="p">:</span> <span class="p">{</span><span class="err">...</span><span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;library-artists&#34;</span><span class="p">:</span> <span class="p">{</span><span class="err">...</span><span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;artists&#34;</span><span class="p">:</span> <span class="p">{</span><span class="err">...</span><span class="p">}</span>
</span></span><span class="line"><span class="cl"> <span class="p">},</span>
</span></span><span class="line"><span class="cl"> <span class="nt">&#34;meta&#34;</span><span class="p">:</span> <span class="p">{</span><span class="err">...</span><span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>有了 API 之后就可以循环请求获取完整列表了:</p>
<ol>
<li>请求参数里的 offset 项指定了偏移量,通过不断改变这个值可以获取完整的专辑列表。</li>
<li>在响应json最后的 <code>meta</code> 项里有订阅的专辑总数,在我这里是 <code>&quot;meta&quot;: {&quot;total&quot;: 1820, ...}</code>,即1820条,请求参数内 <code>limit=100</code> 也就是每页有 100 条。根据这两个数值可以算出一共有 19 页,要请求 19 次。我也尝试了调高 <code>limit</code>,发现 <code>limit&gt;100</code> 时 API 会返回错误。</li>
<li>将前面复制的 curl 指令放到 <code>.sh</code> 脚本内,写一个循环,例如:</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">......
</span></span><span class="line"><span class="cl"><span class="k">for</span> i in <span class="o">{</span>0..18<span class="o">}</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl"><span class="nv">chunk</span><span class="o">=</span><span class="s2">&#34;</span><span class="si">${</span><span class="nv">i</span><span class="si">}</span><span class="s2">00&#34;</span>
</span></span><span class="line"><span class="cl">curl <span class="s2">&#34;https://amp-api.music.apple.com/.../...&amp;offset=</span><span class="nv">$chunk</span><span class="s2">&amp;...&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> --compressed <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> ......
</span></span><span class="line"><span class="cl"> -H <span class="s1">&#39;Sec-Fetch-Site: same-site&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> -H <span class="s1">&#39;TE: trailers&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> -o <span class="s2">&#34;chunk</span><span class="nv">$chunk</span><span class="s2">.json&#34;</span> <span class="c1"># 把响应输出到文件</span>
</span></span><span class="line"><span class="cl">sleep <span class="m">10</span> <span class="c1"># 避免请求间隔太短导致被 ban</span>
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="4">
<li>在终端用 bash 运行这个脚本,不一会资料库内的所有专辑的信息就都被保存到本地的 json 文件内了。</li>
</ol>
<h2 id="第二步将歌曲添加到新账户">第二步:将歌曲添加到新账户
</h2><p>和上面类似,首先获取添加专辑的 API:</p>
<ol>
<li><strong>切换到目标账户</strong>,打开调试工具,切换到 Network 页,过滤出 XHR 请求。</li>
<li>在首页上随便找个专辑添加到资料库内,在调试工具内定位到这条请求:
<img src="https://files.furffisite.link/blogimg/20240523182750-1598942b4c5c3c5de352b8e1ead90f25-36a7e.png"
loading="lazy"
></li>
<li>同上,复制 curl 指令到终端内尝试重放,成功。</li>
</ol>
<p>分析请求 URL:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">https://amp-api.music.apple.com/v1/me/library?art[url]=f&amp;format[resources]=map&amp;ids[albums]=1725057905&amp;representation=ids
</span></span></code></pre></td></tr></table>
</div>
</div><p>其中的 <code>1725057905</code> 对应之前获取到的 json 文件里 <code>resources.albums</code> 内的每个属性的名称,也就是每个专辑的 ID。
由于 URL 内 ID 是复数 <code>ids</code>,我便猜测能否同时添加多个专辑,将URL中的 <code>1725057905</code> 替换为 <code>398320584,322934943</code>,重新发送请求,居然也成功了,说明我的猜测是对的。</p>
<p>有了这些基础,编写脚本就很容易了,首先使用 jq 解析之前保存的响应 json,提取出其中 <code>resources.albums</code> 内每一项的键值,使用逗号分隔输出并拼接到 url 内:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="nv">jqs</span><span class="o">=</span><span class="s2">&#34;.resources.albums | keys[</span><span class="si">${</span><span class="nv">section</span><span class="si">}</span><span class="s2">] | [ .[] | tonumber ] | @csv&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">ids</span><span class="o">=</span><span class="s2">&#34;</span><span class="k">$(</span>cat chunk<span class="si">${</span><span class="nv">index</span><span class="si">}</span>00.json <span class="p">|</span> jq -r <span class="s2">&#34;</span><span class="nv">$jqs</span><span class="s2">&#34;</span> <span class="k">)</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">url</span><span class="o">=</span><span class="s2">&#34;https://amp-api.music.apple.com/.../...&amp;ids%5Balbums%5D=</span><span class="nv">$ids</span><span class="s2">&amp;...&#34;</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>放到之前的 curl 请求内,加上循环组成脚本,脚本里对每个 chunk 分段请求是因为一次增加 100 张专辑请求的响应会很慢:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span><span class="lnt">17
</span><span class="lnt">18
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">......
</span></span><span class="line"><span class="cl"><span class="k">for</span> index in <span class="o">{</span>0..18<span class="o">}</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> section in <span class="s1">&#39;:25&#39;</span> <span class="s1">&#39;25:50&#39;</span> <span class="s1">&#39;50:75&#39;</span> <span class="s1">&#39;75:&#39;</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nv">jqs</span><span class="o">=</span><span class="s2">&#34;.resources.albums | keys[</span><span class="si">${</span><span class="nv">section</span><span class="si">}</span><span class="s2">] | [ .[] | tonumber ] | @csv&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">ids</span><span class="o">=</span><span class="s2">&#34;</span><span class="k">$(</span>cat chunk<span class="si">${</span><span class="nv">index</span><span class="si">}</span>00.json <span class="p">|</span> jq -r <span class="s2">&#34;</span><span class="nv">$jqs</span><span class="s2">&#34;</span> <span class="k">)</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">url</span><span class="o">=</span><span class="s2">&#34;https://amp-api.music.apple.com/v1/me/library?art%5Burl%5D=f&amp;format%5Bresources%5D=map&amp;ids%5Balbums%5D=</span><span class="nv">$ids</span><span class="s2">&amp;representation=ids&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">curl -X POST <span class="s2">&#34;</span><span class="nv">$url</span><span class="s2">&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> -H <span class="s2">&#34;</span><span class="nv">$ua</span><span class="s2">&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> -H <span class="s1">&#39;Accept: */*&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> ......
</span></span><span class="line"><span class="cl"> -H <span class="s1">&#39;Content-Length: 0&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span> -H <span class="s1">&#39;TE: trailers&#39;</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span>
</span></span><span class="line"><span class="cl">sleep <span class="m">10</span> <span class="c1"># 避免请求间隔太短导致被 ban</span>
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>之后运行这个脚本,就可以看到自己源账户里的歌曲被慢慢添加到目标账户里了。</p>
<h2 id="哪些歌曲没成功迁移">哪些歌曲没成功迁移
</h2><p>仿照第一步获取目标账户的所有专辑,然后写一些脚本就能看到有哪些歌曲没被成功迁移了,例如这个 python 程序:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt"> 1
</span><span class="lnt"> 2
</span><span class="lnt"> 3
</span><span class="lnt"> 4
</span><span class="lnt"> 5
</span><span class="lnt"> 6
</span><span class="lnt"> 7
</span><span class="lnt"> 8
</span><span class="lnt"> 9
</span><span class="lnt">10
</span><span class="lnt">11
</span><span class="lnt">12
</span><span class="lnt">13
</span><span class="lnt">14
</span><span class="lnt">15
</span><span class="lnt">16
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">json</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">getalbums</span><span class="p">(</span><span class="n">count</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">fn</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="n">albums</span> <span class="o">=</span> <span class="p">{}</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">index</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">count</span><span class="p">):</span>
</span></span><span class="line"><span class="cl"> <span class="n">filename</span> <span class="o">=</span> <span class="n">fn</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">index</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"> <span class="n">data</span> <span class="o">=</span> <span class="n">json</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">filename</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"> <span class="k">for</span> <span class="n">k</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="n">data</span><span class="p">[</span><span class="s2">&#34;resources&#34;</span><span class="p">][</span><span class="s2">&#34;albums&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
</span></span><span class="line"><span class="cl"> <span class="n">albums</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">k</span><span class="p">)]</span> <span class="o">=</span> <span class="n">v</span><span class="p">[</span><span class="s2">&#34;attributes&#34;</span><span class="p">][</span><span class="s2">&#34;name&#34;</span><span class="p">]</span> <span class="o">+</span> <span class="s1">&#39; | &#39;</span> <span class="o">+</span> <span class="n">v</span><span class="p">[</span><span class="s2">&#34;attributes&#34;</span><span class="p">][</span><span class="s2">&#34;artistName&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"> <span class="k">return</span> <span class="n">albums</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">org_album</span> <span class="o">=</span> <span class="n">getalbums</span><span class="p">(</span><span class="mi">19</span><span class="p">,</span> <span class="s2">&#34;chunk</span><span class="si">{}</span><span class="s2">00.json&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">dest_album</span> <span class="o">=</span> <span class="n">getalbums</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="s2">&#34;after-chunk</span><span class="si">{}</span><span class="s2">00.json&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">ids</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">org_album</span><span class="p">)</span> <span class="o">-</span> <span class="nb">set</span><span class="p">(</span><span class="n">dest_album</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">ids</span><span class="p">:</span>
</span></span><span class="line"><span class="cl"> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">:&#34;</span><span class="p">,</span><span class="n">org_album</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">ids</span><span class="p">))</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>我一共有 338 张专辑没有被成功迁移,这主要有三种情况:</p>
<ol>
<li>Apple Music 国区没版权;</li>
<li>没有一模一样的,但是有不同版本的;</li>
<li>国区和日区都有,但是 ID 不同,例如 RADWIMPS 和 HOYO-MiX 的专辑。</li>
</ol>
<p>其中后两者靠手动搜索也可以慢慢添加回去,如果遇到了第一种那就真的没办法了。</p>
<h2 id="参考资料">参考资料
</h2><style>
.bibliography { display: table; font-size: medium; line-height: normal; }
.bib-item { display: table-row; }
.bib-item > :first-child { display: table-cell; padding-right: .5em; font-weight: bold; text-align: right; }
.bib-item > :last-child { display: table-cell; padding-bottom: .5ex; }
</style>
<div class="bibliography"><div id="cite1" class="bib-item">
<span>[1]</span>
<span><a class="link" href="https://steppark.net/16357088669226.html" target="_blank" rel="noopener"
>订阅 Apple Music 该选哪个区?——中美坡港台日六大地区全对比(第二版) - 向远公园 | Step Park</a><a href="#ref-cite1-1">⤶</a></span>
</div><div id="cite2" class="bib-item">
<span>[2]</span>
<span><a class="link" href="https://sspai.com/post/64477" target="_blank" rel="noopener"
>音乐平台横评:Apple Music、QQ 音乐、网易云、咪咕、Spotify、YouTube Music、Tidal - 少数派</a><a href="#ref-cite2-1">⤶</a></span>
</div></div></description></item><item><title>【Linux】修改登录界面 密码三次错误锁定时间</title><link>https://blog.furffisite.link/p/linux-authfail-lock/</link><pubDate>Wed, 20 Mar 2024 13:55:32 +0800</pubDate><guid>https://blog.furffisite.link/p/linux-authfail-lock/</guid><description><img src="https://files.furffisite.link/blogimg/20240321144218-38a3404b3f78e405beab5627f4e45e7b-f5a7b.jpg" alt="Featured image of post 【Linux】修改登录界面 密码三次错误锁定时间" /><p>太长不看版:</p>
<ol>
<li>使用 root 权限打开 <code>/etc/security/faillock.conf</code> (<code>vim</code> 可换成其它编辑器,如 <code>nano</code>)</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ sudo vim /etc/security/faillock.conf
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="2">
<li>取消注释并修改<code>unlock_time</code>一项,60 可替换为你想要的秒数(默认600秒)</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="cl"><span class="na">unlock_time</span> <span class="o">=</span> <span class="s">60</span>
</span></span></code></pre></td></tr></table>
</div>
</div><ol start="3">
<li><strong>或者</strong>,修改<code>deny</code>一项关闭错误锁定</li>
</ol>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="cl"><span class="na">deny</span> <span class="o">=</span> <span class="s">0</span>
</span></span></code></pre></td></tr></table>
</div>
</div><h2 id="背景">背景
</h2><p>今天清晨我刚刚起床,在迷糊中想看看半夜挂机训练的模型的效果,于是抓起笔记本开机,输密码。结果因为我没睡醒,输错了三次密码,屏幕上跳出了这个提示:</p>
<center>
<img src="https://files.furffisite.link/blogimg/20240320145104-2443a1678e465bfd578c1d40d23648a7-29fb6.jpg" alt="The account is locked due to 3 failed logins. (10 minutes left to unlock)" style="width:50%;min-width:400px;"/>
</center>
<p>(登录界面的壁纸是 <a class="link" href="https://www.pixiv.net/users/14541079" target="_blank" rel="noopener"
>さなせ</a> 老师画的 <a class="link" href="https://www.pixiv.net/artworks/109067533" target="_blank" rel="noopener"
>银狼</a>。)</p>
<p>哇!我用了一年多的 <a class="link" href="http://endeavouros.com/" target="_blank" rel="noopener"
>EndeavourOS</a> 第一次见到这个提示。再怎么说输错三次要锁 10 分钟也太无情了吧,如果我急着用怎么办。怎么改变这个设置呢?</p>
<h2 id="faillock-配置">faillock 配置
</h2><p>由于我不知道这个部分具体是那个模块负责的,搜索的过程还是有一些波折的。在这个过程中,我发现登录界面(greeter)由 LightDM 的 greeter <sup><a id='ref-cite1-1' href='#cite1'>[1]</a></sup> 提供,登录验证由 Pluggable Authentication Modules (PAM) <sup><a id='ref-cite2-1' href='#cite2'>[2]</a></sup><sup><a id='ref-cite3-1' href='#cite3'>[3]</a></sup> 处理,账户锁定的部分则由 PAM 的模块<code>pam_faillock.so</code>负责。前者的配置文件在 <code>/etc/lightdm/</code> 文件夹下,文件名视所使用的 greeter 而定(我这里是<code>slick-greeter.conf</code>),后者的配置文件是 <code>/etc/security/faillock.conf</code><sup><a id='ref-cite4-1' href='#cite4'>[4]</a></sup>.</p>
<p>使用 <code>vim</code>、<code>nano</code> 等文本编辑器以 root 权限打开 faillock 的配置文件:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ sudo vim /etc/security/faillock.conf
</span></span></code></pre></td></tr></table>
</div>
</div><p>可以发现默认的配置文件里(可能因发行版不同而有差异)已经解释了每一项配置及其作用,我们要做的只是取消对应行的注释,修改成我们想要的数值。</p>
<h3 id="账户锁定相关的配置">账户锁定相关的配置
</h3><p>对于非共享的PC而言我觉得最重要的也就三项:<code>deny</code>,<code>fail_interval</code>和<code>unlock_time</code>。这三项用一句话来概括就是:“<code>deny</code>为<code>0</code>时无论输错多少次都不会锁定账户,否则,在<code>{fail_interval}</code>秒内连续密码错误<code>{deny}</code>次会导致账户被锁定<code>{unlock_time}</code>秒。”这三项的默认值分别是<code>deny=3</code>,<code>fail_interval=900</code>和<code>unlock_time=600</code></p>
<p>为了避免暴力破解,我没有将<code>deny</code>设为 <code>0</code>,以下是我的配置:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="cl"><span class="na">deny</span> <span class="o">=</span> <span class="s">3</span>
</span></span><span class="line"><span class="cl"><span class="na">fail_interval</span> <span class="o">=</span> <span class="s">60</span>
</span></span><span class="line"><span class="cl"><span class="na">unlock_time</span> <span class="o">=</span> <span class="s">60</span>
</span></span></code></pre></td></tr></table>
</div>
</div><p>因为比起默认的 10 分钟,1 分钟我觉得是兼顾了安全和便捷的选择。</p>
<p>另外,faillock 记录的登录失败次数默认是放在 <code>/var/run/faillock</code> 内的,而这个文件夹处在临时文件系统 tmpfs 内,也就是说当账户因多次密码错误被锁定时,只要重启电脑就能解除锁定<sup><a id='ref-cite5-1' href='#cite5'>[5]</a></sup>。修改配置文件的<code>dir</code>一项,将文件夹改到持久存储的文件系统内即可防止重启解除锁定(例如<code>/var/lib/faillock</code>)。</p>
<p>我还试了一下,在默认情况下使用 ssh 远程登陆也是有可能触发账户锁定的(当然这个也取决于PAM的配置),当账户锁定时即使密码正确也会返回 <code>Permission denied, please try again.</code>。</p>
<h3 id="root-相关的配置">root 相关的配置
</h3><p>在默认情况下,为了防止 DOS 攻击, faillock 是不会锁定 root 账户的 <sup><a id='ref-cite6-1' href='#cite6'>[6]</a></sup>。开启 <code>even_deny_root</code> 这一项则让 faillock 可以锁定 root 。使用 <code>root_unlock_time</code> 可以为 root 账户单独设置锁定时间,并且设置这项会隐式开启<code>even_deny_root</code>。</p>
<h3 id="日志相关的配置">日志相关的配置
</h3><p>在命令行中使用:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">$ journalctl --since today -fg pam
</span></span></code></pre></td></tr></table>
</div>
</div><p>可以看到今天以来与 PAM 有关的日志,使用 <code>faillock</code> 指令可以看到系统当前记录的登录失败的信息,例如:</p>
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span><span class="lnt">5
</span><span class="lnt">6
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl">$ faillock --user furffico
</span></span><span class="line"><span class="cl">furffico:
</span></span><span class="line"><span class="cl">When Type Source Valid
</span></span><span class="line"><span class="cl">2024-03-21 13:16:24 RHOST 192.168.1.101 V
</span></span><span class="line"><span class="cl">2024-03-21 13:16:27 RHOST 192.168.1.101 V
</span></span><span class="line"><span class="cl">2024-03-21 13:16:30 RHOST 192.168.1.101 V
</span></span></code></pre></td></tr></table>
</div>
</div><p>配置文件内与日志相关的配置有三个:</p>
<ul>
<li><code>audit</code>: 当用户不存在时将用户名记入系统日志。
<div class="highlight"><div class="chroma">
<table class="lntable"><tr><td class="lntd">
<pre tabindex="0" class="chroma"><code><span class="lnt">1
</span><span class="lnt">2
</span><span class="lnt">3
</span><span class="lnt">4
</span></code></pre></td>
<td class="lntd">
<pre tabindex="0" class="chroma"><code class="language-fallback" data-lang="fallback"><span class="line"><span class="cl"># before:
</span></span><span class="line"><span class="cl">sshd[134004]: pam_faillock(sshd:auth): User unknown
</span></span><span class="line"><span class="cl"># after:
</span></span><span class="line"><span class="cl">sshd[136827]: pam_faillock(sshd:auth): User unknown: f
</span></span></code></pre></td></tr></table>
</div>
</div></li>
<li><code>slient</code>:不打印 informative(信息丰富的?)消息(没试出来这个开了有什么区别)。</li>
<li><code>no_log_info</code>:不向系统日志打印 informative 消息。</li>
</ul>
<h2 id="参考资料">参考资料
</h2><style>
.bibliography { display: table; font-size: medium; line-height: normal; }
.bib-item { display: table-row; }
.bib-item > :first-child { display: table-cell; padding-right: .5em; font-weight: bold; text-align: right; }
.bib-item > :last-child { display: table-cell; padding-bottom: .5ex; }
</style>
<div class="bibliography"><div id="cite1" class="bib-item">
<span>[1]</span>
<span><a class="link" href="https://wiki.archlinux.org/title/LightDM#Greeter" target="_blank" rel="noopener"
>LightDM - ArchWiki</a><a href="#ref-cite1-1">⤶</a></span>
</div><div id="cite2" class="bib-item">
<span>[2]</span>
<span><a class="link" href="https://wiki.archlinux.org/title/PAM" target="_blank" rel="noopener"
>PAM - ArchWiki</a><a href="#ref-cite2-1">⤶</a></span>
</div><div id="cite3" class="bib-item">
<span>[3]</span>
<span><a class="link" href="https://www.redhat.com/sysadmin/pluggable-authentication-modules-pam" target="_blank" rel="noopener"
>An introduction to Pluggable Authentication Modules (PAM) in Linux | Enable Sysadmin</a><a href="#ref-cite3-1">⤶</a></span>
</div><div id="cite4" class="bib-item">
<span>[4]</span>
<span><a class="link" href="https://man.archlinux.org/man/faillock.conf.5" target="_blank" rel="noopener"
>faillock.conf(5) — Arch manual pages</a><a href="#ref-cite4-1">⤶</a></span>
</div><div id="cite5" class="bib-item">
<span>[5]</span>
<span><a class="link" href="https://wiki.archlinux.org/title/Security#Lock_out_user_after_three_failed_login_attempts" target="_blank" rel="noopener"
>Security - ArchWiki</a><a href="#ref-cite5-1">⤶</a></span>
</div><div id="cite6" class="bib-item">
<span>[6]</span>
<span><a class="link" href="https://man.archlinux.org/man/pam_faillock.8.en" target="_blank" rel="noopener"
>pam_faillock(8) — Arch manual pages</a><a href="#ref-cite6-1">⤶</a></span>
</div></div></description></item><item><title>【读论文】Unsupervised Learning for Solving the Travelling Salesman Problem</title><link>https://blog.furffisite.link/p/read-papers/utsp/</link><pubDate>Tue, 12 Mar 2024 15:05:23 +0800</pubDate><guid>https://blog.furffisite.link/p/read-papers/utsp/</guid><description><img src="https://files.furffisite.link/blogimg/20240312193735-df25ede3714c69d2ea8b42ba9c39f7c9-755f3.jpg" alt="Featured image of post 【读论文】Unsupervised Learning for Solving the Travelling Salesman Problem" /><h2 id="论文信息">论文信息
</h2><ul>
<li>标题: Unsupervised Learning for Solving the Travelling Salesman Problem<sup><a id='ref-cite1-1' href='#cite1'>[1]</a></sup></li>
<li>作者: Yimeng Min, Yiwei Bai, Carla P. Gomes</li>
<li>会议: NeurIPS 2023</li>
<li>在线资源: <a class="link" href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/93b8618a9061f8a55825c13ecf28392b-Abstract-Conference.html" target="_blank" rel="noopener"
>https://proceedings.neurips.cc/paper_files/paper/2023/hash/93b8618a9061f8a55825c13ecf28392b-Abstract-Conference.html</a></li>
<li>代码: <a class="link" href="https://github.com/yimengmin/UTSP" target="_blank" rel="noopener"
>https://github.com/yimengmin/UTSP</a></li>
</ul>
<h2 id="utsp-算法">UTSP 算法
</h2><p>作者认为理想的 heatmap $\mathcal{H}\in[0,1]^{n\times n}$ 应该表示 TSP 的最优解,即一条长度最短的汉密尔顿环路,也就是:</p>
<ol>
<li>$\mathcal{H}$ 作为邻接矩阵表示的图内有且仅有一条汉密尔顿环路;</li>
<li>$\mathcal{H}$ 表示的环路长度最短,即$$\min_\mathcal{H}\sum^n_{i=1}\sum^n_{j=1} \mathcal{H} _{ij}\cdot d _{ij}$$</li>
</ol>
<p>为了让网络输出的 $\mathcal{H}$ 满足第一个条件,作者设计了 soft indicator matrix $\mathbb{T}\in[0,1]^{n\times n}$,$\mathbb{T}=[\mathbf p_1|\mathbf p_2|\cdots|\mathbf p_n]$ 是$n$个列向量组成的矩阵,满足各列和为$1$($\sum_{j=1}^n p_{ij} = 1$),这个条件可以使用 Softmax 函数或者归一化满足,论文里使用了前者。</p>
<p>然后作者提出了 $\mathbb{T}\rightarrow\mathcal{H}$ transformation,以将 soft indicator $\mathbb{T}$ 转化为可以采样的 heatmap $\mathcal{H}$:$$\mathcal{H} = \sum_{i=1}^n \mathbf p_i\cdot \mathbf p_{i+1}^T + \mathbf p_n\cdot \mathbf p_1^T$$
同时作者也证明了这样形成的 heatmap 中至少有一条汉密尔顿环路,这样就满足了第一个条件的一半(有一条)。</p>
<p>至于剩下的一半(不超过一条),作者使用设计的 loss 函数鼓励网络减少环路数量:
$$\mathcal{L}=\lambda _1 \sum^n _{i=1}(\sum^n _{j=1}\mathbb{T} _{ij}-1)^2 + \lambda_2 \sum^n _{i=1}\mathcal{H} _{ii} + \sum^n _{i=1}\sum^n _{j=1} \mathcal{H} _{ij}\cdot d _{ij}$$
其中第一项鼓励 $\mathbb T$ 每行的和接近 $1$;第二项惩罚 $\mathcal{H}$ 中的自环;第三项对应上面的第二个条件,即让图里所有边的加权和尽可能小。最理想的情况下前两项都是 $0$,$\mathcal L$等于TSP最优解的路径长度。UTSP 通过设置这样的 loss 函数,让神经网络输出的结果靠近理想的 heatmap。</p>
<p>此外,文章使用了 Scattering Attention GNN (SAG) 作为神经网络,在搜索前用 top-k 缩小搜索空间,并使用 Heat Map Guided Best-first Local Search 作为局部搜索方法。</p>
<h2 id="优势与局限性">优势与局限性
</h2><p>根据文章,UTSP 的优势是:</p>
<ul>
<li>相比于有监督学习,UTSP 不需要有标注的数据,相比于强化学习,UTSP 的收敛速度更快(需要的样本数量更少);</li>
<li>UTSP 直接从 heatmap 计算 loss,省去了强化学习依赖采样获得 reward 的过程;</li>
<li>通过结构设计保证输出的 heatmap 含有汉密尔顿环路;</li>
<li>神经网络很轻量化(TSP100 仅需两层 45k 个参数);</li>
<li>可以有效地减小搜索空间。</li>
</ul>
<p>我认为 UTSP 的局限性是:</p>
<ul>
<li>根据提供的实验数据,UTSP 即使有轻量化的网络,它生成 heatmap 需要的时间依然比 Att-GCRN<sup><a id='ref-cite2-1' href='#cite2'>[2]</a></sup> 要长(为什么呢?);</li>
<li>Soft indicator 是针对 TSP 的巧妙设计,只适用于解为汉密尔顿环的问题,并且 UTSP 需要针对问题精心设计 loss 函数,因此将 UTSP 迁移到别的组合优化问题时,相比于一些有监督学习和强化学习方法(分别可以通过有标签的数据和 reward 学习问题特征)需要更多工作。</li>
</ul>
<p>几个问题:</p>
<ul>
<li>在将边权重输入网络时,文章进行了预处理:$w_{ij}=e^{-d_{ij}/\tau}$,这样做除了将输入映射到 $(0,1]$ 之外,相比于直接输入 $w_{ij}=d_{ij}$ 还有什么作用嘛;</li>
<li>如果不采用 Soft indicator,实验结果会有多大变化。</li>
</ul>
<h2 id="相关文献">相关文献
</h2><style>
.bibliography { display: table; font-size: medium; line-height: normal; }
.bib-item { display: table-row; }
.bib-item > :first-child { display: table-cell; padding-right: .5em; font-weight: bold; text-align: right; }
.bib-item > :last-child { display: table-cell; padding-bottom: .5ex; }
</style>
<div class="bibliography"><div id="cite1" class="bib-item">
<span>[1]</span>
<span>Y. Min, Y. Bai, and C. P. Gomes, “Unsupervised Learning for Solving the Travelling Salesman Problem,” in Advances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., Curran Associates, Inc., 2023, pp. 47264–47278. [Online]. Available: <a class="link" href="https://proceedings.neurips.cc/paper_files/paper/2023/file/93b8618a9061f8a55825c13ecf28392b-Paper-Conference.pdf" target="_blank" rel="noopener"
>https://proceedings.neurips.cc/paper_files/paper/2023/file/93b8618a9061f8a55825c13ecf28392b-Paper-Conference.pdf</a><a href="#ref-cite1-1">⤶</a></span>
</div><div id="cite2" class="bib-item">
<span>[2]</span>
<span>Z.-H. Fu, K.-B. Qiu, and H. Zha, “Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, pp. 7474–7482, May 2021, doi: 10.1609/aaai.v35i8.16916.<a href="#ref-cite2-1">⤶</a></span>
</div></div></description></item><item><title>我折腾 NAS 的历程(四):一些升级</title><link>https://blog.furffisite.link/p/nas-4/</link><pubDate>Sat, 09 Mar 2024 23:53:36 +0800</pubDate><guid>https://blog.furffisite.link/p/nas-4/</guid><description><img src="https://files.furffisite.link/blogimg/20240310022612-8671164f59f21e5d03de82e41e7073d5-7f2c7.jpg" alt="Featured image of post 我折腾 NAS 的历程(四):一些升级" /><p>之前的事情:</p>
<ul>
<li><a class="link" href="https://blog.furffisite.link/p/nas-1" ><em>我折腾NAS的历程(一)</em></a></li>
<li><a class="link" href="https://blog.furffisite.link/p/nas-2" ><em>我折腾NAS的历程(二)</em></a></li>
<li><a class="link" href="https://blog.furffisite.link/p/nas-3" ><em>我折腾NAS的历程(三)</em></a></li>
</ul>
<p>本文以<a class="link" href="https://blog.furffisite.link/p/nas-3" >上一篇的自组 NAS</a> 为基础,升级了一些配置,就性价比而言感觉这钱花得不算太值。</p>
<h2 id="硬件部分">硬件部分
</h2><table>
<thead>
<tr>