-
Notifications
You must be signed in to change notification settings - Fork 0
/
documentation.html
978 lines (973 loc) · 54.1 KB
/
documentation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>CorpuScript: User Manual and Instructions</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
/* Base Styles */
body {
font-family: 'Roboto', sans-serif;
margin: 0;
padding: 20px;
transition: background-color 0.3s, color 0.3s;
}
h1, h2, h3, h4, h5, h6 {
color: var(--primary-color);
}
h1 {
font-size: 2em;
margin-bottom: 0.5em;
border-bottom: 2px solid var(--secondary-color);
padding-bottom: 0.3em;
}
h2 {
font-size: 1.75em;
margin-top: 1.5em;
margin-bottom: 0.5em;
border-bottom: 1px solid var(--secondary-color);
padding-bottom: 0.3em;
}
h3 {
font-size: 1.5em;
margin-top: 1.2em;
margin-bottom: 0.5em;
}
h4 {
font-size: 1.2em;
margin-top: 1em;
margin-bottom: 0.3em;
}
p {
line-height: 1.6;
margin-bottom: 1em;
}
ul, ol {
margin-left: 20px;
margin-bottom: 1em;
}
code, pre {
background-color: var(--code-bg-color);
color: var(--code-text-color);
padding: 2px 4px;
border-radius: 4px;
font-family: 'Courier New', monospace;
}
pre {
padding: 10px;
overflow-x: auto;
}
a {
color: var(--link-color);
text-decoration: none;
}
a:hover {
text-decoration: underline;
}
hr {
border: none;
border-top: 1px solid var(--secondary-color);
margin: 2em 0;
}
/* Light Theme */
body.light {
--background-color: #F0F0F0;
--text-color: #000000;
--primary-color: #518FBC;
--secondary-color: #325F84;
--code-bg-color: #E1E1E1;
--code-text-color: #000000;
--link-color: #1A73E8;
}
/* Dark Theme */
body.dark {
--background-color: #1E1E1E;
--text-color: #FFFFFF;
--primary-color: #518FBC;
--secondary-color: #3F3F3F;
--code-bg-color: #2D2D2D;
--code-text-color: #FFFFFF;
--link-color: #8AB4F8;
}
/* Apply Theme Variables */
body {
background-color: var(--background-color);
color: var(--text-color);
}
/* Responsive Table of Contents */
.toc {
background-color: var(--code-bg-color);
padding: 15px;
border-radius: 8px;
margin-bottom: 2em;
}
.toc h2 {
margin-top: 0;
}
.toc ul {
list-style: none;
padding-left: 0;
}
.toc li {
margin-bottom: 0.5em;
}
.toc a {
color: var(--link-color);
}
/* Buttons */
.button {
display: inline-block;
padding: 8px 16px;
background-color: var(--primary-color);
color: var(--text-color);
border: none;
border-radius: 4px;
text-decoration: none;
margin-top: 10px;
}
.button:hover {
background-color: var(--secondary-color);
}
/* Code Blocks */
pre {
background-color: var(--code-bg-color);
color: var(--code-text-color);
padding: 15px;
border-radius: 5px;
overflow-x: auto;
}
/* Mobile Responsiveness */
@media (max-width: 600px) {
body {
padding: 10px;
}
h1 {
font-size: 1.5em;
}
h2 {
font-size: 1.3em;
}
h3 {
font-size: 1.1em;
}
h4 {
font-size: 1em;
}
}
</style>
</head>
<body class="light">
<h1>CorpuScript: User Manual and Instructions</h1>
<h2>Introduction</h2>
<p>Welcome to <strong>CorpuScript</strong>, an advanced and user-friendly tool designed to streamline the preprocessing of text files for corpus compilation. Whether you're a student, researcher, or language professional, CorpuScript empowers you to efficiently clean and prepare your textual data, ensuring consistency and accuracy across your entire corpus. This guide will walk you through the key functionalities of CorpuScript, providing step-by-step instructions to help you make the most of its powerful features.</p>
<hr>
<div class="toc">
<h2>Table of Contents</h2>
<ol>
<li><a href="#1-loading-files">Loading Files</a></li>
<li><a href="#2-configuring-preprocessing-options">Configuring Preprocessing Options</a>
<ul>
<li><a href="#21-accessing-the-processing-parameters-dialog">Accessing the Processing Parameters Dialog</a></li>
<li><a href="#22-general-tab">General Tab</a></li>
<li><a href="#23-advanced-tab">Advanced Tab</a></li>
<li><a href="#24-applying-the-preprocessing-parameters">Applying the Preprocessing Parameters</a></li>
<li><a href="#25-example-configuring-preprocessing-for-a-specific-task">Example: Configuring Preprocessing for a Specific Task</a></li>
<li><a href="#26-best-practices-for-configuring-preprocessing-parameters">Best Practices for Configuring Preprocessing Parameters</a></li>
</ul>
</li>
<li><a href="#3-processing-files">Processing Files</a></li>
<li><a href="#4-viewing-and-saving-results">Viewing and Saving Results</a></li>
<li><a href="#5-troubleshooting">Troubleshooting</a></li>
<li><a href="#6-additional-features">Additional Features</a></li>
<li><a href="#7-conclusion">Conclusion</a></li>
</ol>
</div>
<hr>
<h2 id="1-loading-files">1. Loading Files</h2>
<p>CorpuScript offers flexible options for loading your text data, whether you’re working with individual files or entire directories. This section guides you through the process of importing your <code>.txt</code> files into the application.</p>
<h3 id="11-loading-individual-files">1.1. Loading Individual Files</h3>
<ol>
<li><strong>Access the Open Files Dialog:</strong>
<ul>
<li><strong>Via Menu:</strong> Click on <code>File > Open Files</code> in the menu bar.</li>
<li><strong>Via Toolbar:</strong> Click the <strong>"Open Files"</strong> icon in the toolbar.</li>
</ul>
</li>
<li><strong>Select Files:</strong>
<ul>
<li>In the dialog that appears, navigate to the location of your <code>.txt</code> files.</li>
<li>Select one or more <code>.txt</code> files by holding the <code>Ctrl</code> key (or <code>Cmd</code> on Mac) and clicking on each file.</li>
</ul>
</li>
<li><strong>Add to File List:</strong>
<ul>
<li>Click <strong>"Open"</strong>.</li>
<li>The selected files will now appear in the <strong>"Selected Files"</strong> list on the left panel of the main window.</li>
</ul>
</li>
</ol>
<h3 id="12-loading-an-entire-directory">1.2. Loading an Entire Directory</h3>
<ol>
<li><strong>Access the Open Directory Dialog:</strong>
<ul>
<li><strong>Via Menu:</strong> Click on <code>File > Open Directory</code>.</li>
<li><strong>Via Toolbar:</strong> Click the <strong>"Open Directory"</strong> icon in the toolbar.</li>
</ul>
</li>
<li><strong>Select Directory:</strong>
<ul>
<li>In the dialog that appears, navigate to the directory containing your <code>.txt</code> files.</li>
</ul>
</li>
<li><strong>Add to File List:</strong>
<ul>
<li>Click <strong>"Select Folder"</strong>.</li>
<li>All <code>.txt</code> files within the chosen directory and its subdirectories will be added to the <strong>"Selected Files"</strong> list.</li>
</ul>
</li>
</ol>
<p><strong>Note:</strong> CorpuScript supports batch processing, allowing you to handle large volumes of text files efficiently.</p>
<hr>
<h2 id="2-configuring-preprocessing-options">2. Configuring Preprocessing Options</h2>
<p>CorpuScript provides a robust set of customizable preprocessing options, allowing you to tailor the cleaning process to meet the specific requirements of your project. These options are accessible through the <strong>Processing Parameters</strong> dialog, which is divided into two main sections: <strong>General</strong> and <strong>Advanced</strong>.</p>
<h3 id="21-accessing-the-processing-parameters-dialog">2.1. Accessing the Processing Parameters Dialog</h3>
<p>You can access the <strong>Processing Parameters</strong> dialog through two convenient methods:</p>
<ol>
<li><strong>Via Menu:</strong>
<ul>
<li>Navigate to <code>Settings > Processing Parameters</code> in the menu bar.</li>
</ul>
</li>
<li><strong>Via Toolbar:</strong>
<ul>
<li>Click the <strong>gear icon</strong> located in the toolbar for quick access.</li>
</ul>
</li>
</ol>
<hr>
<h3 id="22-general-tab">2.2. General Tab</h3>
<p>The <strong>General</strong> tab contains a series of checkboxes that enable or disable specific preprocessing tasks. Each option serves a distinct purpose in cleaning and preparing your text data.</p>
<h4 id="221-remove-line-breaks">2.2.1. Remove Line Breaks</h4>
<ul>
<li><strong>Description:</strong> Eliminates all newline characters (<code>\n</code>) from the text, converting multi-line text into a single continuous line.</li>
<li><strong>Use Case:</strong> Ideal for preparing text data that should not contain any line breaks, such as when analyzing continuous narratives or preparing data for models that require uninterrupted text streams.</li>
<li><strong>How to Use:</strong> Simply check the box labeled <strong>"Remove Line Breaks"</strong>. The preprocessing pipeline will automatically remove all line breaks during processing.</li>
</ul>
<h4 id="222-lowercase-conversion">2.2.2. Lowercase Conversion</h4>
<ul>
<li><strong>Description:</strong> Transforms all characters in the text to lowercase, ensuring uniformity in case-sensitive analyses.</li>
<li><strong>Use Case:</strong> Essential for tasks like tokenization, frequency analysis, and other NLP processes where case distinctions are irrelevant or may introduce inconsistencies.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Lowercase Conversion"</strong>. All text will be converted to lowercase before further processing.</li>
</ul>
<h4 id="223-whitespace-normalization">2.2.3. Whitespace Normalization</h4>
<ul>
<li><strong>Description:</strong> Standardizes whitespace by removing redundant spaces, tabs, and other whitespace characters, ensuring consistent spacing throughout the text.</li>
<li><strong>Use Case:</strong> Prevents issues related to inconsistent spacing, which can affect the accuracy of text analysis and processing algorithms.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Whitespace Normalization"</strong> box. The pipeline will clean up excess whitespace automatically.</li>
</ul>
<h4 id="224-stopword-removal">2.2.4. Stopword Removal</h4>
<ul>
<li><strong>Description:</strong> Removes common, non-informative words (stopwords) such as "the", "and", "is", etc., from the text.</li>
<li><strong>Use Case:</strong> Reduces noise in text data, enhancing the performance of algorithms by focusing on meaningful words that contribute more significantly to the analysis.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Stopword Removal"</strong>. The preprocessing pipeline will automatically remove stopwords during processing.</li>
<li><strong>List of Stopwords Removed by spaCy:</strong>
<pre><code>
STOP_WORDS = {
"a", "about", "above", "after", "again", "against", "all", "am", "an",
"and", "any", "are", "aren't", "as", "at", "be", "because", "been",
"before", "being", "below", "between", "both", "but", "by", "can't",
"cannot", "could", "couldn't", "did", "didn't", "do", "does", "doesn't",
"doing", "don't", "down", "during", "each", "few", "for", "from",
"further", "had", "hadn't", "has", "hasn't", "have", "haven't", "having",
"he", "he'd", "he'll", "he's", "her", "here", "here's", "hers",
"herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll",
"i'm", "i've", "if", "in", "into", "is", "isn't", "it", "it's", "its",
"itself", "let's", "me", "more", "most", "mustn't", "my", "myself",
"no", "nor", "not", "of", "off", "on", "once", "only", "or", "other",
"ought", "our", "ours", "ourselves", "out", "over", "own", "same",
"shan't", "she", "she'd", "she'll", "she's", "should", "shouldn't",
"so", "some", "such", "than", "that", "that's", "the", "their", "theirs",
"them", "themselves", "then", "there", "there's", "these", "they",
"they'd", "they'll", "they're", "they've", "this", "those", "through",
"to", "too", "under", "until", "up", "very", "was", "wasn't", "we",
"we'd", "we'll", "we're", "we've", "were", "weren't", "what", "what's",
"when", "when's", "where", "where's", "which", "while", "who", "who's",
"whom", "why", "why's", "with", "won't", "would", "wouldn't", "you",
"you'd", "you'll", "you're", "you've", "your", "yours", "yourself",
"yourselves"
}
</code></pre>
</li>
</ul>
<h4 id="225-strip-html-tags">2.2.5. Strip HTML Tags</h4>
<ul>
<li><strong>Description:</strong> Removes all HTML tags from the text, extracting plain text content from web-scraped or marked-up documents.</li>
<li><strong>Use Case:</strong> Crucial when dealing with text data sourced from websites or any documents containing HTML markup, ensuring only the textual content is processed.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Strip HTML Tags"</strong> box. All HTML tags will be stripped during preprocessing.</li>
</ul>
<h4 id="226-remove-diacritics">2.2.6. Remove Diacritics</h4>
<ul>
<li><strong>Description:</strong> Strips away diacritical marks (accents) from characters, converting them to their base forms (e.g., "é" to "e").</li>
<li><strong>Use Case:</strong> Useful for simplifying text for analysis, especially in languages with diacritics, to ensure uniformity and prevent mismatches in text processing tasks.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Remove Diacritics"</strong>. Characters with diacritics will be converted to their non-diacritic forms during preprocessing.</li>
</ul>
<h4 id="227-remove-greek-letters">2.2.7. Remove Greek Letters</h4>
<ul>
<li><strong>Description:</strong> Filters out Greek script characters from the text.</li>
<li><strong>Use Case:</strong> Necessary when processing text data that should exclude Greek characters, possibly to focus on specific scripts or languages.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Remove Greek Letters"</strong> box. All Greek letters will be removed during preprocessing.</li>
</ul>
<h4 id="228-remove-cyrillic-script">2.2.8. Remove Cyrillic Script</h4>
<ul>
<li><strong>Description:</strong> Filters out Cyrillic script characters from the text.</li>
<li><strong>Use Case:</strong> Similar to removing Greek letters, this is useful when the corpus should exclude Cyrillic script, focusing on other scripts or languages.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Remove Cyrillic Script"</strong> to enable the removal of Cyrillic characters during preprocessing.</li>
</ul>
<h4 id="229-remove-superscript-and-subscript-characters">2.2.9. Remove Superscript and Subscript Characters</h4>
<ul>
<li><strong>Description:</strong> Removes superscript and subscript characters, which are typically used for annotations, mathematical expressions, or specialized formatting.</li>
<li><strong>Use Case:</strong> Cleans text data by removing characters that may not be relevant to linguistic analysis or could interfere with text processing algorithms.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Remove Superscript and Subscript Characters"</strong> box. These characters will be filtered out during preprocessing.</li>
</ul>
<h4 id="2210-normalize-unicode">2.2.10. Normalize Unicode</h4>
<ul>
<li><strong>Description:</strong> Normalizes the text to a standard Unicode form (specifically NFKC), ensuring consistency in character representation across the dataset. This process converts characters to their canonical forms, eliminating discrepancies caused by different Unicode representations.</li>
<li><strong>Use Case:</strong> Essential for maintaining data integrity, especially when dealing with text from multiple sources or languages. Normalization prevents issues such as duplicate representations of the same character, which can adversely affect text analysis and processing tasks.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Normalize Unicode"</strong>. The text will undergo Unicode normalization to NFKC form during preprocessing, standardizing all characters to their canonical representations.</li>
<li><strong>Technical Details:</strong>
<ul>
<li><strong>Normalization Form:</strong> NFKC (Normalization Form Compatibility Composition) is used, which not only composes characters but also replaces compatibility characters with their canonical equivalents.</li>
<li><strong>Example:</strong> The ligature "fi" will be converted to "fi", and full-width characters will be converted to their standard-width counterparts.</li>
</ul>
</li>
</ul>
<h4 id="2211-lemmatization">2.2.11. Lemmatization</h4>
<ul>
<li><strong>Description:</strong> Reduces words to their base or dictionary form (lemmas), aiding in linguistic analysis by grouping together different forms of a word.</li>
<li><strong>Use Case:</strong> Improves the accuracy of analyses such as frequency counts, sentiment analysis, and topic modeling by treating different grammatical forms of a word as a single entity.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Lemmatization"</strong>. The preprocessing pipeline will automatically lemmatize all words during processing.</li>
</ul>
<h4 id="2212-sentence-tokenization">2.2.12. Sentence Tokenization</h4>
<ul>
<li><strong>Description:</strong> Splits the text into individual sentences, facilitating sentence-level analyses and processing.</li>
<li><strong>Use Case:</strong> Essential for tasks such as sentiment analysis, syntactic parsing, and any application requiring sentence boundaries.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Sentence Tokenization"</strong> box. The text will be divided into sentences during preprocessing.</li>
</ul>
<h4 id="2213-word-tokenization">2.2.13. Word Tokenization</h4>
<ul>
<li><strong>Description:</strong> Divides the text into individual words or tokens, essential for word-level analysis and processing.</li>
<li><strong>Use Case:</strong> Fundamental for most Natural Language Processing (NLP) tasks, including frequency analysis, machine learning models, and more.</li>
<li><strong>How to Use:</strong> Check the box labeled <strong>"Word Tokenization"</strong>. The text will be split into words during preprocessing.</li>
</ul>
<h4 id="2214-remove-bibliographical-references">2.2.14. Remove Bibliographical References</h4>
<ul>
<li><strong>Description:</strong> Automatically identifies and removes in-text bibliographical references (e.g., citations like <code>(Smith, 2020)</code>), cleaning up the text for analysis.</li>
<li><strong>Use Case:</strong> Essential for academic texts and research papers where citations can interfere with textual analysis by introducing non-content elements.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Remove Bibliographical References"</strong> box. The preprocessing pipeline will remove all bibliographical references during processing.</li>
<li><strong>Patterns Matched:</strong>
<ul>
<li>Bibliographical references typically follow patterns like <code>(Author, Year)</code>, <code>[1]</code>, or other citation formats.</li>
<li>The module uses regular expressions to match and remove these patterns. For example:
<ul>
<li><code>\(\w+, \d{4}\)</code>: Matches citations like <code>(Smith, 2020)</code>.</li>
<li><code>\[\d+\]</code>: Matches numerical citations like <code>[1]</code>.</li>
<li>Additional patterns can be customized based on the citation style used in the text.</li>
</ul>
</li>
</ul>
</li>
</ul>
<h4 id="2215-remove-page-numbers">2.2.15. Remove Page Numbers</h4>
<ul>
<li><strong>Description:</strong> Detects and removes standalone page numbers that appear isolated on their own lines within the text.</li>
<li><strong>Use Case:</strong> Useful for cleaning up documents that include page numbers inserted manually or automatically, ensuring they do not interfere with text analysis.</li>
<li><strong>How to Use:</strong> Enable this option by checking the <strong>"Remove Page Numbers"</strong> box. The preprocessing pipeline will identify and remove page numbers during processing.</li>
<li><strong>Patterns Matched:</strong>
<ul>
<li>Page numbers typically consist of digits and may be located at the top or bottom of a page.</li>
<li>The module uses regular expressions to match these patterns, such as:
<ul>
<li><code>^\d+$</code>: Matches lines that contain only digits.</li>
<li><code>^\s*\d+\s*$</code>: Matches lines that contain digits possibly surrounded by whitespace.</li>
<li><code>Page\s*\d+</code>: Matches lines like "Page 1", "Page 2", etc.</li>
</ul>
</li>
<li>These patterns ensure that only page numbers are removed without affecting other numeric data within the text.</li>
</ul>
</li>
</ul>
<hr>
<h3 id="23-advanced-tab">2.3. Advanced Tab</h3>
<p>The <strong>Advanced</strong> tab offers more granular control over the preprocessing process, allowing users to define custom patterns and specify additional characters to remove. This is particularly useful for handling specialized text cleaning requirements that go beyond the general options provided.</p>
<h4 id="231-custom-regex-filtering">2.3.1. Custom Regex Filtering</h4>
<ul>
<li><strong>Description:</strong> Allows users to define custom regular expressions (regex) to perform advanced text filtering and extraction based on specific patterns.</li>
<li><strong>Use Case:</strong> Enables complex text manipulation tasks, such as extracting specific patterns, removing certain phrases, or any task that requires pattern-based processing not covered by the general options.</li>
<li><strong>How to Use:</strong>
<ol>
<li><strong>Open the Advanced Pattern Builder:</strong>
<ul>
<li>Click the <strong>"Set Pattern"</strong> button within the <strong>Advanced</strong> tab. This opens the <strong>Advanced Pattern Builder</strong> wizard.</li>
</ul>
</li>
<li><strong>Define Your Patterns:</strong>
<ul>
<li><strong>Add a New Pattern:</strong>
<ul>
<li>Click the <strong>"Add Pattern"</strong> button to create a new pattern entry.</li>
</ul>
</li>
<li><strong>Specify Conditions:</strong>
<ul>
<li><strong>Start Condition:</strong> Define the starting point of the pattern you want to match.</li>
<li><strong>End Condition Type:</strong> Choose how the pattern should end. Options include:
<ul>
<li><strong>Single Number:</strong> Ends after a single numeric digit.</li>
<li><strong>Multiple Numbers:</strong> Ends after a specified number of numeric digits.</li>
<li><strong>Specific Word:</strong> Ends when a particular word is encountered.</li>
</ul>
</li>
<li><strong>End Condition:</strong> Specify the exact end condition based on the selected type.</li>
<li><strong>Number Length:</strong> If using <strong>Multiple Numbers</strong>, define the exact number of digits.</li>
</ul>
</li>
<li><strong>Configure Additional Settings:</strong>
<ul>
<li><strong>Case Sensitivity:</strong> Choose whether the pattern matching should be case-sensitive.</li>
<li><strong>Whole Word Matching:</strong> Decide if the pattern should match whole words only.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Test Your Pattern:</strong>
<ul>
<li>Enter sample text in the <strong>Test Input</strong> section to see how your pattern matches and affects the text.</li>
<li>Adjust the pattern as necessary based on the test results.</li>
</ul>
</li>
<li><strong>Save the Pattern:</strong>
<ul>
<li>Once satisfied, save the pattern. It will be applied during preprocessing.</li>
</ul>
</li>
</ol>
</li>
</ul>
<h5 id="example-remove-all-urls">Example:</h5>
<ul>
<li><strong>Objective:</strong> Remove all URLs from the text.</li>
<li><strong>Pattern Definition:</strong>
<ul>
<li><strong>Start Condition:</strong> <code>http</code></li>
<li><strong>End Condition Type:</strong> <strong>Specific Word</strong></li>
<li><strong>End Condition:</strong> Space character (<code>\s</code>) or end of string</li>
</ul>
</li>
<li><strong>Outcome:</strong> This pattern will match and remove any URL starting with <code>http</code> and ending before a space or the end of the text.</li>
</ul>
<h4 id="232-select-characters-to-remove">2.3.2. Select Characters to Remove</h4>
<ul>
<li><strong>Description:</strong> Provides a dialog for selecting specific characters or sequences of characters to remove from the text, offering precise control over unwanted symbols or patterns.</li>
<li><strong>Use Case:</strong> Useful for eliminating specific symbols, emojis, or any other characters that are not handled by other preprocessing options, ensuring that only relevant text data remains.</li>
<li><strong>How to Use:</strong>
<ol>
<li><strong>Open the Character Selection Dialog:</strong>
<ul>
<li>Click the <strong>"Select Characters to Remove"</strong> button within the <strong>Advanced</strong> tab. This opens the <strong>Character Selection</strong> dialog.</li>
</ul>
</li>
<li><strong>Add Characters or Sequences:</strong>
<ul>
<li><strong>Enter Characters:</strong> Type the characters or sequences you wish to remove in the input field.</li>
<li><strong>Include Characters:</strong> Click the <strong>"Include"</strong> button to add them to the removal list.</li>
<li><strong>Example:</strong> To remove emojis, you might enter specific emoji characters or patterns.</li>
</ul>
</li>
<li><strong>Review Selected Items:</strong>
<ul>
<li>The selected characters or sequences will appear in the <strong>"Items to remove"</strong> list below.</li>
</ul>
</li>
<li><strong>Remove Unwanted Selections:</strong>
<ul>
<li>To delete any selected character or sequence from the removal list, select it in the list and click the <strong>"Delete Selected"</strong> button.</li>
</ul>
</li>
<li><strong>Finalize Selections:</strong>
<ul>
<li>Once all desired characters or sequences are listed, click <strong>"OK"</strong> to apply the changes. These characters will be removed during preprocessing.</li>
</ul>
</li>
</ol>
</li>
</ul>
<h5 id="example-remove-digits-and-symbols">Example:</h5>
<ul>
<li><strong>Objective:</strong> Remove all numerical digits and specific symbols like <code>#</code> and <code>$</code>.</li>
<li><strong>Steps:</strong>
<ol>
<li>Enter <code>0-9</code>, <code>#</code>, and <code>$</code> in the input field.</li>
<li>Click <strong>"Include"</strong> after each entry.</li>
<li>Verify that all selected items appear in the list.</li>
<li>Click <strong>"OK"</strong> to apply the removals.</li>
</ol>
</li>
</ul>
<hr>
<h3 id="24-applying-the-preprocessing-parameters">2.4. Applying the Preprocessing Parameters</h3>
<p>After configuring your desired preprocessing options in both the <strong>General</strong> and <strong>Advanced</strong> tabs:</p>
<ol>
<li><strong>Confirm Settings:</strong>
<ul>
<li>Review all enabled options to ensure they align with your preprocessing goals.</li>
</ul>
</li>
<li><strong>Apply Parameters:</strong>
<ul>
<li>Click the <strong>"OK"</strong> button at the bottom of the <strong>Processing Parameters</strong> dialog to save and apply your settings.</li>
</ul>
</li>
<li><strong>Start Processing:</strong>
<ul>
<li>With the parameters set, proceed to load your text files and initiate the preprocessing workflow by clicking the <strong>"Process Files"</strong> button in the toolbar.</li>
</ul>
</li>
</ol>
<p><strong>Note:</strong> It's recommended to experiment with different preprocessing configurations on a small subset of your data to observe their effects before processing the entire corpus. This approach helps in fine-tuning the settings for optimal results.</p>
<hr>
<h3 id="25-example-configuring-preprocessing-for-a-specific-task">2.5. Example: Configuring Preprocessing for a Specific Task</h3>
<p><strong>Scenario:</strong> Preparing a corpus for sentiment analysis by removing URLs, converting text to lowercase, stripping HTML tags, and removing stopwords.</p>
<ol>
<li><strong>Open Processing Parameters:</strong>
<ul>
<li>Click <code>Settings > Processing Parameters</code> or the gear icon in the toolbar.</li>
</ul>
</li>
<li><strong>General Tab Configurations:</strong>
<ul>
<li><strong>Enable Lowercase Conversion:</strong>
<ul>
<li>Check the <strong>"Lowercase Conversion"</strong> box to ensure all text is in lowercase.</li>
</ul>
</li>
<li><strong>Enable Remove Line Breaks:</strong>
<ul>
<li>Check the <strong>"Remove Line Breaks"</strong> box to merge multi-line text into a single line.</li>
</ul>
</li>
<li><strong>Enable Strip HTML Tags:</strong>
<ul>
<li>Check the <strong>"Strip HTML Tags"</strong> box to remove any HTML markup.</li>
</ul>
</li>
<li><strong>Enable Stopword Removal:</strong>
<ul>
<li>Check the <strong>"Stopword Removal"</strong> box to eliminate common, non-informative words.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Advanced Tab Configurations:</strong>
<ul>
<li><strong>Define Custom Regex Pattern to Remove URLs:</strong>
<ol>
<li>Click <strong>"Set Pattern"</strong> to open the <strong>Advanced Pattern Builder</strong>.</li>
<li><strong>Add Pattern:</strong>
<ul>
<li><strong>Start Condition:</strong> <code>http</code></li>
<li><strong>End Condition Type:</strong> <strong>Specific Word</strong></li>
<li><strong>End Condition:</strong> Space character (<code>\s</code>) or end of string.</li>
</ul>
</li>
<li><strong>Test Pattern:</strong>
<ul>
<li>Enter sample text containing URLs to verify the pattern correctly identifies and removes them.</li>
</ul>
</li>
<li><strong>Save Pattern:</strong>
<ul>
<li>Once satisfied, save the pattern to apply it during preprocessing.</li>
</ul>
</li>
</ol>
</li>
</ul>
</li>
<li><strong>Apply Parameters:</strong>
<ul>
<li>Click <strong>"OK"</strong> to save and apply all settings.</li>
</ul>
</li>
<li><strong>Process Files:</strong>
<ul>
<li>Load your text files using <code>File > Open Files</code> or <code>File > Open Directory</code>.</li>
<li>Click the <strong>"Process Files"</strong> button in the toolbar to start preprocessing.</li>
<li>Monitor the progress through the progress bar and status updates.</li>
</ul>
</li>
<li><strong>Review Results:</strong>
<ul>
<li>After processing, review the cleaned text in the <strong>"Processed Text"</strong> tab.</li>
<li>Ensure that URLs have been removed, text is in lowercase, HTML tags are stripped, and stopwords are eliminated.</li>
</ul>
</li>
</ol>
<hr>
<h3 id="26-best-practices-for-configuring-preprocessing-parameters">2.6. Best Practices for Configuring Preprocessing Parameters</h3>
<ol>
<li><strong>Understand Your Data:</strong>
<ul>
<li>Before configuring preprocessing options, analyze your text data to identify common patterns, unwanted characters, and specific cleaning requirements.</li>
</ul>
</li>
<li><strong>Start Simple:</strong>
<ul>
<li>Begin with basic preprocessing steps like lowercase conversion and whitespace normalization to establish a clean foundation.</li>
</ul>
</li>
<li><strong>Incrementally Add Advanced Options:</strong>
<ul>
<li>Gradually introduce advanced preprocessing options such as custom regex filtering and character removals as needed, ensuring each step positively impacts your data quality.</li>
</ul>
</li>
<li><strong>Test Configurations:</strong>
<ul>
<li>Apply different preprocessing configurations on a small subset of your data to observe their effects and adjust settings accordingly.</li>
</ul>
</li>
<li><strong>Document Your Settings:</strong>
<ul>
<li>Keep a record of the preprocessing parameters used for each project to ensure reproducibility and facilitate future adjustments.</li>
</ul>
</li>
<li><strong>Leverage Advanced Features:</strong>
<ul>
<li>Utilize the <strong>Advanced</strong> tab for specialized cleaning tasks that address unique aspects of your text data, enhancing the overall quality and relevance of your corpus.</li>
</ul>
</li>
</ol>
<hr>
<h2 id="3-processing-files">3. Processing Files</h2>
<p>Once you've loaded your text files and configured the preprocessing parameters, you're ready to process your corpus. This section outlines the steps to apply the selected preprocessing options to your files.</p>
<h3 id="31-initiating-the-processing-workflow">3.1. Initiating the Processing Workflow</h3>
<ol>
<li><strong>Start Processing:</strong>
<ul>
<li>Click the <strong>"Process Files"</strong> button located in the toolbar or select <code>File > Process Files</code> from the menu.</li>
</ul>
</li>
<li><strong>Monitor Progress:</strong>
<ul>
<li>A <strong>Processing Files</strong> dialog will appear, displaying a progress bar and status updates.</li>
<li>The progress bar indicates the completion percentage of the processing task.</li>
<li>The status label provides real-time information about the current file being processed and estimated time remaining.</li>
</ul>
</li>
<li><strong>Handling Errors:</strong>
<ul>
<li>If any errors occur during processing (e.g., file read/write issues), they will be displayed in the status bar and logged for your reference.</li>
<li>You can choose to cancel the processing at any time by clicking the <strong>"Cancel"</strong> button in the dialog.</li>
</ul>
</li>
</ol>
<h3 id="32-concurrent-processing">3.2. Concurrent Processing</h3>
<ul>
<li><strong>Multithreading Support:</strong>
<ul>
<li>CorpuScript utilizes multithreading to process multiple files simultaneously, leveraging your system's CPU cores for optimal performance.</li>
<li>This ensures efficient handling of large corpora, reducing the total processing time.</li>
</ul>
</li>
<li><strong>Resource Management:</strong>
<ul>
<li>The application automatically manages thread pools to prevent system overload, ensuring smooth operation even with extensive datasets.</li>
</ul>
</li>
</ul>
<hr>
<h2 id="4-viewing-and-saving-results">4. Viewing and Saving Results</h2>
<p>After processing your files, CorpuScript provides options to view the original and processed texts, as well as save the results for future use.</p>
<h3 id="41-viewing-results">4.1. Viewing Results</h3>
<ol>
<li><strong>Accessing the Text Tabs:</strong>
<ul>
<li>Navigate to the <strong>"Original Text"</strong> and <strong>"Processed Text"</strong> tabs located in the main window.</li>
<li><strong>Original Text Tab:</strong>
<ul>
<li>Displays the content of the original, unprocessed text files.</li>
</ul>
</li>
<li><strong>Processed Text Tab:</strong>
<ul>
<li>Shows the text after preprocessing has been applied.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Navigating Between Tabs:</strong>
<ul>
<li>Click on the respective tabs to switch between viewing the original and processed versions of your files.</li>
</ul>
</li>
<li><strong>Search Functionality:</strong>
<ul>
<li>Use the search bar below the text tabs to locate specific terms or phrases within the text.</li>
<li>Utilize the <strong>"Previous"</strong> and <strong>"Next"</strong> buttons to navigate through search results.</li>
</ul>
</li>
</ol>
<h3 id="42-saving-results">4.2. Saving Results</h3>
<ol>
<li><strong>Saving Processed Files:</strong>
<ul>
<li>Click on <code>File > Save Files</code> or use the <strong>"Save Files"</strong> button in the toolbar.</li>
</ul>
</li>
<li><strong>Choose Save Options:</strong>
<ul>
<li><strong>Overwrite Original Files:</strong>
<ul>
<li>Select this option to replace the original <code>.txt</code> files with their processed versions.</li>
<li><strong>Caution:</strong> This action is irreversible. Ensure you have backups if needed.</li>
</ul>
</li>
<li><strong>Save to a Different Directory:</strong>
<ul>
<li>Choose this option to save the processed files in a separate location, preserving the original files.</li>
</ul>
</li>
</ul>
</li>
<li><strong>Confirm Save Operation:</strong>
<ul>
<li>A confirmation dialog will appear, asking you to confirm your save preferences.</li>
<li>Review your selection and click <strong>"Yes"</strong> to proceed.</li>
</ul>
</li>
<li><strong>Handling Save Errors:</strong>
<ul>
<li>If any issues arise during the save process (e.g., insufficient permissions), CorpuScript will notify you via a warning message.</li>
<li>Review the error details, address the underlying issue, and attempt to save again if necessary.</li>
</ul>
</li>
</ol>
<hr>
<h2 id="5-troubleshooting">5. Troubleshooting</h2>
<p>Encountering issues while using CorpuScript? This section provides solutions to common problems and tips to ensure smooth operation.</p>
<h3 id="51-error-loading-files">5.1. Error Loading Files</h3>
<ul>
<li><strong>Symptom:</strong> Files fail to load or appear in the <strong>"Selected Files"</strong> list.</li>
<li><strong>Possible Causes:</strong>
<ul>
<li>Unsupported file format (only <code>.txt</code> files are supported).</li>
<li>Corrupted or unreadable files.</li>
</ul>
</li>
<li><strong>Solutions:</strong>
<ul>
<li>Ensure that only <code>.txt</code> files are being loaded.</li>
<li>Verify the integrity of the files by opening them in a text editor.</li>
<li>Re-download or recover corrupted files if necessary.</li>
</ul>
</li>
</ul>
<h3 id="52-incorrect-regex-patterns">5.2. Incorrect Regex Patterns</h3>
<ul>
<li><strong>Symptom:</strong> Preprocessing does not remove or alter text as expected when using custom regex patterns.</li>
<li><strong>Possible Causes:</strong>
<ul>
<li>Syntax errors in the regex pattern.</li>
<li>Misconfigured start and end conditions.</li>
</ul>
</li>
<li><strong>Solutions:</strong>
<ul>
<li>Double-check the regex syntax using online regex testers.</li>
<li>Ensure that start and end conditions are correctly defined in the <strong>Advanced Pattern Builder</strong>.</li>
<li>Refer to the <a href="https://www.regular-expressions.info/">Regex Documentation</a> for guidance.</li>
</ul>
</li>
</ul>
<h3 id="53-processing-interruptions">5.3. Processing Interruptions</h3>
<ul>
<li><strong>Symptom:</strong> Processing stops unexpectedly or is too slow.</li>
<li><strong>Possible Causes:</strong>
<ul>
<li>Insufficient system resources (CPU or memory constraints).</li>
<li>Extremely large files causing delays.</li>
</ul>
</li>
<li><strong>Solutions:</strong>
<ul>
<li>Close other applications to free up system resources.</li>
<li>Break down large corpora into smaller batches and process them sequentially.</li>
<li>Monitor system performance to identify bottlenecks.</li>
</ul>
</li>
</ul>
<h3 id="54-save-operation-failures">5.4. Save Operation Failures</h3>
<ul>
<li><strong>Symptom:</strong> Unable to save processed files or overwrite originals.</li>
<li><strong>Possible Causes:</strong>
<ul>
<li>Lack of write permissions in the target directory.</li>
<li>Files are open in another application, preventing overwriting.</li>
</ul>
</li>
<li><strong>Solutions:</strong>
<ul>
<li>Ensure you have the necessary permissions to write to the target directory.</li>
<li>Close any applications that might be accessing the files.</li>
<li>Choose an alternative directory to save the processed files.</li>
</ul>
</li>
</ul>
<h3 id="55-application-crashes-or-freezes">5.5. Application Crashes or Freezes</h3>
<ul>
<li><strong>Symptom:</strong> CorpuScript becomes unresponsive or crashes during operation.</li>
<li><strong>Possible Causes:</strong>
<ul>
<li>Software bugs or incompatibilities.</li>
<li>Corrupted installation files.</li>
</ul>
</li>
<li><strong>Solutions:</strong>
<ul>
<li>Restart CorpuScript and attempt the operation again.</li>
<li>Reinstall the application to ensure all files are intact.</li>
<li>Check for updates that may address known issues.</li>
<li>Contact <a href="mailto:[email protected]">Support</a> with detailed error logs for assistance.</li>
</ul>
</li>
</ul>
<hr>
<h2 id="6-additional-features">6. Additional Features</h2>
<p>CorpuScript is packed with additional features designed to enhance your preprocessing workflow and provide deeper insights into your corpus.</p>
<h3 id="61-detailed-summary-reporting">6.1. Detailed Summary Reporting</h3>
<p>After processing your files, CorpuScript generates a comprehensive summary report that provides invaluable insights into your corpus:</p>
<ul>
<li><strong>Word Frequency Distributions:</strong>
<ul>
<li>Lists the most common words and their occurrence counts.</li>
</ul>
</li>
<li><strong>Sentence and Token Counts:</strong>
<ul>
<li>Provides statistics on the number of sentences and tokens processed.</li>
</ul>
</li>
<li><strong>Type-Token Ratio Analysis:</strong>
<ul>
<li>Measures lexical diversity by comparing the number of unique words to the total number of words.</li>
</ul>
</li>
<li><strong>Corpus Size Statistics:</strong>
<ul>
<li>Shows the size of your corpus before and after preprocessing.</li>
</ul>
</li>
<li><strong>Applied Preprocessing Parameters Summary:</strong>
<ul>
<li>Lists all the preprocessing options and parameters that were applied.</li>
</ul>
</li>
<li><strong>Processing Time and Performance Metrics:</strong>
<ul>
<li>Details the total time taken to process the corpus and other performance-related information.</li>
</ul>
</li>
</ul>
<p><strong>Accessing the Summary Report:</strong></p>
<ol>
<li>Navigate to the <strong>"Summary Report"</strong> tab located alongside the text tabs.</li>
<li>Review the generated statistics and analyses to gain insights into your corpus.</li>
</ol>
<p><strong>Exporting the Report:</strong></p>
<ol>
<li>Click the <strong>"Export Report"</strong> button within the <strong>"Summary Report"</strong> tab.</li>
<li>Choose your preferred format (<code>.txt</code> or <code>.csv</code>) and select the destination folder.</li>
<li>Click <strong>"Save"</strong> to export the report for future reference or analysis.</li>
</ol>
<h3 id="62-customization-and-flexibility">6.2. Customization and Flexibility</h3>
<p>CorpuScript offers various customization options to adapt to your specific research needs:</p>
<ul>
<li><strong>Save and Load Preprocessing Profiles:</strong>
<ul>
<li>Save your current preprocessing configuration as a profile for future use.</li>
<li>Load existing profiles to quickly apply predefined settings to new projects.</li>
</ul>
</li>
<li><strong>Adjustable Parameters:</strong>
<ul>
<li>Fine-tune preprocessing parameters to suit different research methodologies and corpus types.</li>
</ul>
</li>
<li><strong>Support for Multiple File Formats:</strong>
<ul>
<li>While primarily supporting <code>.txt</code> files, CorpuScript can be extended to handle other formats like <code>.csv</code> and <code>.json</code> with custom configurations.</li>
</ul>
</li>
</ul>
<h3 id="63-data-integrity-and-security">6.3. Data Integrity and Security</h3>
<p>Ensuring the integrity and security of your data is paramount:</p>
<ul>
<li><strong>Non-Destructive Processing:</strong>
<ul>
<li>By default, CorpuScript preserves original files, allowing you to retain unaltered data.</li>
</ul>
</li>
<li><strong>Automatic Backups:</strong>
<ul>
<li>Optionally enable automatic backups before processing, safeguarding your data against accidental loss or corruption.</li>
</ul>
</li>
<li><strong>Detailed Logging:</strong>
<ul>
<li>CorpuScript maintains comprehensive logs of all operations, providing an audit trail for reproducibility and debugging purposes.</li>
</ul>
</li>
</ul>
<h3 id="64-multilingual-support">6.4. Multilingual Support</h3>
<p>CorpuScript is designed to handle text data in multiple languages:</p>
<ul>
<li><strong>Language-Specific Preprocessing:</strong>
<ul>
<li>Customize preprocessing rules to accommodate different languages, accounting for unique linguistic features.</li>
</ul>
</li>
<li><strong>Unicode Compatibility:</strong>
<ul>
<li>Robust Unicode normalization ensures consistent character representation across diverse languages and scripts.</li>
</ul>
</li>
</ul>
<h3 id="65-continuous-updates-and-community-support">6.5. Continuous Updates and Community Support</h3>
<p>Stay up-to-date with the latest advancements and receive support when needed:</p>
<ul>
<li><strong>Regular Updates:</strong>
<ul>
<li>CorpuScript is continuously updated to incorporate the latest developments in corpus linguistics and NLP.</li>
</ul>
</li>
<li><strong>Active User Community:</strong>
<ul>
<li>Join the CorpuScript community to share best practices, custom preprocessing recipes, and collaborate on improving the tool.</li>
</ul>
</li>
<li><strong>Support Channels:</strong>
<ul>
<li>Reach out via <a href="mailto:[email protected]">email</a> for personalized assistance and feedback.</li>
</ul>
</li>
</ul>
<hr>
<h2 id="7-conclusion">7. Conclusion</h2>
<p>CorpuScript is a versatile and powerful tool tailored to meet the diverse needs of corpus linguistics and text preprocessing. By following this guide, you can efficiently load, configure, process, and analyze your textual data, ensuring high-quality and consistent corpora for your research or professional projects. Leverage CorpuScript's comprehensive features and customization options to streamline your workflow and achieve precise results.</p>
<p>For further assistance or to provide feedback, please contact us at <a href="mailto:[email protected]">[email protected]</a>.</p>
<script>
function setTheme(theme) {
document.body.className = theme;
}
</script>
</body>
</html>