Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small corrections #52

Merged
merged 7 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 30 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,15 @@

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7734632.svg)](https://doi.org/10.5281/zenodo.7734632) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC_BY--NC_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)

In this repo you find our Text-Fabric dataset of the Samaritan Pentateuch.
This is the [Text-Fabric](https://github.com/annotation/text-fabric) representation of the Samaritan Pentateuch.
The dataset is work in progress, and so far, we have added a number of word features, which you find in the tf folder. The features are similar to those of the Biblia Hebraica Stuttgartensia Amstelodamensis (BHSA), so we refer to the [BHSA feature documentation](https://etcbc.github.io/bhsa/) for more explanation of the features.

For an introduction to the dataset and its features, see the publication
Martijn Naaijer, Christian Canu Højgaard, Stefan Schorch, and Martin Ehrensvärd (2024)
Text-Fabric Dataset of the Samaritan Pentateuch
Research Data Journal for the Humanities and Social Sciences
https://doi.org/10.1163/24523666-bja10051

### The CACCHT project: Creating Annotated Corpora of Classical Hebrew Text

This dataset is developed as part of the CACCHT project, which is a collaboration of Christian Canu Højgaard, Martijn Naaijer, Martin Ehrensvärd, Robert Rezetko, Oliver Glanz, and Willem van Peursen. The goal of CACCHT is to prepare and publish ancient Semitic texts digitally that can be used for research.
Expand All @@ -25,7 +31,29 @@ https://doi.org/10.5281/zenodo.7734632

You can also refer to specific versions of the dataset.

### Versions
## Get started

This data can be processed by Text-Fabric.

Text-Fabric will automatically download the SP data.

After installing Text-Fabric, you can start the Text-Fabric browser by this command

´´´text-fabric dt-ucph/sp´´´

Alternatively, you can work in a Jupyter notebook and say

´´´from tf.app import use
A = use('dt-ucph/sp')
´´´

In both cases the data is downloaded and ends up in your home directory, under text-fabric-data.

For a general tutorial to working with Text-Fabric in a Jupyter notebook, we recommend [start](https://nbviewer.jupyter.org/github/etcbc/bhsa/blob/master/tutorial/start.ipynb)
and
[search](https://nbviewer.jupyter.org/github/etcbc/bhsa/blob/master/tutorial/search.ipynb), both of which use the BHSA database of the Hebrew Bible.

## Versions

This repo is work in progress. Before version 2.0, the dataset consisted of the text of Genesis. In 3.0 all morphemes have been added for the entire Samaritan Pentateuch. Parsing of the morphemes (verbal tense, gender etc.) is completed for Genesis only. Morphology will be implemented gradually for Exodus-Deuteronomy. If a feature has not been implemented yet for those books, the values are '?'.

Expand Down
4 changes: 2 additions & 2 deletions app/__checkout__.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
v3.3.1
gd34b301c5874a64cf0e65fae72269152c0297ec5
v3.4
g0c9b2fff6448228af93ed6c466ba95e6c0bb3547
2 changes: 1 addition & 1 deletion app/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ docs:
interfaceDefaults: {}
provenanceSpec:
corpus: The Samaritan Pentateuch
version: '3.4.1'
version: '3.4.2'
writing: hbo
typeDisplay:
verse:
Expand Down
2 changes: 1 addition & 1 deletion tests/test_general.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ def test_second_person():
'[TM','[TN','[W','[WN'} for w in F.sp.s('verb') if F.ps.v(w) == 'p2'})

def test_third_person():
assert all({F.g_pfm.v(w) in {'','!J!','!T!'} and F.g_vbe.v(w) in {'[','[H','[HN','[NH','[T','[TH','[W','[WN'}
assert all({F.g_pfm.v(w) in {'','!J!','!T!'} and F.g_vbe.v(w) in {'[','[H','[HN','[N','[NH','[T','[TH','[W','[WN'}
for w in F.sp.s('verb') if F.ps.v(w) == 'p3'})

def test_unknown_person():
Expand Down
2 changes: 2 additions & 0 deletions tf/3.4.1/__checkout__.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
v3.4
g0c9b2fff6448228af93ed6c466ba95e6c0bb3547
20 changes: 20 additions & 0 deletions tf/3.4.2/book.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
@node
@convertedToTextFabricBy=Martijn Naaijer and Christian Canu Højgaard
@dataset=sp
@datasetName=The Samaritan Pentateuch
@description=book title
@encodedBy=Christian Canu Højgaard and Martijn Naaijer
@licence=Creative Commons Attribution-NonCommercial 4.0 International License
@licenceUrl=http://creativecommons.org/licenses/by-nc/4.0/
@manuscripts=MS Dublin Chester Beatty Library 751 (Gen 1-Deut 32:36) + MS Garizim 1 (Deut 32:36b-34)
@source=Stefan Schorch in colloboration with Evelyn Burkhardt, Ulrike Hirschfelder, Irina Wandrey and József Zsengellér
@valueType=str
@version=3.4.2
@writtenBy=Text-Fabric
@dateWritten=2024-12-13T11:09:07Z

399393 Genesis
Exodus
Leviticus
Numbers
Deuteronomy
202 changes: 202 additions & 0 deletions tf/3.4.2/chapter.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
@node
@convertedToTextFabricBy=Martijn Naaijer and Christian Canu Højgaard
@dataset=sp
@datasetName=The Samaritan Pentateuch
@description=chapter number
@encodedBy=Christian Canu Højgaard and Martijn Naaijer
@licence=Creative Commons Attribution-NonCommercial 4.0 International License
@licenceUrl=http://creativecommons.org/licenses/by-nc/4.0/
@manuscripts=MS Dublin Chester Beatty Library 751 (Gen 1-Deut 32:36) + MS Garizim 1 (Deut 32:36b-34)
@source=Stefan Schorch in colloboration with Evelyn Burkhardt, Ulrike Hirschfelder, Irina Wandrey and József Zsengellér
@valueType=int
@version=3.4.2
@writtenBy=Text-Fabric
@dateWritten=2024-12-13T11:09:07Z

399398 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Loading
Loading