[py] Finalize 0.4.0

ambuda-org · Jan 22, 2025 · f3ba416 · f3ba416
1 parent 4050c69
commit f3ba416
Show file tree

Hide file tree

Showing 28 changed files with 179 additions and 73 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/README.md b/README.md
@@ -81,7 +81,7 @@ cargo add vidyut-lipi --git https://github.com/ambuda-org/vidyut.git
 We recommend using our pre-built linguistic data, which is available as a ZIP file
 [here][zip].
 
-[zip]: https://github.com/ambuda-org/vidyut-py/releases/download/0.3.0/data-0.3.0.zip
+[zip]: https://github.com/ambuda-org/vidyut-py/releases/download/0.4.0/data-0.4.0.zip
 
 For more information, see our [Rust documentation][docs-rs].
 
@@ -124,7 +124,7 @@ We recommend using our pre-built linguistic data, which is available as a ZIP fi
 [here][zip].
 
 [pypi]: https://pypi.org/project/vidyut/
-[zip]: https://github.com/ambuda-org/vidyut-py/releases/download/0.3.0/data-0.3.0.zip
+[zip]: https://github.com/ambuda-org/vidyut-py/releases/download/0.4.0/data-0.4.0.zip
 
 For more information, see our [Python documentation][rtd].
 
@@ -160,7 +160,7 @@ We recommend using our pre-built linguistic data, which is available as a ZIP fi
 [here][zip]. Or if you prefer, you can build this data for yourself:
 
 [nextest]: https://nexte.st/
-[zip]: https://github.com/ambuda-org/vidyut-py/releases/download/0.3.0/data-0.3.0.zip
+[zip]: https://github.com/ambuda-org/vidyut-py/releases/download/0.4.0/data-0.4.0.zip
 
 ```shell
 $ cd vidyut-data

diff --git a/bindings-python/CHANGES.rst b/bindings-python/CHANGES.rst
@@ -4,6 +4,26 @@ releases. That is, versions 0.x.a and 0.x.b will be able to use the same data.
 .. _`Semantic Versioning`: https://semver.org/
 
 
+0.4.0
+-----
+
+Released 2025-01-21.
+
+vidyut.kosha:
+- Include dhatus that never use upasargas. These were previously omitted due
+  to a bug.
+- Add more metadata, including dhatu meanings in Sanskrit, Hindi, and English.
+- Use a more space-efficient storage approach for tinantas.
+
+vidyut.lipi:
+- Add basic support for Grantha pluta.
+
+vidyut.prakriya:
+- Fix some buggy behavior for nāmadhātus.
+- Add `drshya` and `anubandhas` methods to most types.
+- Add `nyap` constructor and bindings.
+
+
 0.3.1
 -----
 

diff --git a/bindings-python/CHECKLIST.md b/bindings-python/CHECKLIST.md
@@ -1,27 +1,35 @@
 Deployment checklist
 ====================
 
-Data (if a major release):
+
+Step 1: Prepare data (if major release)
+---------------------------------------
+
+Check:
 
 - `make create_all_data` passes.
-- Data directory exists with release version.
-  (`zip -r data-VERSION.zip data-VERSION/`)
+- Data directory exists with release version
+  (`cd .../vidyut-latest && zip -r data-VERSION.zip data-VERSION/`)
+
+NOTE: important to cd *into* vidyut-latest so that the zip files aren't nested.
 
-Version number:
+
+Step 2: Update version number
+-----------------------------
 
 - Increase version number in various files:
     - `pyproject.toml`.
     - `vidyut/docs/source/conf.py`
     - If updating data:
-        - `Makefile`
-        - `bindings-python/README.md`
         - `introduction.rst`
 - Create changelog entry in CHANGES.rst. Use a temporary release date if
   necessary.
 - Grep for previous version and confirm it exists only in comments and
   CHANGES.rst.
 
-Quality:
+
+Step 3: Quality checks
+----------------------
 
 - `make test` passes
 - `make integration_tests` passes

diff --git a/bindings-python/Cargo.toml b/bindings-python/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "vidyut-python"
-version = "0.3.0"
+version = "0.4.0"
 edition = "2021"
 
 [lib]

diff --git a/bindings-python/pyproject.toml b/bindings-python/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "vidyut"
-version = "0.3.1"
+version = "0.4.0"
 description = "A high-performance Sanskrit toolkit"
 requires-python = ">=3.7"
 classifiers = [

diff --git a/bindings-python/src/kosha/entries.rs b/bindings-python/src/kosha/entries.rs
@@ -256,8 +256,9 @@ impl<'a> From<&PratipadikaEntry<'a>> for PyPratipadikaEntry {
 
 /// An entry in the kosha.
 ///
-/// A `PadaEntry` is a simple dataclass that has one of four types. These types are
-/// constructed by `Kosha` directly, but you can create them yourself if you so choose.
+/// A `PadaEntry` is a simple dataclass that models either a Subanta or a Tinanta. These types
+/// are constructed by `Kosha` directly, and we strongly encourage you to avoid creating these
+/// types for yourself unless you are creating a Kosha for yourself.
 ///
 /// The `PadaEntry.Subanta` constructor creates a *subanta*:
 ///
@@ -295,7 +296,7 @@ impl<'a> From<&PratipadikaEntry<'a>> for PyPratipadikaEntry {
 /// .. testcode::
 ///
 ///     from vidyut.kosha import DhatuEntry, PadaEntry
-///     from vidyut.prakriya import Dhatu, Prayoga, Lakara, Purusha, Vacana
+///     from vidyut.prakriya import Dhatu, Prayoga, Lakara, Purusha, Vacana, Gana
 ///
 ///     gam = Dhatu.mula("ga\\mx~", Gana.Bhvadi)
 ///     gam_entry = DhatuEntry(dhatu=gam, clean_text="gam")
@@ -307,7 +308,6 @@ impl<'a> From<&PratipadikaEntry<'a>> for PyPratipadikaEntry {
 ///         vacana=Vacana.Eka)
 ///
 ///     assert pada.lemma == "gam"
-///
 #[pyclass(name = "PadaEntry", get_all, eq, ord)]
 #[derive(Clone, Debug, Eq, Ord, PartialEq, PartialOrd)]
 pub enum PyPadaEntry {

diff --git a/bindings-python/src/prakriya/args.rs b/bindings-python/src/prakriya/args.rs
@@ -123,6 +123,7 @@ py_enum!(
     ]
 );
 
+/// A dhatu's *gaṇa* or major category.
 #[pyclass(name = "Gana", module = "prakriya", eq, eq_int, ord)]
 #[derive(Copy, Clone, Eq, Hash, PartialEq, PartialOrd)]
 pub enum PyGana {
@@ -156,6 +157,7 @@ py_enum!(
     [Bhvadi, Adadi, Juhotyadi, Divadi, Svadi, Tudadi, Rudhadi, Tanadi, Kryadi, Curadi, Kandvadi]
 );
 
+/// A dhatu's *gaṇa* or minor category.
 #[pyclass(name = "Antargana", module = "prakriya", eq, eq_int, ord)]
 #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
 pub enum PyAntargana {
@@ -184,6 +186,8 @@ py_enum!(
 );
 
 /// The complete list of ordinary *kṛt* pratyayas.
+///
+/// Each pratyaya name is written in the SLP1 encoding scheme.
 #[pyclass(name = "Krt", module = "prakriya", eq, eq_int, ord)]
 #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
 #[allow(non_camel_case_types)]
@@ -876,7 +880,7 @@ pub enum PyLinga {
 
 py_enum!(PyLinga, Linga, [Pum, Stri, Napumsaka]);
 
-/// The prayoga of some tinanta.
+/// The *prayoga* of some tinanta.
 #[pyclass(name = "Prayoga", module = "prakriya", eq, eq_int, ord)]
 #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
 pub enum PyPrayoga {
@@ -909,14 +913,15 @@ py_enum!(PyPurusha, Purusha, [Prathama, Madhyama, Uttama]);
 #[pyclass(name = "DhatuPada", module = "prakriya", eq, eq_int, ord)]
 #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
 pub enum PyDhatuPada {
-    /// *Parasmaipada*.
+    /// *Parasmaipada*, sometimes imprecisely called the "active voice."
     Parasmaipada,
-    /// *Ātmanepada*.
+    /// *Ātmanepada*, sometimes imprecisely called the "middle voice."
     Atmanepada,
 }
 
 py_enum!(PyDhatuPada, DhatuPada, [Parasmaipada, Atmanepada]);
 
+/// A *pratyaya* that creates a new dhatu.
 #[allow(non_camel_case_types)]
 #[pyclass(name = "Sanadi", module = "prakriya", eq, eq_int, ord)]
 #[derive(Copy, Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]

diff --git a/bindings-python/test/integration/test_cheda.py b/bindings-python/test/integration/test_cheda.py
@@ -25,7 +25,7 @@ def test_run_for_word(chedaka, word):
     assert len(entries) == 1
 
     token = entries[0]
-    assert not token.data is not None
+    assert token.data is not None
 
 
 @pytest.mark.parametrize(

diff --git a/bindings-python/test/integration/test_kosha.py b/bindings-python/test/integration/test_kosha.py
@@ -25,17 +25,15 @@ def test_basic_tinanta(kosha):
     assert bhavati.vacana == Vacana.Eka
     assert bhavati.lakara == Lakara.Lat
 
-    assert repr(bhavati) == (
-        "PadaEntry.Tinanta(dhatu_entry=DhatuEntry("
-        "dhatu=Dhatu(aupadeshika='BU', gana=Gana.Bhvadi), clean_text='BU'), "
-        "prayoga=Prayoga.Kartari, lakara=Lakara.Lat, purusha=Purusha.Prathama, vacana=Vacana.Eka)"
-    )
-
 
 def test_basic_subanta(kosha):
     entries = kosha.get("devasya")
     entries = [
-        e for e in entries if isinstance(e, PadaEntry.Subanta) and e.linga == Linga.Pum
+        e
+        for e in entries
+        if isinstance(e, PadaEntry.Subanta)
+        and e.linga == Linga.Pum
+        and e.lemma == "deva"
     ]
 
     devasya = entries[0]
@@ -44,25 +42,14 @@ def test_basic_subanta(kosha):
     assert devasya.vibhakti == Vibhakti.Sasthi
     assert devasya.vacana == Vacana.Eka
 
-    assert repr(devasya) == (
-        "PadaEntry.Subanta(pratipadika_entry="
-        "PratipadikaEntry.Basic(pratipadika=Pratipadika(text='deva'), lingas=[Linga.Pum]), "
-        "linga=Linga.Pum, vibhakti=Vibhakti.Sasthi, vacana=Vacana.Eka)"
-    )
-
 
 def test_basic_avyaya(kosha):
     entries = kosha.get("ca")
-    entries = [e for e in entries if isinstance(e, PadaEntry.Subanta)]
+    entries = [e for e in entries if isinstance(e, PadaEntry.Subanta) and e.is_avyaya]
 
     ca = entries[0]
     assert ca.lemma == "ca"
 
-    assert repr(ca) == (
-        "PadaEntry.Subanta(pratipadika_entry="
-        "PratipadikaEntry.Basic(pratipadika=Pratipadika(text='ca', is_avyaya=True), lingas=[Linga.Pum]))"
-    )
-
 
 @pytest.mark.parametrize(
     "word",

diff --git a/bindings-python/test/integration/test_prakriya.py b/bindings-python/test/integration/test_prakriya.py
@@ -62,7 +62,7 @@ def test_unadipatha(all_sutras):
 
 def test_varttikas(all_sutras):
     sutras = [s for s in all_sutras if s.source == Source.Varttika]
-    assert len(sutras) == 101
+    assert len(sutras) == 102
     assert sutras[0] == Sutra(
         source=Source.Varttika,
         code="1.1.33.1",

diff --git a/bindings-python/vidyut/__init__.py b/bindings-python/vidyut/__init__.py
@@ -22,3 +22,19 @@
 from vidyut import vidyut as __mod
 
 __version__ = __mod.__version__
+
+
+def download_data(path):
+    """Downloads Vidyut's linguistic data and saves it to `path`."""
+    from io import BytesIO
+    import urllib.request
+    from zipfile import ZipFile
+    url = (
+        "https://github.com/ambuda-org/vidyut/releases/download/py-0.4.0/data-0.4.0.zip"
+    )
+    print(f"Downloading {url} ...")
+
+    resp = urllib.request.urlopen(url)
+    archive = ZipFile(BytesIO(resp.read()))
+    archive.extractall(path=path)
+    print(f"Complete. (Wrote data to `{path}`)")
diff --git a/bindings-python/vidyut/docs/source/conf.py b/bindings-python/vidyut/docs/source/conf.py
@@ -18,7 +18,7 @@
 author = "Arun Prasad"
 
 # The full version, including alpha/beta/rc tags
-release = "0.3.1"
+release = "0.4.0"
 
 
 # -- General configuration ---------------------------------------------------

diff --git a/bindings-python/vidyut/docs/source/introduction.rst b/bindings-python/vidyut/docs/source/introduction.rst
@@ -123,18 +123,21 @@ Linguistic data
 ---------------
 
 Vidyut is more interesting when used with our rich linguistic data, which you
-can download here:
+can download like so:
 
-.. code-block:: text
+.. code-block::
+
+    import vidyut
 
-    $ wget https://github.com/ambuda-org/vidyut-py/releases/download/0.3.0/data-0.3.0.zip
-    $ unzip data-0.3.0.zip
+    # `path` is wherever you want to store your data.
+    path = "vidyut-0.4.0"
+    vidyut.download_data(path)
 
 You can use this data like so::
 
     from vidyut.kosha import Kosha
 
-    kosha = kosha("data-0.3.0/kosha")
+    kosha = kosha("vidyut-0.4.0/kosha")
     for entry in kosha.get("gacCati"):
         print(entry)
 

diff --git a/bindings-python/vidyut/docs/source/tutorial.rst b/bindings-python/vidyut/docs/source/tutorial.rst
@@ -8,13 +8,19 @@ so by showing you how to use Vidyut to build a simple Sanskrit dictionary.
 Setup
 -----
 
-First, install Vidyut and its side data:
+First, install Vidyut:
 
 .. code-block:: text
 
     $ pip install vidyut
-    $ curl -LO https://github.com/ambuda-org/vidyut-py/releases/download/0.3.0/data-0.3.0.zip
-    $ unzip data-0.3.0.zip
+
+Then, install its side data::
+
+    import vidyut
+
+    # `path` is wherever you want to store your data.
+    path = "vidyut-0.4.0"
+    vidyut.download_data(path)
 
 You can confirm that your setup works by trying to load :class:`~vidyut.kosha.Kosha`:
 
@@ -24,7 +30,7 @@ You can confirm that your setup works by trying to load :class:`~vidyut.kosha.Ko
     import sys
     from vidyut.kosha import Kosha
 
-    kosha = Kosha("vidyut-latest/kosha")
+    kosha = Kosha("vidyut-0.4.0/kosha")
 
     query = sys.argv[1]
     for entry in kosha.get(query):
@@ -70,7 +76,7 @@ So, let's fix both of these problems and make our interface a little nicer::
 
 
     def run(query: str):
-        kosha = Kosha("vidyut-latest/kosha")
+        kosha = Kosha("vidyut-0.4.0/kosha")
         entries = get_all(kosha, query)
         for entry in entries:
             display_entry(entry)
@@ -126,7 +132,7 @@ as a first step, let's make this program more human-friendly::
         encoding = detect(query) or Scheme.HarvardKyoto
         slp_query = transliterate(query, encoding, Scheme.Slp1)
 
-        kosha = Kosha("vidyut-latest/kosha")
+        kosha = Kosha("vidyut-0.4.0/kosha")
         entries = get_all(kosha, slp_query)
         for entry in entries:
             display_entry(entry, output_scheme)