Skip to content

Sanskrit text preparation

sujato edited this page Jul 23, 2021 · 15 revisions

Here is an outline of the steps for preparing a Sanskrit text for translation on Bilara.

  1. Select a source text.
    • Let’s assume our text is the Candrasūtra.
  2. If the text is already on SC, identify it by its project and UID.
    • project = sf, UID = sf276
    • If it is not on SC, assign a project and UID.
  3. Add the folder named with the SC UID to the appropriate project in publication-sources.
    • bilara-data/.publication-sources/sf/sf276
  4. Copy the source file or files to the folder.
    • Keep the original file name: sa_candrasUtra.xml
  5. Make an HTML file from a local copy of the text.
  6. Delete all front and end matter, including metadata etc.
  7. Ensure the HTML file is well-structured with appropriate heading and <p> tags. Occasionally other semantic tags such as lists might be used. Ensure each text is wrapped in <article id='uid'>, and each <h1> is wrapped in <header>.
  8. Add paragraph numbers of the form <p id='sf276:1'>. Remember, headings take zeroth level.
    • Paragraph increments are usually added to <hX>, <p>, <ul>, <ol>, <dl>. However do not be rigid about this, especially to keep consistency with source text.
  9. If data other than regular text content is present, make sure it is wrapped as <span class='reference'>, <span class='comment'>, <span class='variant'>, etc.
  10. Check that any other HTML in the file is well-formed and consistent with SC standards and usages.
  11. Make sure all HTML uses 'single quotes'.
  12. Create segments.
    • Typically, use punctuation as the basis, then refine it by an initial reading of the text. It is much more efficient to get the segmenting right now than fix it later!
  13. Wrap segments in <span class='root'> (and <span class='translation'> if there is one).
  14. Run tidy –doctype html5 –output-html 1 –tidy-mark 0 –quiet 1 –output-encoding utf8 -w 0 –show-warnings 0 -m *.html
    • fix any errors.

This will produce an HTML file something like the following.

<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body>
<article id='sf276'>
<h1 id='sf276:0'><span class='root'>Candrasūtra</span></h1>
<p id='sf276:1'><span class='root'>evaṃ mayā śrutam</span> <span class='root'>ekasama<i>yaṃ bhagavāñ</i> śrāvastyāṃ viharati jet<i>a</i>v<i>a</i>n<i>a</i> anāthapiṇḍad<i>ā</i>r<i>ā</i>m<i>e /</i></span></p>
<p id='sf276:2'><span class='root'>tena khalu samayena rāhuṇā asurendreṇa sarvaṃ candramaṇḍalam āvṛtam*</span> <i>/</i></p>
<p id='sf276:3'><span class='root'><i>atha</i> yā devatā tasmiṃ<i>ś</i> candramaṇḍala adhyuṣitā sā bhītā trast<i>ā</i> saṃvignā āhṛṣṭaromakūpā yena bhagavāṃs teno<i>pajagāma /</i> upetya bha<i>ga</i>v<i>a</i>tpādau śirasā <i>vanditvaikāṃ</i>te ‘sthād ekāntasthitā sā devatā tasyāṃ velāyāṃ gāthā babhāṣe //</span></p>
<p id='sf276:4'><span class='root'>buddhavīra namas te ‘stu vipramuktāya sarvataḥ<span class='comment'>Ed. bhitā but MS reads bhītā</span></span> <span class='root'>saṃbādhapratipannāsmi tasya me śaraṇaṃ bhava :<span class='comment'>Ed. buddha vīra</span></span></p>
<blockquote class='gatha'>
<p id='sf276:5'><span class='verse-line'><span class='root'>arhantaṃ sugataṃ loke candramāḥ śaraṇaṃ gataḥ</span></span> <span class='verse-line'><span class='root'>rāhoś candramasaṃ muñca buddhā lokānukampakāḥ //</span></span></p>
</blockquote>
<p id='sf276:6'><span class='root'>bhagavān āha //</span></p>
<p><span class='root'>tamonudaṃ taṃ nabhasi prabhākaraṃ virocanaṃ śukla<i>v</i>iśuddhavarcasam*</span> <span class='root'>rāho ś<i>a</i>śāṅkaṃ grasa māntarīkṣe praj<i>ā</i>pr<i>a</i>dīpaṃ drutam utsṛjainam* //</span></p>
<p id='sf276:7'><span class='root'>atha rāhuṇā as<i>u</i>rendreṇa tvaritatvaritaṃ candramaṇḍalam utsṛṣṭam* ⟨/⟩</span> <span class='root'>tataḥ sa<i>ṃ</i>tvaramāṇo ‘sau rāhuś candram avāsṛ<i>jat*</i></span> <span class='root'><i>saṃsvinnagātro vya</i>thitaḥ saṃbhr<i>ānta āturo ya</i>thā //</span></p>
<p id='sf276:8'><span class='root'>adrākṣīd baḍir vairocano <i>rāhuṇā</i> asurendreṇa tvaritatvaritaṃ candr<i>a</i>maṇḍala<i>m utsṛṣṭam* / dṛṣṭvā ca baḍi</i>r gāthāṃ babhāṣe //</span></p>
<p id='sf276:9'><span class='root'>ki<i>ṃ</i> nu sa<i>ṃ</i>tv<i>aramāṇas</i> tv<i>aṃ</i> rāhuś candraṃ vimuñcasi ·</span> <span class='root'>saṃsvinnagātro vyathitaḥ saṃ<i>bhrānta āturo yathā</i> <i>//</i><span class='comment'>Cf. Pelliot Sanskrit bleu 449 Ac: /// ro yathā //</span></span></p>
<p id='sf276:10'><span class='root'><i>rāhur avocat* //</i></span></p>
<p><span class='root'><i>sa</i>ptadhā me sphalen mūrdhā <i>jīvan na sukha</i>m āp<i>nu</i>yāṃ</span> <span class='root'>ta<i>tra buddh</i>ābhigītena muñceyaṃ śaśinaṃ na cet*<span class='comment'>Cf. Pelliot Sanskrit bleu 449 Ac: rāhu prāha // saptadhā me sphal[e] mūrdhā</span></span></p>
<p id='sf276:11'><span class='root'><i>baḍir vairocano ‘vocat* /</i></span> <span class='root'>x x x x x - - - x x x x madarśi<i>nāṃ</i></span> <span class='root'><i>teṣāṃ gāthābhigītena rāhuś candraṃ vimuñcati //</i></span><span class='comment'>Cf. Pelliot Sanskrit bleu 449 Ad: + + + + + .. .. .. .. .. .. .. (bh)i(g)itena muñce</span></p>
<p id='sf276:12'><span class='root'><i>candrasūtraṃ samāptam* //</i></span></p>
</article>
</body>
</html>