Skip to content

01 add data

Gabi Keane edited this page May 31, 2023 · 9 revisions

00 review

In section 00 start here we cover initial app setup, installing the application to your local instance of eXist-db, and VSCode syncing.

Directory structures

Could you keep all of your data and code in the root directory of your application? Yes, you could, and your app would still work provided you wrote your file paths correctly. Technically, we could write the whole app in just a handful of files. But what happens when you forget a file name for something you need to edit? Or if you receive an error that doesn’t include a line number on a file that contains 25 different features? What if you make a change to one line of code, and it breaks something else you didn’t anticipate?

Even though directory structure isn’t strictly required for app functionality, organization is important for human readability and developer experience. This principle in software development is part of “separation of concerns”. We will talk in more detail about this concept in later stages, but for now we want to highlight that keeping our data (the XML files, in our case) separate from other pieces of the app lays some important informational groundwork for later.

If you set create_directory_structure to true in your properties before you ran the eXistentializer script, then you already have a data directory.

In VSCode …

Once you’re sure that your sync is set up correctly, you can open a zsh or bash shell directly in VSCode. You can also work directly in the terminal if that’s what you’re comfortable with, but check that your sync is established or expect to rebuild and reinstall your app frequently.

image

Adding data

If you’re ready to make an edition, you probably already have your data living somewhere else on your filesystem. The best way to move it over is to simply copy it. We’ll create some directories and copy the files over using the shell terminal you just opened.

Our data consists of three subdirectories: hoax_xml, aux_xml, and schemas. hoax_xml contains 36 XML documents, which are TEI-encoded newspaper and periodical articles. Each file contains one article and its metadata. aux_xml contains a prosopography and a gazetteer, which are also TEI-encoded documents. schemas contains schemas. While we could leave schemas out (our app doesn’t use them), the validation is important should we need to make changes as we develop.

Once you’ve copied the data, remove the .gitkeep file and commit the changes. .gitkeep lets you track empty directories in Git, so once the directory has contents it is best to remove it.

A note on copying data

You’re done with your markup, right? If you said no, stop here and finish it. If you said yes, no, you’re not. As you begin working on the app, you will find inconsistencies, encoding decisions, and even mistakes you want to fix. If you made a copy of the original data, but you make changes to the app data, soon the two will fall out of sync.

To avoid this, pick one data set (either the original or the app data) to become the Single Source of Truth (SSoT) and make changes only to that data directory. If you find a need to maintain both directories, keep them in sync by copying and writing over the entire secondary directory each time you make a change. However you decide to maintain the SSoT, record the choices and make your entire project team aware of the policy. Consistency and documentation are key!

At the command line (VSCode or elsewhere) …

  1. Change to the data directory with cd data from the main project directory.
  2. Make a new subdirectory of the data directory using mkdir your-directory-name. We used hoax_xml for our main data.
  3. Change to the directory you just created with cd your-directory-name. In our case that command looks like cd hoax_xml.
  4. Now we can use cp, the copy command, to copy our data from the directory it currently lives in. For us that looks like: cp ~/Desktop/pr-app/data/hoax_xml/*.xml ..

The syntax for copying is cp source-path destination-path. In this case the destination is the current directory, which we represent with a dot. Use the -r switch for recursion if you are also copying subdirectories.

We return to the data directory (cd ..) and repeat the same steps for our aux_xml and schemas directories (adjusting schema-association lines as needed).

You may want to open the data files in the eXist-db editor eXide (accessible from the Dashboard at http://localhost:8080) to check that things look correct. If you’re using non-standard character encoding, this is the right time to start troubleshooting.

Dashboard icons

This is an undocumented feature of eXist-db to our knowledge, so we’re covering it early. If you would like a custom icon in the eXist-db Dashboard, add a file called icon.png to your root directory, rebuild, and reinstall the application. You can see an example of the default icon (left) and custom icon (right) below.

image

At the command line (VSCode or elsewhere) …

  1. git add <filenames>
  2. git commit -m "your commit message here"
  3. git push

Review

In this section, we introduced “separation of concerns” to highlight the importance of consistent data management. We also highlighted that even great data can reveal its limitations during the development process, along with strategies for maintaining a stable and usable dataset. Finally, we covered Dashboard icons and how to add one to your application.

Tree view

Below is a hierarchical view of this stage of the app:

.
├── build
│   └── hoaXed-0.0.1.xar
├── build.xml
├── data
│   ├── aux_xml
│   │   ├── persons.xml
│   │   └── places.xml
│   ├── hoax_xml
│   │   ├── ablackghost_johnbull_1838.xml
│   │   ├── aghost_age_1832.xml
│   │   ├── aghost_douglasjerrold_1847.xml
│   │   ├── aghost_thesatirist_1838.xml
│   │   ├── aghost_weeklytimes_1836.xml
│   │   ├── aghostaghost_cleaves_1836.xml
│   │   ├── aghostathull_leader_1852.xml
│   │   ├── aghostcaught_cabinetnewspaper_1858_edited.xml
│   │   ├── alltheworld_pennysatirist_1838_edited.xml
│   │   ├── anewghost_londonreader_1864.xml
│   │   ├── anotherghost-india_times_1804_edited.xml
│   │   ├── anotherghost_times_1804_edited.xml
│   │   ├── anotherghostcase_englishleader_1864_edited.xml
│   │   ├── anotherstockwellghostcase_leader_1857_edited.xml
│   │   ├── apomeranianghost_pennysatirist_1844.xml
│   │   ├── asubstantialghoststory_peoplesadvocate_1875_edited.xml
│   │   ├── bermondsey_times_1830_edited.xml
│   │   ├── cocklaneghost_householdwords_1852.xml
│   │   ├── fatalcatastrophe_morningchronicle_1804_edited.xml
│   │   ├── ghostbeardevil_pennysatirist_1838_edited.xml
│   │   ├── ghostcutghost_chambers_1889.xml
│   │   ├── ghostlaid_bell_1826_edited.xml
│   │   ├── ghostofhammersmith_morningpost_1804_edited.xml
│   │   ├── hammersmithghost_times_1804.xml
│   │   ├── hampsteadghost_weeklytruesun_1836_edited.xml
│   │   ├── nelsonsghost_leader_1853_edited.xml
│   │   ├── newhammersmith_bell_1825.xml
│   │   ├── notdeadornoghost_weeklytimes_1836.xml
│   │   ├── notwithstanding_times_1804_edited.xml
│   │   ├── parkghost_times_1804.xml
│   │   ├── police_times_1838.xml
│   │   ├── resuscitationofhammersmith_weeklytruesun_1833.xml
│   │   ├── stjamesparkghost_times_1804.xml
│   │   ├── theghost_pennysatirist_1838.xml
│   │   ├── thoughtsonseeingghosts_oddfellow_1841.xml
│   │   └── tompainesghost_theage_1832_edited.xml
│   └── schemas
│       ├── hoax.sch
│       ├── places.sch
│       └── tei_hoax.rnc
├── expath-pkg.xml
├── icon.png
├── modules
│   └── lib.xql
├── pre-install.xq
├── properties.txt
├── repo.xml
└── tests
    ├── test-runner.xq
    └── test-suite.xq

7 directories, 51 files