Skip to content

Latest commit

 

History

History
140 lines (64 loc) · 3.9 KB

md3-01.md

File metadata and controls

140 lines (64 loc) · 3.9 KB

Wrangling Data

image

'Cattle Drive, c1913', cc William Cresswell

Historical data

image

cc Mayor Dore, 1936, Seattle Municipal Archives

  • is messy
  • is never in the format you want

Note: Fred Gibbs on programming historian poster

Digitization

image

cc Alan Levine

  • how do we go from paper to bits?

Ian Milligan, illusionary order, and OCR image

image

image

Transcription v Text Encoding

  • remember when we transcribed text in HIST2809? What was the goal of transcription?

quick intro to TEI

image

  • tags to identify the semantic content of text
  • XML

transformable

  • as when we write in md, content is separate from formatting
  • just as HTML can be styled w CSS, TEI XML can be transformed w stylesheets, xlst

simple files, multiple uses

  • one file with our markup, our annotations, our scholarly apparatus
  • xlst files to transform for our needs
  • open in a browser

specialist tools

a primer for historians

image

frequent tags

<DATE> contains a date in any format.

<EVENT> any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication.

<GEOGNAME> (I.E. GEOGRAPHICAL NAME) a name associated with some geographical feature such as'Windrush Valle' or 'Mount Sinai'.

<GEOG>(I.E. GEOGRAPHICAL FEATURE NAME) a common noun identifying some geographical feature contained within a geographic name, such as 'valley', 'mount', etc.

<OCCUPATION>contains an informal description of a person's trade, profession or occupation.

<PERSNAME> (I.E. PERSONAL NAME) contains a proper noun or proper-noun phrase referring to a person, possibly including any or all of the person's forename, surname, honorofic, added names, etc.

<PLACENAME> (I.E. PLACE NAME) contains an absolute or relative place name.

<ROLENAME> Description: contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank.

<TIME> Description: contains a phrase defining a time of day in any format.

practice

  • The park has a lovely duck pond.

  • Our butcher sold us some rotten meat yesterday afternoon.

  • Dr. Havingsbury goes to church every Sunday.

  • Mrs. Wellington, who was wearing her Wellington boots, accompanied me on a trip, where we saw a statue of the Duke of Wellington in Wellington, New Zealand near the Wellington's Boot public house- where I hear they serve excellent beef Wellington for lunch. I do adore Mrs. Wellington.

Colonial Newspaper Database

  • what tags does the historian use?
  • what does the xlst turn the XML into?

Next day:

  • not exactly wrangling, but let's talk Pandoc (system for transforming your md into other formats)
  • regular expressions
  • open refine

and if time allows

  • this module's exercises, things to watch out for
  • other business?