Skip to content
Eric Kidd edited this page Dec 8, 2017 · 1 revision

I'm working on a draft data model for substudy. Ideally, this should work with a number of other language-learning tools.

Sample data

{
  "title": "My favorite series episode 1",
  "attachment": {
    "mimeType": "video/mp4",
    "relpath": "favorite_series_01.mp4"
  },
  "syncItems": [
    {
      "span": { "begin": 10.5, "end": 15.9 },
      "syncTracks": [
        {
          "type": "text",
          "language": "fr",
          "text": "Hé, les gars`" 
        },
        {
          "type": "text",
          "language": "en",
          "text": "Hey, guys!" 
        },
      ]
    }
  ]
}

Data types

The structure looks like this:

  • MediaFile: An episode, movie or chapter.
    • SyncItem: A subtitle, sentence or small set of sentences that exists in more than one language.
      • SyncTrack: A single version of the SyncItem, in a specific language and media. So this might be "French subtitle", "English subtitle", "image", etc.

Note that I may rename these if I find better names!

For some proposed fields, see below.

Attachment: An attached file.

  • mimeType: String: The MIME type of this file.
  • relPath: PathBuf: The path to this file, relative to some "root".

TimeSpan: A period of time with a beginning and an end.

  • begin: f32: The beginning time.
  • end: f32: The ending time. Must be greater than or equal to begin.

MediaFile: A chapter, episode, movie, etc.

  • title: Option<String>: The title of this media item, if any.
  • attachment: Option<Attachment>: An external audio or video file associated with this MediaFile. May be omitted in the case of book chapters.
  • syncItems: Vec<SyncItem>: Individual synchronized subtitles or sentences.

SyncItem: A synchronized subtitle, sentence or set of sentences. Corresponds to an Anki "note".

  • span: Option<TimeSpan>: An optional time period associated with this item. This will be None if we're just synchronizing two pieces of plain text without any timing information.
  • syncTracks: Vec<SyncTracks>: Different versions of this SyncItem in different languages or media types.

SyncTrack: A specific version of a sentence or subtitle, a "track" if you will

  • type: String: One of text, image, media, notes, etc.
  • language: Option<String>: The language this item is in, using an ISO 639-1 code when possible, or an ISO 639-2 code otherwise.
  • attachment: Option<Attachment>: Any image, etc., that we want to attach.
  • text: Option<String>: Any associated text.