DPE, stands for Digital Paper Edit, named after digital-paper-edit project. Also known as autoEdit3
An application to make it faster, easier and more accessible to edit audio and video interviews using automatically generated transcriptions form STT service. The current representation of a transcription is a list of timed word objects and one of speakers.
{
"words": [
{
"end": 0.46, // in seconds
"start": 0,
"text": "Hello"
},
{
"end": 1.02,
"start": 0.46,
"text": "World"
},
...
]
"paragraphs": [
{
"speaker": "SPEAKER_A",
"start": 0,
"end": 3
},
{
"speaker": "SPEAKER_B",
"start": 3,
"end": 19.2
},
...
]
}
Having paragraphs and words separate as a way of modelling this domain has proven extremly flexible for situation where you need to run alignment on the whole text or just parts of it.
Paragraphs are generally generated by the Speech To Text service speaker diarization information. Or when this is not available they can generated via punctuation (.
|?
|!
) that might be present in the words.
See these STT adapters for examples of it can be generated
- AssemblyAI
assemblyai-to-dpe
- AWS Transcriber
aws-to-dpe
- Google STT
gcp-to-dpe
- IBM Watson STT (in PR pietrop/digital-paper-edit-electron#52 module
ibmwatson-to-dpe
but not extracted as separate module npm/github repo) Speechmatics(There's aspeechmatics-to-dpe
module but not extracted as a separate npm/github repo/module - since speechmatics web portal API deprecation notice)
There's helper functions such as dpe-add-words-to-paragraphs.sj you can write to interpolate the paragraphs back with the words of getWordsForParagraph
used in slate-transcript-editor
- dpe-to-slate
"import" adapter.
/**
*
* @param {*} currentParagraph a dpe paragraph object, with start, and end attribute eg in seconds
* @param {*} words a list of word objects with start and end attributes
* @returns a lsit of words obejcts that are included in the given paragraphs
*/
const getWordsForParagraph = (currentParagraph, words) => {
const { start, end } = currentParagraph;
return words.filter((word) => {
return word.start >= start && word.end <= end;
});
};
export default getWordsForParagraph;