Skip to content

Commit

Permalink
Merge pull request #94 from readbeyond/devel
Browse files Browse the repository at this point in the history
aeneas v1.5.1
  • Loading branch information
readbeyond authored Jul 25, 2016
2 parents faeaff6 + d30ad36 commit d7dbb8c
Show file tree
Hide file tree
Showing 154 changed files with 2,064 additions and 1,705 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ prune docs/build
include CHANGELOG
include LICENSE
recursive-include licenses *
include output/.gitignore
include README.md
include README.rst
include requirements.txt
Expand Down
48 changes: 29 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

**aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

* Version: 1.5.0.3
* Date: 2016-04-23
* Version: 1.5.1.0
* Date: 2016-07-25
* Developed by: [ReadBeyond](http://www.readbeyond.it/)
* Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/)
* License: the GNU Affero General Public License Version 3 (AGPL v3)
Expand Down Expand Up @@ -87,6 +87,16 @@ which can be installed on any modern OS (Linux, Mac OS X, Windows).

### Installation

All-in-one installers are available for Mac OS X and Windows,
and a Bash script for deb-based Linux distributions (Debian, Ubuntu)
is provided in this repository.
It is also possible to download a VirtualBox+Vagrant virtual machine.
Please see the
[INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md)
for detailed, step-by-step installation procedures for different operating systems.

The generic OS-independent procedure is simple:

1. Install
[Python](https://python.org/) (2.7.x preferred),
[FFmpeg](https://www.ffmpeg.org/), and
Expand All @@ -102,20 +112,16 @@ which can be installed on any modern OS (Linux, Mac OS X, Windows).
pip install aeneas
```

See the
[INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md)
for detailed, step-by-step procedures for Linux, OS X, and Windows.


## Usage

1. To **check** whether you installed **aeneas** correctly, run:
4. To **check** whether you installed **aeneas** correctly, run:

```bash
python -m aeneas.diagnostics
```

2. Run without arguments to get the **usage message**:

## Usage

1. Run without arguments to get the **usage message**:

```bash
python -m aeneas.tools.execute_task
Expand All @@ -131,7 +137,7 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows.
python -m aeneas.tools.execute_task --examples-all
```

3. To **compute a synchronization map** `map.json` for a pair
2. To **compute a synchronization map** `map.json` for a pair
(`audio.mp3`, `text.txt` in
[plain](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN)
text format), you can run:
Expand Down Expand Up @@ -169,7 +175,7 @@ for detailed, step-by-step procedures for Linux, OS X, and Windows.
[documentation](http://www.readbeyond.it/aeneas/docs/)
for details.
4. If you have several tasks to process,
3. If you have several tasks to process,
you can create a **job container**
to batch process them:
Expand Down Expand Up @@ -222,12 +228,12 @@ which explains how to use the built-in command line tools.
* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
* Input audio file formats: all those readable by `ffmpeg`
* Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TSV, TTML, TXT, VTT, XML
* Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
* Confirmed working on languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
* MFCC and DTW computed via Python C extensions to reduce the processing time
* On Linux, eSpeak called via a Python C extension for faster audio synthesis
* Batch processing of multiple audio/text pairs
* Several built-in TTS engine wrappers: eSpeak (default, FLOSS), Festival (FLOSS), Nuance TTS API (commercial)
* Use custom TTS engine wrappers besides the built-in ones
* Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
* A custom, user-provided TTS engine Python wrapper can be used instead of the built-in ones (included example for speect)
* Batch processing of multiple audio/text pairs
* Download audio from a YouTube video
* In multilevel mode, recursive alignment from paragraph to sentence to word level
* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
Expand All @@ -236,13 +242,14 @@ which explains how to use the built-in command line tools.
* Output an HTML file for fine tuning the sync map manually (`finetuneas` project)
* Execution parameters tunable at runtime
* Code suitable for Web app deployment (e.g., on-demand cloud computing)
* Extensive test suite including 898 unit/integration/performance tests, that run and must pass before each release
## Limitations and Missing Features
* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
* Audio is assumed to be spoken: not suitable/YMMV for song captioning
* No protection against memory trashing if you feed extremely long audio files
* Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
* No protection against memory trashing if you feed extremely long audio files (>1.5h per single audio file)
* [Open issues](https://github.com/readbeyond/aeneas/issues)
Expand Down Expand Up @@ -340,6 +347,9 @@ for its asynchronous usage.
**Chris Hubbard** prepared the files for
packaging aeneas as a Debian/Ubuntu `.deb`.
**Daniel Bair**, **Chris Hubbard**, and **Richard Margetts**
packaged the installers for Mac OS X and Windows.
**Firat Ozdemir** contributed the `finetuneas`
HTML/JS code for fine tuning sync maps in the browser.
Expand Down
56 changes: 35 additions & 21 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ aeneas
**aeneas** is a Python/C library and a set of tools to automagically
synchronize audio and text (aka forced alignment).

- Version: 1.5.0.3
- Date: 2016-04-23
- Version: 1.5.1.0
- Date: 2016-07-25
- Developed by: `ReadBeyond <http://www.readbeyond.it/>`__
- Lead Developer: `Alberto Pettarin <http://www.albertopettarin.it/>`__
- License: the GNU Affero General Public License Version 3 (AGPL v3)
Expand Down Expand Up @@ -100,6 +100,16 @@ modern OS (Linux, Mac OS X, Windows).
Installation
~~~~~~~~~~~~

All-in-one installers are available for Mac OS X and Windows, and a Bash
script for deb-based Linux distributions (Debian, Ubuntu) is provided in
this repository. It is also possible to download a VirtualBox+Vagrant
virtual machine. Please see the `INSTALL
file <https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md>`__
for detailed, step-by-step installation procedures for different
operating systems.

The generic OS-independent procedure is simple:

1. Install `Python <https://python.org/>`__ (2.7.x preferred),
`FFmpeg <https://www.ffmpeg.org/>`__, and
`eSpeak <http://espeak.sourceforge.net/>`__
Expand All @@ -114,18 +124,14 @@ Installation
pip install numpy
pip install aeneas
See the `INSTALL
file <https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md>`__
for detailed, step-by-step procedures for Linux, OS X, and Windows.
4. To **check** whether you installed **aeneas** correctly, run:

``bash python -m aeneas.diagnostics``

Usage
-----

1. To **check** whether you installed **aeneas** correctly, run:

``bash python -m aeneas.diagnostics``

2. Run without arguments to get the **usage message**:
1. Run without arguments to get the **usage message**:

.. code:: bash
Expand All @@ -140,7 +146,7 @@ Usage
python -m aeneas.tools.execute_task --examples
python -m aeneas.tools.execute_task --examples-all
3. To **compute a synchronization map** ``map.json`` for a pair
2. To **compute a synchronization map** ``map.json`` for a pair
(``audio.mp3``, ``text.txt`` in
`plain <http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN>`__
text format), you can run:
Expand Down Expand Up @@ -178,7 +184,7 @@ specifies the parameters controlling the I/O formats and the processing
options for the task. Consult the
`documentation <http://www.readbeyond.it/aeneas/docs/>`__ for details.

4. If you have several tasks to process, you can create a **job
3. If you have several tasks to process, you can create a **job
container** to batch process them:

.. code:: bash
Expand Down Expand Up @@ -229,17 +235,19 @@ Supported Features
- Input audio file formats: all those readable by ``ffmpeg``
- Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB,
TSV, TTML, TXT, VTT, XML
- Tested languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO,
EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, LAT, LAV, LIT, NLD,
NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
- Confirmed working on languages: ARA, BUL, CAT, CYM, CES, DAN, DEU,
ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN,
LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE,
TUR, UKR
- MFCC and DTW computed via Python C extensions to reduce the
processing time
- On Linux, eSpeak called via a Python C extension for faster audio
synthesis
- Batch processing of multiple audio/text pairs
- Several built-in TTS engine wrappers: eSpeak (default, FLOSS),
Festival (FLOSS), Nuance TTS API (commercial)
- Use custom TTS engine wrappers besides the built-in ones
- Default TTS (eSpeak) called via a Python C extension for fast audio
synthesis
- A custom, user-provided TTS engine Python wrapper can be used instead
of the built-in ones (included example for speect)
- Batch processing of multiple audio/text pairs
- Download audio from a YouTube video
- In multilevel mode, recursive alignment from paragraph to sentence to
word level
Expand All @@ -253,15 +261,18 @@ Supported Features
- Execution parameters tunable at runtime
- Code suitable for Web app deployment (e.g., on-demand cloud
computing)
- Extensive test suite including 898 unit/integration/performance
tests, that run and must pass before each release

Limitations and Missing Features
--------------------------------

- Audio should match the text: large portions of spurious text or audio
might produce a wrong sync map
- Audio is assumed to be spoken: not suitable/YMMV for song captioning
- Audio is assumed to be spoken: not suitable for song captioning, YMMV
for CC applications
- No protection against memory trashing if you feed extremely long
audio files
audio files (>1.5h per single audio file)
- `Open issues <https://github.com/readbeyond/aeneas/issues>`__

License
Expand Down Expand Up @@ -362,6 +373,9 @@ asynchronous usage.
**Chris Hubbard** prepared the files for packaging aeneas as a
Debian/Ubuntu ``.deb``.

**Daniel Bair**, **Chris Hubbard**, and **Richard Margetts** packaged
the installers for Mac OS X and Windows.

**Firat Ozdemir** contributed the ``finetuneas`` HTML/JS code for fine
tuning sync maps in the browser.

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.5.0
1.5.1
2 changes: 1 addition & 1 deletion aeneas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
__version__ = "1.5.0"
__version__ = "1.5.1"
__email__ = "[email protected]"
__status__ = "Production"

Expand Down
2 changes: 1 addition & 1 deletion aeneas/adjustboundaryalgorithm.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
__version__ = "1.5.0"
__version__ = "1.5.1"
__email__ = "[email protected]"
__status__ = "Production"

Expand Down
2 changes: 1 addition & 1 deletion aeneas/analyzecontainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
__version__ = "1.5.0"
__version__ = "1.5.1"
__email__ = "[email protected]"
__status__ = "Production"

Expand Down
73 changes: 72 additions & 1 deletion aeneas/audiofile.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
__version__ = "1.5.0"
__version__ = "1.5.1"
__email__ = "[email protected]"
__status__ = "Production"

Expand Down Expand Up @@ -116,6 +116,77 @@ class AudioFile(Loggable):
:type logger: :class:`~aeneas.logger.Logger`
"""

FILE_EXTENSIONS = [
u"3g2",
u"3gp",
u"aa",
u"aa3",
u"aac",
u"aax",
u"aiff",
u"alac",
u"amr",
u"ape",
u"asf",
u"at3",
u"at9",
u"au",
u"avi",
u"awb",
u"celt",
u"dct",
u"dss",
u"dvf",
u"eac",
u"flac",
u"flv",
u"gsm",
u"m4a",
u"m4b",
u"m4p",
u"m4v",
u"mid",
u"midi",
u"mkv",
u"mmf",
u"mov",
u"mp2",
u"mp3",
u"mp4",
u"mpc",
u"mpeg",
u"mpg",
u"mpv",
u"msv",
u"oga",
u"ogg",
u"ogv",
u"oma",
u"opus",
u"pcm",
u"qt",
u"ra",
u"ram",
u"raw",
u"riff",
u"rm",
u"rmvb",
u"shn",
u"sln",
u"theora",
u"tta",
u"vob",
u"vorbis",
u"vox",
u"wav",
u"webm",
u"wma",
u"wmv",
u"wv",
u"yuv",
]
""" Extensions of common formats for audio (and video) files. """

TAG = u"AudioFile"

def __init__(self, file_path=None, is_mono_wave=False, rconf=None, logger=None):
Expand Down
4 changes: 2 additions & 2 deletions aeneas/audiofilemfcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
Copyright 2015-2016, Alberto Pettarin (www.albertopettarin.it)
"""
__license__ = "GNU AGPL v3"
__version__ = "1.5.0"
__version__ = "1.5.1"
__email__ = "[email protected]"
__status__ = "Production"

Expand Down Expand Up @@ -134,7 +134,7 @@ def __init__(
self._compute_mfcc_c_extension,
self._compute_mfcc_pure_python,
(),
c_extension=self.rconf[RuntimeConfiguration.C_EXTENSIONS]
rconf=self.rconf
)
self.audio_length = self.audio_file.audio_length
if audio_file_was_none:
Expand Down
2 changes: 1 addition & 1 deletion aeneas/cdtw/000_compile_driver.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

gcc cdtw_driver.c cdtw_func.c cint.c -o cdtw_driver -lm -Wall -pedantic -std=c99
gcc cdtw_driver.c cdtw_func.c ../cint/cint.c -o cdtw_driver -lm -Wall -pedantic -std=c99



3 changes: 3 additions & 0 deletions aeneas/cdtw/900_clean.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash

rm -rf build __pycache__ *.so cdtw_driver
Loading

0 comments on commit d7dbb8c

Please sign in to comment.