readbeyond

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
Under GNU Affero General Public License v3.0
By readbeyond

python linux windows macos cli nlp tts ffmpeg audio alignment text dtw speech espeak espeak-ng festival forced-alignment srt smil text-to-speech

aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).



Goal

aeneas automatically generates a synchronization map
between a list of text fragments
and an audio file containing the narration of the text.
In computer science this task is known as
(automatically computing a) forced alignment.


For example, given
this text file
and
this audio file,
aeneas determines, for each fragment, the corresponding time interval in the audio file:


1 => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease, => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]



This synchronization map can be output to file
in several formats, depending on its application:



System Requirements, Supported Platforms and Installation
System Requirements

  1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)

  2. Python 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X)

  3. FFmpeg

  4. eSpeak

  5. Python packages BeautifulSoup4, lxml, and numpy

  6. Python headers to compile the Python C/C++ extensions (optional but strongly recommended)

  7. A shell supporting UTF-8 (optional but strongly recommended)


Supported Platforms

aeneas has been developed and tested on Debian 64bit,
with Python 2.7 and Python 3.5,
which are the only supported platforms at the moment.
Nevertheless, aeneas has been confirmed to work on
other Linux distributions, Mac OS X, and Windows.
See the
PLATFORMS file
for details.


If installing aeneas natively on your OS proves difficult,
you are strongly encouraged to use
aeneas-vagrant,
which provides aeneas inside a virtualized Debian image
running under
VirtualBox
and
Vagrant,
which can be installed on any modern OS (Linux, Mac OS X, Windows).


Installation

All-in-one installers are available for Mac OS X and Windows,
and a Bash script for deb-based Linux distributions (Debian, Ubuntu)
is provided in this repository.
It is also possible to download a VirtualBox+Vagrant virtual machine.
Please see the
INSTALL file
for detailed, step-by-step installation procedures for different operating systems.


The generic OS-independent procedure is simple:




  1. Install
    Python (2.7.x preferred),
    FFmpeg, and
    eSpeak




  2. Make sure the following executables can be called from your shell:
    espeak, ffmpeg, ffprobe, pip, and python




  3. First install numpy with pip and then aeneas (this order is important):


    bash
    pip install numpy
    pip install aeneas




  4. To check whether you installed aeneas correctly, run:




bash
python -m aeneas.diagnostics


Usage


  1. Run without arguments to get the usage message:


    bash
    python -m aeneas.tools.execute_task
    python -m aeneas.tools.execute_job


    You can also get a list of live examples
    that you can immediately run on your machine
    thanks to the included files:


    bash
    python -m aeneas.tools.execute_task --examples
    python -m aeneas.tools.execute_task --examples-all




  2. To compute a synchronization map map.json for a pair
    (audio.mp3, text.txt in
    plain
    text format), you can run:


    bash
    python -m aeneas.tools.execute_task \
    audio.mp3 \
    text.txt \
    "task_language=eng|os_task_file_format=json|is_text_type=plain" \
    map.json




(The command has been split into lines with \ for visual clarity;
in production you can have the entire command on a single line
and/or you can use shell variables.)


To compute a synchronization map map.smil for a pair
(audio.mp3,
page.xhtml
containing fragments marked by id attributes like f001),
you can run:


```bash
python -m aeneas.tools.execute_task \
audio.mp3 \
page.xhtml \
"task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
map.smil
```

As you can see, the third argument (the configuration string)
specifies the parameters controlling the I/O formats
and the processing options for the task.
Consult the
documentation
for details.




  1. If you have several tasks to process,
    you can create a job container
    to batch process them:


    bash
    python -m aeneas.tools.execute_job job.zip output_directory




File job.zip should contain a config.txt or config.xml
configuration file, providing aeneas
with all the information needed to parse the input assets
and format the output sync map files.
Consult the
documentation
for details.


The
documentation
contains a highly suggested
tutorial
which explains how to use the built-in command line tools.


Documentation and Support

Supported Features

Limitations and Missing Features

A Note on Word-Level Alignment

A significant number of users runs aeneas to align audio and text
at word-level (i.e., each fragment is a word).
Although aeneas was not designed with word-level alignment in mind
and the results might be inferior to
ASR-based forced aligners
for languages with good ASR models,
aeneas offers some options to improve
the quality of the alignment at word-level:



If you use the aeneas.tools.execute_task command line tool,
you can add --presets-word switch to enable MFCC nonspeech masking, for example:


bash
$ python -m aeneas.tools.execute_task --example-words --presets-word
$ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word


If you use aeneas as a library, just set the appropriate
RuntimeConfiguration parameters.
Please see the
command line tutorial
for details.


License

aeneas is released under the terms of the
GNU Affero General Public License Version 3.
See the
LICENSE file for details.


Licenses for third party code and files included in aeneas
can be found in the
licenses directory.


No copy rights were harmed in the making of this project.


Supporting and Contributing
Sponsors

Supporting

Would you like supporting the development of aeneas?


I accept sponsorships to



Feel free to
get in touch.


Contributing

If you think you found a bug
or you have a feature request,
please use the
GitHub issue tracker
to submit it.


If you want to ask a question
about using aeneas,
your best option consists in sending an email to the
mailing list.


Finally, code contributions are welcome!
Please refer to the
Code Contribution Guide
for details about the branch policies and the code style to follow.


Acknowledgments

Many thanks to Nicola Montecchio,
who suggested using MFCCs and DTW,
and co-developed the first experimental code
for aligning audio and text.


Paolo Bertasi, who developed the
APIs and Web application for ReadBeyond Sync,
helped shaping the structure of this package
for its asynchronous usage.


Chris Hubbard prepared the files for
packaging aeneas as a Debian/Ubuntu .deb.


Daniel Bair prepared the brew formula
for installing aeneas and its dependencies on Mac OS X.


Daniel Bair, Chris Hubbard, and Richard Margetts
packaged the installers for Mac OS X and Windows.


Firat Ozdemir contributed the finetuneas
HTML/JS code for fine tuning sync maps in the browser.


Willem van der Walt contributed the code snippet
to output a sync map in TextGrid format.


Chris Vaughn contributed the MacOS TTS wrapper.


All the mighty
GitHub contributors,
and the members of the
Google Group.