The MIT License (MIT)
Copyright (c) 2014 CNRS
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
AUTHORS
Hervé Bredin -- http://herve.niderb.fr
# setting up IPython Notebook so that it displays nicely
# you can safely skip this cell...
%pylab inline
pylab.rcParams['figure.figsize'] = (10.0, 5.0)
from pyannote.core.notebook import set_notebook_crop, Segment
set_notebook_crop(Segment(0, 30))
Experiments are performed on The Big Bang Theory subset of the TVD corpus.
Instructions to reproduce this corpus locally are provided on TVD website.
The only prerequisite is that you acquire DVDs for the first season.
from tvd import TheBigBangTheory
dataset = TheBigBangTheory('/Volumes/data/tvd/')
Experiments are conducted on the first six episodes of the first season of The Big Bang Theory TV series.
Indeed, manual speech turns annotations are only available for these very episodes.
sixFirstEpisodes = dataset.episodes[:6]
for episode in sixFirstEpisodes:
print episode
For illustration purposes, we will only focus on the very first episode of the series.
episode = sixFirstEpisodes[0]
print episode
Once reproduced locally, the TVD corpus provides audio tracks for every episode in every language available on DVDs.
english = dataset.path_to_audio(episode, language='en')
french = dataset.path_to_audio(episode, language='fr')
print 'English soundtrack:', english
print 'French soundtrack:', french
english = dataset.path_to_subtitles(episode, language='en')
print 'English subtitles:', english
pyannote.parser provides a SRT parser that takes care of dividing (split=True) subtitles that cover multiple dialogue lines from several speaker, and allottes to each line a duration proportional to their number of words (duration=True).
from pyannote.parser.srt import SRTParser
subtitles = SRTParser(split=True, duration=True).read(english)
subtitles
While subtitles provides coarse dialogue transcription with timestamps, transcripts, on the other side, provides exact dialogue transcription but no timing information other than the chronological order.
transcript = dataset.get_resource('transcript', episode)
transcript
The Big Bang Theory TVD plugin provides a resource called speaker containing manual annotation of the audio track.
manual_annotation = dataset.get_resource('speaker', episode)
manual_annotation
Manual annotation provides all sorts of labels...
all_labels = manual_annotation.labels()
all_labels
... but only speech regions are of interest to us:
labels = [label for label in all_labels if label.startswith('speech_')]
speech_regions = manual_annotation.subset(set(labels))
speech_regions
Among those speech regions, we focus on the main 5 characters.
speech_regions.labels()
All other speech turns are marked as OTHER
main_characters = ['sheldon', 'leonard', 'penny', 'howard', 'raj']
mapping = {label: label[7:].upper() if label[7:] in main_characters else 'OTHER'
for label in labels}
mapping
reference = speech_regions.translate(mapping)
reference
There you go: we now have our shiny speaker identification reference!