BenO

# fiat 1.0
#
# This data (and the data it refers to) is copyright 2007, 2008 by
# Greg Kochanski, and is licensed in England under
# the Creative Commons Noncommercial-Attribution License.
# Details may be found at http://creativecommons.org/licenses/by-nc/2.0/uk/legalcode .
# You may copy and/or use this file (and referenced files) for noncommercial
# purposes so long as the author is properly acknowledged.
# For commercial licensing, contact Isis Innovation,
# http://www.isis-innovation.com/ .
# COPYRIGHT = Greg Kochanski
# LICENSE_URL = http://creativecommons.org/licenses/by-nc/2.0/uk/legalcode
#
# This file contains metadata describing the "tick1" experiment
# from ESRC grant "Articulation and Coarticulation in the Lower Vocal Tract"
# with G. Kochanski and J. Coleman as principal investigators.
# Data is courtesy of the UK's Economics and Social Research Council,
# derived from project RES-000-23-1094, 7/2005 through 3/2008.
# When using this data, the appropriate publication to reference is
# DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?"
# G. Kochanski and C. Orphanidou, Journal of the Acoustical Society of America,
# ISSN 0001-4966, Volume 123(5), pages 2780-2791.
#
# This table is in the FIAT data format, defined originally by
# http://dls.physics.ucdavis.edu/fiat/fiat.html .   Python implementations
# of modules to read and write this format can be found at
# http://sourceforge.net under the "speechresearch" project, in the
# "gmisclib/fiatio.py" file.   http://sourceforge.net/projects/speechresearch
# should lead to the software.
# The format is simply a tab-separated column format, with escape sequences
# that begin with percent characters.
#
# This table contains one line per utterance.  It describes each
# utterance produced in the experiment.
# Columns are as follows:
#
# TTYPE1 = sex
# Gender of the speaker.
#
# TTYPE2 = speakerID
# A unique identifier for each speaker (experimental subject).
# These identifiers are the same as the "speakerID" identifiers
# in "DBsub.fiat", and can be used to look up some additional
# information about that recording session.
#
# TTYPE3 = d
# Directory which holds that utterance.
#
# TTYPE4 = text
# The unique ID for the text that is spoken.
# The actual text can be looked up under the
# same ID in file DBsent.fiat .   In that file,
# the ID is in column "text", and the actual text
# is in the column named "repence".
#
# TTYPE5 = practice
# Is this practice data or not?   Practice data was not used
# in the published analysis.
#
# TTYPE6 = tap_m
# Is this "tapping"(tap) data or "metronome"(m) data?
# This column indicates the experimental task.   The tapping
# task required the subject to tap their finger along with
# the stressed syllables of the text.
# The metronome task presented the subject with a metronome tick
# in an earphone, and they were asked to speak the sentences
# to the beat of the metronome.
# See the publication above for a more detailed description of
# the experiment.
# WARNING: In some of the tapping data, the finger taps are
# loud enough to be heard in the microphone channel intended
# for speech.   Any analysis on that data would have to either
# select utterances where the taps are not too loud, remove
# them via some noise subtraction technique, or be carefully designed
# not to be affected by the sounds of the taps.
#
# TTYPE7 = bpm
# This is either empty (coded as '%mt') or the metronome rate in
# beats per minute.
#
# TTYPE8 = f
# This is the final component of the pathname to the data.
# Relative to the location of this file, each utterance
# is represented by a directory at d/f.
# It contains several files of interest:
# 	raw.wav -- the original recording, in Microsoft WAV format.
#		It is a two-channel file.  One channel contains the
#		recorded speech, and the other channel contains either
#		metronome ticks or an audio channel from a microphone
#		positioned to pick up finger taps.   (The subject's finger
#		tapped on a hardcover book about 2cm from the microphone.)
#		The finger tap channel will pick up some speech, but faintly,
#		and the speech channel will pick up some finger tap sounds.
#		However, metronome ticks were coupled in electronically and
#		are completely isolated from the speech channel.
#	ue.lbl -- These are the start and end-points of the speech in the
#		utterance, automatically generated but checked for accuracy
#		by a human.   A small amount of silence (probably <100ms)
#		is included within
#		the marked endpoints on either side of the utterance.
#		See the above publication for details.
#		The data files are in a format suitable for reading by
#		the ESPS package Xwaves, and can be read by Wavesurfer
#		(circa 2008).   Python 2.5 code for reading them is
#		available on the above Sourceforge site, in the file
#		.../gmisclib/xwaves_lab.py .    In brief, the format
#		contains a bunch of header lines of basically useless
#		information, then a line consisting of a single hash mark
#		('#'), then two relevant lines.    The one containing an
#		asterisk in the third field marks the utterance start
#		(the time is in the first field).   Likewise, the line
#		containing '%' marks the end.
#		Times are relative to the beginning of the raw.wav files.
# 	raw.tap --  This file contains experimental tick or tap events.
#		For the metronome data, it contains the times at which
#		metronome ticks occur.   For the "tick" data, if it
#		exists, it lists the times at which the subject's finger
#		tapped to mark a stressed syllable.
#		This is computed from one of the channels of the raw.wav file,
#		but manually checked.
#		This file is in the Xwaves label format, same as ue.lbl.
# 	m.dat -- This file contains computed tick or tap locations.
# 		It is meaningful only for metronome data, where it simply
# 		marks the metronome ticks.
# Other files are computed from the raw data, and are preserved for convenience.
# These were used in the "What marks the beat of speech?" paper.
#	Theses files are in the "GPK ASCII Image" format, and is
#	readable/writeable by code in the speechresearch project
#	of http://sourceforge.net , in files gpkio/read.c
#	gpkio/ascii_read.c and related code.   A python interface
#	is available in gpk_img_python/gpkimgclass.py
#	and gpk_img_python/gpk_img.cc (and related files).
#	The algorithms used to produce the data below are described in
#	the DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?" publication
#	referenced above.
#
# 	irr.dat --  An irregularity measure that separates voiced speech
#		from unvoiced.   It quantifies speech that is not fully voiced.
#		This file is in the "GPK ASCII Image" format, see above.
# 	loud.dat --  The perceptual loudness.
#		This file is in the "GPK ASCII Image" format, see above.
# 	pdur.dat -- A measure of duration for the current syllable.
#		Essentially, it measures how far one can go (in time)
#		before the spectrum changes substantially.
#		This file is in the "GPK ASCII Image" format, see above.
# 	rms.dat -- The RMS (intensity or power).
#		This file is in the "GPK ASCII Image" format, see above.
# 	f0.dat -- A standard computation of the speech fundamental frequency.
#		This file is in the "GPK ASCII Image" format, see above.
# 	sss.dat -- A measurement of the average slope of the speech spectrum.
#		This file is in the "GPK ASCII Image" format, see above.
#
# So, for instance, the audio for the utterance in the corpus
# with d="nh" and f="nh_rep1_m84"
# is found at nh/nh_rep1_m84/raw.wav .  Start and end marks for that
# utterance are at nh/nh_rep1_m84/ue.lbl , et cetera.
#
# The data used in the above publication have "rep*" in the text field
# and are repetitive speech.   Each phrase is repeated 10-15 times
# in succession.
# Files whose text field is in the form "sent" are long lists
# of randomized sentences.   These "sent" files were used,
# along with the "rep*" files in another publication,
# "Testing the Ecological Validity of Repetitive Speech",
# Greg Kochanski and Christina Orphanidou,
# presented at the 2007 International Congress of
# the Phonetic Sciences (ICPhS2007), 6-10 August 2007.
# It is available on the web at http://kochanski.org/gpk/papers/2007/icphs.pdf,
# http://ora.ouls.ox.ac.uk/objects/uuid:1999c687-49a0-4808-9a50-2f82ab66d96f ,
# or http://tinyurl.com/3u2ba4 .
#
# Files where the text field equals "fox", "king", and "lucky"
# are longer texts that were not used.   They are from
# three books by Dr. Suess (Geisel).
#
#
m ch ch fox 1 tap %mt ch_fox_tap_pr
m ch ch fox 0 tap %mt ch_fox_tap
m ch ch lucky 1 tap %mt ch_lucky_tap_pr
m ch ch lucky 0 tap %mt ch_lucky_tap
m ch ch king 1 tap %mt ch_king_tap_pr
m ch ch king 0 tap %mt ch_king_tap
m ch ch sent 0 %mt %mt ch_sent
m ch ch fox 0 m 84 ch_fox_m84
m ch ch fox 0 m 88 ch_fox_m88
m ch ch fox 0 m 92 ch_fox_m92

etc