Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # fiat 1.0
- #
- # This data (and the data it refers to) is copyright 2007, 2008 by
- # Greg Kochanski, and is licensed in England under
- # the Creative Commons Noncommercial-Attribution License.
- # Details may be found at http://creativecommons.org/licenses/by-nc/2.0/uk/legalcode .
- # You may copy and/or use this file (and referenced files) for noncommercial
- # purposes so long as the author is properly acknowledged.
- # For commercial licensing, contact Isis Innovation,
- # http://www.isis-innovation.com/ .
- # COPYRIGHT = Greg Kochanski
- # LICENSE_URL = http://creativecommons.org/licenses/by-nc/2.0/uk/legalcode
- #
- # This file contains metadata describing the "tick1" experiment
- # from ESRC grant "Articulation and Coarticulation in the Lower Vocal Tract"
- # with G. Kochanski and J. Coleman as principal investigators.
- # Data is courtesy of the UK's Economics and Social Research Council,
- # derived from project RES-000-23-1094, 7/2005 through 3/2008.
- # When using this data, the appropriate publication to reference is
- # DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?"
- # G. Kochanski and C. Orphanidou, Journal of the Acoustical Society of America,
- # ISSN 0001-4966, Volume 123(5), pages 2780-2791.
- #
- # This table is in the FIAT data format, defined originally by
- # http://dls.physics.ucdavis.edu/fiat/fiat.html . Python implementations
- # of modules to read and write this format can be found at
- # http://sourceforge.net under the "speechresearch" project, in the
- # "gmisclib/fiatio.py" file. http://sourceforge.net/projects/speechresearch
- # should lead to the software.
- # The format is simply a tab-separated column format, with escape sequences
- # that begin with percent characters.
- #
- # This table contains one line per utterance. It describes each
- # utterance produced in the experiment.
- # Columns are as follows:
- #
- # TTYPE1 = sex
- # Gender of the speaker.
- #
- # TTYPE2 = speakerID
- # A unique identifier for each speaker (experimental subject).
- # These identifiers are the same as the "speakerID" identifiers
- # in "DBsub.fiat", and can be used to look up some additional
- # information about that recording session.
- #
- # TTYPE3 = d
- # Directory which holds that utterance.
- #
- # TTYPE4 = text
- # The unique ID for the text that is spoken.
- # The actual text can be looked up under the
- # same ID in file DBsent.fiat . In that file,
- # the ID is in column "text", and the actual text
- # is in the column named "repence".
- #
- # TTYPE5 = practice
- # Is this practice data or not? Practice data was not used
- # in the published analysis.
- #
- # TTYPE6 = tap_m
- # Is this "tapping"(tap) data or "metronome"(m) data?
- # This column indicates the experimental task. The tapping
- # task required the subject to tap their finger along with
- # the stressed syllables of the text.
- # The metronome task presented the subject with a metronome tick
- # in an earphone, and they were asked to speak the sentences
- # to the beat of the metronome.
- # See the publication above for a more detailed description of
- # the experiment.
- # WARNING: In some of the tapping data, the finger taps are
- # loud enough to be heard in the microphone channel intended
- # for speech. Any analysis on that data would have to either
- # select utterances where the taps are not too loud, remove
- # them via some noise subtraction technique, or be carefully designed
- # not to be affected by the sounds of the taps.
- #
- # TTYPE7 = bpm
- # This is either empty (coded as '%mt') or the metronome rate in
- # beats per minute.
- #
- # TTYPE8 = f
- # This is the final component of the pathname to the data.
- # Relative to the location of this file, each utterance
- # is represented by a directory at d/f.
- # It contains several files of interest:
- # raw.wav -- the original recording, in Microsoft WAV format.
- # It is a two-channel file. One channel contains the
- # recorded speech, and the other channel contains either
- # metronome ticks or an audio channel from a microphone
- # positioned to pick up finger taps. (The subject's finger
- # tapped on a hardcover book about 2cm from the microphone.)
- # The finger tap channel will pick up some speech, but faintly,
- # and the speech channel will pick up some finger tap sounds.
- # However, metronome ticks were coupled in electronically and
- # are completely isolated from the speech channel.
- # ue.lbl -- These are the start and end-points of the speech in the
- # utterance, automatically generated but checked for accuracy
- # by a human. A small amount of silence (probably <100ms)
- # is included within
- # the marked endpoints on either side of the utterance.
- # See the above publication for details.
- # The data files are in a format suitable for reading by
- # the ESPS package Xwaves, and can be read by Wavesurfer
- # (circa 2008). Python 2.5 code for reading them is
- # available on the above Sourceforge site, in the file
- # .../gmisclib/xwaves_lab.py . In brief, the format
- # contains a bunch of header lines of basically useless
- # information, then a line consisting of a single hash mark
- # ('#'), then two relevant lines. The one containing an
- # asterisk in the third field marks the utterance start
- # (the time is in the first field). Likewise, the line
- # containing '%' marks the end.
- # Times are relative to the beginning of the raw.wav files.
- # raw.tap -- This file contains experimental tick or tap events.
- # For the metronome data, it contains the times at which
- # metronome ticks occur. For the "tick" data, if it
- # exists, it lists the times at which the subject's finger
- # tapped to mark a stressed syllable.
- # This is computed from one of the channels of the raw.wav file,
- # but manually checked.
- # This file is in the Xwaves label format, same as ue.lbl.
- # m.dat -- This file contains computed tick or tap locations.
- # It is meaningful only for metronome data, where it simply
- # marks the metronome ticks.
- # Other files are computed from the raw data, and are preserved for convenience.
- # These were used in the "What marks the beat of speech?" paper.
- # Theses files are in the "GPK ASCII Image" format, and is
- # readable/writeable by code in the speechresearch project
- # of http://sourceforge.net , in files gpkio/read.c
- # gpkio/ascii_read.c and related code. A python interface
- # is available in gpk_img_python/gpkimgclass.py
- # and gpk_img_python/gpk_img.cc (and related files).
- # The algorithms used to produce the data below are described in
- # the DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?" publication
- # referenced above.
- #
- # irr.dat -- An irregularity measure that separates voiced speech
- # from unvoiced. It quantifies speech that is not fully voiced.
- # This file is in the "GPK ASCII Image" format, see above.
- # loud.dat -- The perceptual loudness.
- # This file is in the "GPK ASCII Image" format, see above.
- # pdur.dat -- A measure of duration for the current syllable.
- # Essentially, it measures how far one can go (in time)
- # before the spectrum changes substantially.
- # This file is in the "GPK ASCII Image" format, see above.
- # rms.dat -- The RMS (intensity or power).
- # This file is in the "GPK ASCII Image" format, see above.
- # f0.dat -- A standard computation of the speech fundamental frequency.
- # This file is in the "GPK ASCII Image" format, see above.
- # sss.dat -- A measurement of the average slope of the speech spectrum.
- # This file is in the "GPK ASCII Image" format, see above.
- #
- # So, for instance, the audio for the utterance in the corpus
- # with d="nh" and f="nh_rep1_m84"
- # is found at nh/nh_rep1_m84/raw.wav . Start and end marks for that
- # utterance are at nh/nh_rep1_m84/ue.lbl , et cetera.
- #
- # The data used in the above publication have "rep*" in the text field
- # and are repetitive speech. Each phrase is repeated 10-15 times
- # in succession.
- # Files whose text field is in the form "sent" are long lists
- # of randomized sentences. These "sent" files were used,
- # along with the "rep*" files in another publication,
- # "Testing the Ecological Validity of Repetitive Speech",
- # Greg Kochanski and Christina Orphanidou,
- # presented at the 2007 International Congress of
- # the Phonetic Sciences (ICPhS2007), 6-10 August 2007.
- # It is available on the web at http://kochanski.org/gpk/papers/2007/icphs.pdf,
- # http://ora.ouls.ox.ac.uk/objects/uuid:1999c687-49a0-4808-9a50-2f82ab66d96f ,
- # or http://tinyurl.com/3u2ba4 .
- #
- # Files where the text field equals "fox", "king", and "lucky"
- # are longer texts that were not used. They are from
- # three books by Dr. Suess (Geisel).
- #
- #
- m ch ch fox 1 tap %mt ch_fox_tap_pr
- m ch ch fox 0 tap %mt ch_fox_tap
- m ch ch lucky 1 tap %mt ch_lucky_tap_pr
- m ch ch lucky 0 tap %mt ch_lucky_tap
- m ch ch king 1 tap %mt ch_king_tap_pr
- m ch ch king 0 tap %mt ch_king_tap
- m ch ch sent 0 %mt %mt ch_sent
- m ch ch fox 0 m 84 ch_fox_m84
- m ch ch fox 0 m 88 ch_fox_m88
- m ch ch fox 0 m 92 ch_fox_m92
- etc
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement