Advertisement
Guest User

BenO

a guest
Oct 20th, 2008
525
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.81 KB | None | 0 0
  1. # fiat 1.0
  2. #
  3. # This data (and the data it refers to) is copyright 2007, 2008 by
  4. # Greg Kochanski, and is licensed in England under
  5. # the Creative Commons Noncommercial-Attribution License.
  6. # Details may be found at http://creativecommons.org/licenses/by-nc/2.0/uk/legalcode .
  7. # You may copy and/or use this file (and referenced files) for noncommercial
  8. # purposes so long as the author is properly acknowledged.
  9. # For commercial licensing, contact Isis Innovation,
  10. # http://www.isis-innovation.com/ .
  11. # COPYRIGHT = Greg Kochanski
  12. # LICENSE_URL = http://creativecommons.org/licenses/by-nc/2.0/uk/legalcode
  13. #
  14. # This file contains metadata describing the "tick1" experiment
  15. # from ESRC grant "Articulation and Coarticulation in the Lower Vocal Tract"
  16. # with G. Kochanski and J. Coleman as principal investigators.
  17. # Data is courtesy of the UK's Economics and Social Research Council,
  18. # derived from project RES-000-23-1094, 7/2005 through 3/2008.
  19. # When using this data, the appropriate publication to reference is
  20. # DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?"
  21. # G. Kochanski and C. Orphanidou, Journal of the Acoustical Society of America,
  22. # ISSN 0001-4966, Volume 123(5), pages 2780-2791.
  23. #
  24. # This table is in the FIAT data format, defined originally by
  25. # http://dls.physics.ucdavis.edu/fiat/fiat.html . Python implementations
  26. # of modules to read and write this format can be found at
  27. # http://sourceforge.net under the "speechresearch" project, in the
  28. # "gmisclib/fiatio.py" file. http://sourceforge.net/projects/speechresearch
  29. # should lead to the software.
  30. # The format is simply a tab-separated column format, with escape sequences
  31. # that begin with percent characters.
  32. #
  33. # This table contains one line per utterance. It describes each
  34. # utterance produced in the experiment.
  35. # Columns are as follows:
  36. #
  37. # TTYPE1 = sex
  38. # Gender of the speaker.
  39. #
  40. # TTYPE2 = speakerID
  41. # A unique identifier for each speaker (experimental subject).
  42. # These identifiers are the same as the "speakerID" identifiers
  43. # in "DBsub.fiat", and can be used to look up some additional
  44. # information about that recording session.
  45. #
  46. # TTYPE3 = d
  47. # Directory which holds that utterance.
  48. #
  49. # TTYPE4 = text
  50. # The unique ID for the text that is spoken.
  51. # The actual text can be looked up under the
  52. # same ID in file DBsent.fiat . In that file,
  53. # the ID is in column "text", and the actual text
  54. # is in the column named "repence".
  55. #
  56. # TTYPE5 = practice
  57. # Is this practice data or not? Practice data was not used
  58. # in the published analysis.
  59. #
  60. # TTYPE6 = tap_m
  61. # Is this "tapping"(tap) data or "metronome"(m) data?
  62. # This column indicates the experimental task. The tapping
  63. # task required the subject to tap their finger along with
  64. # the stressed syllables of the text.
  65. # The metronome task presented the subject with a metronome tick
  66. # in an earphone, and they were asked to speak the sentences
  67. # to the beat of the metronome.
  68. # See the publication above for a more detailed description of
  69. # the experiment.
  70. # WARNING: In some of the tapping data, the finger taps are
  71. # loud enough to be heard in the microphone channel intended
  72. # for speech. Any analysis on that data would have to either
  73. # select utterances where the taps are not too loud, remove
  74. # them via some noise subtraction technique, or be carefully designed
  75. # not to be affected by the sounds of the taps.
  76. #
  77. # TTYPE7 = bpm
  78. # This is either empty (coded as '%mt') or the metronome rate in
  79. # beats per minute.
  80. #
  81. # TTYPE8 = f
  82. # This is the final component of the pathname to the data.
  83. # Relative to the location of this file, each utterance
  84. # is represented by a directory at d/f.
  85. # It contains several files of interest:
  86. # raw.wav -- the original recording, in Microsoft WAV format.
  87. # It is a two-channel file. One channel contains the
  88. # recorded speech, and the other channel contains either
  89. # metronome ticks or an audio channel from a microphone
  90. # positioned to pick up finger taps. (The subject's finger
  91. # tapped on a hardcover book about 2cm from the microphone.)
  92. # The finger tap channel will pick up some speech, but faintly,
  93. # and the speech channel will pick up some finger tap sounds.
  94. # However, metronome ticks were coupled in electronically and
  95. # are completely isolated from the speech channel.
  96. # ue.lbl -- These are the start and end-points of the speech in the
  97. # utterance, automatically generated but checked for accuracy
  98. # by a human. A small amount of silence (probably <100ms)
  99. # is included within
  100. # the marked endpoints on either side of the utterance.
  101. # See the above publication for details.
  102. # The data files are in a format suitable for reading by
  103. # the ESPS package Xwaves, and can be read by Wavesurfer
  104. # (circa 2008). Python 2.5 code for reading them is
  105. # available on the above Sourceforge site, in the file
  106. # .../gmisclib/xwaves_lab.py . In brief, the format
  107. # contains a bunch of header lines of basically useless
  108. # information, then a line consisting of a single hash mark
  109. # ('#'), then two relevant lines. The one containing an
  110. # asterisk in the third field marks the utterance start
  111. # (the time is in the first field). Likewise, the line
  112. # containing '%' marks the end.
  113. # Times are relative to the beginning of the raw.wav files.
  114. # raw.tap -- This file contains experimental tick or tap events.
  115. # For the metronome data, it contains the times at which
  116. # metronome ticks occur. For the "tick" data, if it
  117. # exists, it lists the times at which the subject's finger
  118. # tapped to mark a stressed syllable.
  119. # This is computed from one of the channels of the raw.wav file,
  120. # but manually checked.
  121. # This file is in the Xwaves label format, same as ue.lbl.
  122. # m.dat -- This file contains computed tick or tap locations.
  123. # It is meaningful only for metronome data, where it simply
  124. # marks the metronome ticks.
  125. # Other files are computed from the raw data, and are preserved for convenience.
  126. # These were used in the "What marks the beat of speech?" paper.
  127. # Theses files are in the "GPK ASCII Image" format, and is
  128. # readable/writeable by code in the speechresearch project
  129. # of http://sourceforge.net , in files gpkio/read.c
  130. # gpkio/ascii_read.c and related code. A python interface
  131. # is available in gpk_img_python/gpkimgclass.py
  132. # and gpk_img_python/gpk_img.cc (and related files).
  133. # The algorithms used to produce the data below are described in
  134. # the DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?" publication
  135. # referenced above.
  136. #
  137. # irr.dat -- An irregularity measure that separates voiced speech
  138. # from unvoiced. It quantifies speech that is not fully voiced.
  139. # This file is in the "GPK ASCII Image" format, see above.
  140. # loud.dat -- The perceptual loudness.
  141. # This file is in the "GPK ASCII Image" format, see above.
  142. # pdur.dat -- A measure of duration for the current syllable.
  143. # Essentially, it measures how far one can go (in time)
  144. # before the spectrum changes substantially.
  145. # This file is in the "GPK ASCII Image" format, see above.
  146. # rms.dat -- The RMS (intensity or power).
  147. # This file is in the "GPK ASCII Image" format, see above.
  148. # f0.dat -- A standard computation of the speech fundamental frequency.
  149. # This file is in the "GPK ASCII Image" format, see above.
  150. # sss.dat -- A measurement of the average slope of the speech spectrum.
  151. # This file is in the "GPK ASCII Image" format, see above.
  152. #
  153. # So, for instance, the audio for the utterance in the corpus
  154. # with d="nh" and f="nh_rep1_m84"
  155. # is found at nh/nh_rep1_m84/raw.wav . Start and end marks for that
  156. # utterance are at nh/nh_rep1_m84/ue.lbl , et cetera.
  157. #
  158. # The data used in the above publication have "rep*" in the text field
  159. # and are repetitive speech. Each phrase is repeated 10-15 times
  160. # in succession.
  161. # Files whose text field is in the form "sent" are long lists
  162. # of randomized sentences. These "sent" files were used,
  163. # along with the "rep*" files in another publication,
  164. # "Testing the Ecological Validity of Repetitive Speech",
  165. # Greg Kochanski and Christina Orphanidou,
  166. # presented at the 2007 International Congress of
  167. # the Phonetic Sciences (ICPhS2007), 6-10 August 2007.
  168. # It is available on the web at http://kochanski.org/gpk/papers/2007/icphs.pdf,
  169. # http://ora.ouls.ox.ac.uk/objects/uuid:1999c687-49a0-4808-9a50-2f82ab66d96f ,
  170. # or http://tinyurl.com/3u2ba4 .
  171. #
  172. # Files where the text field equals "fox", "king", and "lucky"
  173. # are longer texts that were not used. They are from
  174. # three books by Dr. Suess (Geisel).
  175. #
  176. #
  177. m ch ch fox 1 tap %mt ch_fox_tap_pr
  178. m ch ch fox 0 tap %mt ch_fox_tap
  179. m ch ch lucky 1 tap %mt ch_lucky_tap_pr
  180. m ch ch lucky 0 tap %mt ch_lucky_tap
  181. m ch ch king 1 tap %mt ch_king_tap_pr
  182. m ch ch king 0 tap %mt ch_king_tap
  183. m ch ch sent 0 %mt %mt ch_sent
  184. m ch ch fox 0 m 84 ch_fox_m84
  185. m ch ch fox 0 m 88 ch_fox_m88
  186. m ch ch fox 0 m 92 ch_fox_m92
  187.  
  188. etc
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement