Advertisement
Guest User

Untitled

a guest
Nov 17th, 2019
109
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 14.12 KB | None | 0 0
  1. meta_data = """1    1   X   CH  13  Species code
  2. 2-3 2   AA  CH  4   Breed of evaluation
  3. Bull's identification information
  4. 4-5 2   AA  CH  4   Breed code (alpha code only, no zeros)
  5. 6-8 3   AAA CH  119 Country code of ID origin
  6. 9-20    12  XX...XX CH      Identification number (registration or eartag)
  7. Sire's identification information
  8. 21-22   2   AA  CH  4   Breed code (registration or eartag)
  9. 23-25   3   AAA CH  119 Country code of ID origin
  10. 26-37   12  XX...XX CH      Identification number (registration or eartag)
  11. Dam's identification information
  12. 38-39   2   AA  CH  4   Breed code (alpha code only, no zeros)
  13. 40-42   3   AAA CH  119 Country code of ID origin
  14. 43-54   12  XX...XX CH      Identification number (registration or eartag)
  15. Maternal grandsire's identification information
  16. 55-56   2   AA  CH  4   Breed code (registration or eartag)
  17. 57-59   3   AAA CH  119 Country code of ID origin
  18. 60-71   12  XX...XX CH      Identification number (registration or eartag)
  19. Bull's dual registration identification information
  20. 72-73   2   AA  CH  4 95    Breed code (alpha code only, no zeros)
  21. 74-76   3   AAA CH  119 95  Country code of ID origin
  22. 77-88   12  XX...XX CH  95  Identification number (registration or eartag)
  23. 89-96   8   XX...XX CH  88  Birth date (YYYYMMDD)
  24. 97-98   2   XX  CH  130 145 Registry status code
  25. 99-128  30  AA...AA CH      Registered name
  26. 129-148 20  AA...AA CH  95 145  Short AI name
  27. 149-154 6   XXXXXX  CH  83 95 145   Date the bull entered AI (YYYYMM)
  28. 155 1   A   CH  87 95 145   Sampling status
  29. 156-159 4   XXXX    CH  87 95 145   Sampling controller number
  30. 160 1   A   CH  35 94 95 145    Current status code
  31. 161-164 4   XXXX    CH  95 145  NAAB bull controller number
  32. 165 1   X   CH  95 145  Number of uniform NAAB sire codes assigned in following positions as well as columns 548-567
  33. 166-175 10  XXXAAXXXXX  CH  95 145  Primary NAAB code
  34. 176-205 30  XXXAAXXXXX  CH  95 145  Secondary NAAB codes (up to 3 additional codes)
  35. Herd with most daughters
  36. 206-207 2   XX  CH  5 145   State code
  37. 208-209 2   XX  CH  145 County code
  38. 210-213 4   XXXX    CH  145 Herd number
  39. 214-217 4   XXXX    CH  33 95 145   Number of daughters in herd with most daughters
  40. 218-219 2   XX  CH  5 33 145    State with most daughters
  41. 220-221 2   XX  CH  33 95 145   Age at first calving (months)
  42. 222-224 3   XXX CH  145 Percent daughters with 1st lactation records from management plans (Type of test ≥ 40)
  43. 225-227 3   XX.X    CH  145 Inbreeding coefficient of this bull (%)
  44. 228-230 3   XX.X    CH  145 Average daughter inbreeding percent
  45. 231-233 3   XX.X    CH  145 Expected future inbreeding (%) (EFI)
  46. 234-235 2   XX  CH      Reliability of yield (avg. of protein and milk-fat (MF) reliabilities, weighted by current component prices)
  47. 236-237 2   XX  CH  33 95   Reliability of PTA daughter pregnancy rate (DPR)
  48. 238-242 5   +/-XXXX CSL 33  PTA milk
  49. 243-244 2   XX  CH  33  Reliability of PTA MF
  50. 245-248 4   +/-XXX  CSL 33 95   PTA fat
  51. 249-251 3   +/-.XX  CSL 33  PTA fat percentage
  52. 252-253 2   XX  CH  33  Reliability of PTA protein
  53. 254-257 4   +/-XXX  CSL 33 95   PTA protein
  54. 258-260 3   +/-.XX  CSL 33  PTA protein percent
  55. 261-262 2   XX  CH  95  Reliability of PTA productive life (PL)
  56. 263-265 3   +/-X.X  CSL 95  PTA PL
  57. 266-267 2   XX  CH  95  Reliability of PTA somatic cell score (SCS)
  58. 268-270 3   X.XX    CH  95  PTA SCS
  59. 271-272 2   XX  CH      Reliability of net merit dollars (NM$)
  60. 273-277 5   +/-XXXX CSL 33  Fluid merit dollars (FM$)
  61. 278-282 5   +/-XXXX CSL 33  NM$
  62. 283-287 5   +/-XXXX CSL 33  Cheese merit dollars (CM$)
  63. 288-289 2   AA  CH  39  Net merit percentile
  64. 290-292 3   +/-X.X  CSL 95  PTA DPR
  65. 293 1   A   CH  156 Interbull usability code for DPR
  66. 294-296 3   XXX CH  39 145  Average number of DIM for first-lactation daughters (MF)
  67. 297-299 3   XXX CH  39 145  Average number of DIM for first-lactation daughters (protein)
  68. 300-302 3   X.XX    CH  39 145  Average age weight of daughters for PL evaluation
  69. 303-305 3   XXX CH  149 Pedigree completeness %
  70. 306-308 3   XXX CH  39 145  Percent of daughter first-lactation records that are in progress (MF)
  71. 309-311 3   XXX CH  39 145  Percent of daughter first-lactation records that are in progress (protein)
  72. 312-316 5   XXXXX   CH  33 95   Number of herds (DPR)
  73. 317-321 5   XXXXX   CH  33  Number of herds (MF)
  74. 322-326 5   XXXXX   CH  33  Number of herds (protein)
  75. 327-331 5   XXXXX   CH  33 95 145   Number of herds (PL)
  76. 332-336 5   XXXXX   CH  33 95   Number of herds (SCS)
  77. 337-341 5   XXXXX   CH  33 95   Number of daughters (DPR)
  78. 342-346 5   XXXXX   CH  33  Number of daughters (MF)
  79. 347-351 5   XXXXX   CH  33 41   Number of daughters (protein)
  80. 352-356 5   XXXXX   CH  95 145  Number of daughters (PL)
  81. 357-361 5   XXXXX   CH  95  Number of daughters (SCS)
  82. 362 1   X   CH  136 Interbull usability code for SCS
  83. 363 1   X   CH  150 Interbull preferred ID code/Clonal evaluation source code
  84. 364 1   X   CH  152 Interbull usability code for PL
  85. 365-367 3   X.XX    CH  33 145  Average number of lactations per daughter (MF)
  86. 368-370 3   X.XX    CH  33 145  Average number of lactations per daughter (protein)
  87. 371-373 3   XXX CH      Heterosis coefficient
  88. 374-376 3   XXX CH  33 145  Average number of lactations in daughter management group (MF)
  89. 377-379 3   XXX CH  33 145  Average number of lactations in daughter management group (protein)
  90. 380-381 2   AA  CH  4   Predominate breed for crossbred animals
  91. 382-384 3   XX.X    CH  33 95 145   Average standardized DPR
  92. 385-389 5   XXXXX   CH  33 145  Average standardized milk
  93. 390-393 4   XXXX    CH  33 145  Average standardized fat yield
  94. 394-395 2   X.X CH  33 145  Average standardized fat percent
  95. 396-400 5   XXXXX   CH  33 145  Average standardized milk (protein)
  96. 401-404 4   XXXX    CH  145 Average standardized protein yield
  97. 405-406 2   X.X CH  33 145  Average standardized protein percent
  98. 407-409 3   XX.X    CH  33 95 145   Average PL of daughters
  99. 410-412 3   X.XX    CH  95 145  Average standardized SCS
  100. 413-414 2   XX  CH      Number of countries in evaluation
  101. 415-417 3   AAA CH  119 Country with most daughters
  102. 418-422 5   +/-XXXX CSL 33 145  Daughter yield deviation (DYD) milk PTA milk change (interim summary)
  103. 423-426 4   +/-XXX  CSL 33 95 145   DYD fat PTA fat change (interim summary)
  104. 427-429 3   +/-.XX  CSL 33 95 145   DYD fat percent
  105. 430-434 5   +/-XXXX CSL 33 145  DYD milk (protein)
  106. 435-438 4   +/-XXX  CSL 33 145  DYD protein PTA protein change (interim summary)
  107. 439-441 3   +/-.XX  CSL 33 95 145   DYD protein percent
  108. 442-445 4   +/-XX.X CSL 33 95 145   Daughter deviation for PL
  109. 446-449 4   +/-X.XX CH  95 145  Daughter deviation for SCS
  110. 450-451 2   XX  CH  154 Percentage of predominate breed for crossbred animals
  111. 452-456 5   +/-XXXX CSL 145 Parent average (PA) milk
  112. 457-458 2   XX  CH  145 Reliability of PA (MF)
  113. 459-462 4   +/-XXX  CSL 95 145  PA fat
  114. 463-464 2   XX  CH  145 Reliability of PA (protein)
  115. 465-468 4   +/-XXX  CSL 95 145  PA protein
  116. 469-470 2   XX  CH  95 145  Reliability of PA (PL)
  117. 471-473 3   +/-X.X  CSL 95 145  PA PL
  118. 474-475 2   XX  CH  95 145  Reliability of PA (SCS)
  119. 476-478 3   X.XX    CH  95 145  PA SCS
  120. 479-481 3   XXX CH      Percent of daughters in the US - Genomic bulls with no daughters are reported as 100% US
  121. 482 1   X   CH  125 Interbull usability code for yield
  122. 483-484 2   AA  CH  145 Herdbook identifier [North American (NA) or international (I-blank)]
  123. 485 1   A   CH      Evaluation restriction code (for CDCB and NAAB use only)
  124. 486-500 15  00...00 CH      Zeroes: Available for future use
  125. 501-504 4   +/-XX.X CSL 95  Daughter deviation for DPR
  126. 505-507 3   +/-X.X  CSL 95  PA DPR
  127. 508-509 2   XX  CH  95  Reliability of PA DPR
  128. 510-514 5   +/-XXXX CSL     PA NM$
  129. 515-516 2   XX  CH  95  Reliability of PA NM$
  130. 517-520 4   +/-XX.X CSL 95  Sire conception rate (SCR)
  131. 521-522 2   XX  CH      Reliability of SCR
  132. 523-529 7   XX...XX CH  95  Number of breedings for SCR
  133. Red and White or clonal evaluation source
  134. identificaion information
  135. 530-531 2   AA  CH  4   Breed code (alpha code only, no zeros)
  136. 532-534 3   AAA CH  119 Country code of ID origin
  137. 535-546 12  XX...XX CH      Identification number (registration or eartag)
  138. 547 1   X   CH  160 Genomic indicator code
  139. 548-567 20  XXXAAXXXXX  CH  95 145  Continuation of secondary NAAB codes (two codes)
  140. Heifer conception rate (HCR) information
  141. 568-571 4   +/-XX.X CSL     PTA HCR
  142. 572-573 2   XX  CH      Reliability of PTA HCR
  143. 574-578 5   XXXXX   CH      Number of herds (HCR)
  144. 579-584 6   XX...XX CH      Number of daughters (HCR)
  145. 585 1   A   CH      Interbull usability code for HCR (0 domestic and official, 2 Interbull and official)
  146. Cow conception rate (CCR) information
  147. 586-589 4   +/-XX.X CSL     PTA CCR
  148. 590-591 2   XX  CH      Reliability of PTA CCR
  149. 592-596 5   XXXXX   CH      Number of herds (CCR)
  150. 597-602 6   XX...XX CH      Number of daughters (CCR)
  151. 603 1   A   CH      Interbull usability code for CCR (0 domestic and official, 2 Interbull and official)
  152. 604-607 4   +/-XX.X CSL     PA HCR
  153. 608-609 2   XX  CH      Reliability of PA HCR
  154. 610-613 4   +/-XX.X CSL     PA CCR
  155. 614-615 2   XX  CH      Reliability of PA CCR
  156. 616-617 2   XX  CH  162 Type of chip
  157. 618-621 4   +/-XX.X CSL     Genomic inbreeding coefficient of this bull (%)
  158. 622-625 4   +/-XX.X CSL     Genomic future inbreeding coefficient of this bull (%)
  159. 626-630 5   +/-XXXX CSL 33  Grazing Merit dollars (GM$)
  160. Livability information (introduced August 2016)
  161. 631-634 4   +/-XX.X CSL 33  PTA livability
  162. 635-636 2   XX  CH  33  Reliabilty of PTA livability
  163. 637-641 5   XXXXX   CH  33  Number of herds (livability)
  164. 642-647 6   XXXXXX  CH  33  Number of daughters (livability)
  165. 648-651 4   +/-XX.X CSL 145 Parent average (livability)
  166. 652-653 2   XX  CH  145 Reliability of PA (livability)
  167. Gestation Length information (introduced August 2017)
  168. 654-656 3   +/-X.X  CSL 33  PTA Gestation Length
  169. 657-658 2   XX  CH  33  Reliabilty of PTA Gestation Length
  170. 659-663 5   XXXXX   CH  33  Number of herds (Gestation Length)
  171. 664-669 6   XXXXXX  CH  33  Number of daughters (Gestation Length)
  172. 670-672 3   +/-X.X  CSL 145 Parent average (Gestation Length)
  173. 673-674 2   XX  CH  145 Reliability of PA (Gestation Length)
  174. Milk fever information (introduced April 2018)
  175. 675-678 4   +-XX.X  CSL 33  PTA Milk Fever
  176. 679-680 2   XX  CH  33  Reliabilty of PTA Milk Fever
  177. 681-685 5   XXXXX   CH  33  Number of herds (Milk Fever)
  178. 686-691 6   XXXXXX  CH  33  Number of daughters (Milk Fever)
  179. 692-695 4   +-XX.X  CSL 145 Parent average (Milk Fever)
  180. 696-697 2   XX  CH  145 Reliability of PA (Milk Fever)
  181. Displaced abomasum information (introduced April 2018)
  182. 698-701 4   +-XX.X  CSL 33  PTA Displaced abomasum
  183. 702-703 2   XX  CH  33  Reliabilty of PTA Displaced abomasum
  184. 704-708 5   XXXXX   CH  33  Number of herds (Displaced abomasum)
  185. 709-714 6   XXXXXX  CH  33  Number of daughters (Displaced abomasum)
  186. 715-718 4   +-XX.X  CSL 145 Parent average (Displaced abomasum)
  187. 719-720 2   XX  CH  145 Reliability of PA (Displaced abomasum)
  188. Ketosis information (introduced April 2018)
  189. 721-724 4   +-XX.X  CSL 33  PTA Ketosis
  190. 725-726 2   XX  CH  33  Reliabilty of PTA Ketosis
  191. 727-731 5   XXXXX   CH  33  Number of herds (Ketosis)
  192. 732-737 6   XXXXXX  CH  33  Number of daughters (Ketosis)
  193. 738-741 4   +-XX.X  CSL 145 Parent average (Ketosis)
  194. 742-743 2   XX  CH  145 Reliability of PA (Ketosis)
  195. Mastitis information (introduced April 2018)
  196. 744-747 4   +-XX.X  CSL 33  PTA Mastitis
  197. 748-749 2   XX  CH  33  Reliabilty of PTA Mastitis
  198. 750-754 5   XXXXX   CH  33  Number of herds (Mastitis)
  199. 755-760 6   XXXXXX  CH  33  Number of daughters (Mastitis)
  200. 761-764 4   +-XX.X  CSL 145 Parent average (Mastitis)
  201. 765-766 2   XX  CH  145 Reliability of PA (Mastitis)
  202. Metritis information (introduced April 2018)
  203. 767-770 4   +-XX.X  CSL 33  PTA Metritis
  204. 771-772 2   XX  CH  33  Reliabilty of PTA Metritis
  205. 773-777 5   XXXXX   CH  33  Number of herds (Metritis)
  206. 778-783 6   XXXXXX  CH  33  Number of daughters (Metritis)
  207. 784-787 4   +-XX.X  CSL 145 Parent average (Metritis)
  208. 788-789 2   XX  CH  145 Reliability of PA (Metritis)
  209. Retained placenta information (introduced April 2018)
  210. 790-793 4   +-XX.X  CSL 33  PTA Retained placenta
  211. 794-795 2   XX  CH  33  Reliabilty of PTA Retained placenta
  212. 796-800 5   XXXXX   CH  33  Number of herds (Retained placenta)
  213. 801-806 6   XXXXXX  CH  33  Number of daughters (Retained placenta)
  214. 807-810 4   +-XX.X  CSL 145 Parent average (Retained placenta)
  215. 811-812 2   XX  CH  145 Reliability of PA (Retained placenta)
  216. Early First Calving (introduced April 2019)
  217. 813-816 4   +-XX.X  CSL 33  PTA Early First Calving (introduced April 2019)
  218. 817-818 2   XX  CH  33  Reliabilty of PTA Early First Calving (introduced April 2019)
  219. 819-823 5   XXXXX   CH  33  Number of herds (Early First Calving) (introduced April 2019)
  220. 824-829 6   XXXXXX  CH  33  Number of daughters (Early First Calving) (introduced April 2019)
  221. 830-833 4   +-XX.X  CSL 145 Parent average (Early First Calving) (introduced April 2019)
  222. 834-835 2   XX  CH  145 Reliability of PA (Early First Calving) (introduced April 2019)"""
  223.  
  224. # Split on all the lines
  225. meta_data = meta_data.split('\n')
  226.  
  227. # For every line index
  228. for i in range(len(meta_data)):
  229.     # Split that line so we now have a list of lists. [ [obs11, obs21, ...], [obs21, obs 22...], ...]
  230.     meta_data[i] = meta_data[i].split('\t')
  231.    
  232. # Only take the rows with actual data in it. Not the weird label things.
  233. meta_data = [i for i in meta_data if len(i) == 6]
  234.  
  235. # Fields are based on the last elements in each row
  236. fields = [i[-1] for i in meta_data]
  237.  
  238. # The first field is the byte range.
  239. # e.g. "819-823".split('-') => ["819", "823"].
  240. # Here i refers to the whole row
  241. # could also be written as: [row[0].split('-') for row in meta_data]
  242. pos = [i[0].split('-') for i in meta_data]
  243.  
  244.  
  245. for i in range(len(pos)):
  246.     # Some rows only have one number, which is silly
  247.     if len(pos[i]) == 1:
  248.         # This just takes a single number and repeats it
  249.         # e.g. 2*[8] => ["8","8"]
  250.         pos[i] = 2*pos[i]
  251.     # Everything is still in strings so let's convert those to ints
  252.     pos[i] = list(map(int, pos[i]))
  253.     # Remember python is 0 indexed. So our range should be
  254.     # [start -1, end].
  255.     pos[i][0] = pos[i][0] - 1
  256.    
  257. def parse_line(line):
  258.     res = []
  259.     for start, end in pos:
  260.         res.append(line[start:end].strip())
  261.     return res
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement