Regional_Push

The one that can properly pull all the second lines and separate the first and the third column

Feb 14th, 2021
713
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 2.06 KB | None | 0 0
  1. #This question is based on a real life problem and a common situation in data analysis.
  2. # You have a folder (directory) of fasta (DNA sequence) data files here.
  3. # (Note that the folder is a compressed zip file than you need to uncompress).  
  4. # You also have a master spreadsheet (header_changes.csv) here.
  5. # Each fasta file has a header line, but these need to be changed.
  6. # Each row in header_changes.csv pertains to a file and the first item in that row is the file name.
  7. # Then a comma, then what should be the current header
  8. #  (you might want to check to see if that is correct, and even deliberately insert one with the wrong header to make sure it can detect that!).
  9. # The third row is the new header name.
  10. #  So your job is to replace the headers in all files correctly.
  11. # You might want to make sure that instead of writing over the old files, you save the new files in a new directory.
  12. # You can solve this in python, bash, or any combination of the 2.
  13. #  You must include some documentation so that anyone else can run your code easily and understand what is going on.
  14.  
  15. import glob, os
  16. from typing import Counter
  17.  
  18. def main():
  19.     header_changes_knife = open('header_changes.csv','r')
  20.     for i in header_changes_knife:
  21.         firstColumn = [line.split(',')[0] for line in header_changes_knife] # makes a list of the first column of the header changes file
  22.         third_column_read = [line.split(',')[2] for line in header_changes_knife] #makes a list of the third column of the header changes file
  23.         my_pass_to_fasta_opener = my_fasta_opener(firstColumn) # passes the first column to the function that actually reads and opens the fasta files
  24.  
  25. def my_fasta_opener(my_list):
  26.     counter = 0
  27.     for my_file in my_list:
  28.         os.chdir('C:\\Users\\dhaka\\OneDrive\\Desktop\\Semester material\\Data Skills class\\All Homework\\two\\10\\pauls_dna_seqs')
  29.         file_open = open(my_list[counter])
  30.         file_open.readline()
  31.         second_line = file_open.readline()
  32.         print(second_line)
  33.         counter += 1
  34.  
  35.            
  36. main()
  37.  
  38.  
  39.  
  40.  
  41.  
  42.  
  43.  
  44.  
Advertisement
Add Comment
Please, Sign In to add comment