Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- $cat file_1
- name_of_sequence C other_information
- name_of_sequence C other_information
- name_of_sequence C other_information
- name_of_sequence D other_information
- ...
- $cat file_2
- name_of_sequence B other_information
- name_of_sequence C other_information
- name_of_sequence C other_information
- name_of_sequence C other_information
- ...
- $cat file_3
- name_of_sequence A other_information
- name_of_sequence A other_information
- name_of_sequence A other_information
- name_of_sequence A other_information
- ...
- $cat .csv/.tsv output:
- taxa A B C D E
- File_1 0 0 3 8 1
- File_2 0 1 3 2 9
- File_3 7 3 9 0 6
- import sys
- import csv
- names = set()
- files = {}
- for file_name in sys.argv[1:]:
- b = files.setdefault(file_name, {})
- with open(file_name) as fp:
- for line in fp:
- x = line.strip().split()
- names.add(x[1])
- b.setdefault(x[1], [0])[0] += 1
- names = sorted(list(names))
- grid = []
- top_line = ['taxa']
- grid.append(top_line)
- for name in names:
- top_line.append(name)
- for file_name in sys.argv[1:]:
- data = files[file_name]
- line = [file_name]
- grid.append(line)
- for name in names:
- line.append(data.get(name, [0])[0])
- with open('out.csv', 'w') as fp:
- writer = csv.writer(fp)
- writer.writerows(grid)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement