Guest User

filter_eng_cn

a guest
Mar 10th, 2021
58
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. """
  2. 1) read & parse in csv
  3. 2) filter eng-only & edge weight = 1 assertions
  4. 3) save into json format
  5. {"
  6. """
  7. import csv
  8. import json
  9. import os
  10.  
  11.  
  12. def filter_edges(row):
  13. return json.loads(row[-1])["weight"] == 1
  14.  
  15.  
  16. def filter_english(row):
  17. return "/en/" in row[2] and "/en/" in row[3]
  18.  
  19.  
  20. def find_start_end(row):
  21. start = row.find('/en/') + len('/en/')
  22. end = row.find('/', start)
  23. if end == -1:
  24. end = None
  25. return start, end
  26.  
  27.  
  28. def filter_json(filename='assertions.csv'):
  29. with open('assertions.csv') as csv_file:
  30. csv_reader = csv.reader(csv_file, delimiter='\t')
  31. output_dict = dict()
  32. for row in csv_reader:
  33. if filter_edges(row) & filter_english(row):
  34. src_start, src_end = find_start_end(row[2])
  35. source = row[2][src_start:src_end]
  36. target_start, target_end = find_start_end(row[3])
  37. target = row[3][target_start:target_end]
  38. relation = row[1][2:]
  39. meta_data = row[-1]
  40. if relation not in output_dict.keys():
  41. output_dict[relation] = dict()
  42. output_dict[relation][str((source, target))] = {"source": source, "target": target,
  43. "meta_data": meta_data}
  44. with open("filtered_assertions.json", "w") as write_file:
  45. json.dump(output_dict, write_file)
  46.  
  47.  
  48. def filter_csv(filename='assertions.csv', filtered_path='filtered_assertions.csv'):
  49. if os.path.isfile(filtered_path):
  50. os.remove(filtered_path)
  51. with open(filename) as csv_file:
  52. csv_reader = csv.reader(csv_file, delimiter='\t')
  53. row_count = 0
  54. for row in csv_reader:
  55. if filter_edges(row) & filter_english(row):
  56. with open(filtered_path, 'a') as filtered_file:
  57. csv_writer = csv.writer(filtered_file, delimiter='\t')
  58. csv_writer.writerow(row)
  59. row_count += 1
  60. with open("filtered_row_count.txt", 'w') as row_count_file:
  61. os.write(row_count_file, row_count)
  62.  
  63.  
  64. if __name__ == '__main__':
  65. filter_csv('assertions.csv')
RAW Paste Data

Adblocker detected! Please consider disabling it...

We've detected AdBlock Plus or some other adblocking software preventing Pastebin.com from fully loading.

We don't have any obnoxious sound, or popup ads, we actively block these annoying types of ads!

Please add Pastebin.com to your ad blocker whitelist or disable your adblocking software.

×