SHARE
TWEET

Script

a guest Jun 26th, 2019 82 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1.  
  2. import json
  3. import re
  4.  
  5. path = 'F:\\wikititle\\wiki3.txt'
  6. i = open(path, encoding="utf-8")
  7. var=i.read()
  8. list2 = re.findall("[^a-zA-Z0-9]([a-zA-Z]{4}_[a-zA-Z]{6|7})[^a-zA-Z0-9]",var)
  9. list2 = list2.split()
  10. seen = set()
  11. uniq = [x for x in list2 if x not in seen and not seen.add(x)]    
  12.  
  13. f = open('F:\\wikititle\\kidsb.txt','x')
  14. f.write(json.dumps(uniq))
  15. f.close()
  16. print("done")
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
Not a member of Pastebin yet?
Sign Up, it unlocks many cool features!
 
Top