Advertisement
Kotuara

Парсер словаря Ожегова с удалением оглавления

Jul 9th, 2022
959
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.70 KB | None | 0 0
  1. import requests
  2. from bs4 import BeautifulSoup
  3.  
  4. file_name = 'dicts/all.txt'
  5. file = open(file_name, 'w+')
  6. def get_new_words(url, file_name):
  7.     response = requests.get(url)
  8.     soup = BeautifulSoup(response.content.decode('windows-1251'), features="lxml")
  9.     quotes = soup.find_all('li')
  10.     for quote in quotes:
  11.         file.write(str(quote.text) + '\n')
  12.  
  13. for i in range(1,30):
  14.     url = 'https://onlinedic.net/ozhegov/letter' + str(i) + '.php'
  15.     get_new_words(url, file_name)
  16.  
  17. with open('dicts/all.txt') as source, open('dicts/all_changed.txt', 'w') as destination:
  18.     for line in source:
  19.         if len(line) > 2:
  20.             if '...' not in line:
  21.                 destination.write(line)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement