Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- As part of my mission to discover hidden gold in the Steemit stream, I have been writing tools to help sift the nuggets from the mud.
- Turns out there are existing algorithms out there that can make life easier. One of which is "Readability Scoring".
- What is a Readability Score?
- There are tons of options, from USA "grade level" (eg. your content could be read by a Grade 5 student), through to highly complex math that is way over my head.
- What I want is something that says essentially "This text would be easily read by half of the audience". It is just one filter out of many so does not need to be perfect.
- Python to the rescue
- # these modules help us do the script
- import html2text
- import requests
- import sys
- First, we get the basic imports that will help us build the scaffolding to the script.
- We need to:
- specify the URL
- load content from the specified URL
- remove HTML from the web page that we load
- Then we need to pass it through the function that will give us the readability score. That function lives in the textstat library:
- # this is the important library that actually does the work
- from textstat.textstat import textstat
- Without stripping HTML we will confuse the function so we need to ensure even hyperlinks are removed:
- # we want to not include even link tags
- h.ignore_links = True
- # function to remove html - could be more robust
- def remove_html(in_text):
- return h.handle(in_text)
- For simplicity, I am specifying the URL via a command line argument for now. Later it will be part of my Discord bot.
- We pass in the argument then grab it from the web using Requests.
- # grab whatever content was specified in the command line
- url = sys.argv[1]
- # we need to strip html tags
- test_string = remove_html(requests.get( url ).text)
- # show what we grabbed
- print( test_string )
- print()
- All that remains is to run it through the test!
- # So how readable is it?
- print( textstat.flesch_reading_ease(test_data) + " /100" )
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement