Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Check that a page is actually an article
- ?path: /posts/.+
- # Required info
- body: //article
- title: /head/meta[@property="og:title"]
- # Other info
- @datetime(0): //span[@class="date"]
- published_date: $@
- description: $body//p[1]
- cover: $body//img[1]
- image_url: $body//img[1]
- # Cleanup
- @remove: $body//h1[1]
- @remove: $body//div[@class="meta"]
- @remove: $body//hr
- @remove: $body//h4
- @remove: $body//div[@class="footer"]
Advertisement
Add Comment
Please, Sign In to add comment