Python Automation Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it...

  1. Import the feedparser module, as well as datetimedelorean, and requests:
import feedparser
import datetime
import delorean
import requests
  1. Parse the feed (it will be downloaded automatically) and check when it was last updated. Feed information, like the title of the feed, can be obtained in the feed attribute:
>>> rss = feedparser.parse('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml')
>>> rss.updated
'Sat, 02 Jun 2018 19:50:35 GMT'
  1. Get the entries that are newer than six hours:
>>> time_limit = delorean.parse(rss.updated) - datetime.timedelta(hours=6)
>>> entries = [entry for entry in rss.entries if delorean.parse(entry.published) > time_limit]
  1. There will be fewer entries than the total ones, because some of the returned entries will be older than six hours:
>>> len(entries)
10
>>> len(rss.entries)
44
  1. Retrieve information about the entries, such as the title. The full entry URL is available as link. Explore the available information in this particular feed:
>>> entries[5]['title']
'Loose Ends: How to Live to 108'
>>> entries[5]['link']
'https://www.nytimes.com/2018/06/02/opinion/sunday/how-to-live-to-108.html?partner=rss&emc=rss'
>>> requests.get(entries[5].link)
<Response [200]>