
上QQ阅读APP看书,第一时间看更新
How to do it...
- Import the feedparser module, as well as datetime, delorean, and requests:
import feedparser
import datetime
import delorean
import requests
- Parse the feed (it will be downloaded automatically) and check when it was last updated. Feed information, like the title of the feed, can be obtained in the feed attribute:
>>> rss = feedparser.parse('http://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml')
>>> rss.updated
'Sat, 02 Jun 2018 19:50:35 GMT'
- Get the entries that are newer than six hours:
>>> time_limit = delorean.parse(rss.updated) - datetime.timedelta(hours=6)
>>> entries = [entry for entry in rss.entries if delorean.parse(entry.published) > time_limit]
- There will be fewer entries than the total ones, because some of the returned entries will be older than six hours:
>>> len(entries)
10
>>> len(rss.entries)
44
- Retrieve information about the entries, such as the title. The full entry URL is available as link. Explore the available information in this particular feed:
>>> entries[5]['title']
'Loose Ends: How to Live to 108'
>>> entries[5]['link']
'https://www.nytimes.com/2018/06/02/opinion/sunday/how-to-live-to-108.html?partner=rss&emc=rss'
>>> requests.get(entries[5].link)
<Response [200]>