tidyRSS is a package for extracting data from RSS feeds, including Atom feeds and JSON feeds. For geo-type feeds, see the section on changes in version 2 below, or jump directly to tidygeoRSS, which is designed for that purpose.
It is easy to use as it only has one function, tidyfeed()
, which takes
five arguments:
httr::GET()
;parse_dates
argument, a logical flag, which will attempt to
parse dates if TRUE
(see below).If parse_dates
is TRUE
, tidyfeed()
will attempt to parse dates
using the anytime package.
Note that this removes some lower-level control that you may wish to
retain over how dates are parsed. See this
issue for an example.
It can be installed directly from CRAN with:
install.packages("tidyRSS")
The development version can be installed from GitHub with the remotes package:
remotes::install_github("robertmyles/tidyrss")
Here is how you can get the contents of the R Journal:
library(tidyRSS)
tidyfeed("http://journal.r-project.org/rss.atom")
The biggest change in version 2 is that tidyRSS no longer attempts to parse geo-type feeds into sf tibbles. This functionality has been moved to tidygeoRSS.
XML feeds can be finicky things, if you find one that doesn’t work with
tidyfeed()
, feel free to create an
issue with the url of
the feed that you are trying. Pull Requests are welcome if you’d like to
try and fix it yourself. For older RSS feeds, some fields will almost
never be ‘clean’, that is, they will contain things like newlines (\n
)
or extra quote marks. Cleaning these in a generic way is more or less
impossible so I suggest you use
stringr,
strex and/or tools from base R
such as gsub to clean these. This will mainly affect the
item_description
column of a parsed RSS feed, and will not often
affect Atom feeds (and should never be a problem with JSON).
There are two other related packages that I’m aware of:
In comparison to feedeR, tidyRSS returns more information from the RSS feed (if it exists), and development on rss seems to have stopped some time ago.
For the schemas used to develop the parsers in this package, see:
I’ve implemented most of the items in the schemas above. The following are not yet implemented:
Atom meta info:
Rss meta info: