This week I was working with some RSS feeds in R. You can parse these feeds directly with packages like XML or xml2, or use the rss package (not on CRAN) or the feedeR package (on CRAN). However, I noticed that both rss and feedeR return lists, which necessitates further work in R to get the data into a tidy format, and that both packages leave interesting data behind. With that in mind, I decided to make tidyRSS, a package for parsing these feeds and returning a tidy data frame to the user.
Overall, it was a great experience making this package and working through Hadley Wickham’s R Packages book as I developed it and submitted it to CRAN. I did (of course) make some basic mistakes, kindly pointed out to me by Bob Rudis (who has an excellent blog and some great R packages, like ggalt).1
So what advice could I give after this process? Here goes:
Testing the package
First, devtools and Hadley’s book are great. There are always little things you forget to do, like committing to git, changing the DESCRIPTION, testing the damn thing (!!). And that latter point is important. I had a little test for the package: the function
tidyfeed() cycled through the database provided with the package (a list of feed urls) and tried each one. I had it working well when I remembered the feeds that I started off with, which come from the Brazilian government. Of course, these feeds didn’t work at all…sigh. I then updated the package by using XML and modifying some little things. I tested it again on a few example feeds and all worked well (or so I thought).
As soon as the package was released on CRAN, Bob left me a bug report on GitHub. The first two feeds he tried didn’t work. Damn, I was embarrassed. Lesson 1: thoroughly check the functions you’ve written.
Submitting to CRAN
I got the impression from Hadley’s book that, while it’s desirable that no ‘NOTES’ are returned from
devtools::check(), it’s not 100% necessary. Judging from my interaction with the CRAN maintainers, it is necessary. The check should look like this afterwards:
R CMD check results 0 errors | 0 warnings | 0 notes
Lesson 2: Better to fix the package until the check gives you no errors, no warnings and no notes.
Making a mess of everything and resubmitting
Well, after the mess pointed out by Bob, I spent a day trying to fix everything so that the package worked well. I’m quite sure it does now, although there is one feed on the list that doesn’t play nice (a BBC feed that returns NULLs for most fields regardless of how I try to get the data).
I updated the package to version 1.0.1 and committed to GitHub. The development version is fine and up-to-date, but as of yet the maintainers at CRAN haven’t accepted the resubmit. It seems a new version within a day is not appreciated.2 But anway, I’ll resubmit next week and hopefully tidyRSS will be a useful addition to R-world.
So how did this mess come about? I think part of it was that I was insanely excited (I have really turned into a neRd) about making a ‘real’ R package and having it available on CRAN. I should have slowed down and taken the time to double-check everything. That I will do next time, and that is lesson 3.
Well, other than the horrendous mistakes I made, making an R package and submitting it to CRAN has been a positive experience. I’ve been involved with packages before, without being the sole author or the submitter. I think it improves you as a programmer, making a package by yourself, having to think of all the possible ways things can go wrong. I’m sure there will be some problems with the package in the future (RSS is really messy), but each bug is actually a chance to learn something an improve, not only the package, but my skills as well. As the quote from the Buddhist Henepola Gunaratana goes, “don’t bury your burden in saintly silence. You have a problem? Great. Rejoice, dive in, and investigate.”
What else can I add? Well, one (pretty funny) nugget of advice comes from a tweet by Bob himself:
So don’t forget that one!