{"id":820,"date":"2010-10-22T17:43:00","date_gmt":"2010-10-22T17:43:00","guid":{"rendered":"https:\/\/www.keenertech.com\/?p=820"},"modified":"2010-10-22T17:43:00","modified_gmt":"2010-10-22T17:43:00","slug":"recipe-parsing-rss-and-atom-feeds","status":"publish","type":"post","link":"https:\/\/staging.keenertech.com\/?p=820","title":{"rendered":"Recipe: Parsing RSS and Atom Feeds"},"content":{"rendered":"\n<p>Sometimes it&#8217;s desirable to be able to ingest a remote RSS or Atom feed in order to make content available within a web application. Clearly, the easiest way to expand the content offerings of a web site is to incorporate content from other sources. Standards like RSS and Atom were designed precisely to support the syndication of content in this fashion.<\/p>\n\n\n\n<p>The first thing that pops into the heads of developers when this kind of requirement comes up is the dawning realization that they may have to create some really ugly XML-parsing code. It just sounds like one of hose dreary, painful programming tasks that occasionally come down the pike.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Problem<\/h3>\n\n\n\n<p>Ingest RSS or Atom feeds and parse the content so that it can be repurposed for the needs of a Rails web application.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Solution<\/h3>\n\n\n\n<p>The HTTParty gem makes it almost trivial to parse both RSS and Atom feeds.&nbsp;<strong>Listing 1<\/strong>&nbsp;shows the Ruby code for the Feed class.<\/p>\n\n\n\n<p><strong>Listing 1: The Feed Class<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">  class Feed\n    include HTTParty\n    format :xml\n\n    def initialize(feed_url)\n      @feed_url = feed_url\n    end\n  \n    def feed_url\n      @feed_url\n    end\n  \n    def url\n      uri = URI.parse(@feed_url)\n      strip_feed_extension(uri.scheme + ':\/\/' + uri.host + uri.path)\n    end\n\n    def latest(params={})\n      response = {}\n      begin\n        response = Feed.get(@feed_url)\n      rescue REXML::ParseException =&gt; e\n        RAILS_DEFAULT_LOGGER.warn(\"forum feed parse error: \" + e.message)\n        response[\"feed\"] = \"\"\n      end\n    \n      response[\"feed\"]\n    end\n  \n    private\n  \n      def strip_feed_extension(uri)\n        str = uri.sub(\/.atom\/, '')\n        str.sub(\/.rss\/, '')\n      end\n  end\n<\/pre>\n\n\n\n<p>Place the feed.rb class in the lib directory of your Rails application. Then run script\/console to bring up a console.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>&gt; f = Feed.new('http:\/\/www.keenertech.com\/articles.atom')<\/strong><br><strong>&gt; feed = f.latest<\/strong><\/pre>\n\n\n\n<p>That&#8217;s all there is to it. The feed has been parsed already. So, let&#8217;s view some summary information about the feed.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>&gt; feed['title']<\/strong><br>KeenerTech.com<br><strong>&gt; feed['link']['href']<\/strong><br>http:\/\/www.keenertech.com\/articles.atom<\/pre>\n\n\n\n<p>Well, that&#8217;s great, but what about the entries?<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>> entries = feed['entry']<\/strong><br>[ {}, {}, \u2026]<br><strong>> e = entries[0]<\/strong><br><strong>> e['title']<\/strong><br>Leveraging Rails to Build Facebook Apps<br><strong>> e['author']['name']<\/strong><br>David Keener<br><strong>> e['link']['href']<\/strong><br>http:\/\/www.keenertech.com\/articles\/2010\/09\/29\/leveraging-rails-to-build-facebook-apps<br><strong>> e['summary']<\/strong><br>My presentation on \"Leveraging Rails to Build Facebook Apps,\" which I just gave at SunnyConf, is now available online. This presentation is a distillation of some of the practical tactics that my development team at MetroStar Systems has used to create highly successful\u2026<\/pre>\n\n\n\n<p>Now, to quote SpiderMan, &#8220;with great power comes great responsibility.&#8221; HTTParty is just using REXML to do the parsing, which isn&#8217;t the speediest parser around but it&#8217;s more than good enough for most processing tasks.<\/p>\n\n\n\n<p>Still, for performance reasons, you wouldn&#8217;t want to parse a remote XML feed every time a particular web page was requested. So, this is the type of task that demands some form of data caching, whether memcache or simply storing feed data in the database for later use.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Sometimes it&#8217;s desirable to be able to ingest a remote RSS or Atom feed in order to make<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[24,171,172],"class_list":["post-820","post","type-post","status-publish","format-standard","hentry","category-technology","tag-atom","tag-rss","tag-ruby"],"_links":{"self":[{"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=\/wp\/v2\/posts\/820","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=820"}],"version-history":[{"count":0,"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=\/wp\/v2\/posts\/820\/revisions"}],"wp:attachment":[{"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=820"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staging.keenertech.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=820"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}