Merging Multiple RSS Feeds Using PHP and Caching

// September 8th, 2010 // Programming, Web Development //

I’ve recently been doing some work with RSS feeds and decided I needed a way to combine multiple feeds into one single feed. From a quick google search, I discovered there are already a few code samples available to assist a user in doing that, but none of them had exactly what I wanted:
  • Object-Oriented PHP Class
  • Aggregate multiple RSS feeds specified by an array
  • Use caching to prevent unnecessary bandwidth consumption
  • Separate caches per feed
  • Specify the maximum size of the combined feed
  • Be able to either directly output the data, or retrieve it as an XML string

As a result, I decided to construct the class myself.

If you just wish to download the code, you can skip to the code download section.

Creating the MergedRSS Class

The first thing I did, was created a class, which I called “MergedRSS”.

  1. <?php
  2. class MergedRSS {
  3.         private $myFeeds = null;
  4.         private $myTitle = null;
  5.         private $myLink = null;
  6.         private $myDescription = null;
  7.         private $myPubDate = null;
  8.         private $myCacheTime = null;
  9.  
  10.         // create our Merged RSS Feed
  11.         public function __construct($feeds = null, $channel_title = null,
  12.                                     $channel_link = null, $channel_description = null,
  13.                                     $channel_pubdate = null, $cache_time_in_seconds = 86400) {
  14.                 // set variables
  15.                 $this->myTitle = $channel_title;
  16.                 $this->myLink = $channel_link;
  17.                 $this->myDescription = $channel_description;
  18.                 $this->myPubDate = $channel_pubdate;
  19.                 $this->myCacheTime = $time_in_seconds;
  20.  
  21.                 // initialize feed variable
  22.                 $this->myFeeds = array();
  23.  
  24.                 if (isset($feeds)) {
  25.                         if (is_array($feeds)) {
  26.                                 // if it’s an array, merge it into our existing array.
  27.                                 $this->myFeeds = array_merge($feeds);
  28.                         } else {
  29.                                 // if it’s a single feed url, just push it into the array
  30.                                 $this->myFeeds[] = $feeds;
  31.                         }
  32.                 }
  33.         }
  34. }
  35. ?>

Next thing I did was added a couple of functions to my class to retrieve the rss feeds from both cache and from the web.

  1.        // retrieves contents from a cache file ; returns null on error
  2.         private function __fetch_rss_from_cache($cache_file) {
  3.                 if (file_exists($cache_file)) {
  4.                         // if file exists, then attempt to read as xml. if there’s an error this means it’s malformed
  5.                         //     in which case, a Warning will be thrown.  This is adequate error detection for now.
  6.                         return simplexml_load_file($cache_file);
  7.                 }
  8.                 return null;
  9.         }
  10.  
  11.         // retrieves contents of an external RSS feed ; returns null on error
  12.         private function __fetch_rss_from_url($url) {
  13.                 // Create new SimpleXMLElement instance based on the url.  If there’s an error, return null
  14.                 try {
  15.                         $sxe = new SimpleXMLElement($url, null, true);
  16.                         return $sxe;
  17.                 } catch (Exception $e) {
  18.                         return null;
  19.                 }
  20.         }

Next I decided I needed a storage location and naming convention for my cache files. I ended up storing each feed as a separate cache file inside a directory called “cache”. This required adding write permissions (chmod 0777) to the cache directory. The reason for separate caches per feed, was to allow me to display a cached feed, in the event of a connectivity issue between my server and one or more feeds.

As far as naming the cache file, I decided the easiest way was to just replace any character that is not a letter, number or period, with an underscore. This not only helps keep the file identifiable when viewing the cache folder via SSH or FTP, but also prevents hackers from overwriting files outside of the cache directory. To quickly convert a feed URL into a file name I created the following function:

  1.        // creates a key for a specific feed url (used for creating friendly file names)
  2.         private function __create_feed_key($url) {
  3.                 return preg_replace(‘/[^a-zA-Z0-9\.]/’, ‘_’, $url) . ‘cache’;
  4.         }

Also I knew I was going to need to sort the items at some point based on the pubDate key, so I created a comparison function to assist with that.

  1.        // compares two items based on "pubDate"
  2.         private function __compare_items($a,$b) {
  3.                 return strtotime($b->pubDate)strtotime($a->pubDate);
  4.         }

At this point, we’re ready to create the main function that makes everything work. To avoid splitting the code up too much, I am going to keep the entire function intact, but have added comments through out to make it easier to understand what’s going on.

  1.        // exports the data as a returned value and/or outputted to the screen
  2.         public function export($return_as_string = true, $output = false, $limit = null) {
  3.                 // initialize a combined item array for later
  4.                 $items = array();
  5.  
  6.                 // loop through each feed
  7.                 foreach ($this->myFeeds as $feed_url) {
  8.                         // determine my cache file name.  for now i assume they’re all kept in a file called "cache"
  9.                         $cache_file = "cache/" . $this->__create_feed_key($feed_url);
  10.  
  11.                         // determine whether or not I should use the cached version of the xml
  12.                         $use_cache = false;
  13.                         if (file_exists($cache_file)) {
  14.                                 if (time()filemtime($cache_file) < $this->myCacheTime) {
  15.                                         $use_cache = t<rue;
  16.                                 }
  17.                         }
  18.  
  19.                         if ($use_cache) {
  20.                                 // retrieve cached version
  21.                                 $sxe = $this->__fetch_rss_from_cache($cache_file);
  22.                                 $results = $sxe->channel->item;
  23.                         } else {
  24.                                 // retrieve updated rss feed
  25.                                 $sxe = $this->__fetch_rss_from_url($feed_url);
  26.                                 $results = $sxe->channel->item;
  27.  
  28.                                 if (!isset($results)) {
  29.                                         // couldn’t fetch from the url. grab a cached version if we can
  30.                                         if (file_exists($cache_file)) {
  31.                                                 $sxe = $this->__fetch_rss_from_cache($cache_file);
  32.                                                 $results = $sxe->channel->item;
  33.                                         }
  34.                                 } else {
  35.                                         // we need to update the cache file
  36.                                         $sxe->asXML($cache_file);
  37.                                 }
  38.                         }
  39.                         if (isset($results)) {
  40.                                 // add each item to the master item list
  41.                                 foreach ($results as $item) {
  42.                                         $items[] = $item;
  43.                                 }
  44.                         }
  45.                 }
  46.  
  47.  
  48.                 // set all the initial, necessary xml data
  49.                 $xml =  "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
  50.                 $xml .= "<rss version=\"2.0\"";
  51.                 $xml .= "\n\txmlns:content=\"http://purl.org/rss/1.0/modules/content/\"";
  52.                 $xml .= "\n\txmlns:wfw=\"http://wellformedweb.org/CommentAPI/\"";
  53.                 $xml .= "\n\txmlns:dc=\"http://purl.org/dc/elements/1.1/\"";
  54.                 $xml .= "\n\txmlns:atom=\"http://www.w3.org/2005/Atom\"";
  55.                 $xml .= "\n\txmlns:sy=\"http://purl.org/rss/1.0/modules/syndication/\"";
  56.                 $xml .= "\n\txmlns:slash=\"http://purl.org/rss/1.0/modules/slash/\">\n";
  57.  
  58.                 // begin adding channel information
  59.                 $xml .= "<channel>\n";
  60.                 if (isset($this->myTitle)) { $xml .= "\t<title>".$this->myTitle."</title>\n"; }
  61.  
  62.                 // required for validation
  63.                 $xml .= "\t<atom:link href=\"http://".$_SERVER[‘HTTP_HOST’].$_SERVER[‘PHP_SELF’];
  64.                 $xml .= ""\" rel=\"self\" type=\"application/rss+xml\" />\n";
  65.  
  66.                 // more channel information
  67.                 if (isset($this->myLink)) { $xml .= "\t<link>".$this->myLink."</link>\n"; }
  68.                 if (isset($this->myDescription)) { $xml .= "\t<description>".$this->myDescription."</description>\n"; }
  69.                 if (isset($this->myPubDate)) { $xml .= "\t<pubDate>".$this->myPubDate."</pubDate>\n"; }
  70.  
  71.                 // if there are any items to add to the feed, let’s do it
  72.                 if (sizeof($items) >0) {
  73.  
  74.                         // sort items
  75.                         usort($items, array($this,"__compare_items"));
  76.  
  77.                         // if desired, splice items into an array of the specified size
  78.                         if (isset($limit)) { array_splice($items, intval($limit)); }
  79.  
  80.                         // now let’s convert all of our items to XML
  81.                         for ($i=0; $i<sizeof($items); $i++) {
  82.                                 $xml .= $items[$i]->asXML() ."\n";
  83.                         }
  84.  
  85.  
  86.                 }
  87.                 $xml .= "</channel>\n</rss>";
  88.  
  89.                 // if output is desired print to screen
  90.                 if ($output) { echo $xml; }
  91.  
  92.                 // if user wants results returned as a string, do so
  93.                 if ($return_as_string) { return $xml; }
  94.  
  95.         }

So now we have everything we need to retrieve, merge, sort and limit our RSS feeds. We can save this class as mergedrss.php.

Using the MergedRSS Class

In another file called feed.php, we place the following code:

  1. <?php
  2.  
  3. // If there are errors, don’t show them.  This will break RSS syntax.
  4. ini_set(‘display_errors’, ‘off’);
  5.  
  6. include_once("mergedrss.php");
  7.  
  8. // place our feeds in an array
  9. $feeds = array(
  10.         ‘http://www.widgetsandburritos.com/feed/’,
  11.         ‘http://www.dealofthedaysa.com/feed/’,
  12.         ‘http://www.iambex.com/feed/’,
  13. );
  14.  
  15. // set the header type
  16. header("Content-type: text/xml");
  17.  
  18. // set an arbitrary feed date
  19. $feed_date = date("r", mktime(10,0,0,9,8,2010));
  20.  
  21. // Create new MergedRSS object with desired parameters
  22. $MergedRSS = new MergedRSS($feeds, "My Merged Feed", "http://www.widgetsandburritos.com/",
  23.                            "This is just a sample merged RSS feed", $feed_date);
  24.  
  25. //Export the first 10 items to screen
  26. $MergedRSS->export(false, true, 10);
  27.  
  28. // Retrieve the first 5 items as xml code
  29. $xml = $MergedRSS->export(true, false, 5);
  30.  
  31. ?>

The Final Results

You can see the above sample feed at: http://www.widgetsandburritos.com/test/feed.php

Download code as *.zip

David Stinemetze is the Lead Developer and Director of Social Media for San Antonio Web Design, SEO and Hosting firm, Internet Direct.

Website | Facebook | Twitter

/* Facebook */

to “Merging Multiple RSS Feeds Using PHP and Caching”

  1. Brandon Lee says:

    The XML page cannot be displayed
    Cannot view XML input using XSL style sheet. Please correct the error and then click the Refresh button, or try again later.

    ——————————————————————————–

    Only one top level element is allowed in an XML document. Error processing resource ‘http://www.widgetsandburritos.com/test…

    Warning: SimpleXMLElement::__construct() [<a href=’simplexmlelement.–construct’>simplexmlelement….

    • There’s apparently something wrong with my sample feed.php file. Not sure what happened. I’ll take a look at it when I get the chance, and make sure it gets corrected. Thanks.

    • Ok I figured out what happened. One of the feeds that I was referencing went down. I had assumed that when a feed went down, the __fetch_rss_from_url() function would return null. But apparently it was throwing errors. The errors obviously did not use proper XML syntax which caused things to break. So I corrected it this way:

      Changed the __fetch_rss_from_url() function to the following:

              // retrieves contents of an external RSS feed ; returns null on error
              private function __fetch_rss_from_url($url) {
                      // Create new SimpleXMLElement instance based on the url.  If there's an error, return null
                      try {
                              $sxe = new SimpleXMLElement($url, null, true);
                              return $sxe;
                      } catch (Exception $e) {
                              return null;
                      }
              }

      Then added

      ini_set(‘display_errors’, ‘off’);

      to the beginning of my code. I just figure if there are any errors with any of the feeds for whatever reason, instead of breaking the entire script, just ignore the broken feeds. We can always try again later.

  2. Ian says:

    Thanks for this great tutorial, it really helped me with a project where I needed to merge twitter feeds. Still working on the caching part.

  3. Mike says:

    Hi, i downloaded your nice script but I’m experiencing some problems with the fetch function.
    Php returns me an error:
    “This page contains the following errors:
    error on line 2 at column 1: Extra content at the end of the document
    Below is a rendering of the page up to the first error.”
    and the page doesn’t display.

    Plus that, i was wandering a way to let the script save the xml files in order to put it in cron on my server.
    I wrote this function instead of your echo $xml output

    // if output is desired print to screen
    if ($output) {
    $myFile = “merged.xml”;
    $fh = fopen($myFile, ‘w’) or die(“can’t open file”);
    $stringData = $xml;
    fwrite($fh, $stringData);
    fclose($fh);
    }
    

    May you help me solving this?
    Thx a lot.
    Mike

    • Mike,

      XML pages should only have a single “root element”. Check to make sure all tags are encapsulated inside the <rss></rss> tags. Also, if you’re just writing to a file instead of outputting to the screen, try getting rid of this line:

      // set the header type
      header("Content-type: text/xml");
      

      This will prevent PHP from running as an XML document and treat it as the normal HTML content type.

      Let me know if this helps or not.

  4. Mike says:

    Now it works fine! Thx for your help dude :)

  5. Peter says:

    Hello David,

    really useful php code, many thanks! I was looking at many other solutions, but none was running that smooth!

  6. Jan says:

    Is it somehow possible to identify merged RSS in the result? I mean, I would like to have result look like this:
    Website A: Title of RSS item
    Website B: Title of RSS item

    • Sorry for the delayed response. I was away at SXSW from last Thursday till this Tuesday and have been playing catchup since I’ve gotten back. It’s technically possible. Would require a little bit of data manipulation to do so. If I get a chance, I’ll try to set up an example of this.

  7. [...] are also more advanced coding options with Python, PHP and Simple Pie. I haven’t spent enough time with wordpress.org to investigate Simple Pie yet [...]

Leave a Reply