How This Site Works, Part I
posted on july 28, 2004, tag: site
Many years ago, while looking for a way to add somewhat useless features to a certain celebrity fan site I was running in my spare time, I noticed Netscape's new "RSS" feature. In a nutshell, you could subscribe to certain "channels" on their site, and thus be notified of updates to those sections of the website. I created a channel for my website and months later only two people had subscribed. Of course, I thought—who would want to do this? If you would have told me then that years later every page on my site would allow subscription and tracking via a newsreader, I wouldn't have believed you. But here we are. RSS (and Atom) are all the rage (excuse the pun) these days, and for good reason.
What Good is One Feed?
This is not made up of one single page or one single kind of content, so why should its feed be? Before I moved from Movable Type to my own system, I had but one RSS feed that contained the full text of the last 10 entries. While this definitely serves a purpose, what about all the content that doesn't appear on the front page? Even forgetting about secondary sections like projects and photos, what about comments? By default MT didn't offer an RSS feed for anything but the last 10 entries (and, it's worth noting, only the first sentence thereof), forcing you to visit the website to even see if it had other new content. If you were subscribed to the site's RSS feed, you would see when I posted a new entry or made changes, but you wouldn't know if someone had posted a comment. Why get only half the news?
Since RSS and Atom have become more and more popular, many people have started adding additional feeds to their sites. You'll frequently see a link to an RSS file in a sidebar (commonly under a person's equivalent to my "see also" content) and a link to an RSS file for recent comments.
When I set out to write my own back-end, I knew from the beginning that I wanted you to be able to syndicate any page on my site and track it for changes. From the weblog to the photos section, you can literally track any change I make to the content of this site. Subscribe to this individual entry and you'll see people's comments. Subscribe to the photos section and you'll see when I add a new collection. Hell—subscribe to the archives page if you want to and you'll get a full list of every entry on the site and know the moment I make a change to any of them.
Why Should I Visit the Site?
One of the bigger arguments against full-text feeds is that it doesn't force the reader to actually visit the website. Well, that's just it—I don't want to force you to do anything. I want you to read my content. I want you to give me feedback. I want you to visit my site. But if you're reading the content, haven't I succeeded anyway? Besides, I honestly believe that people do visit sites even if they've subscribed to them in a newsreader. I know I do. There's something to be said for the visual pleasure of seeing designs other than the simple display of your newsreader application.
The goal here isn't to get people to only read your content via a newsreader or only via your website; the goal is to give people options to find and read your content easily.
And, Finally, the Pros and Cons
For the user, the pros are simple: all of your content is easily accessible via syndication and allows for a reader to track any change to your site. Your visitors can be notified of changes to any section of your website (or any subsection) and are more likely to read your content and (I think) visit your site frequently. It's also damned nice to be able to see when people comment on an entry you're particularly interesting.
On the back-end, or for me, the pros are also simple: Everything I add to this site is instantly available to you via both RSS and Atom formats. My system creates an RSS or Atom file automatically if you append an rss or atom subdomain to any URL on this site (for example, rss.maniacalrage.net).
There is only one con, but it's a big one. Because of the dynamic nature of this system, when you (or your newsreader) makes a request for an RSS or Atom file, I have to create it. These files don't exist before you request them—they're dynamically created and displayed for you on the fly each and every time they're needed.
The issue is that don't update more than once a day (for the most part), so if your newsreader checks for updates to the index RSS file, you're going to be downloading a full file even when there haven't been any changes. Normally, if the RSS file was static, your reader would receive a 304 status code from my server letting you know that the content hadn't changed and you wouldn't have to bother downloading the whole file. But, because I'm creating these files dynamically, I can't do that. Instead, I send the whole file to your reader each time it requests it, eating up bandwidth for both of us. Or at least, that is, until today.
The Smarter Way™
Today I made a major change to the way RSS and Atom feeds are generated on this site. Before today, feeds were given their equivalent "last updated" value of the very moment you requested them. Starting today, however, they're using the correct date for this value—the date of the last change to any of the content in the feed. For instance: if you subscribe to the main weblog feed (rss or atom dot maniacalrage dot net), the last updated date for the feed will match the date of the either the last posted entry or, if newer, the last date an entry appearing in the feed was updated.
This required the addition of a column to my database tables called "updated," which is null by default but receives the current time if I make an edit to any item in the database. When I build the content for RSS and Atom feeds, I keep an array of all posted or updated timestamps. Just before building the last updated part of a feed, I rsort() the array and use the first item. So, just like that, feeds have proper modification dates.
But that doesn't solve the dynamic problem. Granted, feeds now state the correct time they were last updated, but that doesn't help anything if you still have to download the whole feed to find out, does it? This is where Alexandre Alapetite's Conditional HTTP Requests in PHP tutorial comes into play.
To put it simply, the function he wrote compares a date you give it (the most current date you have for content, ie, the first item in my rsort()ed dates array), and compares that date to the If-Modified-Since date the HTTP request sends. If your date is newer, it means you've got content changes since the last time your reader requested the feed, so you send the file. If not, though, the function sends back a proper 304 Not Modified status and the reader doesn't have to download the file. No bandwidth wasted, no content missed.
This entry is part I in a series focusing on how this site works. Stay tuned for more.
Comments
There are 9 comments, comments are closed
matthew welty on 07/28/2004:
i thoroughly enjoyed learning about how your site works. i look forward to reading more entries in the series. great work.
compuwhiz7 on 07/29/2004:
Very interesting—I can't wait for Part II!
I'm planning on implementing a similar system (that is, an ASP.NET version of the syndicate-every-darn-page concept) on my Web site, but we'll see how far that gets. :)
Rob G on 07/29/2004:
Why don't you have a contact page? Do you hate us? Are we not good enough to merit a response from you via e-mail?
Garrett on 07/29/2004:
Rob G—I've added a link to email me in the about section, in the sidebar (at the top). Thanks for reminding me to do this, and no, I don't hate you.
Rob G on 07/29/2004:
Thanks. Normally when you click someones name in a comment, it shows their e-mail, but yours just points to http://maniacalrage.net/.
And I like the little blue circle around your message. Maybe you should do it for all the comments, except make any comment not by you a very very light shade of gray (just a little darker than the white background). Or maybe not. I'm bored. Bye.
compuwhiz7 on 07/30/2004:
Rob—I think Garrett just wants to highlight his comments. ;)
I did have one question regarding the CMS in general: what URIs actually exists as physical filepaths? For example, the URI for this entry is /archives/2004/07/howthissite/. Are all of these actual folders within the file structure, or is only /archives/ physical?
Just wondering. :)
Garrett on 07/30/2004:
Yeah, I'm specifically highlighting only my comments... it's not because I'm in love with myself, it just makes it easier for you to see when I answer your comments, etcetera.
And Cory—Part II of this series of entries is coming up and it's all about how that part of my system works. There's even a chart! Oooh!
compuwhiz7 on 07/30/2004:
Comment highlighting has become quite common on the Web-related Weblogs that have custom software... and even a few that don't.
Oh, excellent. I can't wait. :) Ignore my e-mail, then. ;)
Alexandre on 07/30/2004:
I am pleased to inform you that my little function has been updated and now allows the sending of only the new RSS/ATOM articles (don't need to send the whole file) and for people without mod_gzip, it integrates compression on the fly.