Connection not Content

A Blog for MOOCs and Other Animals

#Change11 : A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC

with 11 comments

Trying to keep track of what’s going on in a MOOC where discussion is distributed over numerous participant blogs can be daunting. An RSS reader such as Google Reader does help but at any particular time a high proportion of blogs may not be of interest or active and finding the relevant ones can take time. Many blogs provide RSS feeds for the comment threads attached to posts but the most recent comments appear first and it can be difficult to spot all the scattered comments that belong to one post. Also, sometimes I’ve arrived at a blog post a day or two just after a lively and interesting discussion has ended. Maybe I first spotted the post in the MOOC Newsletter and viewed it before it had any comments and now it’s acquired a lengthy comment thread but the topic is exhausted and folks have moved on! The Newsletter certainly does a great service in highlighting all the new posts in one place although the role of the additional comments on posts provided via gRSShopper is not so clear.

What I would like is a ‘Comment Scraper’ that aggregates very brief summarised versions of posts and their comment threads as they appear so that some quick initial impression can be gained of where and what current MOOC activity is about. So ……, invigorated by the DIY spirit engendered by MOOCs, I’ve been developing and experimenting with a program that aggregates brief up-to-date listings from blog RSS feeds. At present my Comment Scraper only works with WordPress blogs but there are enough of these around in a MOOC such as Change11 to prove the concept.

Would such a tool also be useful to other MOOC participants? I wrote the program initially as an exercise in learning Python and it could be made available after some further development but its action for WordPress blogs  (summarised at the end of this post) is not very complicated. In due course I could probably publish aggregated listings somewhere public but this raises other issues. Some MOOC participants may not wish their posts and comments to be presented in a considerably more compressed form than is usual via an RSS news aggregator. As for legalities, I have no idea who ‘owns’ the content of a WordPress feed or the ramifications of publishing ‘munged’ versions!

Here’s an example of the Comment Scraper in action – taken from real blogs but with the real names and text replaced with fictional ones for illustrative purposes and with all links disabled.

* * * #MOOC What I have decided to do about my Learning by Blogger1 on Thu, 19 Jan 2012 * * *
After much thought I have decided to lorem ipsum dolor sit amet, consectetuer ad elit nisi…..

Wow! Well all I can say is tellus sceleris luctus turpis phare enim ad minim…..[Commenter1: Fri, 20 Jan 2012]
So?…..[Commenter2: Sat, 21 Jan 2012]
Hi Commenter1! Yes of course pet pigs should be licensed but dolore magn…..[Blogger1: Sun, 22 Jan 2012]
* * * #MOOC Introduction to Sed Fermentum, Nisl et Iacul by Blogger1 on Thu, 19 Jan 2012 * * *
The first thing to remember is that sed dui odio tristique in viverra sit amet nec odi….

Great post Blogger1! – resonates with me too!! ….[Commenter1: Wed, 01 Feb 2012]
I can’t agree that proin pede arcu gravida quis, porta a, sodales in, dolor…..[Commenter2: Fri, 20 Jan 2012]
Really? It’s well known that dui vel temporibus autem quibusdam tellus. …..[Blogger1: Sun, 22 Jan 2012]
* * * #MOOC Examinations Examined by Blogger2 on Thu, 05 Jan 2012 * * *
There is little doubt that examinations are cum soluta nobis est eligendi cumque nihil cupid…..

* * * #MOOC Finding your Feet in a MOOC by Blogger2 on Sun, 29 Jan 2012 * * *
Don’t be afraid to itaque earum rerum hic tenetur et sapiente delectus, sit aut reiciendis…..

Losing your head can also porro quisquam est, qui dolorem ipsum dolor…..[Commenter3: Wed, 01 Feb 2012]
Thanks Commenter3 but losing my head is not so omnis voluptas assumens….[Blogger2: Wed, 01 Feb 2012]
I would give an arm and a leg to omnis harum quidem stule omnis repel….[Commenter4: Wed, 01 Feb 2012]
* * * #MOOC Where we have Lost Our Way by Blogger3 on Sun, 29 Jan 2012 * * *
One thing I have always said is occaecat et cupidatat non sapiente proident, sunt in culpa…..

Enough said and furthermore I am libero tempore, cum soluta nobis est e….[Commenter4: Wed, 01 Feb 2012]
I don’t have much to say about this except id est laborum et dolorum fug…..[Commenter5: Wed, 01 Feb 2012]
Just sayin’….[Blogger3: Wed, 01 Feb 2012]
* * * #MOOC Learning Theories – No. 37 by Blogger4 on Thu, 05 Jan 2012 * * *
This weeks theory needs no introduction because harum quidem rerum facilis est expedita…..

Thank you so much Blogger4! Now I know that numquam teius temporas…..[Commenter6: Wed, 11 Jan 2012]
Are you serious? This theory is ab illos veritatis et quasi architectos expl…..[Commenter7: Wed, 11 Jan 2012]

The action of the Comment Scraper is fairly straightforward for WordPress blogs. Two RSS feeds are accessed per blog (eg gbl55/ and gbl55/ listing recent postings and comments respectively. Comments are scanned in reverse order (so that the oldest appear first) ignoring ping-backs. If a comment belongs to a post in the postings file then the first few words of that comment with the  date and user name of the commenter are selected. This is all added to any other comments for that post under a heading (in bold above) containing brief details (Title, Author, Date and first few words) of the post itself.

Written by Gordon Lockhart

February 4, 2012 at 12:47 pm

Posted in Uncategorized

Tagged with

11 Responses

Subscribe to comments with RSS.

  1. This is brilliant. Despite all the positive noises about comments, technologies almost always relegate them in importance compared with the posts. Some of my most memorable learning experiences have been reading blog comments, but as you say, they are often long-abandoned fossils. It would be really good to be able to see the state of a discussion at a glance. Any way to incorporate something like this into gRSShopper?


    February 4, 2012 at 3:46 pm

  2. Thanks Mira. Yes comments can transform a post into something much more than the post by itself – sometimes I even read the comments before the post! Re gRSShopper, I know little about the internals though I think its coded in Perl. – Gordon Lockhart


    February 4, 2012 at 4:24 pm

  3. […] background-position: 50% 0px; background-color:#222222; background-repeat : no-repeat; } – Today, 7:15 […]

  4. Great job. Gordon,
    Comments are so important, and it is fine to see an app for reading them.
    is it a kind of plugin to a browser or to wordpress? is it available?


    February 25, 2012 at 8:03 am

  5. Thanks Jaap – I’m now working on a version for Blogger blogs and this should be ready soon. Blogger + WordPress seem to account for a large proportion of change11 blogs so when I’m ready I’ll try publishing the output in some form to see how useful it might be. At present it is just a stand-alone Python program I wrote as an exercise in learning Python. In principle I see no reason why a comment scraper could not be implemented as a browser plugin – but this is beyond my programming skills at present! I’ll certainly make my program available once it’s finished.


    February 25, 2012 at 3:39 pm

  6. I’m now starting to test the Comment Scraper on ‘live’ blogs – follow the page link ‘MOOC Comment Scraper [alpha]’ above. Gordon


    February 29, 2012 at 3:31 pm

  7. […] proportion of blogs may not be of interest or active and finding the relevant ones can take […] [Link] Sat, 04 Feb 2012 12:47:00 +0000 […]

  8. […] role that MOOC participants play in commenting on each others blogs. As Mira comments (in a comment) “Some of my most memorable learning experiences have been reading blog comments . . .” […]

  9. […] that brings together brief summarised versions of recent blog posts with their comments (‘A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC‘). The idea was to provide a quick impression of current MOOC activity but in principle any […]

  10. […] together brief summarised versions of recent blog posts along with with resulting comments (See A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC and the update) and FAQ. The idea is to provide nothing more than a quick impression of current […]

  11. […] brings together brief summarised versions of recent blog posts along with resulting comments (See A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC, the update, FAQ and an output). The idea is to provide a quick up-to-date impression of posts and […]

Leave a Reply to gbl55 Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: