Connection not Content

A Blog for MOOCs and Other Animals

Archive for February 2012

#Change11 : A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC

with 11 comments

Trying to keep track of what’s going on in a MOOC where discussion is distributed over numerous participant blogs can be daunting. An RSS reader such as Google Reader does help but at any particular time a high proportion of blogs may not be of interest or active and finding the relevant ones can take time. Many blogs provide RSS feeds for the comment threads attached to posts but the most recent comments appear first and it can be difficult to spot all the scattered comments that belong to one post. Also, sometimes I’ve arrived at a blog post a day or two just after a lively and interesting discussion has ended. Maybe I first spotted the post in the MOOC Newsletter and viewed it before it had any comments and now it’s acquired a lengthy comment thread but the topic is exhausted and folks have moved on! The Newsletter certainly does a great service in highlighting all the new posts in one place although the role of the additional comments on posts provided via gRSShopper is not so clear.

What I would like is a ‘Comment Scraper’ that aggregates very brief summarised versions of posts and their comment threads as they appear so that some quick initial impression can be gained of where and what current MOOC activity is about. So ……, invigorated by the DIY spirit engendered by MOOCs, I’ve been developing and experimenting with a program that aggregates brief up-to-date listings from blog RSS feeds. At present my Comment Scraper only works with WordPress blogs but there are enough of these around in a MOOC such as Change11 to prove the concept.

Would such a tool also be useful to other MOOC participants? I wrote the program initially as an exercise in learning Python and it could be made available after some further development but its action for WordPress blogs  (summarised at the end of this post) is not very complicated. In due course I could probably publish aggregated listings somewhere public but this raises other issues. Some MOOC participants may not wish their posts and comments to be presented in a considerably more compressed form than is usual via an RSS news aggregator. As for legalities, I have no idea who ‘owns’ the content of a WordPress feed or the ramifications of publishing ‘munged’ versions!

Here’s an example of the Comment Scraper in action – taken from real blogs but with the real names and text replaced with fictional ones for illustrative purposes and with all links disabled.

* * * #MOOC What I have decided to do about my Learning by Blogger1 on Thu, 19 Jan 2012 * * *
After much thought I have decided to lorem ipsum dolor sit amet, consectetuer ad elit nisi…..

Wow! Well all I can say is tellus sceleris luctus turpis phare enim ad minim…..[Commenter1: Fri, 20 Jan 2012]
So?…..[Commenter2: Sat, 21 Jan 2012]
Hi Commenter1! Yes of course pet pigs should be licensed but dolore magn…..[Blogger1: Sun, 22 Jan 2012]
* * * #MOOC Introduction to Sed Fermentum, Nisl et Iacul by Blogger1 on Thu, 19 Jan 2012 * * *
The first thing to remember is that sed dui odio tristique in viverra sit amet nec odi….

Great post Blogger1! – resonates with me too!! ….[Commenter1: Wed, 01 Feb 2012]
I can’t agree that proin pede arcu gravida quis, porta a, sodales in, dolor…..[Commenter2: Fri, 20 Jan 2012]
Really? It’s well known that dui vel temporibus autem quibusdam tellus. …..[Blogger1: Sun, 22 Jan 2012]
* * * #MOOC Examinations Examined by Blogger2 on Thu, 05 Jan 2012 * * *
There is little doubt that examinations are cum soluta nobis est eligendi cumque nihil cupid…..

* * * #MOOC Finding your Feet in a MOOC by Blogger2 on Sun, 29 Jan 2012 * * *
Don’t be afraid to itaque earum rerum hic tenetur et sapiente delectus, sit aut reiciendis…..

Losing your head can also porro quisquam est, qui dolorem ipsum dolor…..[Commenter3: Wed, 01 Feb 2012]
Thanks Commenter3 but losing my head is not so omnis voluptas assumens….[Blogger2: Wed, 01 Feb 2012]
I would give an arm and a leg to omnis harum quidem stule omnis repel….[Commenter4: Wed, 01 Feb 2012]
* * * #MOOC Where we have Lost Our Way by Blogger3 on Sun, 29 Jan 2012 * * *
One thing I have always said is occaecat et cupidatat non sapiente proident, sunt in culpa…..

Enough said and furthermore I am libero tempore, cum soluta nobis est e….[Commenter4: Wed, 01 Feb 2012]
I don’t have much to say about this except id est laborum et dolorum fug…..[Commenter5: Wed, 01 Feb 2012]
Just sayin’….[Blogger3: Wed, 01 Feb 2012]
* * * #MOOC Learning Theories – No. 37 by Blogger4 on Thu, 05 Jan 2012 * * *
This weeks theory needs no introduction because harum quidem rerum facilis est expedita…..

Thank you so much Blogger4! Now I know that numquam teius temporas…..[Commenter6: Wed, 11 Jan 2012]
Are you serious? This theory is ab illos veritatis et quasi architectos expl…..[Commenter7: Wed, 11 Jan 2012]

The action of the Comment Scraper is fairly straightforward for WordPress blogs. Two RSS feeds are accessed per blog (eg gbl55/wordpress.com/feed and gbl55/wordpress.com/comments/feed) listing recent postings and comments respectively. Comments are scanned in reverse order (so that the oldest appear first) ignoring ping-backs. If a comment belongs to a post in the postings file then the first few words of that comment with the  date and user name of the commenter are selected. This is all added to any other comments for that post under a heading (in bold above) containing brief details (Title, Author, Date and first few words) of the post itself.

Written by Gordon Lockhart

February 4, 2012 at 12:47 pm

Posted in Uncategorized

Tagged with