MOOC Scraper Update (3) – (and Hello #FutureEd !)

with 18 comments

MOOC Comment Scraper

Experimental Comment Scraping
(Based on ‘la vaca de los sinvaca’ – by José Bogado)


I unleashed my experimental MOOC Comment Scraper on the Rhizomatic Learning MOOC (#rhiz014) run by Dave Cormier from Jan 15th and have been updating it once or twice a day (latest output). The idea behind the Scraper is to get a quick impression of MOOC activity by creating very brief summarised versions of recent blog posts along with their comments. For some reason this type of presentation does not seem to be readily available via feed readers but I’ve found the Scraper useful, particularly for connectivist style  MOOCs where activity is typically distributed across numerous blogs, some of which may not be active at any one time.

In contrast, my xMOOC experiences (eg in a Coursera Philosophy MOOC) suggest that blogging around these ‘instructivist’ MOOCs is not nearly so common. Having joined Cathy Davidson’s ‘History and Future of (Mostly) Higher Education’ (#FutureEd) my introductory spiel sank without trace in the usual enormous and clunky Coursera forum but Cathy Davidson herself has reservations about the stereotypical xMOOC and this particular Coursera MOOC (“…not just a MOOC, it’s a movement.”) does seem less centralised. I’ll be looking out for participant blogs.

Rhizo14 is a good guinea pig for the Scraper and I appreciate the significant number of participants who actively blog and comment on each other’s posts generating lively discussions with long comment streams. Some posts have attracted around 30 comments – all types and lengths and this has facilitated the squashing of several bugs in the Scraper program (A recurring problem is dealing with ragged loose ends when HTML and other ‘hidden’ codes in comments are chopped up.) At present, about 60 WordPress and Blogger blogs are being scanned and comments extracted for all posts tagged, #rhizo14 over a time ‘window’ of the last 10 days. The participants seem happy to have their comments abbreviated and published in this way but it would be a simple matter to remove any blog if required.

The graph below gives some indication of how commenting in rhizo14 is developing with time. This is no scientific study, particularly for the first few days when blogs were being added and  no posts were too dated to be lost from a  time window that itself was being adjusted. However, the period from Jan 23 was more stable with a constant 10 day window. Both comments and posts seem to have peaked around Jan 30 but interestingly, even though comment and post numbers have now dropped a little, the average number of comments per post is being maintained at over 5.


KEY:   BLUE = No. of posts. RED = No. of comments
YELLOW = Average Comments per post x 100

I’d be very grateful for any constructive comment or criticisms of the Comment Scraper, particularly if you’ve been viewing the output over a period of time. There are several directions in which the Scraper could be developed. More or less output text could be provided or posts without comments could be identified but there may be rather more fundamental changes worth making.

How do you rate the Comment Scraper? – please mark out of 10 where:

0 = Useless
5 = Sometimes useful but I rely mainly on other tools
10 = I couldn’t live without it!

However busy you are please try at the very least to leave your mark out of 10 below so I get some sense of the Scraper’s perceived utility! Thank you!

Written by Gordon Lockhart

February 4, 2014 at 9:19 pm

Posted in Mooc, rhizo14