MOOC Comment Scraper – Update (4)
My MOOC Comment Scraper had a great run during the Rhizo14 MOOC – was even mentioned by Dave Cormier in his recent presentation (‘Why teach MOOCs – MOOCs as a selfish enterprise (talk at MIT)‘)! Judging from the comments I received during Rhizo14, the Scraper could be employed in a variety of situations supporting MOOCs or other online events where it’s useful to aggregate blog posts and comments in an abbreviated form. There seems to be an unexplored niche for open aggregation tools that simply abbreviate text one click away from distributed sources – and don’t attempt to entrap users for commercial purposes!
Use of the Comment Scraper – My own conception of the Scraper seems best suited to cMOOCs. Here, much or even most discussion, is distributed among numerous participant blogs, some of which may be inactive at any particular time. A quick impression of where the latest posts are, how various discussions are developing and who is involved, can be more useful than aggregators providing considerably more text requiring lengthy scrolling.
The current version of the Scraper merely links to a post with comments giving very brief details: date, authors etc. (see sample output). At the expense of some extra text a more advanced version could supply more detail such as twitter and Facebook identities of post and comment authors. Since individual blogs are the focus of discussion in cMOOCs it may be counterproductive to allow direct commenting on a page along with the Scraper output although ‘meta-comment’ on the cMOOC itself might be useful if the Scraper output were displayed as part of a ‘hub’ website for the MOOC.
Potential uses for a Comment Scraper may differ, perhaps considerably from my own use, so I’ve briefly described my approach along with a summary of the program and this might assist a competent programmer to develop their own version for their own purposes. I’m not a particularly competent programmer myself (the Scraper was originally developed as an exercise in learning Python) but if anyone wants the Python source code for non-commercial purposes I will (shortly) make a cleaned-up version available on request.
Privacy, Legal and Other Issues – The Scraper’s output consists almost entirely of other people’s work, scraped from blogs and published without their permission. It’s not really practical to contact the authors of all blogs and commenters individually in a MOOC but I’ve always been willing to exclude any blogs or comments by any author on their request. To date I’ve never received any such request and those who contacted me have always been positive about the use of the Scraper.
I have little understanding of the legal issues involved here and confess I’ve done little to find out. I do not know who ‘owns’ the posts or comments in a proprietary blog nor the legal status of a ‘remix’ consisting of fragments of text from numerous sources with authors identified. I suspect it could be a complicated matter – any advice?
Unfortunately, the current version of the Scraper is only compatible with WordPress and Blogger blogs. Together these define ‘standard’ RSS formats that account for a very large proportion of all blogs but inevitably a small minority are excluded. Clearly, all participants in a MOOC should be represented on an equal footing regardless of their blog type. It may be possible to make special provision for some other blog types provided RSS feeds are available but if not, comment scraping would seem to be considerably more difficult to implement.
I did not use the Scraper to collect data in any rigorous way but it certainly could be used for research purposes such as studying the rise and fall of posting and commenting in a cMOOC (eg the graph I plotted using rhiz014 data). Again, this raises unexplored issues concerning the analytical use of a Scraper as there are clearly dangers in the misuse of such data even in a statistical form.