MOOC Comment Scraper – Update (4)

MOOC Comment Scraper

No MOOCs were harmed by the Scraper. (Image via José Bogado)

My MOOC Comment Scraper had a great run during the Rhizo14 MOOC – was even mentioned by Dave Cormier in his recent presentation (‘Why teach MOOCs – MOOCs as a selfish enterprise (talk at MIT)‘)! Judging from the comments I received during Rhizo14, the Scraper could be employed in a variety of situations supporting MOOCs or other online events where it’s useful to aggregate blog posts and comments in an abbreviated form. There seems to be an unexplored niche for open aggregation tools that simply abbreviate text one click away from distributed sources – and don’t attempt to entrap users for commercial purposes!

Use of the Comment Scraper – My own conception of the Scraper seems best suited to cMOOCs. Here, much or even most discussion, is distributed among numerous participant blogs, some of which may be inactive at any particular time. A quick impression of where the latest posts are, how various discussions are developing and who is involved, can be more useful than aggregators providing considerably more text requiring lengthy scrolling.

The current version of the Scraper merely links to a post with comments giving very brief details: date, authors etc. (see sample output). At the expense of some extra text a more advanced version could supply more detail such as twitter and Facebook identities of post and comment authors. Since individual blogs are the focus of discussion in cMOOCs it may be counterproductive to allow direct commenting on a page along with the Scraper output although ‘meta-comment’ on the cMOOC itself might be useful if the Scraper output were displayed as part of a ‘hub’ website for the MOOC.

Potential uses for a Comment Scraper may differ, perhaps considerably from my own use, so I’ve briefly described my approach along with a summary of the program and this might assist a competent programmer to develop their own version for their own purposes. I’m not a particularly competent programmer myself (the Scraper was originally developed as an exercise in learning Python) but if anyone wants the Python source code for non-commercial purposes I will (shortly) make a cleaned-up version available on request.

Privacy, Legal and Other Issues – The Scraper’s output consists almost entirely of other people’s work, scraped from blogs and published without their permission. It’s not really practical to contact the authors of all blogs and commenters individually in a MOOC but I’ve always been willing to exclude any blogs or comments by any author on their request. To date I’ve never received any such request and those who contacted me have always been positive about the use of the Scraper.

I have little understanding of the legal issues involved here and confess I’ve done little to find out. I do not know who ‘owns’ the posts or comments in a proprietary blog nor the legal status of a ‘remix’ consisting of fragments of text from numerous sources with authors identified. I suspect it could be a complicated matter – any advice?

Unfortunately, the current version of the Scraper is only compatible with WordPress and Blogger blogs. Together these define ‘standard’ RSS formats that account for a very large proportion of all blogs but inevitably a small minority are excluded. Clearly, all participants in a MOOC should be represented on an equal footing regardless of their blog type. It may be possible to make special provision for some other blog types provided RSS feeds are available but if not, comment scraping would seem to be considerably more difficult to implement.

I did not use the Scraper to collect data in any rigorous way but it certainly could be used for research purposes such as studying the rise and fall of posting and commenting in a cMOOC (eg the graph I plotted using rhiz014 data). Again, this raises unexplored issues concerning the analytical use of a Scraper as there are clearly dangers in the misuse of such data even in a statistical form.



Deep Learning in MOOCs

I’ve been following several MOOCs simultaneously and often just lurking as I’m usually more interested in how MOOCs are developing than their content. The smallish cMOOC on ‘Rhizomatic Learning – The community is the curriculum‘ (Rhizo14) led by Dave Cormier held my attention, partly because I was using it as a test bed for my MOOC Scraper but also because its ‘content’ was largely created by by the participants themselves. Cathy Davidson’s very much larger xMOOC ‘History and Future of (Mostly) Higher Education‘ (FutureEd) was also fascinating but in a different way as she positively encouraged independent activity outside the MOOC – think Incredible Hulk trying to break out of its xMOOC clothes!

On the whole, I’m positive about MOOCs and there are several areas where I think MOOCs can be very effective. Connecting and updating professionals, stimulating the interests of well-motivated lifelong learners, providing educational opportunities where none existed before are a few. I welcome the different MOOC formats that are emerging and I don’t share the usual concerns about dropout rates. Someone close to me with lifelong interests in languages and literature joined an xMOOC on Climate Change and for the first time in her life bought a popular science magazine and found it interesting. MOOCs have the power to transform learners, sometimes unexpectedly but usually for the good. Even the removal of pig ignorance can count as education but ….. everyone needs to be a deep learner at times.

Deep Learning MOOC comic

Deep Learning MOOC comic (Kevin Hodgson on Flickr)

During Rhizo14 there was some controversy about the relevance or otherwise of certain French philosophers. ‘Skimmers’ and others may have perfectly good reasons for neglecting them but in deep learning mode you take the time and trouble to read them in whatever detail is necessary to make an informed decision – even if you find French philosophers excruciatingly dull and boring!

Having taught engineering courses at a university for more years than I care to remember, I wonder how MOOCs can deal with deep learning in circumstances where it’s vitally important to demonstrate competence, understanding something all the way through as opposed to a superficial or ‘working’ knowledge? This is no elitist concern of interest only to PhD students or just Higher Education. A huge number of vocational courses are wholly or partly of this type – an electrician’s understanding of your wiring is just as vital as a brain surgeon’s! Teaching something to someone else is not a bad test of understanding (as many parents find out trying to help their kids with homework!) but what proportion of a MOOC’s participants could begin to teach or demonstrate real competence in the topics they study? For the typical mammoth xMOOC I would guess very few, particularly if they had little prior knowledge of the subject matter. I would also be surprised if many of those gaining current Statements of Accomplishment could demonstrate real understanding. (Anyone want me for a Philosophy 101 tutor on the basis of my Coursera Certificate?)

Deep learning can be very rewarding but it can also be time-consuming, not particularly interesting and hard work – as many budding PhD students find out all too quickly. Encouraging deep learning in MOOCs may not be so problematical given well-educated and motivated participants as in Rhizo14 and FutureEd but in the wider world where education may be prized more as a meal ticket rather than for its own sake, the traditional training course, ‘taught to the test’, is often viewed by students as little more than an irksome chore unrelated to real life. I’m unsure how MOOCs might be used to improve things but maybe a crucial first step would be to encourage interaction, almost any type of interaction, between connected participants before expecting anything like deep learning to happen. Rhizo14 certainly encouraged interaction and passionate learning. Interestingly, now I see that several enthusiastic Rhizo14 learners may be passing the ‘teacher test’ by taking over and extending the course themselves – way beyond its nominal 6 week period!

