Connection not Content

A Blog for MOOCs and Other Animals

MOOC Scraper Update (3) – (and Hello #FutureEd !)

with 18 comments

MOOC Comment Scraper

Experimental Comment Scraping
(Based on ‘la vaca de los sinvaca’ – by José Bogado)


I unleashed my experimental MOOC Comment Scraper on the Rhizomatic Learning MOOC (#rhiz014) run by Dave Cormier from Jan 15th and have been updating it once or twice a day (latest output). The idea behind the Scraper is to get a quick impression of MOOC activity by creating very brief summarised versions of recent blog posts along with their comments. For some reason this type of presentation does not seem to be readily available via feed readers but I’ve found the Scraper useful, particularly for connectivist style  MOOCs where activity is typically distributed across numerous blogs, some of which may not be active at any one time.

In contrast, my xMOOC experiences (eg in a Coursera Philosophy MOOC) suggest that blogging around these ‘instructivist’ MOOCs is not nearly so common. Having joined Cathy Davidson’s ‘History and Future of (Mostly) Higher Education’ (#FutureEd) my introductory spiel sank without trace in the usual enormous and clunky Coursera forum but Cathy Davidson herself has reservations about the stereotypical xMOOC and this particular Coursera MOOC (“…not just a MOOC, it’s a movement.”) does seem less centralised. I’ll be looking out for participant blogs.

Rhizo14 is a good guinea pig for the Scraper and I appreciate the significant number of participants who actively blog and comment on each other’s posts generating lively discussions with long comment streams. Some posts have attracted around 30 comments – all types and lengths and this has facilitated the squashing of several bugs in the Scraper program (A recurring problem is dealing with ragged loose ends when HTML and other ‘hidden’ codes in comments are chopped up.) At present, about 60 WordPress and Blogger blogs are being scanned and comments extracted for all posts tagged, #rhizo14 over a time ‘window’ of the last 10 days. The participants seem happy to have their comments abbreviated and published in this way but it would be a simple matter to remove any blog if required.

The graph below gives some indication of how commenting in rhizo14 is developing with time. This is no scientific study, particularly for the first few days when blogs were being added and  no posts were too dated to be lost from a  time window that itself was being adjusted. However, the period from Jan 23 was more stable with a constant 10 day window. Both comments and posts seem to have peaked around Jan 30 but interestingly, even though comment and post numbers have now dropped a little, the average number of comments per post is being maintained at over 5.


KEY:   BLUE = No. of posts. RED = No. of comments
YELLOW = Average Comments per post x 100

I’d be very grateful for any constructive comment or criticisms of the Comment Scraper, particularly if you’ve been viewing the output over a period of time. There are several directions in which the Scraper could be developed. More or less output text could be provided or posts without comments could be identified but there may be rather more fundamental changes worth making.

How do you rate the Comment Scraper? – please mark out of 10 where:

0 = Useless
5 = Sometimes useful but I rely mainly on other tools
10 = I couldn’t live without it!

However busy you are please try at the very least to leave your mark out of 10 below so I get some sense of the Scraper’s perceived utility! Thank you!

Written by Gordon Lockhart

February 4, 2014 at 9:19 pm

Posted in Mooc, rhizo14

18 Responses

Subscribe to comments with RSS.

  1. Gordon, I like the scraper. Use it to have a quick look and find interesting discussions.
    Watch for not answered or commented posts.
    People do post on Twitter when they post a blogpost. But these twitter messages are reposted, that is confusing . It makes this question difficult: What is new and what is older.
    I vote for 10.


    February 4, 2014 at 9:53 pm

    • Thanks Jaap ! – yes that’s how I see the Scraper’s usefulness. Also, comments in the feed come latest first so the Scraper has to reverse the time order so they read from old to new for all posts.

      Gordon Lockhart

      February 4, 2014 at 11:46 pm

  2. We have something in common in that we both enrolled in IntroPhil and FutureEd! I am not sure what the comment scraper is for? So I am therefore not sure how it would be useful. In the graph is the date of the comments the day they were made? Why is the yellow multiplied by 100? Best, Susan

    From: Connection not Content Reply-To: Connection not Content Date: Tuesday, 4 February, 2014 16:19 PM To: Susan Elgie Subject: [New post] Scraper Update (3) (and Hello #FutureEd !) Gordon Lockhart posted: ” I unleashed my experimental MOOC Comment Scraper on the Rhizomatic Learning MOOC (#rhiz014) run by Dave Cormier from Jan 15th and have been updating it once or twice a day (latest output). The idea behind the Scraper is to get a quick impression of MO”

    Susan Elgie

    February 4, 2014 at 10:44 pm

    • Thanks for your comments Susan. The Scraper brings together on a single page abbreviated versions of posts and comments associated with a MOOC but I’m still finding out how it’s used – everyone filters information in different ways. A date on the graph refers to when posts and comments were counted over the previous 10 days; ie the graph provides an indication of activity averaged over a 10 day ‘moving window’. The yellow is multiplied by 100 only to make it visible on the graph so in this case the vertical scale should be divided by 100. Hope this helps!

      Gordon Lockhart

      February 5, 2014 at 8:19 am

  3. Right now I have to say 5 out of 10. It took me several visits to figure out that fragments of posts were included. That makes it much more useful than when I thought it was exclusively comments. I will likely check more often now to find gems of blogs not linked in the Facebook group. Will the scraper harvest WordPress tags, or does “#rhizo14” have to be in the actual title?

    Jim Stauffer

    February 5, 2014 at 1:18 am

    • Hi Jim and thanks for your helpful comment. The Scraper is trying to generate a sort of FB type of thread in one place out of blog posts with comments distributed over many places. Yes, the 2nd line in ‘bold’ of the Scraper’s entry for every post is the 1st line of the post itself and the following lines are its comments in date order. It’s difficult to know just where to set the balance though. Maybe the post should get 2 lines and/or maybe the blog title should appear. The Scraper easily recognises rhiozo14 as a WP ‘tag’ or ‘category’ or if inserted in the post title – and similar for Blogger (not with Google+ though). However, the appearance of rhizo14 only in the body of a post does not work – I’d wear out my computer searching for it through so many posts 🙂

      Gordon Lockhart

      February 5, 2014 at 4:01 pm

  4. Vote for 10: following #rhizo14 wouldn’t be the same thing without it, and would be a less rich experience. Very useful tool. Thanks Gordon!


    February 7, 2014 at 10:20 pm

  5. My vote is a 10. I use it to keep track of blogs and comments. It helps me to find what I am looking for but also to find what I don’t expect eg a blog I may never have visited. In that way it’s not only a tool for management but also for helping me to make my experience more diverse.
    It would be useful on a MOOC but may not scale well (I don’t know what % of MOOCers blog on huge MOOCs).

    Frances Bell

    February 9, 2014 at 2:52 pm

    • Thanks for your comments Frances – it was something I would have liked (the very much more powerful and sophisticated) gRSShopper to provide during CCK11. Re discovering blogs, the current Scraper ignores all posts without comments but sometimes perfectly relevant posts get no comments at all – maybe it should flag ‘new’ posts.

      In regard to the mammoth xMOOCs, I tried the Scraper on the Edinburgh philosophy MOOC (Coursera) last year but only a handful of participants out of several thousand were blogging (and not a few were the usual suspects! – see ) so no scaling problems. FutureEd may be different as Cathy Davidson is encouraging a more distributed approach.

      Gordon Lockhart

      February 9, 2014 at 4:42 pm

  6. Hi Gordon
    Thank you for your brilliant work. Sure there will be improvements, but for me 100/10 for being v.useful.
    Can it be envisaged for other “courses” ? Would be interested for September 2014.


    February 18, 2014 at 7:43 pm

    • Thanks – the Scraper seems a good match for cMOOCs such as rhizo14 where significant numbers of participant blogs attract lengthy comment threads. New posts are sometimes overlooked by participants so posts without comments are now also included for a couple of days after they first appear.

      The Scraper could probably be applied to any online course where there is a need to aggregate posts and comments from a miscellany of known blogs. When I’ve time I think I’ll write an account for any competent coder who might wish to create their own version. In any case the Python source code for my version will be available when I ‘ve cleaned it up.

      Gordon Lockhart

      February 18, 2014 at 9:16 pm

  7. Reblogged this on MOOC Madness and commented:
    Keep up with comments on #rhizo14 and #FutureEd blogs (mostly the former)


    February 27, 2014 at 5:07 pm

  8. If I’d actually remembered to use the scraper,I’d go for a 10. I’ll probably use it for post-course gap-fill/catch-up. This round, I spent most of my comment dime on FB, which has drawn out more discussion than G+ and even Twitter. Announcing blog posts in more than one place probably does help participants catch posts they might have missed but it sure clogs up the mail box too, especially when there are more than three. A fair amount of #FutureEd blogging happens on the HASTAC site. Could those be scraped?

    The Stanford MOOCs on the Venture (or something like that) platform had externalizable blogs with rss feeds built into the platform and an optional randomizer for forum posts to up the odds a post not sinking unseen. They are now (I think) EdNovo and may have dropped the randomizer. The design kinship to the Coursera LMS is clear but this version seems less clunky.

    I’ve taken to posting links and sharing across and among social media platforms.

    Anyway, the main reason I remembered to check in was to leave you the link for the art history course I just started that has the feel of being different, a touch more connectivist (minus full chaos effect) and possibly more blog friendly. Another point of difference is an emphasis on illustration and design.

    Live!: A History of Art for Artists, Animators and Gamers,


    February 27, 2014 at 6:10 pm

  9. Thanks Vanessa – I originally meant to try the Scraper on FutureEd blogs but Rhizo14 turned out to be such a good guinea pig. That and a few days’ visit by my grandson kept me very busy! I’m now working on a ‘final’ version of the Scraper combining both Blogger and WordPress versions and when that’s done I’m hoping to document the concept in enough detail for a competent programmer to develop their own version. Particular uses that the Scraper could be put to are likely to require modification in different directions but I’ll make my version available when I’ve properly tested it. My version only works on ‘standard’ WP or Blogger posts as together they account for a large proportion of posting. Anything else that’s not used so much could require a very different approach leading to diminishing returns.

    Thanks for the link – I’ll take a look!

    Gordon Lockhart

    March 3, 2014 at 5:51 pm

  10. Scraper was awesome. Including it in presentations today and tomorrow. Hope you don’t mind me STEALING YOUR DATAS

    • Thanks Dave – no, not at all. I have fairly complete daily records if you need any more stuff.

      Gordon Lockhart

      March 13, 2014 at 12:08 pm

  11. […] ran a previous version of the program during the rhizo14 MOOC producing a graph showing (roughly) how commenting developed with time. The first graph illustrated below is similar […]

Leave a Reply to jollyroger Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: