Connection not Content

A Blog for MOOCs and Other Animals

#Change11 :: MOOC Comment Scraper (update)

with 13 comments

To recap, I’ve been developing a MOOC Comment Scraper that brings together brief summarised versions of recent blog posts with their comments (‘A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC‘). The idea was to provide a quick impression of current MOOC activity but in principle any online event where discussion is distributed over participant blogs could be treated in a similar way.

I’ve found the Scraper helpful myself but we all work in different ways so I would like to know how useful such a tool might (or might not) be to other people? Some aspects of design, choice of colours etc, are easily changed. Others that depend on the properties of input feeds, less so. Further development could proceed in several different directions (eg as a research tool) – suggestions are welcome!

MOOC Comment Scraper

Scraping a MOOC for Comments (Based on ‘la vaca de los sinvaca’ – by José Bogado)

As an experiment I will attempt to scrape the Change11 MOOC on a daily basis until its completion. The Scraper output will be published here: MOOC Comment Scraper. An RSS feed (tested OK for Google Reader) is provided by this link: Scraper RSS Feed.

I’ve adopted two forms for publication – ‘full’ and ‘abbreviated’ (examples below with links disabled). The full form includes the first line of the latest comments (pingbacks excluded) whereas the abbreviated form omits comment text but retains dates and commenter user names. The full form could be considered to be a derivative work so I’m using this form only with the explicit permission of the blogger or if the blog bears a CC licence permitting derivative works. More full form permissions (currently Jaap, John and Brainysmurf – thanks!) would be very welcome but equally, any request by a blog author not to scrape their blog in any form will be respected.

Full Form:

20 Apr: ‘Scraping Comments off MOOCs – good or evil?’ by gbl55
I’ve decided that comment scraping in the full form could be considered a derivative work and so…

20 Apr: What nonsense! Who cares? Publish and be damned! Everyone else does and wh…(AlterEgo)
21 Apr: Thanks AlterEgo but some MOOCers could be offended by unintended juxtaposition of th..(gbl55)
22 Apr: They might be just as offended by the so-called abbreviated form! Get a lawyer or…(AlterEgo)
22 Apr: I just thought I should draw the line there AlterEgo – no I don’t have any legal training bu…(gbl55)
23 Apr: In my learned opinion and notwithstanding the above, hereupon the party of the fir…(LegalEagle)
30 Apr: Thanks LegalEagle for information on copyright violation and risk of my extradition to the…(gbl55)

Abbreviated Form:

20 Apr: ‘Scraping Comments off MOOCs – good or evil?’ by gbl55
I’ve decided that comment scraping in the full form could be considered a derivative work and so…
Comments by:
 AlterEgo(20 Apr), gbl55(21 Apr), AlterEgo(22 Apr), gbl55(22 Apr), LegalEagle(23 Apr), gbl55(30 Apr)

Notes on the current implementation:

I’m no expert on RSS or in coding – some of the following may be misinformed!

  • Current operation is experimental – apologies if comments are altered in peculiar ways.
  • Output is derived only from WordPress and Blogger post and comment feeds. Together these account for a large proportion of MOOC blogs. I have yet to look at other types of feed.
  • Output is limited to the contents of current post and comment feeds and placed in order of date of posting – latest posts first.
  • The Scraper does not aggregate posts and comments (like a reader does) – only the updates provided by feeds are made available at any one time.
  • The Scraper ignores pingbacks. A flurry of very recent pingbacks can push slightly earlier comments out of a current feed.
  • The Scraper is not smart enough to deal with very complex HTML at the beginning of a post or comment and may even default by deleting text. Messages sent by the Scraper itself are indicated by: { … }:  eg {image} if an image is found.
  • A time limit has been set so that only postings less than about 2 months old are included. The greater the number of active blogs the tighter the limit required to avoid generating reams of output.

Written by Gordon Lockhart

May 1, 2012 at 8:31 pm

Posted in Uncategorized

Tagged with

13 Responses

Subscribe to comments with RSS.

  1. […] on gbl55.wordpress.com Valora esto: Me gusta:Me gustaSé el primero en decir que te gusta esta post. […]

  2. Not just nifty but the apex of nifty, dedicated grinder as my hip hop slam pals would say – btw that’s an ultimate compliment. I totally missed the first mention – chalk it up to reader overflow (and just added the new feed!) and running too many blogs (mostly non mooc). Comment feed topic came up a while back, and just when I’d been thinking about longish comments I post and then lose track of. Very glad you are doing. If I were actually posting at my more or less designated mooc blog, permission would be a given

    VanessaVaile

    May 3, 2012 at 4:29 am

    • Compliments appreciated 🙂 I’m inclined towards throwing technology at the chaos and confusion associated with connectivist MOOCs but have difficulty in figuring out the ramifications. I never understand why supermarkets don’t have simple terminals at the entrance where shoppers can find out where things are and go directly there – probably to encourage random wanderings and impulse buys. Maybe something similar applies to MOOCs!
      Gordon

      gbl55

      May 3, 2012 at 10:52 am

      • Terminals at the entrance, like malls. You are here. Wonderful. MOOCs could use those too. Daily tries buy does not quite do it. No terminal where someone could type in poetry, programming, curiosity, etc and out would pop Gordon Lockhart.

        VanessaVaile

        May 6, 2012 at 1:00 am

  3. […] background-position: 50% 0px ; background-color:#222222; background-repeat : repeat; } gbl55.wordpress.com – Today, 12:52 […]

  4. OK looks like I’ve had a reawakening and am back mooc blogging at Computers Language Writing after a fashion…. so scrape me http://computerslanguagewriting.blogspot.com/ (been blogging elsewhere but unrelated)

    VanessaVaile

    May 7, 2012 at 7:53 pm

    • With pleasure! Comments duly scraped and now showing. Gordon

      gbl55

      May 7, 2012 at 11:33 pm

  5. […] have also been very interested in the work that Gordon Lockhart  has been doing on scraping blog […]

    • I read about your comment scraper back during Change11, Gordon, and was so impressed that I added such a tool to my Wish List for the tech guys who are helping with a course I’m designing. Vanessa Vaile in POTcert just answered my call for a ready-made comment tracker and suggested your scraper. Is it something you can share now? Could we use it in POTcert and beyond — in our only tiny MOOCs and classes? So aren’t you planning on revolutionizing the Web with this? See interesting article: http://knappster.blogspot.com/2012/02/why-hasnt-universal-commentingdiscussio.html

      Cris

      September 2, 2012 at 7:55 pm

      • Thanks Cris – the comment scraper was as interesting experiment but apart from some positive comments from Vanessa and others there was little feedback and it didn’t seem to be used much so I put it on the back burner (see the MOOC Comment Scraper – FAQ) – but I would be delighted to try again! In its present form it can only scrape comments from WordPress or Blogger blogs (or compatible RSS formats) but I could probably get it going again sometime next week (I’m on vacation now and all the programs are on my home PC) by publishing the results here (see MOOC Comment Scraper) in the same form as I did for Change11. In principle I’m happy to release the programs but in practice I need to do some revising and documentation – if only to make them comprehensible to anyone else! If you’d like to contact me by email my address is iberry dot com at gmail dot com.
        Gordon

        gbl55

        September 3, 2012 at 10:16 am

      • Thanks so much, Gordon. I’ll definitely contact you by email. You are generous to help. Enjoy your vacation!

        Cris

        September 3, 2012 at 2:30 pm

  6. […] (See A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC and the update) and FAQ. The idea is to provide nothing more than a quick impression of current MOOC activity […]

  7. […] comments (See A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC, the update, FAQ and an output). The idea is to provide a quick up-to-date impression of posts and comments […]


Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.