I’ve been running my new Comment Collector program during the Connected Courses (ccourses) MOOC and updating the output on a daily basis. The idea is to get a quick impression of current MOOC activity by bringing together in one place brief summarised versions of blog posts and their comments. Posts with comments are displayed for 15 days in order of their latest comments while posts without comments are flagged ‘New Post’ and displayed for 3 days. These parameters reflect my own ideas of what might be useful and can easily be changed.
The Collector currently scans the RSS post and comment feeds of a subset of blogs taken from the list of syndicated blogs. RSS feeds can lose old data so the Collector aggregates posts and comments over the 15 day periods. Posts intended as ccourses contributions are recognised by a tag placed in the post. There are currently over 230 syndicated blogs listed but some are inactive or have posts without recognisable tags in a label, category or in the title. Originally, the Collector recognised only ‘ccourses’ as a tag but this was altered so that variants such as ‘connectedcourses’ or ‘Connected Course’ were also recognised (not ‘cc’ – ‘cute cats’?) resulting in a significant increase in the number of accepted posts. The Collector works with most WordPress or Blogger blogs but not with some other commenting methods (eg tumblr, G+, FeedBurner etc) or blogs without comment feeds. At present, the Collector scans about 80 blogs with suitable feeds and probably covers the majority of active ccourses bloggers.
I ran a previous version of the program during the rhizo14 MOOC producing a graph showing (roughly) how commenting developed with time. The first graph illustrated below is similar and shows the total number of posts (blue) and comments (red) displayed each day (normally evening BST) and published with recognised ccourses tags over the preceding 15 day period. Again, this is no scientific study. The Collector is experimental and adjustments were made during the 31 day period covered by the graph This applies particularly to the first few days when blogs were being added and removed and the aggregation period was less than the nominal 15 days. A few blogs were removed because apparently valid RSS feeds could not be accessed by the Collector (reasons beyond me!). A sudden increase in posts and comments on the 25th Sept was caused when the number of recognised tags was increased. Subsequently, the graph is at least indicative of post and comment activity over the 80 or so blogs being scanned with not much variation around an average of about 60 posts and 225 comments over 15 day periods. For clarity, the average number of comments per post for each period (yellow) is scaled up by a factor of 100.
The second graph below is an attempt to estimate the distribution of specific numbers of comments among all recognised posts (495 in total) over the entire period from Sep 24 to Oct 24. For example, the first point indicates that 19 posts received 1 comment. The missing zeroth point corresponding to posts with zero comments would have indicated that 78 posts received no comments at all (displaying it would have compressed the vertical scale). This seems high but includes blogs with at least one recognisable post followed by other posts that may or may or may not have been intended for ccourses but with no recognisable tags. The sample lacks statistical significance but a cluster of posts with around 2 comments and maybe other clusters are discernible followed by a long tail of up to 18 comments for some single posts.
Other quantitative types of analysis are possible and may be useful for research or other purposes. For example, representations of the network of connections created by participants in a MOOC as they comment on each other’s posts could be of interest, maybe along the lines of what Martin Hawksey has done for Twitter. There are other possibilities – ranking people by name in order of number of posts or comments? This seems more questionable than ranking tweets in the same way but where should the line be drawn and why? Advice and suggestions welcome!
Thanks to all ccourses folks who have retweeted and favorited the Collector updates. The rapid turnover of ccourses posts and comments has field-tested the Comment Collector well – sometimes to breaking point! I will keep it running until the formal end of Connected Courses and now that the program is reasonably stable it’s little trouble to continue publishing the output. However, there are several other methods available to ccourses participants for monitoring activity such as the blog aggregator, the forum, the Facebook page etc and I’m unsure to what extent the Comment Collector has a useful or distinct role to play.
As always, comments and suggestions are very welcome but at the very least if you find the Collector useful, please ‘like’ this post so I have some measure of the Collector’s value in the context of ccourses – thanks!
I’m all set to field test my new Comment Collector. I’ve been tracking the acclaimed Connected Courses MOOC using blogs from the syndicated blog list . There are almost 200 blogs listed now and there’s considerable activity and commenting – I can’t keep up! The Collector is set up only to process blogs with ccourses as a tag, as a category or in the title of a post [now extended to accept connectedcourses and several variants], indicating that the post is intended as a ccourses contribution. Only a fraction of listed bloggers seem to be using this tag at present and I have not yet included some who are.
As a sampler of MOOCs (I’ve stopped using the ‘lurker’ word!), I’ve found the Collector very useful for cMOOCs where activity is distributed among many blogs. Everyone has different aims and objectives however and comments on the usefulness of the Collector or otherwise are very welcome. I will try to keep it up to date as ccourses proceeds.
- The Comment Collector generates brief summaries of many WordPress and Blogger posts and comments by scanning and aggregating RSS feeds. The idea is to highlight centres of activity and discussion rather than aggregate whole blogs. Original posts with full comments are accessible (in new browser tabs) by clicking on the post titles.
- The Collector is experimental and I may change some of the settings as ccourses proceeds. A recent change ordered posts according to the date of their latest comment (rather than date of post) to achieve a ‘Facebook style’ presentation. Also, post summaries now expand as the number of comments grows.
- Currently, the Collector displays new posts with or without comments for 3 days and posts with comments up to 15 days.
- Previously I used a page on this WP blog (eg for rhizo14) to publish output but found that placing output here is more straightforward in view of the HTML coding required by recent changes.
- It’s not practical to ask permission of all MOOC participants who may have fragments of their posts and comments published (and maybe munged!) but I will exclude any blog if requested by the author.
- The Collector amasses a considerable amount of data on the progress of a MOOC. This may be used for statistical purposes (eg the graph produced during the rhizo14 MOOC).
- I have no intention to develop the Collector for any commercial purpose and Programming details are openly available.
I read with interest George Siemens’ recent article on ‘Activating Latent Knowledge Capacity‘ and in particular:
The one draw back to networked learning is that while we have managed to advance conversation on the fragmentation of learning so that it is not a cohesive whole created solely by the instructor, we have not yet advanced the process of centring or stitching together fragmented parts into cohesive wholes for individuals.
This has certainly been my experience of networked learning and is particularly true of the mammoth xMOOCs where huge clunky forums can be so overwhelming that a majority of participants just keep away. In connectivist MOOCs, participants are encouraged to use their own blogs and social media for interaction but there’s still a need for ‘defragmentation’ – a means of signposting MOOC activity in one place in ways that are meaningful to the individual. There are of course RSS readers but these focus on the aggregation of blog posts rather than active discussion and interaction between participants. MOOCs sometimes pull together participant blog postings into a single ‘blog hub’ but the resulting presence of duplicate posts can be confusing, particularly if discussion about the same post is fragmented with comments appearing on the hub independently of other comments on the original post.
Interaction between the participants of a MOOC can centre around social media such as Google Plus, Facebook or even Twitter and a dedicated Facebook Group page can be very effective in tracking current activity. In this case, the most recently active threads appear first, often with relevant images and the non-active ones gradually fall into obscurity. This can result in timely and fast-moving forum discussions although the various threads are unlikely to carry the more substantive contributions typical of blog posts. Over-dependence on social media is not without a price. Participants who are not registered for some services will be excluded and there is the inevitable manipulation of users and their data for commercial purposes.
Setting aside the practical problems of implementation, what considerations should apply to the design of a ‘MOOC defragmenter’?
- Give primacy to personal web space - First and foremost participant web spaces should be recognised and scanned as the major source of data. Currently, most participants are unlikely to have their own web spaces but setting up a personal blog has never been more straightforward. The trend towards establishing a digital presence on the open web is set to continue as the benefits of controlling one’s own data and digital identity become more widely recognised. Innovation such as ‘Domain of my Own‘ demonstrates that owning and managing one’s own slice of the web is not just for the geeks.
- Signpost current activity – Create a concise overall view of current MOOC activity on a single page. Focus on the wood rather than the trees. Activity could be signposted by direct links to participant spaces with older material dropping down in classic Facebook style. Little manipulation of content or additional material is envisaged so that participants are encouraged to move on to whatever fragment of the MOOC they find of particular interest.
- Openness Rules! – Clear information about rules, settings and any other uses made of participant data should be freely stated and available. Selecting output from a miscellany of inputs necessarily involves a set of rules that are designed to bring together and output MOOC fragments as a cohesive whole. Rules, however, can be manipulated, commercial advertising being the obvious example. Data could also be collected for academic research or other purposes. MOOCs are complex systems and the rules governing effective defragmention are also likely to be complex. Some rules may be misunderstood, unacceptable or even detrimental to the interests of some participants.
Returning to the practical, I have experimented with a ‘MOOC Comment Scraper‘ that generates brief summaries of WordPress and Blogger comments and posts by scanning the RSS feeds of participant blogs. The latest version was well tested during the excellent Rhizo14 MOOC and considered to be a useful facility by many participants (see MOOC Comment Scraper Output – #rhizo14 ). Further development has now resulted in a ‘Comment Collector’ where output items are ordered according to the date of a post’s latest comment rather than the date of the post itself. An example output was derived from a real MOOC (nonsense text replaces real). The presentation could be enhanced in a number of ways but as an amateur programmer I’m unlikely to produce a really comprehensive MOOC defragmenter! All the same, I’d be pleased to find another MOOC for a field test.
My MOOC Comment Scraper had a great run during the Rhizo14 MOOC – was even mentioned by Dave Cormier in his recent presentation (‘Why teach MOOCs – MOOCs as a selfish enterprise (talk at MIT)‘)! Judging from the comments I received during Rhizo14, the Scraper could be employed in a variety of situations supporting MOOCs or other online events where it’s useful to aggregate blog posts and comments in an abbreviated form. There seems to be an unexplored niche for open aggregation tools that simply abbreviate text one click away from distributed sources – and don’t attempt to entrap users for commercial purposes!
Use of the Comment Scraper – My own conception of the Scraper seems best suited to cMOOCs. Here, much or even most discussion, is distributed among numerous participant blogs, some of which may be inactive at any particular time. A quick impression of where the latest posts are, how various discussions are developing and who is involved, can be more useful than aggregators providing considerably more text requiring lengthy scrolling.
The current version of the Scraper merely links to a post with comments giving very brief details: date, authors etc. (see sample output). At the expense of some extra text a more advanced version could supply more detail such as twitter and Facebook identities of post and comment authors. Since individual blogs are the focus of discussion in cMOOCs it may be counterproductive to allow direct commenting on a page along with the Scraper output although ‘meta-comment’ on the cMOOC itself might be useful if the Scraper output were displayed as part of a ‘hub’ website for the MOOC.
Potential uses for a Comment Scraper may differ, perhaps considerably from my own use, so I’ve briefly described my approach along with a summary of the program and this might assist a competent programmer to develop their own version for their own purposes. I’m not a particularly competent programmer myself (the Scraper was originally developed as an exercise in learning Python) but if anyone wants the Python source code for non-commercial purposes I will (shortly) make a cleaned-up version available on request.
Privacy, Legal and Other Issues – The Scraper’s output consists almost entirely of other people’s work, scraped from blogs and published without their permission. It’s not really practical to contact the authors of all blogs and commenters individually in a MOOC but I’ve always been willing to exclude any blogs or comments by any author on their request. To date I’ve never received any such request and those who contacted me have always been positive about the use of the Scraper.
I have little understanding of the legal issues involved here and confess I’ve done little to find out. I do not know who ‘owns’ the posts or comments in a proprietary blog nor the legal status of a ‘remix’ consisting of fragments of text from numerous sources with authors identified. I suspect it could be a complicated matter – any advice?
Unfortunately, the current version of the Scraper is only compatible with WordPress and Blogger blogs. Together these define ‘standard’ RSS formats that account for a very large proportion of all blogs but inevitably a small minority are excluded. Clearly, all participants in a MOOC should be represented on an equal footing regardless of their blog type. It may be possible to make special provision for some other blog types provided RSS feeds are available but if not, comment scraping would seem to be considerably more difficult to implement.
I did not use the Scraper to collect data in any rigorous way but it certainly could be used for research purposes such as studying the rise and fall of posting and commenting in a cMOOC (eg the graph I plotted using rhiz014 data). Again, this raises unexplored issues concerning the analytical use of a Scraper as there are clearly dangers in the misuse of such data even in a statistical form.
I’ve been following several MOOCs simultaneously and often just lurking as I’m usually more interested in how MOOCs are developing than their content. The smallish cMOOC on ‘Rhizomatic Learning – The community is the curriculum‘ (Rhizo14) led by Dave Cormier held my attention, partly because I was using it as a test bed for my MOOC Scraper but also because its ‘content’ was largely created by by the participants themselves. Cathy Davidson’s very much larger xMOOC ‘History and Future of (Mostly) Higher Education‘ (FutureEd) was also fascinating but in a different way as she positively encouraged independent activity outside the MOOC – think Incredible Hulk trying to break out of its xMOOC clothes!
On the whole, I’m positive about MOOCs and there are several areas where I think MOOCs can be very effective. Connecting and updating professionals, stimulating the interests of well-motivated lifelong learners, providing educational opportunities where none existed before are a few. I welcome the different MOOC formats that are emerging and I don’t share the usual concerns about dropout rates. Someone close to me with lifelong interests in languages and literature joined an xMOOC on Climate Change and for the first time in her life bought a popular science magazine and found it interesting. MOOCs have the power to transform learners, sometimes unexpectedly but usually for the good. Even the removal of pig ignorance can count as education but ….. everyone needs to be a deep learner at times.
During Rhizo14 there was some controversy about the relevance or otherwise of certain French philosophers. ‘Skimmers’ and others may have perfectly good reasons for neglecting them but in deep learning mode you take the time and trouble to read them in whatever detail is necessary to make an informed decision – even if you find French philosophers excruciatingly dull and boring!
Having taught engineering courses at a university for more years than I care to remember, I wonder how MOOCs can deal with deep learning in circumstances where it’s vitally important to demonstrate competence, understanding something all the way through as opposed to a superficial or ‘working’ knowledge? This is no elitist concern of interest only to PhD students or just Higher Education. A huge number of vocational courses are wholly or partly of this type – an electrician’s understanding of your wiring is just as vital as a brain surgeon’s! Teaching something to someone else is not a bad test of understanding (as many parents find out trying to help their kids with homework!) but what proportion of a MOOC’s participants could begin to teach or demonstrate real competence in the topics they study? For the typical mammoth xMOOC I would guess very few, particularly if they had little prior knowledge of the subject matter. I would also be surprised if many of those gaining current Statements of Accomplishment could demonstrate real understanding. (Anyone want me for a Philosophy 101 tutor on the basis of my Coursera Certificate?)
Deep learning can be very rewarding but it can also be time-consuming, not particularly interesting and hard work – as many budding PhD students find out all too quickly. Encouraging deep learning in MOOCs may not be so problematical given well-educated and motivated participants as in Rhizo14 and FutureEd but in the wider world where education may be prized more as a meal ticket rather than for its own sake, the traditional training course, ‘taught to the test’, is often viewed by students as little more than an irksome chore unrelated to real life. I’m unsure how MOOCs might be used to improve things but maybe a crucial first step would be to encourage interaction, almost any type of interaction, between connected participants before expecting anything like deep learning to happen. Rhizo14 certainly encouraged interaction and passionate learning. Interestingly, now I see that several enthusiastic Rhizo14 learners may be passing the ‘teacher test’ by taking over and extending the course themselves – way beyond its nominal 6 week period!
I unleashed my experimental MOOC Comment Scraper on the Rhizomatic Learning MOOC (#rhiz014) run by Dave Cormier from Jan 15th and have been updating it once or twice a day (latest output). The idea behind the Scraper is to get a quick impression of MOOC activity by creating very brief summarised versions of recent blog posts along with their comments. For some reason this type of presentation does not seem to be readily available via feed readers but I’ve found the Scraper useful, particularly for connectivist style MOOCs where activity is typically distributed across numerous blogs, some of which may not be active at any one time.
In contrast, my xMOOC experiences (eg in a Coursera Philosophy MOOC) suggest that blogging around these ‘instructivist’ MOOCs is not nearly so common. Having joined Cathy Davidson’s ‘History and Future of (Mostly) Higher Education’ (#FutureEd) my introductory spiel sank without trace in the usual enormous and clunky Coursera forum but Cathy Davidson herself has reservations about the stereotypical xMOOC and this particular Coursera MOOC (“…not just a MOOC, it’s a movement.”) does seem less centralised. I’ll be looking out for participant blogs.
Rhizo14 is a good guinea pig for the Scraper and I appreciate the significant number of participants who actively blog and comment on each other’s posts generating lively discussions with long comment streams. Some posts have attracted around 30 comments – all types and lengths and this has facilitated the squashing of several bugs in the Scraper program (A recurring problem is dealing with ragged loose ends when HTML and other ‘hidden’ codes in comments are chopped up.) At present, about 60 WordPress and Blogger blogs are being scanned and comments extracted for all posts tagged, #rhizo14 over a time ‘window’ of the last 10 days. The participants seem happy to have their comments abbreviated and published in this way but it would be a simple matter to remove any blog if required.
The graph below gives some indication of how commenting in rhizo14 is developing with time. This is no scientific study, particularly for the first few days when blogs were being added and no posts were too dated to be lost from a time window that itself was being adjusted. However, the period from Jan 23 was more stable with a constant 10 day window. Both comments and posts seem to have peaked around Jan 30 but interestingly, even though comment and post numbers have now dropped a little, the average number of comments per post is being maintained at over 5.
I’d be very grateful for any constructive comment or criticisms of the Comment Scraper, particularly if you’ve been viewing the output over a period of time. There are several directions in which the Scraper could be developed. More or less output text could be provided or posts without comments could be identified but there may be rather more fundamental changes worth making.
How do you rate the Comment Scraper? – please mark out of 10 where:
0 = Useless
5 = Sometimes useful but I rely mainly on other tools
10 = I couldn’t live without it!
However busy you are please try at the very least to leave your mark out of 10 below so I get some sense of the Scraper’s perceived utility! Thank you!
The MOOC Comment Scraper brings together brief summarised versions of recent blog posts along with resulting comments (See A ‘Comment Scraper’ for Aggregating Blog Posts with Comments in a MOOC, the update, FAQ and an output). The idea is to provide a quick up-to-date impression of posts and comments relating to a particular MOOC. I’ve experimented with the Comment Scraper on several MOOCs but not surprisingly, the concept works best with the connectivist MOOC style where significant debate and discussion can often be found in the blogs of participants rather than in the centralised forums favoured by most xMOOCs.
The P2PU course run by Dave Cormier, ‘Rhizomatic Learning – The community is the curriculum‘, is a good opportunity for further experimentation and several participant blogs have already appeared with comments. I’m intending to display the Scraper output on the page, ‘MOOC Comment Scraper Output – #rhizo14‘, and will try to keep it up-to-date. It’s not practical to seek permission to do this from all authors but past experience suggests that nobody is too concerned – of course I will exclude any author if they request.
I’m not sure how the Scraper should should be developed, if at all, so any comments about the design or about errors, omissions etc are very much appreciated. Previously I included an RSS feed on the display page so that the Scraper output could be fed to a Reader but had no feedback as to whether this was useful – I’d be happy to include it again if required (now included!).