Mar 15, 2006

Miles Berry

Year 6 are just starting to work through a statistical investigation, around the idea of comparing pieces of text. We start off with newspaper stories from three different newspapers (used to be a red-top, the Mail or Express, and a broadsheet, but now it’s just three different tabloids).

Lots of great ideas from the class (and some of their parents as this was open morning) as we brain-stormed our way through what differences we might find, and how we could measure them. After contemplating setting large proportions of the school a timed comprehension test, we settled on the idea of comparing the vocabulary and the grammar, measured by word length and sentence length respectively, and also brought out the idea of a fair test – that we’d need to compare the same story from the three papers: not an easy thing to do today, it turned out, but the Commonwealth Games came to our rescue.

We’re heading, of course, in the direction of standard reading age measures, like Flesch-Kincaid, and wikipedia’s links helpfully pointed me in the direction of Dave Child‘s cool php script for working this and a couple of other measures out. Dave’s actually made the underlying code public too; quite impressive the way he works out what a syllable is. Now it wouldn’t be too difficult to hack something together to display the raw means for word and sentence length too, which is what my class will be working with for the time being, but a quick e-mail to Dave persuaded him to include these on the output page of his script; this’ll save so much time on the number-crunching side of the work, and so kind of him to build this in for us. I’ll try to post again with some of the results.