Data Science in Schools

Sep 06, 2019

Miles Berry

I’ve no doubt that good CS education involves finding some motivating contexts for getting the ideas across, and for pupils to get to grips with programming. Lots of teachers have found their pupils highly engaged through creating games and animations, or through interacting with the real world through physical computing and robotics, or, perhaps more unusually, through algorithmic art or composing music. I think we could make a good case for adding some data science into this mix, getting pupils to do a little visualisation and exploratory data analysis, and through this starting to answer some genuinely interesting questions.

When we wrote the English computing curriculum, we included some explicit references to working with data: 7-11 year olds are taught “collecting, analysing, evaluating and presenting data”, and 11-14 year olds “undertake creative projects that involve selecting, using, and combining multiple applications, preferably across a range of devices, to achieve challenging goals, including collecting and analysing data.” Or at least they’re supposed to. CSTA’s standards go quite a bit further, with a whole strand given over to data and analysis, with a clear sense of progression and ambitious targets for high schoolers like “Create interactive data visualizations” and “use data analysis tools and techniques to identify patterns in data representing complex systems”. I worry that we’ve put so much emphasis on coding that these crucial skills, and the consequent understanding gets overlooked in too many schools. It needn’t be this way. Indeed there’s plenty of scope for doing this data visualisation and analysis with code.

I’ve been thinking recently about how we can take the foundations / application / implications (that’s roughly computer science, IT and critical digital literacy) model that underpins the English computing curriculum and apply it to related (and some unrelated) subjects, to help promote a broader and more balanced approach to curriculum design. We can use this model for thinking about data science in schools.

If we’re serious about pupils’ learning data science, then I think we need to lay the foundations with some old school probability and statistics: typically these are already part of the math curriculum, but there’s so much more we can do here when we let our pupils use computers for this, from simulating dice rolls, through plotting graphs to calculating summary statistics for some big datasets. All these things can be done by hand (‘unplugged’?), but once pupils have an idea of the techniques, they can concentrate on selecting and using the right tools, and making sense of the results if they use technology to automate the automatable parts of the process – it’s far more interesting and useful to be able to make sense of a scatterplot (for example) than to be able to draw one by hand.

I’d also want pupils to apply this knowledge to some interesting problems. In elementary school, I’d look at opinion polls or other surveys as a way in to this, perhaps getting pupils to work collaboratively at coming up with good questions – agree / disagree Likert scales are a good starting point, and then exploring what they can learn by slicing the data they collect: is there any difference between boys’ and girls’ enjoyment of school subjects in elementary school (and is there any difference in high school…)? Later on, I’d start looking at time series: weather data is great for this. In the UK we’ve open access month on month meteorological data going back over 100 years, and a comparison of temperatures for the last 30 with the previous 70+ makes a persuasive case. Later still, I’d get pupils looking for patterns and relationships in big (or biggish) datasets: sports fans might like to play with accelerometer or GPS data from micro:bits, wearables or phones: can they work out what sport someone was playing from the datafiles (or a visualisation of them)? Could a machine do this? Big, public, anonymised datasets could be linked very powerfully to some social studies topics: what are the links between gender, ethnicity, education and income? Or pupils could learn about text mining techniques and apply these to their study of English: are there quantifiable differences between the vocabulary and grammar of Hemingway and Morrison? Or between Obama and Trump?

Even more importantly, I’d like pupils to think through some of the implications of collecting and using data as freely as we do. Coming back to my elementary school survey idea: what questions shouldn’t we ask one another? What questions shouldn’t we answer? Does it matter if your name is attached to the answers? In one day at school, how much data does a pupil generate (attendance, grades, cafeteria, accessing the internet, CCTV, online learning, behaviour management, etc…)? What happens to all this data? What could you discover about a pupil if this was all linked together? Does anyone mind? How much do internet service providers, search engines and email services know about a user? What do they use this for? Again, does anyone mind? If big tech firms provide the wonderful services they do for free, how have they got to be some of the most valuable companies in the world? The English computing curriculum includes teaching pupils ‘new ways to protect their online identity and privacy’ – what should we include here?

Some of this certainly should be part of what our pupils learn in their school computing lessons, but lots of it provides ample opportunity for cross curricular links, with math, social studies, civics and even sports! I think we as CS teachers gain so much through showing how relevant coding can be to the other things our pupils study.

Originally published in CSTA’s The Advocate blog

« We need a GCSE in computing Time for change? »