Big Data in Education – Learning Outcomes

By July 3, 2016English

The first episode in of Big Data in Education introduced the opportunities arising by programmatically collecting and analysing educational data. The second episode detailed the Dimensions of Education Data, the so-called input space of the Big Data in Education. As anticipated before, this session talks about learning outcomes measurement, or namely how to transform learning performance and assessment indicators to take into account when deploying Big Data techniques in Education.

But if we now know where to collect data, why bother about the output at all? The output space is as important as the input as most of the supervised Big Data techniques consists in model or pattern discovering, through which is possible to perform automatic predictions or classifications.

Take for instance the simple function y = f(x) in mathematics, the function f maps the input x into the output y. Model discovering, the core concept of data mining and machine learning consists in, given several x,y pairs, to automatically learn the function (model) f . There it is clear the importance of the output y in order to be able to (machine) learn the models.

But practically speaking what is ‘y’? In the domain of education, that corresponds to a particular educational outcome. That can be either binary variable (yes or no) for example “did student A pass the course C?”  or “did student A dropped-out from course C?” Or can be a numerical variable, as for example “how much did student A score at course C?“. In both cases, in order to apply machine learning to discover useful models from the learning records, the outcome should be quantifiable into a variable.

Formalising educational outcomes into measurable indicators is the trickiest aspect of Learning Analytics”. Transposing peoples’ performance into numbers is difficult even for a human evaluator since there are far too many factors to be taken into account. Most often summative assessment, like the final course grades, are “confused” for a good indication of learning success.  Those tend to miss out several important criteria to evaluate fairly an individual. Take for instance relative progress, transversal competencies, commitment or participation. Final grades are just not taking these qualities into account.

A possible alternative to achieve formative assessment and yet make it measurable are the evaluation rubrics. These stand out for being fair and personalised and yet produce a quantifiable output. However they are rather time consuming: in our knowledge, there is the lack of a scalable approach to using formative evaluation in a scalable way. The concern with the scalability of the assessment practice comes strongly into play when dealing with massive educational settings like MOOCs. Given the high number of learners per course, there is a strong the need of fast and reliable assessment, but at the same time, ensuring fast, scalable formative assessment is a real challenge.

A possible solution.

The trade-off between scalable assessment and personalised evaluation must be tackled with unconventional smart solutions. An example can be the following. Let’s say there is only one evaluator who has also been the instructor in the course. This person knows quite well the group of learners and can point out those who deserve a pass and those who don’t.  Let the evaluator mark a group of successful learners and at the same time another group of not-successful. The marking should not be exhaustive but should a large number of learners.If those labels are available it is possible to calculate retroactively what among all the Attributes of Learning (including final grades), what are those that make a successful 0r an unsuccessful learner. Moreover, a regression model can be employed to predict the future learning performance. The benefit of this approach is that in one hand is scalable on the other hand it preserves a human evaluator, at least in labelling the training dataset which will be learnt by the machine learning algorithms.

The last example is an idea how to produce quantifiable indicators for the educational outcome which can be used as output for future learning analytics applications for predictions. It is certainly not the only one applicable: we will discuss more of the in the next episodes of Big Data in Education. 

Author Daniele Di Mitri

I am 26-years-old Learning Technologist. Born in Bari (Italy), now living in the Netherlands where I moved to study a master Artificial Intelligence and Learning Analytics. I am currently second year PhD candidate at TELI research centre of the Open Universiteit. My expertise is in Machine Learning techniques on learning (big) data. My extended bio at

More posts by Daniele Di Mitri