Big Data in Education – First episode

By July 2, 2016English

Like in many other fields like healthcare, retail, telecommunications and natural science, Big Data and Analytics have become a new hype in Education and Learning under the umbrella name of “Learning Analytics”. As technology becomes ubiquitous and more accessible, as most of  the learning time is now spent on Massive Online Open Courses, vast quantities of data are continuously generated and stored in IT systems. These data offer unprecedented opportunities for researchers to analyse and understand several different aspects of learning and education. This data-driven approach is shaking the traditional paradigms of educational research:

“In the traditional model of evidence-gathering and interpretation in education, researchers are independent observers, who pre-emptively create instruments of measurement, and insert these into the educational process in specialized times and places (a pre-test or post- test, a survey, an interview, a focus group). The ‘big data’ approach is to collect data through practice-integrated research. If a record is kept of everything that happens, then it is possible to analyze what happened, ex-post facto. Data collection is embedded. It is on-the-fly and ever-present” (Cope & Kalantzis, 2015)

The real strength of Learning Analytics is however not only limited to the embedded, unobtrusive and seamless data collection.  The development of machine learning and data mining techniques, as well as big data storage and processing capabilities, has allowed going beyond conventional historical data reporting to “move into an era where we can predict, with reasonable accuracy, everything from future” (ECAR, 2015).

The group of techniques which leverage learning data to generate predictions and recommendations and thus personalisation of the learning activity is usually referred as Predictive Learning Analytics (PLA). The aim of PLA is to empower all educational stakeholders like students, teachers or school heads with the necessary intelligence to take informed actions and address the teaching and learning towards desired outcomes. PLA is, however, a fairly new research field. The existing research has been focusing primarily on Higher Education problems such as predicting student’s academic success  to improve student retention and reduce early drop outs.

Despite its great potentials, PLA is a fairly new research field. The related existing research has primarily focused on Higher Education problems, above all predicting student’s academic success to improve student retention and reduce early drop outs. The existing PLA applications’ data sources are generally limited to the Learning Management System (LMS) data and the Student Information Systems (SIS) data (i.e. grades, demographics, participation).  These data are often aggregated, decontextualised and poorly descriptive and this can lead towards highly biassed predictions and interpretations.

In addition, a common practice is to take final course grades to measure “student success“. One can argue that looking only at the results of the summative assessment can give a biassed representation of the commitment and the effort that learner is putting in her studies, as final grades, for example, don’t take into account the learner’s progress.

Therefore, in spite of the strengths that the data-driven approach is bringing, it is yet unclear how PLA should work in a bigger picture. There is, in fact, a lack of a systematic model which helps answering two relevant questions: 

  1. Which aspects of the learning process need to be taken into account when designing predictive models about learning?
  2. Which success indicators should be taken into account besides the course grades?

We will try to answer those two questions in the next episodes of “Big Data in Education” on



Bill Cope & Mary Kalantzis (2015) Interpreting Evidence-of-Learning: Educational research in the era of big data, Open Review of Educational Research, 2:1, 218-239,

ECAR-ANALYTICS Working Group. The Predictive Learning Analytics Revolution: Leveraging Learning Data for Student Success. ECAR working group paper. Louisville, CO: ECAR, October 7, 2015.

Author Daniele Di Mitri

I am 26-years-old Learning Technologist. Born in Bari (Italy), now living in the Netherlands where I moved to study a master Artificial Intelligence and Learning Analytics. I am currently second year PhD candidate at TELI research centre of the Open Universiteit. My expertise is in Machine Learning techniques on learning (big) data. My extended bio at

More posts by Daniele Di Mitri