Datatel Challenge

More information on the dataTEL challenge of RecSysTEL will be provided here soon.

In the world of recommender systems, it is a common practice to use public available datasets from different application environments (e.g. MovieLens, Book-Crossing, or EachMovie) in order to evaluate recommendation algorithms. These datasets are used as benchmarks to develop new recommendation algorithms and compare them to other algorithms in given settings.

In such data sets, a representation of implicit or explicit feedback from the users regarding the candidate items is stored, in order to allow the recommender system to produce a recommendation. This feedback can be in several forms. For example, in the case of collaborative filtering systems it can be ratings or votes (i.e. if an item has been viewed or bookmarked). In the case of content-based recommenders, it can be product reviews or simple tags (keywords) that users provide for items. Additional information is also required such a unique way to identify who provides this feedback (user ID) and upon which item (item ID). The user-rating matrix used in collaborative filtering systems is a well-known example.

Although recommender systems are increasingly applied in Technology Enhanced Learning (TEL), it is still an application area that lacks such publicly available and interoperable data sets. So although there is a lot of research conducted on recommender systems in TEL, they lack data sets that would allow the experimental evaluation of the performance of different recommendation algorithms using comparable, interoperable, and reusable data sets. This leads to awkward experimentation and testing such as using data sets from movies in order to evaluate educational recommendation algorithms.

To this end, the EATEL SIG on on Data-driven Research and Learning Analytics has launched the second dataTEL Challenge. The first edition was successfully organized by the dataTEL Theme Team of the STELLAR Network of Excellence in RecSysTEL 2010 in Barcelona. Related to this, the 1st workshop on Data Sets for Technology Enhanced Learning was also organized in 2011 at the 2nd STELLAR Alpine Rendez-Vous in La Clusaz, France.

In this call for TEL datasets we focus on datasets description and invites research groups to submit existing described datasets from TEL applications that can be used as input for TEL recommender systems. The collected data sets are expected to facilitate the discussion of the following 5 core questions:

  1. How can data sets be shared according to privacy and legal protection rules?
  2. How to development a respective policy to use and share data sets?
  3. How to pre-process data sets to make them suitable for other researchers?
  4. How to define common evaluation criteria for TEL recommender systems?
  5. How to develop overview methods to monitor the performance of TEL recommender systems on data sets?