Datatel Challenge

In the world of recommender systems, it is a common practice to use public available datasets from different application environments (e.g. MovieLens, Book-Crossing, or EachMovie) in order to evaluate recommendation algorithms. These datasets are used as benchmarks to develop new recommendation algorithms and compare them to other algorithms in given settings.

In such data sets, a representation of implicit or explicit feedback from the users regarding the candidate items is stored, in order to allow the recommender system to produce a recommendation. This feedback can be in several forms. For example, in the case of collaborative filtering systems it can be ratings or votes (i.e. if an item has been viewed or bookmarked). In the case of content-based recommenders, it can be product reviews or simple tags (keywords) that users provide for items. Additional information is also required such a unique way to identify who provides this feedback (user ID) and upon which item (item ID). The user-rating matrix used in collaborative filtering systems is a well-known example.

Although recommender systems are increasingly applied in Technology Enhanced Learning (TEL), it is still an application area that lacks such publicly available and interoperable data sets. So although there is a lot of research conducted on recommender systems in TEL, they lack data sets that would allow the experimental evaluation of the performance of different recommendation algorithms using comparable, interoperable, and reusable data sets. This leads to awkward experimentation and testing such as using data sets from movies in order to evaluate educational recommendation algorithms.

To this end, the dataTEL Theme Team of the STELLAR Network of Excellence (http://www.teleurope.eu/pg/groups/9405/datatel/) has launched the first dataTEL Challenge: a call for TEL datasets that invites research groups to submit existing datasets from TEL applications that can be used as input for TEL recommender systems. The collected data sets are expected to facilitate the discussion of the following 5 core questions:

  1. How can data sets be shared according to privacy and legal protection rules?
  2. How to development a respective policy to use and share data sets?
  3. How to pre-process data sets to make them suitable for other researchers?
  4. How to define common evaluation criteria for TEL recommender systems?
  5. How to develop overview methods to monitor the performance of TEL recommender systems on data sets?

A special dataTEL Cafe event will take place during the RecSysTEL Workshop 2010 in Barcelona, to discuss about the submitted data sets, discuss these questions, and facilitate data set sharing in this community. A best dataTEL award will also be given to a TEL data set that will be selected from a specially appointed Scientific Committee.

For the new submission procedure we explicitly request data sets and their descriptions. A new submission form has been provided. In case a data set cannot be submitted for any reason only a description of the data sets is also welcome. The description of the data set has been extended to 4 pages.

At the dataTEL challenge at ECTEL 2010 we will shortly present the collected data sets and discuss what is needed to get them shared within the scientific community. All submitted data set descriptions will be compiled into an extended TEL data set paper that will be part of the RecSysTEL workshop special issue.

The submission form should be submitted through the EasyChair submission system: http://www.easychair.org/conferences/?conf=datatel2010.

For more details, download the CfP.