Science & Enterprise subscription

Follow us on Twitter

  • As mobile phones, smart watches, and tablets play an increasingly key role in our daily lives, a new report shows m… https://t.co/HNOl3qa4oI
    about 2 days ago
  • New post on Science and Enterprise: Infographic – Mobile Industry’s Economic Role https://t.co/aMBZtn0gkM #Science #Business
    about 2 days ago
  • FDA expert panel gives thumbs-up to Aimmune’s peanut allergy drug https://t.co/srCucGLuB5
    about 2 days ago
  • A developer of a drug for migraine formulated as a dry powder and given as a nasal spray is raising $82.5 million i… https://t.co/gn4NSNTmBR
    about 2 days ago
  • New post on Science and Enterprise: Migraine Drug Developer Raises $82.5M in IPO https://t.co/AKsNP8FpOT #Science #Business
    about 2 days ago

Please share Science & Enterprise

Yahoo Releases User Interaction Data for Machine Learning

Earth and server

(Suresh Subbaiah, Wikimedia Commons)

14 January 2016. The online company Yahoo is releasing an extensive data set of individual user interactions with some of its popular services to the academic community as raw material for studies of machine learning. The de-identified data sets will be part of Yahoo’s Webscope reference library offered to academic researchers.

The data sets cover individual interactions with Yahoo’s news, sports, finance, movies, and real estate sections, as well as its home page. The collection, says the company, has some 110 billion items accessed by with 20 million users, from February to May 2015. The entire uncompressed file is estimated to be 13.5 terabytes of data.

Items in the data set are identified by their titles, summary, and key phrases. Data on individuals accessing those items give their gender, age range, and generalized geographic location. Interactions with the items show the user’s local date and time, and some data about the device employed.

Yahoo’s Webscope program provides data sets for academic researchers and students covering computer systems, languages, images, graph and social data, ratings and classification data, as well as competition, advertising, and marketing data. The Webscope databases are part of Yahoo Labs, doing research in a range of fields related to the company’s business and services including, advertising, computer science, information and knowledge management, human-computer interactions, and machine learning.

In a statement, the company cites computer scientists planning on using the data sets. Gert Lanckriet at University of California in San Diego says “Access to data sets of this size is essential to design and develop machine learning algorithms and technology that scales to truly ‘big’ data.”

Tom Mitchell at Carnegie Mellon University adds, “Academic researchers everywhere will finally have access to realistic scale data to study how to automatically discover which news articles are of interest to which users, and will be able to compare their methods using this as a shared test case.”

Read more:

*     *     *

Please share Science & Enterprise ...
error

Comments are closed.