Difference between revisions of "Extract events from teletext"

From Master Projects
Jump to: navigation, search
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
{{Projectproposal
 
{{Projectproposal
 
|Contact person=Valentina Maccatrozzo
 
|Contact person=Valentina Maccatrozzo
|Project page=http://www.vista-tv.eu/
+
|Master areas=Information Sciences, Computer Science and Communication, Information and Communication Technology, Multimedia, AI and Communication, Internet and Web Technology, Technical Artificial Intelligence, Computational Intelligence and Selforganisation, Knowledge Technology and Intelligent Internet Applications
|Fulfilled=No
+
|Project page=http://vista-tv.eu/
 +
|Fulfilled=Yes
 
}}
 
}}
Teletext services are either disappearing or moving to the web. Those that are moving to the web, update their content, making it clickable and navigable, i.e. standard html content. Besides the standard program guide information, they expose also news. This project has the objective of extracting events from teletext content and matching those news with newspapers articles. Practically speaking you need to build a crawler to extract the content of the teletext website, extract the news from the content and search for those news in other newspapers, for instance using the NYTimes API (http://developer.nytimes.com) or LexisNexis (accessible only from the VU http://academic.lexisnexis.nl/).
+
Teletext services are either disappearing or moving to the web. Those that are moving to the web, update their content, making it clickable and navigable, i.e. standard html content. Besides the standard program guide information, they expose also news. This project has the objective of extracting events from teletext content and matching those with newspapers articles. Practically speaking you need to build a crawler or use another tool (e.g. AlchemyAPI) to extract the content of the teletext website, extract the news from the content and search for those news in other newspapers, for instance using the NYTimes API (http://developer.nytimes.com) or LexisNexis (accessible only from the VU http://academic.lexisnexis.nl/).
 +
 
 +
== Tasks ==
 +
* crawl or extract information from teletext websites
 +
* identify important nouns, e.g. names, locations, etc
 +
* build events with the information extracted
 +
* find other relevant newspaper articles
 +
 
 +
==Tools and Data ==
 +
* NYTimes API
 +
* LexisNexis
 +
* NLP tool, e.g. OpenCalais, AlchemyAPI
 +
 
 +
== Recommended Prior Knowledge ==
 +
* Information Retrieval course
 +
* Web Technologies
 +
* Some experience in building crawlers

Latest revision as of 12:43, 16 December 2013


About Extract events from teletext

  • This project has been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:Information Sciences, Computer Science and Communication, Information and Communication Technology, Multimedia, AI and Communication, Internet and Web Technology, Technical Artificial Intelligence, Computational Intelligence and Selforganisation, Knowledge Technology and Intelligent Internet Applications|, |xXx|project within::xXx|,}}
  • Project website: has projectpage:: http://vista-tv.eu/

Description

Teletext services are either disappearing or moving to the web. Those that are moving to the web, update their content, making it clickable and navigable, i.e. standard html content. Besides the standard program guide information, they expose also news. This project has the objective of extracting events from teletext content and matching those with newspapers articles. Practically speaking you need to build a crawler or use another tool (e.g. AlchemyAPI) to extract the content of the teletext website, extract the news from the content and search for those news in other newspapers, for instance using the NYTimes API (http://developer.nytimes.com) or LexisNexis (accessible only from the VU http://academic.lexisnexis.nl/).

Tasks

  • crawl or extract information from teletext websites
  • identify important nouns, e.g. names, locations, etc
  • build events with the information extracted
  • find other relevant newspaper articles

Tools and Data

  • NYTimes API
  • LexisNexis
  • NLP tool, e.g. OpenCalais, AlchemyAPI

Recommended Prior Knowledge

  • Information Retrieval course
  • Web Technologies
  • Some experience in building crawlers