Parts or all of the course will be given as video conferences.
To receive the organizational details, please register for the course in Moodle by April 21!
Lecture: Wednesday 12:15-13:45
Exercise: Wednesday 10:15-11:45
Course Content Description
Retrieving information from large collections, especially the World Wide Web, has become an integral part of information systems for personal and business use cases alike.
The need to organize vast amounts of information for effective retrieval long predates the World Wide Web and even computers. Traditional libraries were the birthplace for many techniques to effectively organize and retrieve information. The introduction of computers redefined our methods of storing, accessing and searching for information and gave rise to a new research field – Information Retrieval. The diversity of information on the World Wide Web introduced new retrieval tasks, which triggered the advancement of traditional and the creation of new information retrieval technologies.
This course introduces core concepts and technologies of both traditional information retrieval as well as information retrieval on the Web. By completing the course, the participants will get to know the predominant information retrieval tasks, e.g., Web search and recommendation. The participants will understand the conceptual requirements of specific retrieval tasks and be able to devise retrieval approaches consisting of suitable data structures and algorithms to address these tasks. The participants will be able to critically evaluate the strengths and weaknesses of retrieval approaches and to prototypically implement suitable retrieval approaches to solve complex practical information retrieval problems.
The course provides a good foundation for a bachelor's or master's thesis in our group.
Visit https://mt.uni-wuppertal.de/en/teaching/bachelors-and-masters-projects.html for our current theses proposals.
The lecture will cover the following topics:
- Basics: Background, Documents, Terms, Vocabulary, Inverted Index
- Boolean Retrieval, Positional Retrieval, Tolerant Retrieval
- Efficient Index Construction, Index Compression
- Term Weighting, Relevance Scoring, Ranked Retrieval
- Semantic Text Analysis, Link Analysis
- Complete Retrieval Systems
- Results Visualization and Exploration
- Evaluation of Retrieval Systems
The exercise sessions will focus on individual programming projects (teamwork is possible) that will address complex information retrieval tasks.
Using the programming language Python and presenting the intermediate and final results of the projects is mandatory.
- An Introduction to Information Retrieval. (free online edition: http://www-nlp.stanford.edu/IR-book/).
C. D. Manning, P. Raghavan and H. Schütze.
Cambridge University Press, Cambridge, England 2009.
- Web Information Retrieval.
S. Ceri, A. Bozzon, M. Brambilla, E. Della Valle, P. Fraternali and S. Quarteroni.
Springer, 2013. ISBN 3642393136.
- Modern Information Retrieval: The Concepts and Technology Behind Search.
B. Ribeiro-Neto and R. Baeza-Yates.
Pearson Education, Ltd., Harlow, England, Addison-Wesley, 2nd edition, 2011. ISBN 9780321416919.
- Knowledge of at least one object-oriented programming language, preferably Python, is required.
- Python is used as part of the exercise sessions. For participants who are unfamiliar with Python a fast-paced introduction into the essentials of the language will be provided.
- Successful completion of programming projects (includes teaser, intermediate, and final presentation)
- Final exam (oral in case of 1-25 participants, written in case of 26 or more participants)
- The final exam can include up to 50% of its questions about the programming project.