Kommentar |
By completing the the course, participants will obtain the knowledge and skills to solve a wide range of applied problems in Natural Language Processing. To achieve this goal, the participants will get to know successful methods for solving sub-problems, such as text representation, information extraction, text mining, language modeling, and similarity detection. The participants will understand the conceptual requirements of specific NLP tasks and be able to devise approaches to address these tasks in practice. The participants will be able to assess the strengths and limitations of state-of-the-art NLP approaches and to propose solutions for interdisciplinary NLP problems.
The lecture will cover the following topics:
- Introduction
Course structure, schedule, projects, requirements, specifics Course topics, motivation Overview of the field
- Text representation
Words, sentences, paragraphs, documents Text processing, regular expressions, tokenization, stemming, lemmatization Bag-of-Words, weighting schemes (e.g., tf-idf), information retrieval Minimum edit distance Language models, N-grams, perplexity, information gain, smoothing Word sense, lexical databases, distance measures
- Word embeddings and dense vector representations
Vector representation Recap on NLP representations before 2013 word2vec, GloVe, fastText Paragraph-Vectors Multi-Sense Embeddings ELMo, USE
- Applications
Lexical databases, lexical semantics Word sense disambiguation, semantic similarity Part-of-speech tagging, parsing Word similarity, word dissimilarity, distance measures Text classification Sentiment analysis / evaluation Named entity recognition, information extraction, relation extraction Questioning and answering, chatbots, dialog systems Text summarization Machine translation Fake news detection Plagiarism / paraphrase detection Math retrieval, MathML Automatic detection of political opinions Online harassment detection Collaboration network analysis
Participants (teamwork is possible) will carry out an applied research project that addresses complex NLP downstream tasks and subtasks, such as:
- Word similarity
- Document and Sentence classification
- Named entity recognition
- Question and answering system
- Text summarization
- Objective and subjective classification
- Sentiment analysis
- Part-of-speech tagging
- Compositional knowledge entailment (entailment, contradiction, neutral)
- Relation extraction and parsing
- Machine translation
- ...
Applications that participants can address in their projects include but are not limited to:
- Plagiarism and paraphrase detection
- Social media analysis
- Fake news identification and classification
- Spell checking
- Detection of political opinions
- Identification of opinion polarity
- Online harassment and bias identification systems
- Collaboration network analysis
|
Voraussetzungen |
The course is in English. Basic knowledge of Python (e.g., branches, loops, object orientation) is required to complete the course. Experience with numpy, sckit-learn, pandas, and other libraries in the SciPy ecosystem is beneficial but not mandatory. For participants who are unfamiliar with Python, a fast-paced introduction into the essentials of the language will be provided.
|