Call for Shared Task Proposals

Piroska Lendvai¹, Uwe Reichel², Simone Rebora¹ and Moniek Kuijpers¹

Ranking of Social Reading Reviews Based on Richness in Narrative Absorption.

Abstract
Book reviews on social platforms are generated in large quantities by non-specialist avid readers, and contain subjective evaluations pertaining to one’s own reading experience. Social reading reviews often feature an under-researched phenomenon: Narrative Absorption, i.e. the extent to which immersion into the book’s narrative took place during reading. Absorption can be reflected by statements such as ’I was completely hooked’ and pertain to a complexity of dimensions such as attention, emotional engagement, mental imagery, and transportation. Based on a set of user-generated reviews that we manually annotated (cf. Rebora et al. 2020), the detection of reading absorption with NLP approaches has been investigated in e.g. Lendvai, Rebora and Kuijpers (2019), Lendvai et al. (2020).

We work on a pipeline to retrieve and rank absorption-rich user reviews from a large, unlabeled document dump (6+ million reviews in English), in order to allow for the preselection of subsets of the dump that undergo manual annotation. We fine-tuned BERT (Devlin et al., 2018) for a supervised absorption detection task on 16k review sentences absorption-annotated by us (Absorption vs. Nonabsorption), and evaluated it on a held-out dataset of 149 reviews, achieving .75 macro F1 mean (support: 1,011 vs. 3,510 sentences).

Our current focus was to create a model that aggregates sentence level prediction scores on the document level. To this end, BERT’s sentence level absorption probabilities were averaged per review and were used to train a linear regression model on the full corpus to predict Absorption Richness, defined as the proportion of sentences annotated as expressing absorption in a review. Review-level Absorption Richness regression lowers classification error relative to the baseline, defined as the review-level proportion of absorption classifications by taking the argmax of BERT’s logits (Mean Average Errors of .08 vs. .11 and Spearman correlation of .73 vs. .65, respectively).
The increase of the Spearman’s rank correlation coefficient directly expresses that a review ranking by linear regression predictions corresponds more closely to the ground truth ranking than a ranking solely based on BERT. We utilize the regression model in Absorption-Richness-based document filtering, to facilitate the benchmarking and analysis of social reading reviews in our large document dump.

Organization:

¹DH Lab, University of Basel

²Research Institute for Linguistics, Hungarian Academy of Sciences

Dina Wiemann¹, Khan Ozol¹, Natalia Korchagina², Claudio Bonesana³, Anastassia Shaitarova², Fabio Rinaldi³

Named entity recognition for job description mining.

Abstract
In a collaborative project with a major pharma company we explored name entity recognition (NER) strategies applied to job/resume mining tasks. In the project we leveraged advanced NER approaches in order to identify job titles, organization names, and geographical locations which are the essential parts of a job mining task, such as recruiting, tracking job candidates and job recommendation. This process is currently based on the manual analysis of hundreds of CVs, often with no relevance for a specific position or a profile.
Despite the existence of many commercial providers of similar services, there are no publicly available datasets to evaluate the advertised algorithms. The existing pre-trained NER models such as spaCy models, and Stanford NER models were trained on blogs, news and media. Their performance drops significantly when applied on the sentences taken from the resumes, since titles, locations and organization names in a resume are often written in the manner of a heading.
We asked domain experts to manually annotate a reference dataset of free-text job title description extracted from CVs, used it to train a deep-learning model, and compared the results against the reference models mentioned above. We were able to outperform both pre-trained models by a significant margin. Our NER models have been integrated in a prototype system which demonstrates a more dynamic and flexible data analysis compared to baseline commercial solutions.

Organization:
¹Novartis AG, Basel
²University of Zurich
³IDSIA, Dalle Molle Institute for Artificial Intelligence, Lugano

Iuliia Nigmatulina, Tannon Kew, Tanja Samardžić

Swiss German speech-to-text with Kaldi

Abstract

Recent improvements in speech technology enable its increasing use in a range of applications, including chatbots, online speech translation and smart home devices, among others. While speech technology already achieves strong results for standardised languages, for languages without orthography, with high regional variation and limited training resources, such as Swiss German, it remains a considerable challenge. A high degree of dialectal variability combined with a lack of standardisation leads to extremely sparse data that decreases the quality of alignments between the acoustic signal and its labels and, therefore, the final accuracy.

To tackle the challenge of speech-to-text for Swiss German, we built a speech recognition system using an adapted Kaldi toolkit recipe on multi-dialectal speech data from the ArchiMob corpus. The system was separately trained on two types of writing in the target texts: a) an approximate acoustic transcription that provides a close correspondence between labels and the acoustic signal and b) a normalised writing that potentially reduces the lexical variability. We find that the system trained on the normalised transcriptions currently achieves better results in word error rate (40.81% vs. 54.39%) but underperforms the system trained on the acoustic transcriptions on the character level (character error rate) (23.19% vs. 22.19%). We investigate possible improvements of both approaches and present the outcomes.

Organization:
University of Zurich

Anastassia Shaitarova, Lenz Furrer, Fabio Rinaldi

Cross-lingual transfer-learning approach to negation scope resolution.

Detecting instances of negation in text is crucially important for several applications, yet it is often neglected. Several decades of research in automated negation detection have not yet provided a reliable solution, especially in a multilingual context. Negation scope resolution poses particular challenges since identifying the scope of influence of a negation cue in a sentence requires a deeper level of natural
language understanding. Little work has been done on negation scope resolution in languages other than English. Meanwhile, transfer learning is in wide use and large multilingual models are available to the public. This paper explores the feasibility of a cross-lingual transfer-learning approach to negation scope resolution. Preliminary experiments with the Multilingual BERT model and data in English, French, and Spanish show solid results with the highest F1-score 84.73 on zero- shot transfer between English and French.

Organization:
University of Zurich

Roberto Navigli

What’s new in multilingual sense embeddings, Word Sense Disambiguation and Semantic Role Labeling

Abstract
Natural Language Processing has seen an explosion of interest in recent years, with many industrial applications relying on key technological developments in the field. However, Natural Language Understanding (NLU) – which requires the machine to get beyond processing strings and involves a deep, semantic level – is particularly challenging due to the pervasive ambiguity of language.
In this talk I will present recent research at the Sapienza NLP group on multilingual NLU, including work on new multilingual sense embeddings, and novel neural approaches to word sense disambiguation and semantic role labeling which scale across languages easily and achieve state-of-the-art performance thanks to the integration of deep learning and explicit knowledge.

Organization:
Sapienza University of Rome

Anya Belz

How Long is a Piece of String: 13 Years of Comparative Evaluation in Natural Language Generation

Abstract
The field of Natural Language Generation (NLG) was a late adopter of the paradigm of comparative and competitive evaluation, not organising its first shared-task competition until 2007. Since then there has been a steady stream of new shared tasks and evaluation studies. But are we in 2020 where we want to be in terms of our knowledge and practice of evaluating automatically generated language? In this talk, I will present an overview of developments in NLG evaluation over the past 13 years, focusing on what aspects of system quality have been prioritised and what evaluation methods have been used, before picking out some topics that have received less attention, and describing some of the exciting developments currently underway, including a multi-lab reproducibility study of NLG evaluation results, and a survey of 20 years worth of evaluations in NLG.

Organization:
University of Brighton

Author Name

Topic

Abstract
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Organization:
Organization name
Contact:
email@address1.com
email@address1.com

Daniel Niklaus

Become a Compliance Hero with automated text analyses.

Abstract
Since the Bank crisis in 2007/2008, everything has changed for Compliance & Risk management. Once in the business model of tax haven Switzerland and Liechtenstein, a more than proforma role, managing Compliance & Risk have become one of the most critical tasks in banking. From uncovering money laundry networks to handling Politically Exposed Persons (PEP) the right way, Compliance Managers are challenged and often lost in tons of data across the organization.
In this talk, you learn how text analysis in unstructured data helps to build the foundation for a graph network of relations to detect suspicious connections and patterns. We will give you insights into the tools we are using, what challenges we have, and how it has changed the work for compliance managers – and why they are today on the path to becoming the banker heroes.

Organization:
Netlive IT AG, Teufen

Alena Schmickl, Thomas Bögel, Andrea Giovannini and Matthias Reumann.

Combining NLP and domain knowledge induction into knowledge graphs to automate medical coding.

Abstract
Medical coding is a time- and cost-intensive process which constitutes an increasing pain point for hospitals due to the high amount of manual work and the rising shortage of skilled workforce also called coders. A solution that (semi-)automates the processing of cases would improve productivity and mitigate revenue losses due to increased accuracy of coding results.
We are presenting an application that identifies medical concepts in medical documents and maps them to the corresponding ICD-10 and CHOP codes. The solution leverages a knowledge graph that contains information on coded patient cases based on which the identified codes are evaluated regarding their relevance for the determination of a DRG. This is crucial in the context of training NLP models with data for CHOP codes that are country-specific. The application also proposes other codes including the DRG that were coded for similar patient cases based on the graph. Thereby, concepts wrongfully missing in the documentation or missed by the text analysis can be determined. The option to export selected codes in an XML format enables the integration of the application with a hospital information system.
With our work, we aim at reducing the research gap regarding the digitization of the medical coding process by contributing to the work on automated determination of input codes for the DRG Grouper. We will demonstrate the application functionality, the underlying concepts and the structure of the knowledge graph.
Intended Audience: Our presentation addresses everyone with an interest or background in NLP and Knowledge Graphs.

Organization:
IBM

Manuel Sage ¹, Pietro Cruciata², Raed Abdo¹, Jackie Cheung¹ and Fiona Zhao ¹.

Investigating the Influence of Selected Linguistic Features on Authorship Attribution using German News Articles.

Abstract
In this work, we perform authorship attribution on a new dataset of German news articles. We seek to classify over 3,700 articles to their five corresponding authors, using four conventional machine learning approaches (naive Bayes, logistic regression, SVM and kNN) and a convolutional neural network. We analyze the effect of character and word n-grams on the prediction accuracy, as well as the influence of stop words, punctuation, numbers, and lowercasing when preprocessing raw text. The experiments show that higher order character n-grams (n = 5,6) perform better than lower orders and word n-grams slightly outperform those with characters. Combining both in fusion models further improves results up to 92% for SVM. A multilayer convolutional structure allows the CNN to achieve 90.5% accuracy. We found stop words and punctuation to be important features for author identification; removing them leads to a measurable decrease of performance. Finally, we evaluate the topic dependency of the algorithms by gradually replacing named entities, nouns, verbs and eventually all tokens in the dataset according to their POS-tags.

Organization:
¹ McGill University, Montreal
² Polytechnic de Montreal

Matthias Aßenmacher and Christian Heumann.

On the comparability of pre-trained language models.

Abstract
Recent developments in unsupervised representation learning have successfully established the concept of transfer learning in NLP. Instead of simply plugging in static pre-trained representations, end-to-end trainable model architectures are making better use of contextual information through more intelligently designed language modelling objectives. Along with this, larger corpora are used for selfsupervised pre-training of models which are afterwards fine-tuned on supervised tasks.
Advances in parallel computing made it possible to train these models with growing capacities in the same or even in shorter time than previously established models. These developments agglomerate in new state-of-the-art results being revealed in an increasing frequency.
Nevertheless, we show that it is not possible to completely disentangle the contributions of the three driving forces to these improvements.
We provide a concise overview on several large pre-trained language models, which achieved state-of-the-art results on different leaderboards in the last two years, and compare them with respect to their use of new architectures and resources. We clarify where the differences between the models are and attempt to gain some insight into the single contributions of lexical/computational improvements as well as those of architectural changes.
We do not intend to quantify these contributions, but rather see our work as an overview in order to identify potential starting points for benchmark comparisons.

Organization:
Ludwig-Maximilians-Universität München

Paula Reichenberg ¹, Artūrs Vasiļevskis ² and Manuel Herranz ³

The “Multilingual Anonymisation Toolkit for Public Administrations” (MAPA) Project.

Abstract
The European Union’s new ‘Open Data Directive’ aims to stimulate the publishing and sharing of dynamic data by public administrations, thus furthering the development of language technologies, NLP research and translation. However, such data sharing is only possible subject to compliance with the General Data Protection Regulation (GDPR).
For this reason, the European Commission has commissioned the development of a multilingual anonymization toolkit for public administrations.
Pangeanic and Tilde, together with CNRS (www.cnrs.fr), ELDA (www.elra.info/en), the University of Malta (www.um.edu.mt), Vicomtech (www.vicomtech.org) and SEAD (Spanish Agency for Digital Advancement) have been awarded EU funds to develop such open-source toolkit for all EU languages, able to detect and de-identify personal data (name, addresses, emails, credit card and bank accounts, etc.). The anonymisation toolkit is based on Named-Entity Recognition (NER) techniques, using neural networks approaches. Pre-trained models such as BERT (Delvin et al., 2018) and preprocessing of text using regular expressions are included.The toolkit will provide support to EU public administrations in complying with GDPR requirements, in particular in the health and legal fields.
In this short presentation, Manuel Herranz, CEO of Pangeanic, and Artūrs Vasiļevskis, Head of Machine Translation Solutions at Tilde, will discuss the challenges of the MAPA project, their strategy, the results reached so far and the perspective it opens for public administrations and the industry.

Organization:
¹ Hieronymus AG, Zurich
² SIA Tilde
³ Pangea MT, València

Jingyuan Feng¹, Özge Sevgili², Steffen Remus², Eugen Ruppert², Chris Biemann²

Supervised Pun Detection and Location with Feature Engineering and Logistic Regression.

Abstract
Puns, by exploiting ambiguities, are commonly used in literature to achieve a humorous or rhetorical effect. Previous approaches mainly focus on machine learning models or rule-based methods, however, they have not addressed how and why a pun is detected or located. Focusing on this, this paper proposes a system for recognizing and locating English puns and generate a solution accordingly. Regarding the fact of limited training data and the aim of measuring how relevant a predictor is and its direction of the association, we form a dataset and use logistic regression and explores several different techniques as features to see the influence. Our system achieves better results than state-of-the-art systems by 3% to 10% for different types of puns and subtasks respectively.

Organization:
¹Technische Universität Hamburg
¹Universität Hamburg

Holger Keibel¹, Elisabeth Maier¹, Tobias Christen²

Showcase: Language analytics and semantic search for unknown document varieties.

Abstract
HIBU is a proprietary solution platform based on which Karakun (Basel) builds customer solutions around Enterprise Search and Text Analytics. In this talk, we present a solution by DSwiss (Zürich): high-security digital safes which allow users to store, exchange, but also search any type of documents and other security-relevant data.
The focus will be on the text analytics aspects of the solution developed with HIBU. Since the uploaded data can contain any sort of content, the solution supports users to organize their data in two ways: by a hierarchical folder structure and by means of facets (search filters). Some of the default facets are derived from structured metadata as file format or date, while others are populated dynamically by semantic taggers and classifiers as e.g. semantic document type, persons, locations mentioned in the document. Especially these filters have proven very useful to support document and data retrieval.
We touch on the challenges of analyzing and indexing documents in a highly secure, multiple-encrypted environment and will then discuss joint ongoing work to support the individual needs of users even better: (1) use state-of-the-art neural network architectures to classify and extract more types of information from documents to provide a broader range of filters; (2) personalize the trained models that create the search filters; and (3) add a workflow engine with text-based triggers (e.g. proposing a specific folder when uploading a document).

Organization:
¹Karakun AG, Basel
²DSwiss AG, Zurich

Péter Jeszenszky¹, Burcu Demiray², Carina Steiner¹, Adrian Leemann¹

Towards a regionally representative and socio-demographically diverse resource of Swiss German.

Abstract
When it comes to representing its vast regional diversity, Swiss German is under-resourced for text-to-speech and speech-to-text tasks. Our database enriches existing resources by representing low resource regional varieties and by matching dialect variation to diverse socio-demographic backgrounds.
We plan to compile a database based on two projects. The SDATS [1] (Swiss German Dialects Across Time and Space) project, focusing on language variation and change, collects about 2000 hours of records from 125 survey sites (8 speakers/locality). Local dialects of the respondents, women and men of two age groups, with different professional backgrounds, are recorded. The structured interviews involve prompting certain words and phrases, reading a text previously translated from Standard German to their local dialect, semi-structured speech and spontaneous general interaction with the interviewer. The records come with rich background information (mobility, social networks, personality, attitude etc.), which helps estimate the likelihood of people with a certain background using a certain dialectal variety.
EAR [2] data contains non-intrusive records of spontaneous speech from healthy older individuals, mainly including everyday interactions in Swiss German. We invite EAR participants for SDATS interviews, making possible to match linguistic variables across the spontaneous EAR records and the structured and spontaneous parts of the SDATS interview with the same person.
We plan the automated phonetic transcription of the data and aligning the results to Standard German. At the conference, we plan to present the roadmap of data collection, cleaning, matching and analysis, our first results along with sound samples and future uses of the database.
[1] www.sdats.ch
[2] Luo, M., Schneider, G., Martin, M., & Demiray, B. (2019). Cognitive Aging Effects on Language Use in Real-Life Contexts: A Naturalistic Observation Study.

Organization:
¹Center for the Study of Language and Society, University of Bern
²Department of Psychology, University of Zurich

Albert Weichselbraun, Christian Hauser, Sandro Hörler, Anina Havelka

Deep learning and visual tools for analyzing and monitoring integrity risks.

Abstract
Risks jeopardizing the integrity of an organization are widespread. According to a 2018 study by PricewaterhouseCoopers, almost 40% of Swiss companies haven been affected by illegal and unethical behavior, such as embezzlement, cybercrime, corruption, fraud, money laundering and anti-competitive agreements. Although the number of cases within Switzerland is relatively low, the financial impact of these incidents is still above the global average.
The University of Applied Science of the Grisons conducts research that applies web intelligence and deep learning to the task of supporting Swiss companies in identifying and mitigating integrity risks. Historical data is used for training an LSTM classifier to recognize national and international media coverage on corruption. Afterwards, we apply transfer learning techniques to the task of adapting the classifier to a wide range of integrity topics such as human rights, labor conditions and sustainability.
The adapted classifier assigns scores to News articles that indicate their relevance to the topic of integrity. Sophisticated visual tools use the annotated documents for (i) tracking and visualizing past integrity management gaps and their respective impacts, (ii) identifying whether organizations have been mentioned positively or negatively in these events, and (iii) leveraging media coverage on upcoming integrity stories for predicting and discovering existing blind spots within a company’s governance.

Organization:
University of Applied Sciences of the Grisons

Manuela Weibel, Muriel Peter

Compiling a Large Swiss German Dialect Corpus

Abstract
The Swiss German Dialect Corpus (Schweizer Mundartkorpus CHMK) is an initiative launched by the Swiss German dictionary Schweizerisches Idiotikon. It is an unbalanced, opportunistic corpus and the largest dialect corpus for Swiss German to date. The corpus will be accessible through a query engine and, in part, as an open-source XML corpus. In this paper we provide an overview of the concept, workflow, and challenges of compiling a corpus for a non-standard linguistic variety.

Organization:
Schweizerisches Idiotikon, Zurich

Branden Chan¹, Stefan Schweter², Timo Möller¹

Exploring German BERT model pre-training from scratch.

Abstract
In this work we provide interesting insights into BERT model pre-training from scratch for German. We experiment with different corpora and subword masking techniques.
The two current available BERT models for German (from Deepset and DBMDZ) were trained on similar amounts of data (16GB). With the availability of larger corpora such as the OSCAR corpus, that has an uncompressed size of 145GB for German and the recently introduced whole word masking technique that is used in the preprocessing step, we try to answer three research questions: a) does training BERT models with larger corpora significantly improve performance on downstream tasks, b) does using the whole word masking technique improve or harm performance and c) is language model pre-training loss a reliable predictor of the downstream performance.
To answer these questions, we perform an extensive evaluation of our models over the course of pre-training on various downstream tasks like GermEval 2018 (Fine and Coarse) or GermEval 2014 All trained models will be publicly available for the research community.
Branden Chan is a Stanford graduate in computational linguistics. He now works for deepset.ai as a machine learning engineer bringing the latest NLP techniques to the industry. He is part of the team that open sourced German BERT and a regular contributor to the transfer learning framework FARM. Currently he is experimenting with German language model pre-training with a range of different architectures.
The intended audience ranges from researchers to developers. Researchers might be interested in our detailed evaluation. Developers might be interested in the integration of our models into the Hugging Face Transformers library and Deepset’s FARM.

Organization:
¹deepset, Berlin
²Bayerische Staatsbibliothek München, Digital Library/Munich Digitization Center

Annelen Brunner¹, Ngoc Duyen Tanja Tu¹, Lukas Weimer³, Fotis Jannidis³

To BERT or not to BERT — Comparing contextual embeddings in a deep learning architecture for the automatic recognition of four types of speech, thought and writing representation.

Abstract
We present recognizers for four very different types of speech, thought and writing representation (STWR) for German texts. The implementation is based on deep learning with two different contextual embeddings, namely FLAIR embeddings and BERT embeddings. This paper gives an evaluation of our recognizers with a particular focus on the differences in performance we observed between those two embeddings. FLAIR performed best for direct STWR (F1=0.85), BERT for indirect (F1=0.76), reported (F1=0.60) and free indirect (F1=0.59) STWR. Our best recognizers and customized embeddings will be made freely available.

Organization:
¹Institute for the German Language, Leibniz
²Julius-Maximilans-Universität Würzburg

Manuela Hürlimann¹, Malgorzata Anna Ulasik¹, Philippe Schläpfer¹, Fernando Benites de Azevedo E Souza¹, Katsiaryna Mlynchyk¹, Pius von Däniken¹, Flurin Gishamer², Lina Scarborough², Olesya Ogorodnikova³, Tracey Etheridge³, Nitin Kumar, Badrudin Stanicki³, Mark Cieliebak

Speech-to-Text Insights

Abstract
Generating high quality transcripts from spoken dialogues (e.g. meetings or interviews) is not a trivial task. Many different Automatic Speech Recognition (ASR) engines exist, both commercial and open source. Two key tasks need to be solved: partitioning the speech according to the different speakers (Diarization), and recognizing what is being said (Speech-to-Text).
The quality of the resulting transcript and its usability are influenced by many different factors. In this talk we are going to present multiple insights and techniques which can improve the output quality of ASR.
We will address topics such as:
– the recording setting, e.g. which microphone setup is going to give the best results?
– audio preprocessing steps, e.g. how can we remove noise and enhance the audio for better output quality?
– error analysis, e.g. what are typical errors? How can we measure only semantically meaningful errors?
– post-processing, e.g. how can we make transcribed spontaneous speech more legible?
– confidence scoring, e.g. how can we create more reliable confidence scores for the STT and diarization output?
The main goal of our contribution is to present best-practice approaches which can improve both the diarization as well as the transcription quality. Our insights are based on extensive research and experiments, including an evaluation of 10 STT engines and error analysis of more than 70 hours of transcribed speech in German and English.

Organization:
¹ZHAW Zurich University of Applied Sciences
²SpinningBytes, Winterthur
³Propulsion Academy, Zurich

Daniele Puccinelli¹, Sandra Mitrovic¹, Denis Broggini¹, Giancarlo Corti¹, Luca Chiarabini¹, Riccardo Mazza¹, Fabio Rinaldi¹, Andrea Laus²

Enabling conversational-based leadership training through advanced natural language understanding.

Abstract
SkillGym (www.skillgym.com) is a computer-based training system that enables in-role and prospective leaders to develop their communication skills by presenting them with realistic simulations of workplace situations. SkillGym walks the end user through a sequence of videos related to a specific management situation by showing a rich set of alternatives as text boxes. SkillGym also provides extensive feedback, which enables users to review a conversation step by step, and learn the implications of their behavior at each step.
Feedback from SkillGym users praises its engaging training environment. To make simulations even more realistic, our goal is to move from the existing point-and-click interface to a voice-based interface. Achieving this goal requires cutting-edge natural language understanding to interpret the user input in the context of the ongoing flow of the simulated interaction. Our proposed solution is to carry out feature extraction based on the output of a commodity speech-to-text engine so that a dialog state tracker can select the next video based on the user input. Notably, the user must be guided through textual hints to ensure that she provides input that is coherent with the training goals of SkillGym. Moreover, the dialog state tracker must handle all situations where the user input is not aligned with the training goals (e.g. off-topic comments, disambiguation).

Organization:
¹University of Applied Sciences of Southern Switzerland
²Lifelike SA, Chiasso

Matthias Sommer

Twitter Sentiment Analysis in MATLAB.

Abstract
Social media is an integral part of today’s digital life. From reputation management to asset selection model, information from social media can significantly enhance data analytics. In this demo, we will use text analytics in MATLAB to perform sentiment analysis using machine learning and deep learning algorithms, build a web application to use the model in real-time and share with others. The demo highlights:
• Accessing twitter API from MATLAB to gather tweets containing specific ticker symbols,
• Preprocess texts to remove unnecessary information,
• Build sentiment analysis models using machine and deep learning algorithms,
• Test the predictive capability of sentiment scores by comparing with stock prices, and
• Build and share a web app for real-time sentiment analysis.
Intended audience:
• Researchers and developers analyzing customer surveys or reviews on social media to identify pain points and gaps in product/process design
• Financial analysts building asset selection models and predicting financial market trends.

Organization:
MathWorks, Bern

Mascha Kurpicz-Briki

Cultural Differences in Bias? Origin and Gender Bias in Pre-Trained German and French Word Embeddings.

Abstract
Smart applications often rely on training data in form of text. If there is a bias in that training data, the decision of the applications might not be fair.
Common training data has been shown to be biased towards different groups of minorities. However, there is no generic algorithm to determine the fairness of training data.
One existing approach is to measure gender bias using word embeddings. Most research in this field has been dedicated to the English language. In this work, we identified that there is a bias towards gender and origin in both German and French word embeddings. In particular, we found that real-world bias and stereotypes from the 18th century are still included in today’s word embeddings. Furthermore, we show that the gender bias in German has a different form from English and there is indication that bias has cultural differences that need to be considered when analyzing texts and word embeddings in different languages.

Organization:
Berner Fachhochschule, Technik und Informatik

Karolina Zaczynska, Nils Feldhus, Robert Schwarzenberg, Aleksandra Gabryszak, Sebastian Möller

Evaluating German Transformer Language Models with Syntactic Agreement Tests.

Abstract
Pre-trained transformer language models (TLMs) have recently refashioned natural language processing (NLP): Most state-of-the-art NLP models now operate on top of TLMs to benefit from contextualization and knowledge induction. To explain their success, the scientific community conducted numerous analyses. Besides other methods, syntactic agreement tests were utilized to analyse TLMs. Most of the studies were conducted in the English language domain, however. In this work, we analyse TLMs in the German language domain. To this end, we design numerous agreement tasks, some of which consider peculiarities of the German language. Our experimental results show that state-of-the-art German TLMs generally perform well on agreement tasks, but we also identify and discuss syntactic structures that push them to their limits.

Organization:
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Kaiserslautern, Germany

Joseph Cornelius¹, Tilia Ellendorff¹, Nico Colic¹, Lenz Furrer¹, Albert Weichselbraun¹, Raul Rodriguez-Esteban², Philipp Kuntschik¹, Mathias Leddin², Juergen Gottowik², Fabio Rinaldi¹

MedMon: multilingual social media mining for disease monitoring.

Abstract
The MedMon project (“Monitoring of internet resources for pharmaceutical research and development”) is a collaborative InnoSuisse project between the University of Zurich, University of Applied Science of the Grisons, and Roche. The project aims to monitor different social platforms on the internet (e.g., Twitter, Reddit, and Medical Forums) to assess patients’ perception of their specific disease burden and to discover unmet medical needs.
Bringing together different sources of micro-posts for disease monitoring has the advantage of ensuring a complete picture by integrating information from all source-types. However, all monitored source-types are inherently different, each posing their own challenges for computational processing.
We discuss specific characteristics, advantages and disadvantages of each source-type in the context of automatic medical monitoring. Using the sub-task of personal health mention recognition as an example, we showcase how we addressed these challenges in practise.
Our results give further insights on how to optimally benefit from these multilingual resources and how to integrate them into an efficient model which can be applied in the context of different disease patterns.
Additionally, in the context of this project the academic partners participated in an international challenge about social media mining for health, achieving top results in two tasks, using deep-learning BERT-based models. Specific methods and results will be presented.

Organization:
¹University of Zurich
²Roche, Basel

Ela Pustulka-Hunt, Thomas Hanne, Phillip Gachnang, Pasquale Biafora

FLIE: Form Labelling for Information Extraction.

Abstract
Information extraction from forms is a challenging topic with high practical relevance, in particular for the insurance industry in Switzerland. We have gathered over20’000 anonymized insurance policies and related documents in German, French, Englishand Italian and are developing an automated method for information extraction.
Given a user-provided schema, expressed as a list of attributes to be found in an insurance policy, we extract all the relevant information and map it to the attributes. Todo that, we first extract the text from pdf and generate the bounding boxes as a csv. We reconstruct a page, group the text boxes into horizontal groups and columns within groups and annotate the geometry. A small number of policies coming from various insurers and representing various policy types is annotated manually by the user with the desired attribute names, to show us how to map them. Machine learning is used to propagate this annotation in two steps: first, text is tagged as being metadata or data, and in the second step, attribute names are mapped to the extracted text. The accuracy of the first step is now at 88%, based on a corpus of 24 annotated policies. Ongoing work is improving the second step of the mapping. Data extraction uses those annotations to produce the required data for the user.

Organization:
FHNW University of Applied Sciences Northwestern Switzerland

Ahmad Aghaebrahimian, Mark Cieliebak

Company Name Disambiguation.

Abstract
Company Name Disambiguation (CND) is a form of Named Entity Disambiguation where different textual representations of a company name are linked to its formal name. For instance, the company ‘ArcelorMittal SA’ is often referred to as ‘Arcelor Mittal Group’, ‘Mittal Steel’, or simply ‘Mittal Co.’. The task of mapping these surface forms to the same company formal name is known as CND or in a more general term, Named Entity Disambiguation (NED). NED is a crucial task in many Natural Language Processing applications such as entity linking, record linkage, knowledge base construction, or relation extraction, to name a few. It has been shown that parameter-less models for NED do not generalize to other domains very well. On the other hand, parametric learning models do not scale well with a large number of candidate names which is often the case for CND since the number of company formal names usually exceeds hundreds of thousands of instances. Yet another challenge is multilingual NED; while company formal names are often in English, texts and company mentions are in another language which makes string matching impractical.
In this talk, I elaborate on a wide range of techniques we use to tackle these challenges for a proprietary CND system. I will talk about our parameterized and non-parameterized models, string normalization, encoding and disambiguation on the scale. Eventually, I present the audience with the state-of-the-art results we obtained on three publicly available datasets using our CND system.

Organization:
Zurich University of Applied Sciences

Anne Jorstad

Assigning Grant Applications to Reviewers via Text Analysis.

Abstract
The Swiss National Science Foundation normally finds the most appropriate expert reviewer for each grant application by hand. However, this process can be performed more efficiently using text mining.
An application can be represented by the text of its title, keywords, and abstract. Potential reviewers can be defined by similar texts from their publications. We are currently testing a variety of techniques to define the similarity between pairs of texts, followed by an optimization procedure to determine the final matching, given constraints about the number of applications allowed per reviewer.
The biggest challenge is due to the fact that the amount of discriminatory information provided in these texts varies widely between disciplines. Humanities and social sciences texts tend to use standard language vocabulary such as “law” or “urban”, while the hard sciences include very specific terminology like “SARS-CoV-2” or “latent semantic analysis”. And some expressions overlap, but carry different meanings in different fields, such as “family” or “support”, which are generally not meant in the context of “family of algorithms” or “support vector machines”.
We aim to develop a system that will be able to appropriately assign applications to reviewers for funding schemes as multi-disciplinary as Spark (“rapid funding of unconventional ideas”) and as mono-disciplinary as our new Coronavirus call. We note that this algorithm will not be applied for all funding schemes at the SNSF.

Organization:
Swiss National Science Foundation

Fabio Rinaldi¹, Anne Goehring¹, Corinne Gurtner², John Berezowski², Michele Bodmer², Irene Zuehlke², Celine Faverjon²

Text Mining Technologies for Animal Health Surveillance.

Abstract
We describe the outcomes of a collaborative project between the Vetsuisse faculty of the University of Bern and the Institute of Computational Linguistics of the University of Zurich, aimed at exploiting text mining technologies in the analysis of pathology reports from multiple Swiss veterinary laboratories. An online tool has been developed which allows the dynamic processing of batches of reports for the extraction of relevant signals, which in turn can be used for statistical analysis in epidemiological studies. The process is based on the identification in the reports of terminological items referring to relevant domain concepts. The terminologies used in the project are sourced from several ontological resources. We have also developed a semi-automated process to cross-map our ontological resources through a reference ontology, such as the UMLS.
In a first step we evaluated the completeness and validity of the necropsy data. In a second step, we combined information extracted from the three necropsy data sources, and investigated factors associated with necropsy submissions at three different levels – “national” , “farm” and “individual” – and according to age, region and time of the year.
An interactive dashboard application enables data exploration. The combined pathology data from several veterinary pathology laboratories can be spatially and temporally displayed for different types of analysis. All aspects of the projects have been assessed for their potential benefits for animal health surveillance.

Organization:
¹University of Zurich
²VetSuisse Bern

Alexandros Paramythis¹, Doris Paramythis¹, Andreas Putzinger²

SmartCC: Combining NLU and semantic business case modelling in customer care support.

Abstract
Customer care (CC) services play a vital role in customer acquisition and retention, yet are notoriously difficult to provide at the level of quality, effectiveness and efficiency that customers have grown to expect. The challenges are numerous: proliferation of support channels, hard to locate product and institutional knowledge, difficulty in maintaining continuity in customer interactions over time, to name but a few.
Contexity’s new product line, SmartCC, is a new approach in this area that empowers customer service workers through active facilitation. Our approach is centered around an AI assistant which “collaborates” with humans, taking over some of the most demanding tasks: analysing and structuring available information, monitoring and analysing live dialogue with customers, identifying user intent and support-case at hand, and intelligently and proactively delivering 360 degree views of the most relevant information, and appropriate next steps.
SmartCC brings together capabilities that stem from the fields of: information retrieval; natural language processing and understanding; semantic reasoning; and, automatic speech recognition. The connecting “glue” between, and at the same time the “orchestration basis” for, the comprising system services are semantic models of CC business cases. Our demo will concentrate on how Natural Language Understanding is driven by the aforementioned semantic models, to identify the status of ongoing customer-agent dialogues.

Organization:
¹Contexity AG, Winterthur
²GRZ IT Center GmbH, Linz

Alexandros Paramythis¹, Doris Paramythis¹, Andreas Putzinger

Following, understanding, and supporting service-oriented person-to-person communications.

Abstract
Automation in enterprise service provision has proliferated in recent years. In service-based communications, such automation typically has the form of Chatbots or Interactive Voice Response systems, of varying sophistication. Despite very significant improvements achieved in the corresponding technologies, recent studies show that in the domain of service-oriented communications, person-to-person interaction is highly more effective and efficient. This has given rise to a new generation of products that seek to empower humans engaging in such interaction, rather than replace them.
The main prerequisites for providing support during person-to-person communication are: on the one hand, being able to observe the ongoing interaction as it happens, bringing it to a computable form in (near) real time (e.g., through automatic speech recognition); and, on the other hand, being able to semantically interpret utterances in context. The second part specifically entails natural language understanding coupled with a semantic representation of the domain of intercourse that can be used for reasoning.
In this presentation we focus on the application of new developments in the fields of natural language processing and ontological domain modeling for the interpretation of dialogue acts, and also for the analysis of domain-specific data (e.g., product documentation), targeted to identifying the pieces of information most relevant to an ongoing person-to-person dialogue in real time.

Organization:
¹Contexity AG, Winterthur
²GRZ IT Center GmbH, Linz

Jan Deriu¹, Katsiaryna Mlynchyk¹, Philippe Schläpfer¹, Alvaro Rodrigo², Dirk von Grünigen¹, Kurt Stockinger¹, Eneko Agirre³, Mark Cieliebak¹

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation.

Abstract
In this talk, we present a novel methodology to efficiently construct a corpus for natural language interfaces to databases. We introduced an intermediate representation that is based on the logical query plan in a database. With this representation, we inverted the annotation process without loosing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of the tokens to the operations.
With our method, we created a new corpus for training natural language interfaces to databases, which is publicly available. We also present a state-of-the-art deep learning model, which we train on our data and show that our corpus is challenging.
In this talk, we present the annotation methodology, which allow for a fast creation of high quality data and a deep learning model, which can be trained on the created data.

Organization:
¹ZHAW Zurich University of Applied Sciences
²UNED, Madrid
³Euskal Herriko Unibertsitatea/Universidad del Paìs Vasco

Albin Zehe, Julia Arns, Lena Hettinger, Andreas Hotho

HarryMotions – Classifying Relationships in Harry Potter based on Emotion Analysis.

Abstract
Sentiment Analysis has long been a topic of interest in natural language processing and computational literary studies, where it can be used to infer the relationships between fictional characters. Building on the dataset and results of Kim and Klinger (2019), we propose a classifier based on BERT that improves the results reported therein and show that we can use this classifier to determine the relation between characters in Harry Potter novels. Our proposed sentiment classifier yields an F1-score of up to 75% for binary classification of emotions. Aggregating these emotions over novels, we reach an F1-score of up to 68% for the classification of a pair of characters as friendly or unfriendly.

Organization:
University of Wuerzburg

Vani Kanjirangat, Fabio Rinaldi

Biomedical relation extraction with state-of-the-art neural models.

Abstract
Typically text mining systems are based upon the identification of mentions of domain entities of relevance (entity recognition and linking), and the identification of their relationships, such as the role of genes in certain diseases, or protein-protein interactions.
We experimented the efficacy of state-of-the-art neural models for extracting high-quality relations from biomedical abstracts. The transformer models, BERT and its biomedical counterpart, BIOBERT were tested as classification models as well as embeddings features. The embeddings were then fed as input to other neural models such as Long Short Term Memory Networks (LSTM), Convolutional Neural Networks (CNN) and Attention Models.
Experiments were conducted on reference datasets such as the CHEMPROT dataset (Chemical-Protein relations) and the CDR dataset (Chemical-Disease relations). Depending on the dataset used, the tasks varied from binary to multi-class classification and intra-sentential to inter-sentential relation spans.
Our research centers on improving the relation extraction models, by analyzing the features captured by the current models. Experiments are done on visualizing the attention flow to exploit the features that were involved in deciding the relations by existing models. These analysis are quite important, especially when the black-box nature of the neural models is considered to be a main pitfall specifically restricting their practical applications.

Organization:
IDSIA, Lugano

Janis Goldzycher¹, Isabel Meraner², Martin Volk², Simon Clematide²

Ranking Georeferences for Efficient Crowdsourcing of Toponym Annotations in a Historical Corpus of Alpine Texts.

Abstract
This paper presents a simple method to rank georeference candidates to optimally support the workflow of a citizen science web application for toponym annotation in historical texts. We implement the general idea of efficient crowdsourcing based on human and artificial intelligence working hand in hand. For named entity recognition, we apply recent neural pretraining-based NER tagger methods. For named entity linking to geographical knowledge bases, we report on georeference ranking experiments testing the hypothesis that textual proximity indicates geographic proximity. Simulation results with online reranking that immediately integrates user verification show further improvements.

Organization:
¹Institute of Computational Linguistics, University of Zurich
²University of Zurich

Andrei Popescu-Belis¹, Aris Xanthos², Valentin Minder¹, Àlex R. Atrio¹, Gabriel Luthier¹, Antonio Rodriguez²

Interactive Poem Generation: when Language Models support Human Creativity.

Abstract
Neural language models, which are probability distributions over sequences of words or characters, have recently enabled the generation of fluent sentences and even short texts. However, controlling such models in order to convey specific meanings remains difficult. To study how language modeling can be constrained with text-level features, we have designed a system for interactive poem generation, which enables the joint writing of a poem by a human and a machine. The human first selects the intended form of the poem, e.g. a sonnet or a haiku, although internally any numbers of stanzas and lines are allowed. Using a general-domain neural language model at the character-level, trained on French poems, the system generates a first draft respecting the form. The draft can be modulated according to a desired combination of specific topics (e.g. art, love, or nature) by modifying a number of words using topic-specific language models. Similarly, the draft can be modulated in terms of emotions (happiness, sadness, or aversion). To express their creativity and improve the readability of the poem, humans are allowed to edit it at any stage of the creative process. A strategy to improve rhyming patterns is currently explored. The system has been active since mid-February in the Digital Lyric exhibition. All poems are logged in a database, from which descriptive statistics can be extracted. The system can be demonstrated live at the conference using a large touchscreen.
Bio of the presenter — Andrei Popescu-Belis is professor of computer science at HEIG-VD / HES-SO and a lecturer at EPFL. He is a graduate of the École Polytechnique, with a PhD from the University of Paris-Sud. He has been a researcher in human language technology at the University of Geneva and at the Idiap Research Institute. His interests are in machine translation, information retrieval and human-computer interaction. He has published over 150 refereed papers and edited 12 books/proceedings.
Intended audience — This talk will be of interest to researchers and developers of language technologies, especially those using deep neural language models to generate texts. The talk will also be relevant to those interested in digital humanities and creativity support tools.

Organization:
¹HEIG-VD / HES-SO, Yverdon
²University of Lausanne

Jean Charbonnier, Christian Wartena

Predicting the Concreteness of German Words.

Abstract
Concreteness of words has been measured and used in Psycholinguistic already for decades. Recently, it is also used in retrieval and NLP tasks. For English a number of well known datasets has been established with average values for perceived concreteness
We give an overview of available German datasets, their correlation and evaluate prediction algorithms for concreteness of German words. We show that these algorithms achieve similar results as for English datasets. Moreover, we show that for no dataset there are significant differences between a prediction model based on a regression model using word embeddings as features and a prediction algorithm based on word similarity according to the same embeddings.

Organization:
Hochschule Hannover University of Applied Sciences and Arts

Sandra Mitrović¹, Vani Kanjirangat¹, Denis Broggini¹, Lorenzo Cimasoni², Marco Alberti², Alessandro Antonucci¹, Fabio Rinaldi¹

A conversational recommender system based on neural NLP models.

Abstract
In this project, we focus on conversational recommender systems that allow users to specify their preferences through a sequence of dynamically customized interactions, as contrasted to traditional ones. In particular, we seek to improve an online recommendation platform of Stagend (stagend.com) that aims at finding the most suitable performer (“an item”) for a particular event specified by an event organizer (“a user”). In the first phase, an adaptive, Bayesian methods-based approach was used to sequentially update the model given a new piece of information, e.g. performer’s answer to organizer’s question. However, in a real-time setting, delayed/incomplete interactions (e.g. missing reply), can hamper the system efficiency.
To overcome this issue, and also to avoid unnecessary burden on performer (in cases when the answer is already available in performer’s biography or previous events’ conversations), we investigate the ways of enhancing the Bayesian approach with NLP methods. Specifically, we adopt a question-answering BERT-based approach to either provide a confident automated answer based on the existing information, or to indicate uncertainty and thus, the necessity of contacting the performer. Additionally, given that Stagend operates in multilingual markets, we benchmark different multilingual models such as multilingual BERT and XLM-RoBERTa, as well as compare these with separate language models per each of the target languages (DE + Swiss DE challenge, FR, IT, EN).

Organization:
¹IDSIA, Lugano
²Stagend, Lugano

Jannis Vamvas, Rico Sennrich

X-Stance: A Multilingual Multi-Target Dataset for Stance Detection.

Abstract
We extract a large-scale stance detection dataset from comments written by candidates of elections in Switzerland. The dataset consists of German, French and Italian text, allowing for a cross-lingual evaluation of stance detection. It contains 67 000 comments on more than 150 political issues (targets). Unlike stance detection models that have specific target issues, we use the dataset to train a single model on all the issues. To make learning across targets possible, we prepend to each instance a natural question that represents the target (e.g. «Do you support X?»). Baseline results from multilingual BERT show that zero-shot cross-lingual and cross-target transfer of stance detection is moderately successful with this approach.

Organization:
Department of Computational Linguistics, University of Zurich

Manfred Klenner, Anne Göhring, Michael Amsler

Harmonization sometimes Harms.

Abstract
In this short paper we argue that harmonization is not the preferred way to produce a gold standard in all cases. Neither does a majority
vote based harmonization produce an appropriate gold standard centroid, nor would a mere centroid be a good basis for training a system that reproduces prototypical user reactions given some understanding task. We discuss these claims in the context of sentiment inference.

Organization:
University of Zurich

Simone Griesser

Psychological Distance in German and English Brand Language of 6 International Brands.

Abstract
Language offers additional insights to sentiment and content. The same content can be described with psychologically close or distant language. According to the Construal-Level Theory (Trope & Liberman, 2010), psychological distance influences decision-making. Analysing the psychological distance in brand language of 8 brands shows that brands psychologically approach customers with their German brand language but psychologically distance themselves from customers with their English brand language. Implications on decision-making and brand positioning are discussed.

Organization:
FHNW University of Applied Sciences and Arts Northwestern Switzerland

Updates

Important Dates