Abstract
Book reviews on social platforms are generated in large quantities by non-specialist avid readers, and contain subjective evaluations pertaining to one’s own reading experience. Social reading reviews often feature an under-researched phenomenon: Narrative Absorption, i.e. the extent to which immersion into the book’s narrative took place during reading. Absorption can be reflected by statements such as ’I was completely hooked’ and pertain to a complexity of dimensions such as attention, emotional engagement, mental imagery, and transportation. Based on a set of user-generated reviews that we manually annotated (cf. Rebora et al. 2020), the detection of reading absorption with NLP approaches has been investigated in e.g. Lendvai, Rebora and Kuijpers (2019), Lendvai et al. (2020).
We work on a pipeline to retrieve and rank absorption-rich user reviews from a large, unlabeled document dump (6+ million reviews in English), in order to allow for the preselection of subsets of the dump that undergo manual annotation. We fine-tuned BERT (Devlin et al., 2018) for a supervised absorption detection task on 16k review sentences absorption-annotated by us (Absorption vs. Nonabsorption), and evaluated it on a held-out dataset of 149 reviews, achieving .75 macro F1 mean (support: 1,011 vs. 3,510 sentences).
Our current focus was to create a model that aggregates sentence level prediction scores on the document level. To this end, BERT’s sentence level absorption probabilities were averaged per review and were used to train a linear regression model on the full corpus to predict Absorption Richness, defined as the proportion of sentences annotated as expressing absorption in a review. Review-level Absorption Richness regression lowers classification error relative to the baseline, defined as the review-level proportion of absorption classifications by taking the argmax of BERT’s logits (Mean Average Errors of .08 vs. .11 and Spearman correlation of .73 vs. .65, respectively).
The increase of the Spearman’s rank correlation coefficient directly expresses that a review ranking by linear regression predictions corresponds more closely to the ground truth ranking than a ranking solely based on BERT. We utilize the regression model in Absorption-Richness-based document filtering, to facilitate the benchmarking and analysis of social reading reviews in our large document dump.