Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/128932
Type: Conference paper
Title: A framework for streamlined statistical prediction using topic models
Author: Glenny, V.
Tuke, J.
Bean, N.
Mitchell, L.
Citation: Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2019, vol.abs/1904.06941, pp.61-70
Publisher: Association for Computational Linguistics
Issue Date: 2019
ISBN: 9781950737000
Conference Name: 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (7 Jun 2019 - 7 Jun 2019 : Minneapolis, MN)
Statement of
Responsibility: 
Vanessa Glenny, Jonathan Tuke, Nigel Bean, Lewis Mitchell
Abstract: In the Humanities and Social Sciences, there is increasing interest in approaches to information extraction, prediction, intelligent linkage, and dimension reduction applicable to large text corpora. With approaches in these fields being grounded in traditional statistical techniques, the need arises for frameworks whereby advanced NLP techniques such as topic modelling may be incorporated within classical methodologies. This paper provides a classical, supervised, statistical learning framework for prediction from text, using topic models as a data reduction method and the topics themselves as predictors, alongside typical statistical tools for predictive modelling. We apply this framework in a Social Sciences context (applied animal behaviour) as well as a Humanities context (narrative analysis) as examples of this framework. The results show that topic regression models perform comparably to their much less efficient equivalents that use individual words as predictors.
Rights: ©2019 Association for Computational Linguistics
Published version: https://www.aclweb.org/anthology/W19-2508/
Appears in Collections:Aurora harvest 8
Mathematical Sciences publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.