Working Thesis Title
Using Interpretable Machine Learning Methods to Retrieve Articles by Genre and Topic in a Digitised Historical Newspaper Collection.
Comprehensive digitisation programmes have made historical newspaper collections more accessible from a physical perspective but have introduced other problems such as optical character recognition (OCR) errors and issues related to evaluation of the quality, scope, and representativeness of search results. My research will use interpretable machine learning methods to deliver new ways to explore and retrieve articles from the National Library of New Zealand’s Papers Past collection. Using the Papers Past open data for development and testing, the work will bring together aspects of information retrieval, data science, and data visualisation. The intention is that the methods and tools developed as part of this work will enable greater transparency of search results and access to parts of the collection that may be less likely to be discovered using existing keyword-based search methods.
Supervisors
Primary supervisor: Dr Christopher Thomson
Co-supervisor: Dr James Williams
Academic History
Master of Applied Data Science (Distinction), University of Canterbury
Bachelor of Management Studies with Honours (First class), University of Waikato