Evaluating Effects of Linguistic Evolution on Classification Models for Banking Documents

Skills Employed
Introduction

Linguistic Evolution can be broadly defined as the shift in word vocabulary, context and trends in a language in a particular area of interest. In 1968, Weinreich, Labov, and Herzog published a paper, Empirical foundations for a theory of language change to break this bigger area of study into 5 smaller problems. They are:

In this project, we focussed on the Transition Problem. To explain further, this project was aimed to study the evolution of the vocabulary in banking documents like customer complaints and reviews and its correlation with the performance metrics (especially recall) of classification models built using these text documents itself. The hypothesis we tried to test here, was that as the complexity (difficulty of readability) of a language increases, the performance of the classification models improves (false-negatives decrease). The model for this particular case would predict mortgage payment faults. However, we anticipate the framework being scaled to other use-cases as well, with a few minor tweaks in the process described here. The detailed methodology followed in this project is depicted below.

Methodology

The detailed methodology followed in this project is explained below.



Impact

The model and framework described here were executed on a partly synthetic and partly deidentified actual data as it was completed as a part of a virtual internship with a banking company (Summer 2020). The framework and the models were then added to the research repository of the bank for further use in all such similar use-cases.

References

The following papers and literature were referred to, for execution and solution approach design.

Contact Me

Address

Duke University, Durham, NC, USA

Phone Number

+1-2065811905

Your message has been sent. Thank you!