Data Exploration with Selection of Representative Regions: Formulation, Axioms, Methods, and Consistency

Published in Mathematics of Operations Research, 2020

Recommended citation: Estes AS, Ball MO, Lovell DJ. To appear in Mathematics of Operations Research.

Preprint available here.

We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger dataset. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we construct two methods for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including the two methods proposed in this paper. We provide an example of how representative regions may be used to explore a data set.