Kang D, Derhacobian A, Tsuji K, Hebert T, Bailis P, Fukami T, Hashimoto T, Sun Y, Zaharia M (2021) Exploiting Proximity Search and Easy Examples to Select Rare Events. InNeurIPS Data-Centric AI Workshop 2021 2021 Dec.
A common problem practitioners face is to select rare events in a large dataset. Unfortunately, standard techniques ranging from pre-trained models to active learning do not leverage proximity structure present in many datasets and can lead to worse-than-random results. To address this, we propose EZMODE, an algorithm for iterative selection of rare events in large, unlabeled datasets. EZMODE leverages active learning to iteratively train classifiers, but chooses the easiest positive examples to label in contrast to standard uncertainty techniques. EZMODE also leverages proximity structure (e.g., temporal sampling) to find difficult positive examples. We show that EZMODE can outperform baselines by up to 130× on a novel, real-world, 9,000 GB video dataset. [link to publication]