Data derived from https://github.com/seancepstein/training_labels/tree/main/data