Refactored Convolver, with two key changes for efficiency:

1) The code no longer iterates over -1s to perform convolution. Instead, it stores every kernel value in a new array
which is references as the convolution runs.

2) The frame_index / frame_kernels arrays are no longer a list of NumPy arrays, but a 2D NumPy array. This produced a
big speed up and proved necessary, so we should keep to NumPy arrays + Numba when possible. These arrays are padded
with -1s where masked pixels overlap the frame convolver, but the -1s are not used in the calculation.

ORIGINAL:

psf = 21x21

lsst_solution: 0.23537135124206543
euclid_solution: 0.961646318435669
hst_solution: 3.833004951477051
hst_up_solution: 10.64686131477356
ao_solution: 95.55560088157654

psf = 41x41

JITTED:

psf = 21x21

lsst_solution: 0.0011916160583496094
euclid_solution: 0.0038046836853027344
hst_solution: 0.014577150344848633
hst_up_solution: 0.038819074630737305
ao_solution: 0.3407902717590332

psf = 41x41



Clearly, the jitted PSF convolver is a huge speed up :).

