Improve stability of velocity fits in template metrics #4342
+14
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In template metrics,
velocity_aboveandvelocity_belowestimate how fast an spike moves along the axon/dendrites by fitting a line where x is distance (template_channel_location - location of soma) and y is time (position of the peak at each template_channel compared to the soma expressed in msecs). A small slope means the spike takes a long time to move through the probe (so mm/msecs), a higher slope means the spike moves fast. Currently when a peak position is the same (or very close) along the template channels, VelocityFits() tries to fit a straight vertical line; which gives an infinite slope (in practice, the fitting fails with NaN because X is ill-conditioned). This is pretty common.This PR switches the fitting to regress peak_ms (in y) onto channel_distances (in x) and then take 1/slope to obtain the same velocities as before. This is more stable and also justified in the original source "Because the time difference between the trough of adjacent sites could be 0, to avoid infinite numbers, we calculated the inverse of velocity (ms/mm) instead by fitting a regression line to the time of waveform trough at different sites against the distance of the sites relative to soma" (Jia et al, 2020). I also centered the data before fitting the regressor. Centering helps stability and avoids having to learn an intercept which gives the model a bit more robustness.
Here's the velocities calculated with the current method and the new method (proposed in this PR):

Predicted velocities are consistent for lower velocities (center of the plot) and more stable at higher velocities. It is also able to make predictions for a lot more cases (this is a 3h neuropixels recording and the proportions of NaNs drops from 0.37 to 0.26). I also manually checked some results (where there were nans before) and they look sensible.
Overall, I think it's just a sensible change for added stability and better predictions.