-
Notifications
You must be signed in to change notification settings - Fork 7
fix(medcat-trainer): Postgres load_examples fails on unique constraint #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t as models.py calls insert twice
| raise MedCATLoadException(f'Failure loading MetaCAT models - {unpacked_model_pack_path}') from exc | ||
|
|
||
| super().save(*args, **kwargs) | ||
| if not is_new: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a lot of Django knowledge.
But to me this doesn't feel quite right.
If the model is new (is_new == True) then this gets initially saved at the start of this method (line 51).
So the rest of the code should run under the assumption that if something is changed, it should be updated rather than fully re-saved.
But the current implementation performs an update if the object already existed before calling this method, but attempts to re-save the entire thing if the object was brand new. To me, the brand new behaviour would still run into the save-twice issue.
But again, not too familiar with Django. And not sure what the suitable option would / should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha - I should have looked up, this already is in a transaction.... in Django to set a M2M field requires the object to exist in the ORM. so Saving 'twice' is needed for new or otherwise objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you there that it doesnt seem right
Looking at it today: I think at least this is a step forward. I think this PR fixes it for when is_new = True, and leaves the existing flow where is_new=False exactly as it is today
"To me, the brand new behaviour would still run into the save-twice issue." - with the change in this PR if is_new=True then it does an UPDATE statement not insert using this function argument update_fields=['concept_db', 'vocab']). So the is_new behavior should be fixed
For the is_new = False case, I think it's probably still broken, I think it's going to go through this function and call an INSERT, though I dont know how to make that function call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - I missed the not in the if statement. So the (potential) issue is (like you correctly said) with the not is_new case. The other issue is that in case of is_new the *args and **kwargs aren't being propagated. I don't know whether / what downstream effect that may or may not have. But it's certainly a diverging behaviour compared to the other calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was too eager pressing the merge button.
I've added the args and kwargs on this PR #300 - good spot and I want to keep it all consistent.
tomolopolis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - 😬 this is likely first of a couple contention errors that might arise with postgres. There's likely some @transactional ctx managers that are needed...
…t as models.py calls insert twice (#299)
This was a long one, for a one line change.
.savegets called twice in models.py. I think for sqlite it doesnt care at all, but postgres fails.This is only seen on load_examples as it calls /api/modelpacks/, but if you use the admin UI it all worked fine as it calls a different API.
I'm not 100% on the actual change if anyone with a bit more django knowledge has opinions. Basically it should just update the foreign keys instead of inserting again.
Bug
Application logs
Postgres logs
After fix