Skip to content

Conversation

@alhendrickson
Copy link
Collaborator

This was a long one, for a one line change.

.save gets called twice in models.py. I think for sqlite it doesnt care at all, but postgres fails.

This is only seen on load_examples as it calls /api/modelpacks/, but if you use the admin UI it all worked fine as it calls a different API.

I'm not 100% on the actual change if anyone with a bit more django knowledge has opinions. Basically it should just update the foreign keys instead of inserting again.

Bug

Application logs

2026-01-20T13:43:18.964957939Z [medcattrainer] Internal Server Error: /api/modelpacks/
2026-01-20T13:43:18.964993604Z [medcattrainer] Traceback (most recent call last):
2026-01-20T13:43:18.964999327Z [medcattrainer]   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 105, in _execute
2026-01-20T13:43:18.965003517Z [medcattrainer]     return self.cursor.execute(sql, params)
2026-01-20T13:43:18.965007298Z [medcattrainer]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-20T13:43:18.965011079Z [medcattrainer]   File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 97, in execute
2026-01-20T13:43:18.965014962Z [medcattrainer]     raise ex.with_traceback(None)
2026-01-20T13:43:18.965018641Z [medcattrainer] psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "api_modelpack_pkey"
...

2026-01-20T13:43:18.965969644Z [medcattrainer]     raise ex.with_traceback(None)
2026-01-20T13:43:18.965973323Z [medcattrainer] django.db.utils.IntegrityError: duplicate key value violates unique constraint "api_modelpack_pkey"
2026-01-20T13:43:18.966259155Z [medcattrainer] DETAIL:  Key (id)=(13) already exists.
2026-01-20T13:43:18.966264061Z [medcattrainer] ERROR 2026-01-20 13:43:18,956 log.py l:253:Internal Server Error: /api/modelpacks/
2026-01-20T13:43:18.966267842Z [medcattrainer] Traceback (most recent call last):
2026-01-20T13:43:18.966271214Z [medcattrainer]   File "/usr/local/lib/python3.11/site-packages/django/db/backends/utils.py", line 105, in _execute
2026-01-20T13:43:18.966274995Z [medcattrainer]     return self.cursor.execute(sql, params)
2026-01-20T13:43:18.966278470Z [medcattrainer]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-01-20T13:43:18.966282047Z [medcattrainer]   File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 97, in execute
2026-01-20T13:43:18.966285828Z [medcattrainer]     raise ex.with_traceback(None)
2026-01-20T13:43:18.966289302Z [medcattrainer] psycopg.errors.UniqueViolation: duplicate key value violates unique constraint "api_modelpack_pkey"
2026-01-20T13:43:18.966292675Z [medcattrainer] DETAIL:  Key (id)=(13) already exists.

Postgres logs

20 17:28:36.944 GMT [17471] LOG:  statement: INSERT INTO "api_conceptdb" ("name", "cdb_file", "use_for_training", "create_time", "last_modified", "last_modified_by_id") VALUES ('Example Model Pack_CDB', '/home/api/media/model_pack_vM3qHeH/cdb', true, '2026-01-20 17:28:36.943543+00:00'::timestamptz, '2026-01-20 17:28:36.943562+00:00'::timestamptz, NULL) RETURNING "api_conceptdb"."id"
2026-01-20T17:28:37.303Z | 2026-01-20 17:28:37.303 GMT [17471] LOG:  statement: INSERT INTO "api_vocabulary" ("name", "vocab_file", "create_time", "last_modified", "last_modified_by_id") VALUES ('', '/home/api/media/model_pack_vM3qHeH/vocab', '2026-01-20 17:28:37.302798+00:00'::timestamptz, '2026-01-20 17:28:37.302816+00:00'::timestamptz, NULL) RETURNING "api_vocabulary"."id"
2026-01-20T17:28:37.305Z | 2026-01-20 17:28:37.305 GMT [17471] LOG:  statement: SELECT "api_metacatmodel"."id" FROM "api_metacatmodel" INNER JOIN "api_modelpack_meta_cats" ON ("api_metacatmodel"."id" = "api_modelpack_meta_cats"."metacatmodel_id") WHERE "api_modelpack_meta_cats"."modelpack_id" = 23
2026-01-20T17:28:37.308Z | 2026-01-20 17:28:37.307 GMT [17471] LOG:  statement: INSERT INTO "api_modelpack" ("id", "name", "model_pack", "concept_db_id", "vocab_id", "create_time", "last_modified", "last_modified_by_id") VALUES (23, 'Example Model Pack', 'model_pack_vM3qHeH.zip', 12, 12, '2026-01-20 17:28:37.307383+00:00'::timestamptz, '2026-01-20 17:28:37.307410+00:00'::timestamptz, NULL) RETURNING "api_modelpack"."id"

2026-01-20T17:28:37.308Z | 2026-01-20 17:28:37.308 GMT [17471] ERROR:  duplicate key value violates unique constraint "api_modelpack_pkey"

2026-01-20T17:28:37.308Z | 2026-01-20 17:28:37.308 GMT [17471] DETAIL:  Key (id)=(23) already exists.

After fix

  • It now calls update instead of insert
2026-01-20 17:46:02.316 GMT [19337] LOG:  statement: SELECT "api_metacatmodel"."id" FROM "api_metacatmodel" INNER JOIN "api_modelpack_meta_cats" ON ("api_metaca
tmodel"."id" = "api_modelpack_meta_cats"."metacatmodel_id") WHERE "api_modelpack_meta_cats"."modelpack_id" = 25
2026-01-20 17:46:02.318 GMT [19337] LOG:  statement: UPDATE "api_modelpack" SET "concept_db_id" = 14, "vocab_id" = 14 WHERE "api_modelpack"."id" = 25

@alhendrickson alhendrickson marked this pull request as ready for review January 20, 2026 18:03
raise MedCATLoadException(f'Failure loading MetaCAT models - {unpacked_model_pack_path}') from exc

super().save(*args, **kwargs)
if not is_new:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a lot of Django knowledge.

But to me this doesn't feel quite right.

If the model is new (is_new == True) then this gets initially saved at the start of this method (line 51).
So the rest of the code should run under the assumption that if something is changed, it should be updated rather than fully re-saved.

But the current implementation performs an update if the object already existed before calling this method, but attempts to re-save the entire thing if the object was brand new. To me, the brand new behaviour would still run into the save-twice issue.

But again, not too familiar with Django. And not sure what the suitable option would / should be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha - I should have looked up, this already is in a transaction.... in Django to set a M2M field requires the object to exist in the ORM. so Saving 'twice' is needed for new or otherwise objects

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you there that it doesnt seem right

Looking at it today: I think at least this is a step forward. I think this PR fixes it for when is_new = True, and leaves the existing flow where is_new=False exactly as it is today

"To me, the brand new behaviour would still run into the save-twice issue." - with the change in this PR if is_new=True then it does an UPDATE statement not insert using this function argument update_fields=['concept_db', 'vocab']). So the is_new behavior should be fixed

For the is_new = False case, I think it's probably still broken, I think it's going to go through this function and call an INSERT, though I dont know how to make that function call

Copy link
Collaborator

@mart-r mart-r Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I missed the not in the if statement. So the (potential) issue is (like you correctly said) with the not is_new case. The other issue is that in case of is_new the *args and **kwargs aren't being propagated. I don't know whether / what downstream effect that may or may not have. But it's certainly a diverging behaviour compared to the other calls.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was too eager pressing the merge button.

I've added the args and kwargs on this PR #300 - good spot and I want to keep it all consistent.

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - 😬 this is likely first of a couple contention errors that might arise with postgres. There's likely some @transactional ctx managers that are needed...

@alhendrickson alhendrickson merged commit 66540b2 into main Jan 21, 2026
10 checks passed
@alhendrickson alhendrickson deleted the fix/medcat-trainer/postgres-load-examples branch January 21, 2026 10:06
tomolopolis pushed a commit that referenced this pull request Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants