Fix notebooks generation flow by asiomchen · Pull Request #76 · EBjerrum/scikit-mol

asiomchen · 2025-05-05T20:32:56Z

So the notebook pairs were broken in many ways and this PR aims to solve that

Makefile command fixed to works with multifolder setup (somehow jupytext does not respect global config from pyproject.toml )
Notebook were formatted with ruff and reexecuted
New pre-commit added - it check that all the pairs exist and are in sync automatically
Notebook are excluded from .gitignore - we need them for docs

And additionally typing issue in parallel,py is fixed

EBjerrum · 2025-05-06T06:52:57Z

Great to tighten up the workflow and coupling, but from the diffs it seems that a lot of spelling and grammar mistakes are reintroduced to the notebooks. @mieczyslaw, which did you spellcheck, the .py or the .ipynb? I have a suspicion that we are syncing the wrong way and overwriting your fixes with this pull-request?

asiomchen · 2025-05-06T06:58:23Z

Oh, right, I didn't think about that and was hoping that whichever is the newer is a good one (but jupytext behaviour is somewhat still a surprise for me).

I suspect @mieczyslaw fixed the .ipynb files, then we should pull them from main and regenerate *.py

mieczyslaw · 2025-05-06T07:15:20Z

If I remember correctly, I corrected everything I found re speelchecks (you would see that in git logs of my merged branch). In my recent branch I wanted to fix import which seemed to be broken (but I suppose ipynb needs to be regenerated from py, basically worth checking in which of two the import is correct after sanitization in class name change) plus I wanted to change the capitalization of one of the notebooks for consistency. With my branch I am happy to reset to the main after this PR is merged (when all works as expected) and do the required changes again. I suppose I am a bit confused about flow - do people write *py files and generate notebooks from them, or other way round? Do we support both ways, or best to keep it one way.

mieczyslaw · 2025-05-06T07:16:59Z

I don't see in this PR any update to dev section of contributing readme - possibly great to update it to the flow for regeneration is clear.

mieczyslaw · 2025-05-06T19:30:56Z

I looked in the git history of my commit and I corrected both py and ipynb files, but not all ipynb unfortuanately - I trusted PyCharm check. Do you want me to look back into changes to spelling/grammar in Py and make sure they are also present in notebooks, so then we can regenerate py from notebooks using the updated machinery in this PR? I think we need to decide whether py or ipynb is the source of truth.

EBjerrum · 2025-05-08T06:34:01Z

I think we may need a mechanism to sync both ways. I find it easier to write the notebooks as notebooks, but I think the .py files should be the official source as it's easier to track genuine changes to the code there. Thus a command to sync from notebook to .py on a single file basis would be handy, whereas the sync from .py to notebooks and rerunning all could work on all files?

EBjerrum · 2025-05-08T06:34:49Z

Definitely we should add some notes in the dev doc for us self to remember.

mieczyslaw · 2025-05-08T08:28:21Z

It would be great to create those processes as nox (https://nox.thea.codes/en/stable/) sessions (we can add later on tests as well plus we can generate py from ipynb and check diff in the pipelines). This way we would have a run in a separate env (to avoid local additions) and trackable changes to py <-> ipynb generation. Happy to help with setting those up.

asiomchen · 2025-05-08T19:12:57Z

Update, the issue with the previous notebooks was the fact that new versions of jupytext preserves old pairs, after manual recreation it is probably fine. I used .py files as a source of truth for the new pairs

Running make sync-notebooks updates the pairs and if the new notebook is created, then the same command will create corresponding .py file. @EBjerrum this command also works both from notebook to script and vice versa.

Only issue is - notebook 13 looks broken - not sure what is happening and my brain activity is on zero for now

EBjerrum · 2025-05-09T06:09:29Z

Great! I'll find some time in the weekend and I can also look into notebook 13 issue then.

EBjerrum · 2025-05-11T09:44:52Z

I think it looks good, and the fix of notebook 13 follows shortly.

mieczyslaw · 2025-05-12T11:26:22Z

docs/notebooks/04_standardizer.ipynb

   "source": [
    "# Molecule standardization\n",
-    "When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotonated carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit"
+    "When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotanted carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit"


something to update as a small fix later on

mieczyslaw · 2025-05-12T11:27:47Z

docs/notebooks/scripts/03_example_pipeline.py

 print(f"Test score is  :{pipe.score(mol_list_test, y_test):0.2F}")
 # %% [markdown]
-# Nevermind the performance, or the exact value of the prediction, this is for demonstration purposes. We can easily predict on lists of molecules
+# Nevermind the performance, or the exact value of the prediction, this is for demonstration purpures. We can easily predict on lists of molecules


something to fix later on

Ah, sorry, did we reintroduce a lot of the type fixes?

looks like only the two I found, so easy to fix with the future PR, probably not worth creating a separate PR for that

btw, great improvement @asiomchen

asiomchen requested a review from EBjerrum as a code owner May 5, 2025 20:32

asiomchen added 7 commits May 8, 2025 17:06

update dependencies

b468f27

use py files as the source of truth

da0d755

re-pair notebooks

3daa515

add more info about the sync

8aaaa52

run notebooks

046f22b

sync py files

c070f64

update target Python version to 3.9 in ruff.toml

f3fd021

asiomchen force-pushed the bugfix/notebook-pairs branch from 670e282 to f3fd021 Compare May 8, 2025 16:30

EBjerrum merged commit fe0e97d into EBjerrum:main May 11, 2025
10 checks passed

mieczyslaw reviewed May 12, 2025

View reviewed changes

Conversation

asiomchen commented May 5, 2025

Uh oh!

EBjerrum commented May 6, 2025

Uh oh!

asiomchen commented May 6, 2025

Uh oh!

mieczyslaw commented May 6, 2025

Uh oh!

mieczyslaw commented May 6, 2025

Uh oh!

mieczyslaw commented May 6, 2025

Uh oh!

EBjerrum commented May 8, 2025

Uh oh!

EBjerrum commented May 8, 2025

Uh oh!

mieczyslaw commented May 8, 2025

Uh oh!

asiomchen commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EBjerrum commented May 9, 2025

Uh oh!

EBjerrum commented May 11, 2025

Uh oh!

Uh oh!

mieczyslaw May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mieczyslaw May 12, 2025

Choose a reason for hiding this comment

Uh oh!

EBjerrum May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mieczyslaw May 12, 2025

Choose a reason for hiding this comment

Uh oh!

mieczyslaw May 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asiomchen commented May 8, 2025 •

edited

Loading