Conversation
|
Great to tighten up the workflow and coupling, but from the diffs it seems that a lot of spelling and grammar mistakes are reintroduced to the notebooks. @mieczyslaw, which did you spellcheck, the .py or the .ipynb? I have a suspicion that we are syncing the wrong way and overwriting your fixes with this pull-request? |
|
Oh, right, I didn't think about that and was hoping that whichever is the newer is a good one (but jupytext behaviour is somewhat still a surprise for me). I suspect @mieczyslaw fixed the .ipynb files, then we should pull them from main and regenerate *.py |
|
If I remember correctly, I corrected everything I found re speelchecks (you would see that in git logs of my merged branch). In my recent branch I wanted to fix import which seemed to be broken (but I suppose ipynb needs to be regenerated from py, basically worth checking in which of two the import is correct after sanitization in class name change) plus I wanted to change the capitalization of one of the notebooks for consistency. With my branch I am happy to reset to the main after this PR is merged (when all works as expected) and do the required changes again. I suppose I am a bit confused about flow - do people write *py files and generate notebooks from them, or other way round? Do we support both ways, or best to keep it one way. |
|
I don't see in this PR any update to dev section of contributing readme - possibly great to update it to the flow for regeneration is clear. |
|
I looked in the git history of my commit and I corrected both py and ipynb files, but not all ipynb unfortuanately - I trusted PyCharm check. Do you want me to look back into changes to spelling/grammar in Py and make sure they are also present in notebooks, so then we can regenerate py from notebooks using the updated machinery in this PR? I think we need to decide whether py or ipynb is the source of truth. |
|
I think we may need a mechanism to sync both ways. I find it easier to write the notebooks as notebooks, but I think the .py files should be the official source as it's easier to track genuine changes to the code there. Thus a command to sync from notebook to .py on a single file basis would be handy, whereas the sync from .py to notebooks and rerunning all could work on all files? |
|
Definitely we should add some notes in the dev doc for us self to remember. |
|
It would be great to create those processes as nox (https://nox.thea.codes/en/stable/) sessions (we can add later on tests as well plus we can generate py from ipynb and check diff in the pipelines). This way we would have a run in a separate env (to avoid local additions) and trackable changes to py <-> ipynb generation. Happy to help with setting those up. |
670e282 to
f3fd021
Compare
|
Update, the issue with the previous notebooks was the fact that new versions of jupytext preserves old pairs, after manual recreation it is probably fine. I used .py files as a source of truth for the new pairs Running Only issue is - notebook 13 looks broken - not sure what is happening and my brain activity is on zero for now |
|
Great! I'll find some time in the weekend and I can also look into notebook 13 issue then. |
|
I think it looks good, and the fix of notebook 13 follows shortly. |
| "source": [ | ||
| "# Molecule standardization\n", | ||
| "When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotonated carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit" | ||
| "When building machine learning models of molecules, it is important to standardize the molecules. We often don't want different predictions just because things are drawn in slightly different forms, such as protonated or deprotanted carboxylic acids. Scikit-mol provides a very basic standardize transformer based on the molvs implementation in RDKit" |
There was a problem hiding this comment.
something to update as a small fix later on
| print(f"Test score is :{pipe.score(mol_list_test, y_test):0.2F}") | ||
| # %% [markdown] | ||
| # Nevermind the performance, or the exact value of the prediction, this is for demonstration purposes. We can easily predict on lists of molecules | ||
| # Nevermind the performance, or the exact value of the prediction, this is for demonstration purpures. We can easily predict on lists of molecules |
There was a problem hiding this comment.
something to fix later on
There was a problem hiding this comment.
Ah, sorry, did we reintroduce a lot of the type fixes?
There was a problem hiding this comment.
looks like only the two I found, so easy to fix with the future PR, probably not worth creating a separate PR for that
So the notebook pairs were broken in many ways and this PR aims to solve that
Makefilecommand fixed to works with multifolder setup (somehowjupytextdoes not respect global config frompyproject.toml)pre-commitadded - it check that all the pairs exist and are in sync automatically.gitignore- we need them for docsAnd additionally typing issue in
parallel,pyis fixed