Speed up at_addrs with a dict. by rainwoodman · Pull Request #38 · mgedmin/objgraph

rainwoodman · 2018-07-05T18:40:54Z

The old loop was O(mn). This is O(m+n).

mgedmin

It's hard for me to make decisions about this function since I don't use it myself. I except that for small address sets naive iteration can be faster (and should use less memory), but I don't know what the expected average size of the address set is.

Also the KeyError thing -- perhaps it makes sense, but old behavior was to suppress errors, and changing it this way would probably require a major version bump according to SemVer. Again, I don't know what makes more sense to users, as I'm not one.

mgedmin · 2018-07-09T19:25:17Z

objgraph.py

+    id_to_obj = dict((id(o), o) for o in gc.get_objects())
+
+    for i in address_set:
+        o = id_to_obj[i]


This can raise KeyError.

rainwoodman · 2018-07-09T20:02:16Z

As there is no hard copy, the memory used by the dict should be close to 16 * number of GC objects. GC itself probably consumes as much memory as this.

I did not benchmark this in typical use cases (I only ran objgraph on small cases), though I can imagine it is difficult to find cases where O(m+n) is slower than O(mn).

For a stress test, one can probably collect new IDs with debug level of SAVEALL, and run this on the full id list.

mgedmin

Eh, LGTM.

Would you care to add a small changelog note in CHANGES.rst?

klahnakoski · 2018-08-29T12:39:56Z

objgraph.py

-    for o in gc.get_objects():
-        if id(o) in address_set:
-            res.append(o)
+    id_to_obj = dict((id(o), o) for o in gc.get_objects())


I suggest

id_to_obj = {id(o): o for o in gc.get_objects()}

klahnakoski · 2018-08-29T12:43:03Z

objgraph.py

-            res.append(o)
+    id_to_obj = dict((id(o), o) for o in gc.get_objects())
+
+    for i in address_set:


I suggest change these lines to

return [ id_to_obj[i] for i in address_set if i in id_to_obj ]

👍 except I'd put it all on a single line

rainwoodman added 2 commits July 5, 2018 11:40

Speed up at_addrs with a dict.

09cc5c0

The old loop was O(mn). This is O(m+n).

flake

21c7cb7

mgedmin reviewed Jul 9, 2018

View reviewed changes

objgraph.py

id_to_obj = dict((id(o), o) for o in gc.get_objects())

for i in address_set:

o = id_to_obj[i]

Copy link

Owner

mgedmin Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can raise KeyError.

Ignore addresses not backed by objects.

7f5e2e3

lint

775a796

mgedmin reviewed Jul 17, 2018

View reviewed changes

klahnakoski reviewed Aug 29, 2018

View reviewed changes

mgedmin added the waiting-for-updated-pr label Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up at_addrs with a dict.#38

Speed up at_addrs with a dict.#38
rainwoodman wants to merge 4 commits intomgedmin:masterfrom
rainwoodman:patch-1

rainwoodman commented Jul 5, 2018

Uh oh!

mgedmin left a comment

Uh oh!

mgedmin Jul 9, 2018

Uh oh!

rainwoodman commented Jul 9, 2018

Uh oh!

mgedmin left a comment

Uh oh!

klahnakoski Aug 29, 2018

Uh oh!

mgedmin Aug 29, 2018

Uh oh!

klahnakoski Aug 29, 2018

Uh oh!

mgedmin Aug 29, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rainwoodman commented Jul 5, 2018

Uh oh!

mgedmin left a comment

Choose a reason for hiding this comment

Uh oh!

mgedmin Jul 9, 2018

Choose a reason for hiding this comment

Uh oh!

rainwoodman commented Jul 9, 2018

Uh oh!

mgedmin left a comment

Choose a reason for hiding this comment

Uh oh!

klahnakoski Aug 29, 2018

Choose a reason for hiding this comment

Uh oh!

mgedmin Aug 29, 2018

Choose a reason for hiding this comment

Uh oh!

klahnakoski Aug 29, 2018

Choose a reason for hiding this comment

Uh oh!

mgedmin Aug 29, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants