Conversation
The old loop was O(mn). This is O(m+n).
mgedmin
left a comment
There was a problem hiding this comment.
It's hard for me to make decisions about this function since I don't use it myself. I except that for small address sets naive iteration can be faster (and should use less memory), but I don't know what the expected average size of the address set is.
Also the KeyError thing -- perhaps it makes sense, but old behavior was to suppress errors, and changing it this way would probably require a major version bump according to SemVer. Again, I don't know what makes more sense to users, as I'm not one.
| id_to_obj = dict((id(o), o) for o in gc.get_objects()) | ||
|
|
||
| for i in address_set: | ||
| o = id_to_obj[i] |
|
As there is no hard copy, the memory used by the dict should be close to 16 * number of GC objects. GC itself probably consumes as much memory as this. I did not benchmark this in typical use cases (I only ran objgraph on small cases), though I can imagine it is difficult to find cases where O(m+n) is slower than O(mn). For a stress test, one can probably collect new IDs with debug level of |
mgedmin
left a comment
There was a problem hiding this comment.
Eh, LGTM.
Would you care to add a small changelog note in CHANGES.rst?
| for o in gc.get_objects(): | ||
| if id(o) in address_set: | ||
| res.append(o) | ||
| id_to_obj = dict((id(o), o) for o in gc.get_objects()) |
There was a problem hiding this comment.
I suggest
id_to_obj = {id(o): o for o in gc.get_objects()}
| res.append(o) | ||
| id_to_obj = dict((id(o), o) for o in gc.get_objects()) | ||
|
|
||
| for i in address_set: |
There was a problem hiding this comment.
I suggest change these lines to
return [
id_to_obj[i]
for i in address_set
if i in id_to_obj
]There was a problem hiding this comment.
👍 except I'd put it all on a single line
The old loop was O(mn). This is O(m+n).