-
Notifications
You must be signed in to change notification settings - Fork 12
Swap to using 'get_node_ids' that may be sharded or fetch a split, etc #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/unit_test_py |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 17:56:25UTC : 🔄 @ 19:10:59UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 17:56:36UTC : 🔄 |
GiGL Automation@ 17:56:55UTC : 🔄 @ 19:17:55UTC : ✅ Workflow completed successfully. |
|
/unit_test_py |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 23:31:16UTC : 🔄 @ 24:39:48UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:31:19UTC : 🔄 |
GiGL Automation@ 23:31:21UTC : 🔄 @ 24:55:47UTC : ✅ Workflow completed successfully. |
|
/unit_test_py |
|
/integration_test |
GiGL Automation@ 23:32:48UTC : 🔄 @ 24:51:43UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:32:52UTC : 🔄 |
|
/unit_test_py |
|
/integration_test |
|
/e2e_test |
GiGL Automation@ 18:21:41UTC : 🔄 @ 19:40:00UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 18:21:49UTC : 🔄 @ 19:43:14UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 18:21:52UTC : 🔄 |
mkolodner-sc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kyle! Left a few small comments, generally LGTM.
In the future, it might be easier to review this if the testing utility changes were moved to a separate PR from the get_node_ids change here.
| # ============================================================================= | ||
|
|
||
| USER: Final[NodeType] = NodeType("user") | ||
| STORY: Final[NodeType] = NodeType("story") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there was a comment prior that we should prefer item type to story to be more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
man I can update but is there really no place for fun and whimsy in the world...
| def create_homogeneous_dataset( | ||
| edge_index: torch.Tensor, | ||
| node_features: Optional[torch.Tensor] = None, | ||
| node_feature_dim: int = DEFAULT_HOMOGENEOUS_NODE_FEATURE_DIM, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should node feature dim be inferred from node features?
I see that right now, if it's None, we are creating an empty tensor. I feel like if it's None, this should mean my dataset has no NodeFeatures, and if we want to indicate there's no node features on the machine, an empty tensor should be provided as input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you think it's worth adding an arg here for edge features while we are making this, as well as node_labels to be similar to the below function?
| node_features: Mapping of NodeType -> feature tensor [num_nodes, feature_dim]. | ||
| If None, creates zero features with the specified dimension. | ||
| node_labels: Mapping of NodeType -> label tensor [num_nodes, 1]. | ||
| If None, creates labels equal to node indices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reusability in other tests, I feel like the None behavior again should be to default to no labels/features, not to auto-populate them.
| return dataset | ||
|
|
||
|
|
||
| def create_heterogeneous_dataset_with_labels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth specifying in function header that this is ABLP labels, not node classification
Scope of work done
Per discussion in #438, migrating to some
get_node_idswhich is more flexible and can fetch a split and optionally shard, etc.Additionally, breaking out the dataset building utils to their own (tested) file,
tests/test_assets/distributed/test_dataset.pyand then creatingremote_dist_dataset_test.pyas unit tests for RDI.Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO