Skip to content

[FEA]: Define new protocol(s) defining the size of an object in GPU memory #646

@vyasr

Description

@vyasr

Is this a duplicate?

Area

cuda.core

Is your feature request related to a problem? Please describe.

This issue is adapted from rapidsai/cudf#9587.

Python's sys module provides the sys.getsizeof function to determine the size of a Python object. This function is not recursive (so given a collection like a list it will not include the memory of each element in the list, which is a reasonable choice since that isn't always a well-defined query with a single answer, e.g. if the underlying objects have overlapping memory ranges), so it is only designed to work on a single object at a time. The behavior of getsizeof when applied to a user-defined class may be customized by overriding the __sizeof__ attribute.

Currently, there is no equivalent method for objects backed by GPU memory. CUDA memory is also more complex than host memory in that there are multiple types of memory that an object may be allocated from, such as managed or pinned memory. Various higher-level Python libraries that leverage GPU libraries under the hood would benefit from a standardized approach to requesting total GPU memory allocations.

Describe the solution you'd like

It would be nice to define a standard protocol like __cuda_sizeof__ that Python objects could implement to indicate how much GPU memory they use. Ideally, the protocol would return something like a dictionary or a dataclass that could indicate memory usage by type (managed, pinned, etc). To fully satisfy this need, we will also need to think about what how this protocol should behave for cases where one object is viewing a subset of the data owned by another object. For example, what would be the expected behavior for slices? Another case to consider would be noncontiguous memory, such as a strided view of an array. There are some cases where the caller may want to know the total memory of the underlying allocation, while at other times the caller may really want to know how much new memory would be allocated by an elementwise copy. We could support both of these using separate protocols, or by using a parametrized protocol. We can also look to existing __sizeof__ implementations on the CPU for prior art.

We would then provide a function cuda.core.getsizeof that would be the canonical implementation of how to use this protocol.

I think the recursive case remains out of scope.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

cuda.coreEverything related to the cuda.core modulefeatureNew feature or request
No fields configured for Enhancement.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions