Is this a duplicate?
Area
cuda.core
Is your feature request related to a problem? Please describe.
This issue is adapted from rapidsai/cudf#9587.
Python's sys module provides the sys.getsizeof function to determine the size of a Python object. This function is not recursive (so given a collection like a list it will not include the memory of each element in the list, which is a reasonable choice since that isn't always a well-defined query with a single answer, e.g. if the underlying objects have overlapping memory ranges), so it is only designed to work on a single object at a time. The behavior of getsizeof when applied to a user-defined class may be customized by overriding the __sizeof__ attribute.
Currently, there is no equivalent method for objects backed by GPU memory. CUDA memory is also more complex than host memory in that there are multiple types of memory that an object may be allocated from, such as managed or pinned memory. Various higher-level Python libraries that leverage GPU libraries under the hood would benefit from a standardized approach to requesting total GPU memory allocations.
Describe the solution you'd like
It would be nice to define a standard protocol like __cuda_sizeof__ that Python objects could implement to indicate how much GPU memory they use. Ideally, the protocol would return something like a dictionary or a dataclass that could indicate memory usage by type (managed, pinned, etc). To fully satisfy this need, we will also need to think about what how this protocol should behave for cases where one object is viewing a subset of the data owned by another object. For example, what would be the expected behavior for slices? Another case to consider would be noncontiguous memory, such as a strided view of an array. There are some cases where the caller may want to know the total memory of the underlying allocation, while at other times the caller may really want to know how much new memory would be allocated by an elementwise copy. We could support both of these using separate protocols, or by using a parametrized protocol. We can also look to existing __sizeof__ implementations on the CPU for prior art.
We would then provide a function cuda.core.getsizeof that would be the canonical implementation of how to use this protocol.
I think the recursive case remains out of scope.
Describe alternatives you've considered
No response
Additional context
No response
Is this a duplicate?
Area
cuda.core
Is your feature request related to a problem? Please describe.
This issue is adapted from rapidsai/cudf#9587.
Python's
sysmodule provides thesys.getsizeoffunction to determine the size of a Python object. This function is not recursive (so given a collection like alistit will not include the memory of each element in the list, which is a reasonable choice since that isn't always a well-defined query with a single answer, e.g. if the underlying objects have overlapping memory ranges), so it is only designed to work on a single object at a time. The behavior ofgetsizeofwhen applied to a user-defined class may be customized by overriding the__sizeof__attribute.Currently, there is no equivalent method for objects backed by GPU memory. CUDA memory is also more complex than host memory in that there are multiple types of memory that an object may be allocated from, such as managed or pinned memory. Various higher-level Python libraries that leverage GPU libraries under the hood would benefit from a standardized approach to requesting total GPU memory allocations.
Describe the solution you'd like
It would be nice to define a standard protocol like
__cuda_sizeof__that Python objects could implement to indicate how much GPU memory they use. Ideally, the protocol would return something like a dictionary or a dataclass that could indicate memory usage by type (managed, pinned, etc). To fully satisfy this need, we will also need to think about what how this protocol should behave for cases where one object is viewing a subset of the data owned by another object. For example, what would be the expected behavior for slices? Another case to consider would be noncontiguous memory, such as a strided view of an array. There are some cases where the caller may want to know the total memory of the underlying allocation, while at other times the caller may really want to know how much new memory would be allocated by an elementwise copy. We could support both of these using separate protocols, or by using a parametrized protocol. We can also look to existing__sizeof__implementations on the CPU for prior art.We would then provide a function
cuda.core.getsizeofthat would be the canonical implementation of how to use this protocol.I think the recursive case remains out of scope.
Describe alternatives you've considered
No response
Additional context
No response