[FEA]: Define new protocol(s) defining the size of an object in GPU memory

### Is this a duplicate?

- [x] I confirmed there appear to be no [duplicate issues](https://github.com/NVIDIA/cuda-python/issues) for this request and that I agree to the [Code of Conduct](CODE_OF_CONDUCT.md)

### Area

cuda.core

### Is your feature request related to a problem? Please describe.

This issue is adapted from https://github.com/rapidsai/cudf/issues/9587.

Python's `sys` module provides the `sys.getsizeof` function to determine the size of a Python object. This function is not recursive (so given a collection like a `list` it will not include the memory of each element in the list, which is a reasonable choice since that isn't always a well-defined query with a single answer, e.g. if the underlying objects have overlapping memory ranges), so it is only designed to work on a single object at a time. The behavior of `getsizeof` when applied to a user-defined class may be customized by overriding the `__sizeof__` attribute.

Currently, there is no equivalent method for objects backed by GPU memory. CUDA memory is also more complex than host memory in that there are multiple types of memory that an object may be allocated from, such as managed or pinned memory. Various higher-level Python libraries that leverage GPU libraries under the hood would benefit from a standardized approach to requesting total GPU memory allocations.



### Describe the solution you'd like

It would be nice to define a standard protocol like `__cuda_sizeof__` that Python objects could implement to indicate how much GPU memory they use. Ideally, the protocol would return something like a dictionary or a dataclass that could indicate memory usage by type (managed, pinned, etc). To fully satisfy this need, we will also need to think about what how this protocol should behave for cases where one object is viewing a subset of the data owned by another object. For example, what would be the expected behavior for slices? Another case to consider would be noncontiguous memory, such as a strided view of an array. There are some cases where the caller may want to know the total memory of the underlying allocation, while at other times the caller may really want to know how much new memory would be allocated by an elementwise copy. We could support both of these using separate protocols, or by using a parametrized protocol. We can also look to existing `__sizeof__` implementations on the CPU for prior art.

We would then provide a function `cuda.core.getsizeof` that would be the canonical implementation of how to use this protocol.

I think the recursive case remains out of scope.


### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA]: Define new protocol(s) defining the size of an object in GPU memory #646

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEA]: Define new protocol(s) defining the size of an object in GPU memory #646

Description

Is this a duplicate?

Area

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions