-
Notifications
You must be signed in to change notification settings - Fork 4.1k
GH-34882: [Python] Binding for FixedShapeTensorType #34883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 7 commits
d69685a
a6292f8
1bdba1d
7c395b0
d27d48f
8e790b4
64e0cd0
48cbeb3
d9ca165
d3530af
e2ce8ba
ee5d25c
f5a5c0c
52f9e7e
f9dee9e
b171d00
c0ec94c
f2d9fe7
8b5dc93
570f086
3dbbe20
223968a
dd8fd31
1ebb829
b2d0453
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3090,6 +3090,69 @@ cdef class ExtensionArray(Array): | |
| return self.storage.to_numpy(**kwargs) | ||
|
|
||
|
|
||
| class FixedShapeTensorArray(ExtensionArray): | ||
| """ | ||
| Concrete class for fixed shape tensor extension arrays. | ||
|
|
||
| Examples | ||
| -------- | ||
| Define the extension type for tensor array | ||
|
|
||
| >>> import pyarrow as pa | ||
| >>> tensor_type = FixedShapeTensorType(pa.int32(), [2, 2]) | ||
|
|
||
| Create an extension array | ||
|
|
||
| >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]] | ||
| >>> storage = pa.array(arr, pa.list_(pa.int32(), 4)) | ||
| >>> pa.ExtensionArray.from_storage(tensor_type, storage) | ||
| <pyarrow.lib.FixedShapeTensorArray object at ...> | ||
| [ | ||
| [ | ||
| 1, | ||
| 2, | ||
| 3, | ||
| 4 | ||
| ], | ||
| [ | ||
| 10, | ||
| 20, | ||
| 30, | ||
| 40 | ||
| ], | ||
| [ | ||
| 100, | ||
| 200, | ||
| 300, | ||
| 400 | ||
| ] | ||
| ] | ||
| """ | ||
|
|
||
| def to_numpy_ndarray(self): | ||
| """ | ||
| Convert fixed shape tensor extension array to a numpy array (with dim+1). | ||
| """ | ||
|
AlenkaF marked this conversation as resolved.
|
||
| np_flat = np.asarray(self.storage.values) | ||
| numpy_tensor = np_flat.reshape((len(self),) + tuple(self.type.shape), | ||
| order='C') | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does reshape incur a copy? Perhaps we should throw if
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It doesn't guarantee to be zero-copy in all cases, but in this simple case of reshaping a 1D array to an nD array, this is always zero copy AFAIK
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens in cases where permutation doesn't give C style layout?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A permutation would happen after the reshape, I think? (so the initial reshape is always C contiguous) And that is left to the user for now in this method (which we should document, or add)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, but permutation would not roundtrip to numpy object and back. I'd prefer to throw NotImplemented if permutation is not trivial (C).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, actually I need to error if
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, the
Yes, and as mentioned before, I think we should certainly add examples how the user can do this (if we decide to not automatically do it in
I wouldn't raise a warning for that, since it's a fact of life that numpy arrays don't have dimension names, so that's just a consequence of calling this method. We can mention that in the docstring, though, to be explicit.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, thank you for confirming the thought I was currently struggling with!
Done: https://github.com/apache/arrow/pull/34957/files
Agree, will make it explicit in the docstrings.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After couple of days to think it over, I would agree with all the comments from Joris at the beginning of the thread.
With this I propose to leave the code as is and maybe open an issue later to discuss changes/features further. I have also added a PR for the documentation of the
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
While that's true for the physical layout it's not necessarily true for the logical layout. If permutation is non-trivial tenor will not be laid out in memory in C-contiguous way. I propose (#34883 (review)) we block converting tensors with non-trivial permutations and add correct logic to handle those later. (Another option is to go via
AlenkaF marked this conversation as resolved.
Outdated
|
||
|
|
||
| return numpy_tensor | ||
|
|
||
| def from_numpy_ndarray(obj): | ||
|
AlenkaF marked this conversation as resolved.
|
||
| """ | ||
| Convert numpy tensors (ndarrays) to a fixed shape tensor extension array. | ||
|
AlenkaF marked this conversation as resolved.
|
||
| """ | ||
| arrow_type = from_numpy_dtype(obj.dtype) | ||
| shape = obj.shape[1:] | ||
| size = obj.size / obj.shape[0] | ||
|
|
||
| return ExtensionArray.from_storage( | ||
| fixed_shape_tensor(arrow_type, shape), | ||
| FixedSizeListArray.from_arrays(obj.flatten(), size) | ||
| ) | ||
|
|
||
|
|
||
| cdef dict _array_classes = { | ||
| _Type_NA: NullArray, | ||
| _Type_BOOL: BooleanArray, | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.