-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Moved discussion in #462 into this separate issue.
Other PythonCall issues related to NumPy dtype and Numpy arrays that may be relevant:
- Use "native" types when possible #319
- dtype of NumPy array created from Julia array is
objecton Julia nightly #439 - Working with type instabilities, coming from PyJulia #441
- numpy functions don't treat Any[...] arrays like Python lists #486
My system details (click to expand)
Julia
julia> versioninfo()
Julia Version 1.12.1
Commit ba1e628ee49 (2025-10-17 13:02 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 48 × AMD EPYC 7V13 64-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver3)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 48 virtual cores)
julia> Pkg.status()
Status `~/temp/Project.toml`
[992eb4ea] CondaPkg v0.2.33
[6099a3de] PythonCall v0.9.28Python Environment
- python 3.13.9
- numpy 2.3.4
I noticed a difference in how the PythonCall to_numpy() function and NumPy treat property values of dtype as follows:
julia> using PythonCall
julia> Py(rand(10)).to_numpy().dtype
Python: dtype('float64')
julia> Py([[1,2,3], [4,5,6]]).to_numpy().dtype
Python: dtype('O')I would expect the latter to be dtype('int64') to match Python:
>>> import numpy
>>> a = numpy.array([[1,2,3], [4,5,6]])
>>> a.dtype
dtype('int64')While Julia does provide Matrix{Int64} and to_numpy() outputs dtype('int64') as I would expect
julia> [1 2 3; 4 5 6] |> typeof
Matrix{Int64} (alias for Array{Int64, 2})julia> Py([1 2 3; 4 5 6]).to_numpy().dtype
Python: dtype('int64')I am working on a project where I need to use PythonCall to deal with Vector{Vector{Int64}} instances and changing into Matrix{Int64} is not an option.
In Julia, using PythonCall to_numpy(), it is clear that dtype('int64') is only output for Vector{Int64} not Vector{Vector{Int64}} nor Vector{Vector{Vector{Int64}}} regardless of how many levels of nesting:
julia> Py([1, 2, 3]).to_numpy().dtype
Python: dtype('int64')
julia> Py([[1,2,3], [4,5,6]]).to_numpy().dtype
Python: dtype('O')
julia> Py([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]).to_numpy().dtype
Python: dtype('O')Whereas in Python, dtype('int64') is output for all the above:
>>> numpy.array([1, 2, 3]).dtype
dtype('int64')
>>> numpy.array([[1,2,3], [4,5,6]]).dtype
dtype('int64')
>>> numpy.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]).dtype
dtype('int64')Thus the value of the property .dtype in NumPy is defined based on the innermost elements in the array, whereas in Julia the value of .dtype upon using to_numpy() is not based on the innermost elements (i.e. 1, 2, 3, etc) but on the whole structure containing them (i.e Vector{Int64} for the 2nd array, and Vector{Vector{Int64}} for the 3rd array).
I was expecting the same behaviour from Python's NumPy and the conversions from to_numpy() given by PythonCall, but it turns out the conversion does not agree with NumPy on dtype property values.