Skip to content

Make to_numpy conversions consistent with NumPy dtype #719

@marcosdiezgarcia

Description

@marcosdiezgarcia

Moved discussion in #462 into this separate issue.

Other PythonCall issues related to NumPy dtype and Numpy arrays that may be relevant:

My system details (click to expand)

Julia

julia> versioninfo()
Julia Version 1.12.1
Commit ba1e628ee49 (2025-10-17 13:02 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 48 × AMD EPYC 7V13 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver3)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 48 virtual cores)

julia> Pkg.status()
Status `~/temp/Project.toml`
  [992eb4ea] CondaPkg v0.2.33
  [6099a3de] PythonCall v0.9.28

Python Environment

  • python 3.13.9
  • numpy 2.3.4

I noticed a difference in how the PythonCall to_numpy() function and NumPy treat property values of dtype as follows:

julia> using PythonCall

julia> Py(rand(10)).to_numpy().dtype
Python: dtype('float64')

julia> Py([[1,2,3], [4,5,6]]).to_numpy().dtype
Python: dtype('O')

I would expect the latter to be dtype('int64') to match Python:

>>> import numpy
>>> a = numpy.array([[1,2,3], [4,5,6]])
>>> a.dtype
dtype('int64')

While Julia does provide Matrix{Int64} and to_numpy() outputs dtype('int64') as I would expect

julia> [1 2 3; 4 5 6] |> typeof
Matrix{Int64} (alias for Array{Int64, 2})
julia> Py([1 2 3; 4 5 6]).to_numpy().dtype
Python: dtype('int64')

I am working on a project where I need to use PythonCall to deal with Vector{Vector{Int64}} instances and changing into Matrix{Int64} is not an option.

In Julia, using PythonCall to_numpy(), it is clear that dtype('int64') is only output for Vector{Int64} not Vector{Vector{Int64}} nor Vector{Vector{Vector{Int64}}} regardless of how many levels of nesting:

julia> Py([1, 2, 3]).to_numpy().dtype
Python: dtype('int64')

julia> Py([[1,2,3], [4,5,6]]).to_numpy().dtype
Python: dtype('O')

julia> Py([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]).to_numpy().dtype
Python: dtype('O')

Whereas in Python, dtype('int64') is output for all the above:

>>> numpy.array([1, 2, 3]).dtype
dtype('int64')

>>> numpy.array([[1,2,3], [4,5,6]]).dtype
dtype('int64')

>>> numpy.array([[[1,2,3], [4,5,6]], [[7,8,9], [10,11,12]]]).dtype
dtype('int64')

Thus the value of the property .dtype in NumPy is defined based on the innermost elements in the array, whereas in Julia the value of .dtype upon using to_numpy() is not based on the innermost elements (i.e. 1, 2, 3, etc) but on the whole structure containing them (i.e Vector{Int64} for the 2nd array, and Vector{Vector{Int64}} for the 3rd array).

I was expecting the same behaviour from Python's NumPy and the conversions from to_numpy() given by PythonCall, but it turns out the conversion does not agree with NumPy on dtype property values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions