arrayobject.h
, after the header file
Python.h
that is obligatory for all extension
modules. The file arrayobject.h
comes with the NumPy
distribution; depending on where it was installed on your system you
might have to tell your compiler how to find it.
In addition to including arrayobject.h
, the extension
must call import_array()
in its initialization function,
after the call to Py_InitModule()
. This call makes sure
that the module which implements the array type has been imported, and
initializes a pointer array through which the NumPy functions are
called. If you forget this call, your extension module will crash on
the first call to a NumPy function!
PyArrayObject
,
which is an extension of the structure PyObject
. Pointers
to PyArrayObject
can thus safely be cast to
PyObject
pointers, whereas the inverse is safe only if
the object is known to be an array. The type structure corresponding
to array objects is PyArray_Type
.
The structure PyArrayObject
has four elements that are
needed in order to access the array's data from C code:
int nd |
The number of dimensions |
int *dimensions |
A pointer to an array of nd integers, describing
the number of elements along each dimension. |
int *strides |
A pointer to an array of nd integers, describing
the address offset between two successive data elements along each
dimension. Note that strides can also be negative! |
char *data |
A pointer to the first data element of the array. |
The address of a data element can be calculated from its indices and
the data
and strides
pointers. For example,
element [i, j]
of a two-dimensional array has the address
data + i*array->strides[0] + j*array->strides[1]
. Note
that the stride offsets are in bytes, not in storage units of the
array elements. Therefore address calculations must be made in bytes
as well, starting from the data
pointer, which is always
a char
pointer. To access the element, the result of the
address calculation must be cast to a pointer of the required
type. The advantage of this arrangement is that purely structural
array operations (indexing, extraction of subarrays, etc.) do not have
to know the type of the array elements.
Element data types
The type of the array elements is indicated by a type number,
whose possible values are defined as constants in arrayobject.h
:
PyArray_CHAR | char |
PyArray_UBYTE | unsigned char |
PyArray_SBYTE | signed char |
PyArray_SHORT | short |
PyArray_INT | int |
PyArray_LONG | long |
PyArray_FLOAT | float |
PyArray_DOUBLE | double |
PyArray_CFLOAT | float[2] |
PyArray_CDOUBLE | double[2] |
PyArray_OBJECT | PyObject * |
The type number is stored in array->descr->type_num
.
Note that the names of the element type constants refer to the C data
types, not the Python data types. A Python int
is
equivalent to a C long
, and a Python float
corresponds to a C double
. Many of the element types
listed above do not have corresponding Python scalar types
(e.g. PyArray_INT
).
array->strides[i]
is equal
to the size of a single array element times the product of
array->dimensions[j]
for j
up to
i-1
. Arrays that are created from scratch are always
contiguous; non-contiguous arrays are the result of indexing and other
structural array operations.
The main advantage of contiguous arrays is easier handling in C; the
pointer array->data
is cast to the required type and then
used like a C array, without any reference to the stride values. This
is particularly important when interfacing to existing libraries in C
or Fortran, which typically require this standard data layout. A
function that requires input arrays to be contiguous must call the
conversion function PyArray_ContiguousFromObject()
,
described in the section "Accepting input
data from any sequence type"
PyArray_DOUBLE
.
static PyObject * trace(PyObject *self, PyObject *args) { PyArrayObject *array; double sum; int i, n; if (!PyArg_ParseTuple(args, "O!", &PyArray;_Type, &array;)) return NULL; if (array->nd != 2 || array->descr->type_num != PyArray_DOUBLE) { PyErr_SetString(PyExc_ValueError, "array must be two-dimensional and of type float"); return NULL; } n = array->dimensions[0]; if (n > array->dimensions[1]) n = array->dimensions[1]; sum = 0.; for (i = 0; i < n; i++) sum += *(double *)(array->data + i*array->strides[0] + i*array->strides[1]); return PyFloat_FromDouble(sum); }
double
. In many circumstances this is sufficient, but
often, especially in the case of library routines for general use, it
would be preferable to accept input data from any sequence (lists,
tuples, etc.) and to convert the element type to double
automatically where possible.
NumPy provides a function that accepts arbitrary sequence objects as
input and returns an equivalent array of specified type (this is in
fact exactly what the array constructor Numeric.array()
does in
Python code):
PyObject * PyArray_ContiguousFromObject(PyObject *object, int type_num, int min_dimensions, int max_dimensions);The first argument,
object
, is the sequence object from
which the data is taken. The second argument, type_num
,
specifies the array element type (see the table in the section "Element data types". If you want the function to
the select the ``smallest'' type that is sufficient to store the data,
you can pass the special value PyArray_NOTYPE
. The
remaining two arguments let you specify the number of dimensions of
the resulting array, which is guaranteed to be no smaller than
min_dimensions
and no larger than
max_dimensions
, except for the case max_dimensions
== 0
, which means that no upper limit is imposed. If the input
data is not compatible with the type or dimension restrictions, an
exception is raised.
Since the array returned by PyArray_ContiguousFromObject()
is
guaranteed to be contiguous, this function also provides a method of
converting a non-contiguous array to a contiguous one. If the input
object is already a contiguous array of the specified type, it is
passed on directly; there is thus no performance or memory penalty for
calling the conversion function when it is not required.
Using this function, the example from the last section becomes
static PyObject * trace(PyObject *self, PyObject *args) { PyObject *input; PyArrayObject *array; double sum; int i, n; if (!PyArg_ParseTuple(args, "O", &input;)) return NULL; array = (PyArrayObject *) PyArray_ContiguousFromObject(input, PyArray_DOUBLE, 2, 2); if (array == NULL) return NULL; n = array->dimensions[0]; if (n > array->dimensions[1]) n = array->dimensions[1]; sum = 0.; for (i = 0; i < n; i++) sum += *(double *)(array->data + i*array->strides[0] + i*array->strides[1]); Py_DECREF(array); return PyFloat_FromDouble(sum); }Note that no explicit error checking is necessary in this version, and that the array reference that is returned by
PyArray_ContiguousFromObject()
must be destroyed by
calling Py_DECREF()
.
PyObject * PyArray_FromDims(int n_dimensions, int dimensions[n_dimensions], int type_num);The first argument specifies the number of dimensions, the second one the length of each dimension, and the third one the element data type (see the table in the section "Element data types". The array that is returned is contiguous, but the contents of its data space are undefined. There is a second function which permits the creation of an array object that uses a given memory block for its data space:
PyObject * PyArray_FromDimsAndData(int n_dimensions, int dimensions[n_dimensions] int item_type char *data);The first three arguments are the same as for
PyArray_FromDims()
. The fourth argument is a pointer to
the memory block that is to be used as the array's data space. It is
the caller's responsibility to ensure that this memory block is not
freed before the array object is destroyed. With few exceptions (such
as the creation of a temporary array object to which no reference is
passed to other functions), this means that the memory block may
never be freed, because the lifetime of Python objects are
difficult to predict. Nevertheless, this function can be useful in
special cases, for example for providing Python access to arrays in
Fortran common blocks.
PyObject * PyArray_Return(PyArrayObject *array);which returns the array unchanged if it has one or more dimensions, or the appropriate Python scalar object in case of a zero-dimensional array.
DGEMV
. It takes three
arguments: a scalar prefactor, the matrix (a two-dimensional array),
and the vector (a one-dimensional array). The return value is
a one-dimensional array. The input values are checked for
consistency.
In addition to providing an illustration of the functions explained
above, this example also demonstrates how a Fortran routine can be
integrated into Python. Unfortunately, mixing Fortran and C code
involves machine-specific peculiarities. In this example, two
assumptions have been made:
DGEMV
must be called from C as
dgemv_
. Many Fortran compilers apply this rule, but
the C name could also be dgemv
or DGEMV
(or
in principle anything else; there is no fixed standard).
integer
s are equivalent to C long
s, and
Fortran double precision
numbers are equivalent to
C double
s. This works for all systems that I have
personally used, but again there is no standard.
blas
and f2c
.
So here is the code:
static PyObject * matrix_vector(PyObject *self, PyObject *args) { PyObject *input1, *input2; PyArrayObject *matrix, *vector, *result; int dimensions[1]; double factor[1]; double real_zero[1] = {0.}; long int_one[1] = {1}; long dim0[1], dim1[1]; extern dgemv_(char *trans, long *m, long *n, double *alpha, double *a, long *lda, double *x, long *incx, double *beta, double *Y, long *incy); if (!PyArg_ParseTuple(args, "dOO", factor, &input1;, &input2;)) return NULL; matrix = (PyArrayObject *) PyArray_ContiguousFromObject(input1, PyArray_DOUBLE, 2, 2); if (matrix == NULL) return NULL; vector = (PyArrayObject *) PyArray_ContiguousFromObject(input2, PyArray_DOUBLE, 1, 1); if (vector == NULL) return NULL; if (matrix->dimensions[1] != vector->dimensions[0]) { PyErr_SetString(PyExc_ValueError, "array dimensions are not compatible"); return NULL; } dimensions[0] = matrix->dimensions[0]; result = (PyArrayObject *)PyArray_FromDims(1, dimensions, PyArray_DOUBLE); if (result == NULL) return NULL; dim0[0] = (long)matrix->dimensions[0]; dim1[0] = (long)matrix->dimensions[1]; dgemv_("T", dim1, dim0, factor, (double *)matrix->data, dim1, (double *)vector->data, int_one, real_zero, (double *)result->data, int_one); return PyArray_Return(result); }Note that
PyArray_Return()
is not really necessary in this
case, since we know that the array being returned is one-dimensional.
Nevertheless, it is a good habit to always use this function; its
performance cost is practically zero.