Iterating Over Arrays & Array-Traversal Order
In this section, you will learn:
About NumPy’s functions for iterating over an array
That there is more than one valid way for NumPy to perform this operation, which amounts to how NumPy traverses a multidimensional array.
The row-major array traversal methodology, which is utilized by NumPy by default.
NumPy provides valuable tools for iterating over any array, such that each element can be visited in the array, regardless of the array’s shape. For example, recall that Python’s built-in enumerate
function permits us to produce each item in an iterable, along with its index of iteration:
# enumerating the items in an iterable
>>> [i for i in enumerate("abcdef")]
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
Similarly, NumPy provides the ndenumerate function, which enumerates each element in an N-dimensional array, specifying the N-dimensional index for each element.
>>> import numpy as np
# Demonstrating `np.ndenumerate`.
# A shape-(2, 2, 3) array
>>> x = np.array([[[ 25, 6, -35],
... [ 9, -40, -29]],
...
... [[ -8, 2, -37],
... [ 17, 10, -45]]])
>>> [i for i in np.ndenumerate(x)]
[((0, 0, 0), 25),
((0, 0, 1), 6),
((0, 0, 2), -35),
((0, 1, 0), 9),
((0, 1, 1), -40),
((0, 1, 2), -29),
((1, 0, 0), -8),
((1, 0, 1), 2),
((1, 0, 2), -37),
((1, 1, 0), 17),
((1, 1, 1), 10),
((1, 1, 2), -45)]
See that each triplet of integers specifies the index for the corresponding array element:
>>> x[0, 0, 0]
25
>>> x[0, 0, 1]
6
>>> x[0, 0, 2]
-35
>>> x[0, 1, 0]
9
See the official NumPy documentation for a complete listing of functions that facilitate iterating over arrays. The official documentation also provides a detailed treatment of array iteration, which is far more detailed than is warranted here. Next, we must discuss the default ordering that NumPy uses when traversing a N-dimensional array.
How to Traverse an Array: Row-major (C) vs Column-major (F) Traversal Ordering
Note the order in which np.ndenumerate
iterated over x
. It first traversed the columns within row-0 of sheet-0 of x
, and then it traversed the columns within the row-1 of sheet-0, and so on. What is special about this traversal order? Why, for instance, didn’t it traverse the rows within a given column instead? We can also see that there is not a unique ordering for a reshape
function to adhere to. For example, the following reshape operation could sensibly return either of the
following results:
array([0, 1, 2, 3, 4, 5]).reshape(2, 3) --> array([[0, 1, 2], or array([[0, 2, 4],
[3, 4, 5]]) [1, 3, 5]])
Both arrays are of the appropriate shape and preserve the ordering of the original sequence of numbers, depending on how you traverse them. The left array preserves the ordering of the original data if you traverse the columns within a row, and then proceed to the next row. This is known as row-major ordering. The array on the right preserves the ordering if you traverse the rows within a given column, and then transition to the next column. This is thus referred to as column-major ordering. One ordering is not inherently better than the other. That being said, NumPy always defaults to row-major ordering whenever one of its functions involves array traversal.
Remember This:
NumPy utilizes row-major ordering, as a default, for any operation that requires an array to be traversed.
These two orderings are simple enough to follow for a 2D-array, but how do they manifest in arrays with higher dimensions, where we have to worry not only about rows and columns, but potentially “stacks of sheets with rows and columns” (which would be a 4D array) and so on? The generic rules are as follows:
Row-major ordering (C ordering) {NumPy’s default}: traverse an array by advancing the index of the last axis, first, until the end of that axis is reached, and then advance the index of the second-to last axis, and so on.
Column-major ordering (F ordering): traverse an array by advancing the index of the first axis, first, until the end of that axis is reached, and then advance the index of the second axis, and so on.
Note:
“Row-major” ordering is also referred to as “C-ordering” because this is the traversal method utilized in the C language. “Column-major” ordering, on the other hand, is also referred to as “F-ordering”, because it is used by the Fortran language. NumPy functions, like reshape
allow you to specify either order="C"
(which is the default) or order="F"
to control the order in which an array is traversed; these options thus correspond to row-major and column-major ordering.
To make this more concrete, let’s consider how NumPy reshapes a shape-(24,) array into a shape-(2,3,4) array:
# reshape a shape-(24,) array into a shape-(2,3,4) array
>>> np.arange(2*3*4).reshape(2,3,4)
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
Following NumPy’s default row-major ordering, we can perform this reshaping by following these steps:
Create an empty array of the desired shape: (2, 3, 4).
Start by inserting the 0th element from the input array into the (0, 0, 0) element of the output array.
Advance the index by increasing the index of the last axis, first, and inserting the following element from the input array.
If you reach the end of an axis (axis-2, for instance, only has 4 slots), reset the index for that axis to 0, and advance the index of the preceding axis. Go back to 3.
This traversal process is easier to understand when laid out explicitly:
Reshaping a shape-(24,) array to a shape-(2,3,4) array, using NumPy’s default “row-major” ordering
Input Array Output Array
----------- ---------------
entry: 0 -> entry: (0, 0, 0)
entry: 1 -> entry: (0, 0, 1)
entry: 2 -> entry: (0, 0, 2)
entry: 3 -> entry: (0, 0, 3) *row-0 of sheet 0, filled. go to next row*
entry: 4 -> entry: (0, 1, 0)
entry: 5 -> entry: (0, 1, 1)
entry: 6 -> entry: (0, 1, 2)
entry: 7 -> entry: (0, 1, 3) *row-1 of sheet 0, filled. go to next row*
entry: 8 -> entry: (0, 2, 0)
entry: 9 -> entry: (0, 2, 1)
entry:10 -> entry: (0, 2, 2)
entry:11 -> entry: (0, 2, 3) *row-2 of sheet 0, filled. go to next sheet!*
entry:12 -> entry: (1, 0, 0)
entry:13 -> entry: (1, 0, 1)
entry:14 -> entry: (1, 0, 2)
entry:15 -> entry: (1, 0, 3) *row-0 of sheet 1, filled. go to next row*
entry:16 -> entry: (1, 1, 0)
entry:17 -> entry: (1, 1, 1)
entry:18 -> entry: (1, 1, 2)
entry:19 -> entry: (1, 1, 3) *row-1 of sheet 1, filled. go to next row*
entry:20 -> entry: (1, 2, 0)
entry:21 -> entry: (1, 2, 1)
entry:22 -> entry: (1, 2, 2)
entry:23 -> entry: (1, 2, 3) *row-2 of sheet 1, filled. Done!*
The same process can be extended to reshape one multidimensional array into another multidimensional array of a different shape. The input and output arrays are simply traversed, respectively, according to “row-major” rules. Suppose we want to reshape a shape-(2,3,4) array into a shape-(6,4) array. This process would be carried out as follows:
Reshaping a shape-(2,3,4) into a shape(6,4) array using NumPy’s default “row-major” ordering
Input Array Output Array
---------------- ---------------
entry: (0, 0, 0) -> entry: (0, 0)
entry: (0, 0, 1) -> entry: (0, 1)
entry: (0, 0, 2) -> entry: (0, 2)
entry: (0, 0, 3) -> entry: (0, 3)
entry: (0, 1, 0) -> entry: (1, 0)
...
...
entry: (1, 2, 3) -> entry: (5, 3)
Although this bookkeeping may seem a bit tedious at first glance, you will likely find that you are able to build up enough intuition for row-major ordering, to the point where you never need to write out these tables in full! The ability to reshape an array to adjust the way you can access an array’s data is commonly used in data science applications. Furthermore, understanding how NumPy handles array traversal is critical to understanding more advanced concepts like array-broadcasting and advanced indexing.
Reshape is its own inverse:
According to this discussion, reshape
can effectively “undo” itself: np.arange(10).reshape(5,2).reshape(10)
will return array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
. Take some time to consider why this will always be true, no matter how many intermediary reshapes are performed.