Functions for Creating NumPy Arrays
This section presents standard methods for creating NumPy arrays of varying shapes and contents. NumPy provides a laundry list of functions for creating arrays:
>>> import numpy as np
# creating an array from a Python sequence
>>> np.array([i**2 for i in range(5)])
array([ 0, 1, 4, 9, 16])
# creating an array filled with ones
>>> np.ones((2, 4))
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
# creating an array of evenly-spaced points
>>> np.linspace(0, 10, 5)
array([ 0. , 2.5, 5. , 7.5, 10. ])
# creating an array by sampling 10 numbers
# randomly from a mean-1, std-dev-5 normal
# distribution
>>> np.random.normal(1, 5, 10)
array([ 2.549537 , 2.75144951, 0.60031823, 3.75185732, 4.65543858,
0.55779525, 1.15574987, -1.98461337, 5.39771083, -7.81395192])
# creating an array of a specified datatype
>>> np.array([1.5, 3.20, 5.78], dtype=int)
array([1, 3, 5])
Creating Arrays from Python Sequences
You can create an array from a Python list
or tuple
by using NumPy’s array
function. NumPy will interpret the structure of the data it receives to determine the dimensionality and shape of the array. For example, a single list of numbers will be used to create a 1-dimensional array:
# a list of numbers will become a 1D-array
>>> np.array([1., 2., 3.]) # shape: (3,)
array([ 1., 2., 3.])
Nested lists/tuples will be used to construct multidimensional arrays. For example, a “list of equal-length lists of numbers” will lead to a 2-dimensional array; each of the inner-lists comprises a row of the array. Thus a list of two, length-three lists will produce a (2,3)-shaped array:
# a list of lists of numbers will produce a 2D-array
>>> np.array([[1., 2., 3.], [4., 5., 6.]]) # shape: (2, 3)
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
A “list of equal-length lists, of equal-length lists of numbers” creates a 3D-array, and so on. Recall that using repeated concatenation, [0]*3
will produce [0, 0, 0]
. Using this, let’s create two lists, each containing three lists, each containing four zeros; feeding this to np.array
thus produces a 2x3x4 array of zeros:
# A list of lists of lists of zeros creates a 3D-array
>>> np.array([[[0]*4]*3]*2)
array([[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]])
You will seldom use lists to form high-dimensional arrays like this. Instead, there are other array-creation functions that are more amendable to generating high-dimensional data, which we will introduce next. For example, we will see that the np.zeros
function is a much more civilized way to create a high-dimensional array of zeros.
Warning!
You actually can create an array from lists of unequal lengths. The resulting array is not an ND-array as it has no well-defined dimensionality. Instead, something called an object-array is produced, which does not benefit from the majority of NumPy’s features. This is a relatively obscure feature of the NumPy library, and should be avoided unless you really know what you’re doing!
Creating Constant Arrays: zeros
and ones
NumPy provides the functions zeros
and ones
, which will fill an array of user-specified shape with 0s and 1s, respectively:
# create a 3x4 array of zeros
>>> np.zeros((3, 4))
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
# create a shape-(4,) array of ones
>>> np.ones((4,))
array([ 1., 1., 1., 1.])
NumPy provides additional functions for creating constant-valued arrays. Please refer to the official documentation for a complete listing.
Creating Sequential Arrays: arange
and linspace
The arange function allows you to initialize a sequence of integers based on a starting point (inclusive), stopping point (exclusive), and step size. This is very similar to the range
function; however arange
immediately creates this sequence as an array, whereas range
produces a generator.
>>> np.arange(0, 10, 1) # start (included): 0, stop (excluded): 10, step:1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# supplying one value to `arange` amounts to specifying the stop value
# start=0 and step=1 are then used as defaults
>>> np.arange(10) # equivalent to: start: 0, stop: 10, step:1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.arange(-5, 6, 2) # start (included): -5, stop (excluded): 6, step:2
array([-5, -3, -1, 1, 3, 5])
The linspace function allows you to generate \(N\) evenly-spaced points within a user-specified interval \([i, j]\) (\(i\) and \(j\) are included in the interval). This is often used to generate a domain of values on which to evaluate a mathematical function (e.g. if you want to the sine function from \(-\pi\) to \(\pi\) on a finely-divided grid).
# generate five evenly-spaced points on the interval [-1, 1]
>>> np.linspace(-1, 1, 5)
array([-1. , -0.5, 0. , 0.5, 1. ])
# generate two evenly-spaced points on the interval [3, 4]
>>> np.linspace(3, 4, 2)
array([ 3., 4.])
# generate 100 evenly-spaced points on the interval [-pi, pi]
>>> np.linspace(-np.pi, np.pi, 100)
array([-3.14159265, ..., 3.14159265])
Numpy has other functions for creating sequential arrays, such as producing an array spaced evenly on a log-scaled interval. See the official documentation for a complete listing.
Creating Arrays Using Random Sampling
Several functions can be accessed from np.random
, which populate arrays of a user-specified shape by drawing randomly from a specified statistical distribution:
# construct a new random number generator
>>> rng = np.random.default_rng()
# create a shape-(3,3) array by drawing its entries randomly
# from the uniform distribution [0, 1)
>>> rng.random((3, 3))
array([[ 0.09542611, 0.13183498, 0.39836068],
[ 0.7358235 , 0.77640024, 0.74913595],
[ 0.37702688, 0.86617624, 0.39846429]])
# create a shape-(5,) array by drawing its entries randomly
# from a mean-0, variance-1 normal (a.k.a. Gaussian) distribution
>>> rng.normal(size=(5,))
array([-1.11262121, -0.35392007, 0.4245215 , -0.81995588, 0.65412323])
There are many more functions to read about that allow you to draw from a wide variety of statistical distributions. This only scratches the surface of random number generation in NumPy.
Creating an Array with a Specified Data Type
Each of the preceding functions used to create an array can be passed a so-called ‘keyword’ argument, dtype
, which instructs NumPy to use a specified data type when producing the contents of the array.
# populate an array using 32-bit floating point numbers
>>> np.array([1, 2, 3], dtype="float32")
array([ 1., 2., 3.], dtype=float32)
# default data type produced by `arange` is 32-bit integers
>>> np.arange(0, 4).dtype
dtype('int32')
# the data type produced by `arange` can be specified otherwise
>>> np.arange(0, 4, dtype="float16")
array([ 0., 1., 2., 3.], dtype=float16)
# generate shape-(4,4) array of 64-bit complex-valued 0s
>>> np.zeros((4, 4), dtype="complex64")
array([[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]], dtype=complex64)
Refer to the official NumPy documentation for the complete list of available array datatypes.
Joining Arrays Together
Similar to Python lists and tuples, NumPy arrays can be concatenated together. However, because NumPy’s arrays can be multi-dimensional, we can choose the dimension along which arrays are joined.
# demonstrating methods for joining arrays
>>> x = np.array([1, 2, 3])
>>> y = np.array([-1, -2, -3])
# stack `x` and `y` "vertically"
>>> np.vstack([x, y])
array([[ 1, 2, 3],
[-1, -2, -3]])
# stack `x` and `y` "horizontally"
>>> np.hstack([x, y])
array([ 1, 2, 3, -1, -2, -3])
A complete listing of functions for joining arrays can be found in the official NumPy documentation. There are also corresponding functions for splitting an array into independent arrays.