StatArray Class

Extends the numpy ndarray class to add extra attributes such as names, and units, and allows us to attach statistical descriptors of the array. The direct extension to numpy maintains speed and functionality of numpy arrays.

import numpy as np
import matplotlib.pyplot as plt
import h5py
from geobipy import DataArray, StatArray, Histogram, Distribution, RectilinearMesh1D


# plt.style.use('seaborn-pastel')

Instantiating a new StatArray class

# Integer
test = StatArray(1, name='1')
assert isinstance(test, StatArray) and test.size ==  1 and test.item() == 0.0, TypeError("da 0")
print(test.summary)
test = StatArray(10, name='10')
assert isinstance(test, StatArray) and test.size ==  10 and np.all(test == 0.0), TypeError("da 1")
print(test.summary)
# tuple/Shape
test = StatArray((2, 10), name='(2, 10)')
assert isinstance(test, StatArray) and np.all(test.shape ==  (2, 10)) and np.all(test == 0.0), TypeError("da 2")
print(test.summary)

test = StatArray([2, 10], name='(2, 10)')
assert isinstance(test, StatArray) and np.all(test ==  [2, 10]), TypeError("da 2")
print(test.summary)

# float
test = StatArray(45.454, name='45.454')
assert isinstance(test, StatArray) and test.size ==  1 and test.item() == 45.454, TypeError("da 3")
print(test.summary)
test = StatArray(np.float64(45.454), name='45.454')
assert isinstance(test, StatArray) and test.size ==  1 and test.item() == 45.454, TypeError("da 4")
print(test.summary)

# array
test = StatArray(np.random.randn(1), name="test", units="$\frac{g}{cc}$")
assert isinstance(test, StatArray) and test.size ==  1, TypeError("da 5")
print(test.summary)

test = StatArray(np.arange(10.0), name="test", units="$\frac{g}{cc}$")
assert isinstance(test, StatArray) and test.size ==  10, TypeError("da 6")
print(test.summary)


test = DataArray(np.arange(10.0), name="test", units="$\frac{g}{cc}$")
test = StatArray(test)
assert isinstance(test, StatArray) and test.size ==  10, TypeError("da 6")
print(test.summary)

# The StatArray can take any numpy function that returns an array as an input.
# The name and units of the variable can be assigned to the StatArray.
StatArray
Name:   1
Address:['0x140b98bd0']
Shape:  (1,)
Values: [0.]
Min:    0.0
Max:    0.0
has_posterior: False

StatArray
Name:   10
Address:['0x140b998d0']
Shape:  (10,)
Values: [0. 0. 0. ... 0. 0. 0.]
Min:    0.0
Max:    0.0
has_posterior: False

StatArray
Name:   (2, 10)
Address:['0x140b9a2d0']
Shape:  (2, 10)
Values: [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
Min:    0.0
Max:    0.0
has_posterior: False

StatArray
Name:   (2, 10)
Address:['0x140b9add0']
Shape:  (2,)
Values: [ 2 10]
Min:    2
Max:    10
has_posterior: False

StatArray
Name:   45.454
Address:['0x10390a350']
Shape:  (1,)
Values: [45.454]
Min:    45.454
Max:    45.454
has_posterior: False

StatArray
Name:   45.454
Address:['0x140b9b150']
Shape:  (1,)
Values: [45.454]
Min:    45.454
Max:    45.454
has_posterior: False

StatArray
Name:   test ($\frac{g}{cc}$)
Address:['0x10390a350']
Shape:  (1,)
Values: [-0.30357293]
Min:    -0.30357293062816576
Max:    -0.30357293062816576
has_posterior: False

StatArray
Name:   test ($\frac{g}{cc}$)
Address:['0x140b9b2d0']
Shape:  (10,)
Values: [0. 1. 2. ... 7. 8. 9.]
Min:    0.0
Max:    9.0
has_posterior: False

StatArray
Name:   test ($\frac{g}{cc}$)
Address:['0x140b9b150']
Shape:  (10,)
Values: [0. 1. 2. ... 7. 8. 9.]
Min:    0.0
Max:    9.0
has_posterior: False

Attaching Prior and Proposal Distributions to a StatArray

The StatArray class has been built so that we may easily attach not only names and units, but statistical distributions too. We won’t go into too much detail about the different distribution

Two types of distributions can be attached to the StatArray.

# Obtain an instantiation of a random number generator.
# This is optional, but is an important consideration for parallel programming.
from numpy.random import Generator
from numpy.random import PCG64DXSM
generator = PCG64DXSM(seed=0)
prng = Generator(generator)

Density = StatArray(10.0, name="test", units="$\frac{g}{cc}$")

Density.prior = Distribution('Uniform', -2.0, 2.0, prng=prng)

We can also attach a proposal distribution

Density.proposal = Distribution('Normal', 0.0, 1.0, prng=prng)
print(Density.summary)
print("Class type of the prior: ",type(Density.prior))
print("Class type of the proposal: ",type(Density.proposal))
StatArray
Name:   test ($\frac{g}{cc}$)
Address:['0x106d683d0' '0x13c904ec0' '0x13de4f810' ... '0x140b7ee40' '0x13de4f870'
 '0x13de4f8d0']
Shape:  (1,)
Values: [10.]
Min:    10.0
Max:    10.0
Prior:
|   Uniform Distribution:
|   Min: -2.0
|   Max: 2.0
Proposal:
|   Normal
|       Mean:0.0
|   Variance:1.0
has_posterior: False

Class type of the prior:  <class 'geobipy.src.classes.statistics.UniformDistribution.Uniform'>
Class type of the proposal:  <class 'geobipy.src.classes.statistics.NormalDistribution.Normal'>

The values in the variable can be evaluated against the prior. In this case, we have 3 elements in the variable, and a univariate Normal for the prior. Therefore each element is evaluated to get 3 probabilities, one for each element.

print(Density.probability(log=False))
0.0

The univariate proposal distribution can generate random samples from itself.

print(Density.propose())
1.1375024404290368

From a sampling stand point we can either sample using only the proposal Or we can only generate samples that simultaneously satisfy the prior.

print(Density.propose(relative=True))
[10.53816627]

We can perturb the variable by drawing from the attached proposal distribution.

Density.perturb()
print(Density.summary)
StatArray
Name:   test ($\frac{g}{cc}$)
Address:['0x106d683d0' '0x13c904ec0' '0x13de4f810' ... '0x140b7ee40' '0x13de4f870'
 '0x13de4f8d0']
Shape:  (1,)
Values: [0.38188467]
Min:    0.38188466718060166
Max:    0.38188466718060166
Prior:
|   Uniform Distribution:
|   Min: -2.0
|   Max: 2.0
Proposal:
|   Normal
|       Mean:0.0
|   Variance:1.0
has_posterior: False

Attaching a Histogram to capture the posterior distribution

The StatArray can perturb itself, evaluate its current probability given its priors and a histogram can be attached to capture its posterior distribution. As an example, lets create a Histogram class with bins generated from the prior.

bins = Density.prior.bins()

Attach the histogram

Density.posterior = Histogram(mesh = RectilinearMesh1D(edges=bins))

In an iterative sense, we can propose and evaluate new values, and update the posterior

for i in range(1000):
    Density.perturb()
    p = Density.probability(log=False)

    if p > 0.0: # This is a simple example!
        Density.update_posterior()
plt.figure()
Density.summaryPlot()
Prior, Proposal, Posterior

Attach a multivariate normal distribution as the prior and proposal

Attach the multivariate prior

mean = np.random.randn(Density.size)
variance = np.ones(Density.size)
Density.prior = Distribution('MvNormal', mean, variance, prng=prng)

Since the prior is multivariate, the appropriate equations are used to evaluate the probability for all elements in the StatArray. This produces a single probability.

print(Density.probability(log=False))
0.3796768180063585

Attach the multivariate proposal

mean = np.random.randn(Density.size)
variance = np.ones(Density.size)
Density.proposal = Distribution('MvNormal', mean, variance, prng=prng)

Perturb the variables using the multivariate proposal.

Density.perturb()
Density.summary

with h5py.File('statarray.h5', 'w') as f:
    Density.createHdf(f, 'statarray', withPosterior=True, add_axis=3)
    Density.writeHdf(f, 'statarray', withPosterior=True, index=0)

with h5py.File('statarray.h5', 'r') as f:
    tmp = StatArray.fromHdf(f, 'statarray', index=0, skip_posterior=False)

with h5py.File('statarray.h5', 'r') as f:
    tmp = StatArray.fromHdf(f, 'statarray', skip_posterior=False)

Basic manipulation

The StatArray contains other functions to perform basic array manipulations

These routines essentially wrap around numpy functions, but the result will have the same name and units, and if any prior or proposal are set, those will be carried through too.

1D example

x = StatArray(-np.cumsum(np.arange(10.0)))
print(x)
[ -0.  -1.  -3. ... -28. -36. -45.]
print(x.insert(i=[0, 9], values=[999.0, 999.0]))
[999.  -0.  -1. ... -36. 999. -45.]
print(x.prepend(999.0))
[999.  -0.  -1. ... -28. -36. -45.]
print(x.prepend([998.0, 999.0]))
[998. 999.  -0. ... -28. -36. -45.]
print(x.append([998.0, 999.0]))
[ -0.  -1.  -3. ... -45. 998. 999.]
print(x.resize(14))
[-0. -1. -3. ... -1. -3. -6.]
print(x.delete([5,8]))
[ -0.  -1.  -3. ... -21. -28. -45.]
print(x.edges())
[  0.5  -0.5  -2.  ... -32.  -40.5 -49.5]
print(x.internalEdges())
[ -0.5  -2.   -4.5 ... -24.5 -32.  -40.5]
print(x.firstNonZero())
1
print(x.lastNonZero())
10
print(x.abs())
[ 0.  1.  3. ... 28. 36. 45.]

2D example

x = StatArray(np.asarray([[0, -2, 3],[3, 0, -1],[1, 2, 0]]))
print(x)
[[ 0 -2  3]
 [ 3  0 -1]
 [ 1  2  0]]
print(x.insert(i=0, values=4))
[[ 4  4  4]
 [ 0 -2  3]
 [ 3  0 -1]
 [ 1  2  0]]
print(x.insert(i=[2, 3], values=5, axis=1))
[[ 0 -2  5  3  5]
 [ 3  0  5 -1  5]
 [ 1  2  5  0  5]]
print(x.insert(i=2, values=[10, 11, 12], axis=1))
[[ 0 -2 10  3]
 [ 3  0 11 -1]
 [ 1  2 12  0]]
print(x.prepend(999))
[[999 999 999]
 [  0  -2   3]
 [  3   0  -1]
 [  1   2   0]]
print(x.prepend([999, 998, 997], axis=1))
[[999 998 997   0  -2   3]
 [999 998 997   3   0  -1]
 [999 998 997   1   2   0]]
print(x.append([[999, 998, 997]]))
[[  0  -2   3]
 [  3   0  -1]
 [  1   2   0]
 [999 998 997]]
print(x.resize([5,5]))
[[ 0 -2  3  3  0]
 [-1  1  2  0  0]
 [-2  3  3  0 -1]
 [ 1  2  0  0 -2]
 [ 3  3  0 -1  1]]
print(x.delete(5))
[ 0 -2  3 ...  1  2  0]
print(x.delete(2, axis=0))
[[ 0 -2  3]
 [ 3  0 -1]]
print(x.firstNonZero(axis=0))
[1 0 0]
print(x.lastNonZero(axis=0))
[3 3 2]
print(x.firstNonZero(axis=1))
[1 0 0]
print(x.lastNonZero(axis=1))
[3 3 2]
print(x.abs())
[[0 2 3]
 [3 0 1]
 [1 2 0]]

Plotting

We can easily plot the StatArray with its built in plotting functions. All plotting functions can take matplotlib keywords

# The simplest is to just plot the array

Density = StatArray(np.random.randn(100),name="Density",units="$\frac{g}{cc}$")
Time = StatArray(np.linspace(0, 100, Density.size), name='Time', units='s')
Depth = StatArray(np.random.exponential(size=Density.size), name='Depth', units='m')
plt.figure()
_ = Density.plot(linewidth=0.5, marker='x', markersize=1.0)
plot StatArray

We can quickly plot a bar graph.

plt.figure()
_ = Density.bar()
plot StatArray

We can scatter the contents of the StatArray if it is 1D

plt.figure()
_ = Density.scatter(alpha=0.7)
plot StatArray

Histogram Equalization

A neat trick with colourmaps is histogram equalization. This approach forces all colours in the images to have an equal weight. This distorts the colour bar, but can really highlight the lower and higher ends of whatever you are plotting. Just add the equalize keyword!

plt.figure()
_ = Density.scatter(alpha=0.7, equalize=True)
plot StatArray

Take the log base(x) of the data

We can also take the data to a log, log10, log2, or a custom number!

plt.figure()
_ = Density.scatter(alpha=0.7,edgecolor='k',log='e') # could also use log='e', log=2, log=x) where x is the base you require
plot StatArray

X and Y axes

We can specify the x axis of the scatter plot.

plt.figure()
_ = Density.scatter(x=Time, alpha=0.7, edgecolor='k')
plot StatArray

Notice that I never specified the y axis, so the y axis defaulted to the values in the StatArray. In this case, any operations applied to the colours, are also applied to the y axis, e.g. log=10. When I take the values of Density to log base 10, because I do not specify the y plotting locations, those locations are similarly affected.

I can however force the y co-ordinates by specifying it as input. In the second subplot I explicitly plot distance on the y axis. In the first subplot, the y axis is the same as the colourbar.

plt.figure()
ax1 = plt.subplot(211)
Density.scatter(x=Time, alpha=0.7, edgecolor='k', log=10)
plt.subplot(212, sharex=ax1)
_ = Density.scatter(x=Time, y=Depth, alpha=0.7, edgecolor='k', log=10)
plot StatArray

Point sizes

Since the plotting functions take matplotlib keywords, I can also specify the size of each points.

s = np.ceil(100*(np.abs(np.random.randn(Density.size))))
plt.figure()
plt.tight_layout()
ax1 = plt.subplot(211)
Density.scatter(x=Time, y=Depth, s=s, alpha=0.7,edgecolor='k', legend_size=2)
plt.subplot(212, sharex=ax1)
#Density.scatter(x=Time, y=Depth, s=s, alpha=0.7,edgecolor='k', sizeLegend=[1.0, 100, 200, 300])
v = np.abs(Density)+1.0
_ = Density.scatter(x=Time, y=Depth, s=s, alpha=0.7,edgecolor='k', legend_size=[1.0, 100, 200, 300], log=10)
plot StatArray

Of course we can still take the log, or equalize the colour histogram

plt.figure()
_ = Density.scatter(x=Time, y=Depth, s=s, alpha=0.7,edgecolor='k',equalize=True,log=10)
plot StatArray

Typically pcolor only works with 2D arrays. The StatArray has a pcolor method that will pcolor a 1D array

plt.figure()
plt.subplot(221)
Density.pcolor()
plt.subplot(222)
Density.pcolor(y=Time)
plt.subplot(223)
Density.pcolor(y=Time, flip=True)
plt.subplot(224)
_ = Density.pcolor(y=Time, log=10, equalize=True)
plot StatArray

We can add grid lines, and add opacity to each element in the pcolor image

This is useful if the colour values need to be scaled by another variable e.g. variance.

plt.figure()
plt.subplot(121)
Density.pcolor(grid=True, cmap='jet')
plt.subplot(122)
a = np.linspace(1.0, 0.0, Density.size)
_ = Density.pcolor(grid=True, alpha=a, cmap='jet')
plot StatArray

We can plot a histogram of the StatArray

plt.figure()
_ = Density.hist(100)
plot StatArray

We can write the StatArray to a HDF5 file. HDF5 files are binary files that can include compression. They allow quick and easy access to parts of the file, and can also be written to and read from in parallel!

with h5py.File('1Dtest.h5','w') as f:
    Density.toHdf(f,'test')

We can then read the StatArray from the file Here x is a new variable, that is read in from the hdf5 file we just wrote.

x = StatArray.fromHdf('1Dtest.h5', 'test')
print('x has the same values as Density? ',np.all(x == Density))
x[2] = 5.0 # Change one of the values in x
print('x has its own memory allocated (not a reference/pointer)? ', id(x) != id(Density))
x has the same values as Density?  True
x has its own memory allocated (not a reference/pointer)?  True

We can also define a 2D array

Density = StatArray(np.random.randn(50,100),"Density","$\frac{g}{cc}$")
Density.summary
"StatArray\nName:   Density ($\\frac{g}{cc}$)\nAddress:['0x14c27bb50']\nShape:  (50, 100)\nValues: [[ 0.43031444  0.16747392 -1.02201373 ...  0.74153721  1.90862961\n  -1.96298093]\n [-1.14462562 -0.35617089 -0.26110976 ... -0.23504484  0.90603323\n   0.24712065]\n [-0.14957468  0.72207578 -0.88557489 ... -0.17089164 -0.17362076\n  -0.63489297]\n ...\n [-1.159864    0.20198017 -1.45020582 ... -0.09188611  0.51429009\n  -2.29873304]\n [ 0.05110857  0.16473781  0.39133135 ... -0.772846    1.47108002\n   0.36929033]\n [ 1.88347735 -0.83146739 -0.92242455 ... -0.48809595  1.23172139\n   1.86073707]]\nMin:    -4.160360372326481\nMax:    3.7432694562093096\nhas_posterior: False\n"

The StatArray Class’s functions work whether it is 1D or 2D

We can still do a histogram

plt.figure()
_ = Density.hist()
plot StatArray

And we can use pcolor to plot the 2D array

plt.figure()
_ = Density.pcolor()
plot StatArray

The StatArray comes with extra plotting options

Here we specify the x and y axes for the 2D array using two other 1D StatArrays

plt.figure()
x = StatArray(np.arange(101),name='x Axis',units = 'mm')
y = StatArray(np.arange(51),name='y Axis',units = 'elephants')
_ = Density.pcolor(x=x, y=y)
plot StatArray

We can plot using a log10 scale, in this case, we have values that are less than or equal to 0.0. Plotting with the log option will by default mask any of those values, and will let you know that it has done so!

plt.figure()
_ = Density.pcolor(x=x,y=y,log=2)
plot StatArray

A neat trick with colourmaps is histogram equalization. This approach forces all colours in the image to have an equal amount. This distorts the colours, but can really highlight the lower and higher ends of whatever you are plotting

plt.figure()
_ = Density.pcolor(x=x, y=y, equalize=True)
plot StatArray

We can equalize the log10 plot too :)

plt.figure()
_ = Density.pcolor(x=x,y=y,equalize=True, log=10)
plot StatArray

We can add opacity to each pixel in the image

a = StatArray(np.random.random(Density.shape), 'Opacity from 0.0 to 1.0')
plt.figure()
ax1 = plt.subplot(131)
ax = Density.pcolor(x=x, y=y, flipY=True, linewidth=0.1, colorbar=False)
plt.subplot(132, sharex=ax1, sharey=ax1)
ax = Density.pcolor(x=x, y=y, alpha=a, flipY=True, linewidth=0.1, colorbar=False)
plt.subplot(133, sharex=ax1, sharey=ax1)
_ = a.pcolor(x=x, y=y, flipY=True)
plot StatArray

If the array potentially has a lot of white space around the edges, we can trim the image

Density[:10, :] = 0.0
Density[-10:, :] = 0.0
Density[:, :10] = 0.0
Density[:, -10:] = 0.0
plt.figure()
plt.subplot(121)
Density.pcolor()
plt.subplot(122)
_ = Density.pcolor(trim=0.0)
plot StatArray

Create a stacked area plot of a 2D StatArray

A = StatArray(np.abs(np.random.randn(13,100)), name='Variable', units="units")
x = StatArray(np.arange(100),name='x Axis',units = 'mm')
plt.figure()
ax1 = plt.subplot(211)
A.stackedAreaPlot(x=x, axis=1)
plt.subplot(212, sharex=ax1)
_ = A.stackedAreaPlot(x=x, i=np.s_[[1,3,4],:], axis=1, labels=['a','b','c'])

plt.show()
plot StatArray

Total running time of the script: (0 minutes 6.330 seconds)

Gallery generated by Sphinx-Gallery