StatArray
digraph inheritance824f77954e { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "ABC" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Helper class that provides a standard way to create an ABC using"]; "DataArray" [URL="#geobipy.src.classes.core.DataArray.DataArray",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class extension to numpy.ndarray"]; "ndarray" -> "DataArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "myObject" -> "DataArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "myObject" [URL="myObject.html#geobipy.src.classes.core.myObject.myObject",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "ABC" -> "myObject" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ndarray" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="ndarray(shape, dtype=float, buffer=None, offset=0,"]; }- class geobipy.src.classes.core.DataArray.DataArray(shape=None, name=None, units=None, verbose=False, **kwargs)
Class extension to numpy.ndarray
This subclass to a numpy array contains extra attributes that can describe the parameters it represents. One can also attach prior and proposal distributions so that it may be used in an MCMC algorithm easily. Because this is a subclass to numpy, the StatArray contains all numpy array methods and when passed to an in-place numpy function will return a StatArray. See the example section for more information.
StatArray(shape, name=None, units=None, **kwargs)
- Parameters:
shape (int or sequence of ints or array_like or StatArray) –
If shape is int or sequence of ints : give the shape of the new StatArray e.g.,
2
or(2, 3)
. All other arguments that can be passed to functions like numpy.zeros or numpy.arange can be used, see Other Parameters.If shape is array_like : any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence. e.g.,
StatArray(numpy.arange(10))
will cast the result into a StatArray and will maintain the properies passed through to that function. One could then attach the name, units, prior, and/or proposal through this interface too. e.g.,x = StatArray(numpy.arange(10,dtype=numpy.int), name='aTest', units='someUnits')
If shape is StatArray : the returned object is a deepcopy of the input. If name and units are specified with this option they will replace those parameters in the copy. e.g.,
y = StatArray(x, name='anotherTest')
will be a deepcopy copy of x, but with a different name.
name (str, optional) – The name of the StatArray.
units (str, optional) – The units of the StatArray.
dtype (data-type, optional) – The desired data-type for the array. Default is numpy.float64. Only used when shape is int or sequence of ints. The data type could also be a class.
buffer (object exposing buffer interface, optional) – Used to fill the array with data. Only used when shape is int or sequence of ints.
offset (int, optional) – Offset of array data in buffer. Only used when shape is int or sequence of ints.
strides (tuple of ints, optional) – Strides of data in memory. Only used when shape is int or sequence of ints.
order ({'C', 'F', 'A'}, optional) – Specify the order of the array. If order is ‘C’, then the array will be in C-contiguous order (rightmost-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (leftmost-index varies the fastest). If order is ‘A’ (default), then the returned array may be in any order (either C-, Fortran-contiguous, or even discontiguous), unless a copy is required, in which case it will be C-contiguous. Only used when shape is int or sequence of ints.
- Returns:
out – Extension to numpy.ndarray with additional attributes attached.
- Return type:
- Raises:
TypeError – If name is not a str.
TypeError – If units is not a str.
Notes
When the StatArray is passed through a numpy function, the name and units are maintained in the new object. Any priors or proposals are not kept for two reasons. a) keep computational overheads low, b) assume that a possible change in size or meaning of a parameter warrants a change in any attached distributions.
Examples
Since the StatArray is an extension to numpy, all numpy attached methods can be used.
>>> from geobipy import StatArray >>> import numpy as np >>> x = StatArray(arange(10), name='test', units='units') >>> print(x.mean()) 4.5
If the StatArray is passed to a numpy function that does not return a new instantiation, a StatArray will be returned (as opposed to a numpy array)
>>> delete(x, 5) StatArray([0, 1, 2, 3, 4, 6, 7, 8, 9])
However, if you pass a StatArray to a numpy function that is not in-place, i.e. creates new memory, the return type will be a numpy array and NOT a StatArray subclass
>>> append(x,[3,4,5]) array([0, 1, 2, ..., 3, 4, 5])
See also
geobipy.src.classes.statistics.Distribution
For possible prior and proposal distributions
- Bcast(world, root=0)
Broadcast the StatArray to every rank in the MPI communicator.
- Parameters:
world (mpi4py.MPI.Comm) – The MPI communicator over which to broadcast.
root (int, optional) – The rank from which to broadcast. Default is 0 for the master rank.
- Returns:
out – The broadcast StatArray on every rank in the MPI communicator.
- Return type:
- classmethod Irecv(source, world, ndim=None, shape=None, dtype=None)
- Isend(dest, world, ndim=None, shape=None, dtype=None)
- Scatterv(starts, chunks, world, axis=0, root=0)
Scatter variable lengths of the StatArray using MPI
Takes the StatArray and gives each core in the world a chunk of the array.
- Parameters:
starts (array of ints) – 1D array of ints with size equal to the number of MPI ranks. Each element gives the starting index for a chunk to be sent to that core. e.g. starts[0] is the starting index for rank = 0.
chunks (array of ints) – 1D array of ints with size equal to the number of MPI ranks. Each element gives the size of a chunk to be sent to that core. e.g. chunks[0] is the chunk size for rank = 0.
world (mpi4py.MPI.Comm) – The MPI communicator over which to Scatterv.
axis (int, optional) – This axis is distributed amongst ranks.
root (int, optional) – The MPI rank to ScatterV from. Default is 0.
- Returns:
out – The StatArray distributed amongst ranks.
- Return type:
- abs()
Take the absolute value. In-place operation.
- Returns:
out – Absolute value
- Return type:
- property address
- property addressof
- append(values, axis=0)
Append to a StatArray
Appends values the end of a StatArray. Be careful with repeated calls to this method as it can be slow due to reallocating memory.
- Parameters:
values (scalar or array_like) – Numbers to append
- Returns:
out – Appended StatArray
- Return type:
- argmax_multiple_to_nan(axis=0)
Perform the numpy argmax function on the StatArray but optionally mask multiple max values as NaN.
- Parameters:
nan_multiple (bool) – If multiple locations contain the same max value, mask as nan.
- Returns:
out – Array of indices into the array. It has the same shape as self.shape with the dimension along axis removed.
- Return type:
ndarray of floats
- bar(x=None, i=None, **kwargs)
Plot the StatArray as a bar chart.
The values in self are the heights of the bars. Auto labels it if x has type geobipy.StatArray
- Parameters:
x (array_like or StatArray, optional) – The horizontal locations of the bars
i (sequence of ints, optional) – Plot the ith indices of self, against the ith indices of x.
- Returns:
matplotlib .Axes
- Return type:
ax
See also
matplotlib.pyplot.bar
For additional keyword arguments you may use.
- property bounds
- centred_grid_nodes(spacing)
Generates grid nodes centred over bounds
- Parameters:
bounds (array_like) – bounds of the dimension
spacing (float) – distance between nodes
- copy(order='C')
Return a copy of the array.
- Parameters:
order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout of the copy. ‘C’ means C-order, ‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible. (Note that this function and
numpy.copy()
are very similar but have different default values for their order= arguments, and this function always passes sub-classes through.)
See also
numpy.copy
Similar function with different default behavior
numpy.copyto
Notes
This function is the preferred method for creating an array copy. The function
numpy.copy()
is similar, but it defaults to using order ‘K’, and will not pass sub-classes through by default.Examples
>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> y = x.copy()
>>> x.fill(0)
>>> x array([[0, 0, 0], [0, 0, 0]])
>>> y array([[1, 2, 3], [4, 5, 6]])
>>> y.flags['C_CONTIGUOUS'] True
For arrays containing Python objects (e.g. dtype=object), the copy is a shallow one. The new array will contain the same object which may lead to surprises if that object can be modified (is mutable):
>>> a = np.array([1, 'm', [2, 3, 4]], dtype=object) >>> b = a.copy() >>> b[2][0] = 10 >>> a array([1, 'm', list([10, 3, 4])], dtype=object)
To ensure all elements within an
object
array are copied, use copy.deepcopy:>>> import copy >>> a = np.array([1, 'm', [2, 3, 4]], dtype=object) >>> c = copy.deepcopy(a) >>> c[2][0] = 10 >>> c array([1, 'm', list([10, 3, 4])], dtype=object) >>> a array([1, 'm', list([2, 3, 4])], dtype=object)
- createHdf(h5obj, name, shape=None, add_axis=None, fillvalue=None, **kwargs)
Create the Metadata for a StatArray in a HDF file
Creates a new group in a HDF file under h5obj. A nested heirarchy will be created e.g., myName/data, myName/prior, and myName/proposal. This method can be used in an MPI parallel environment, if so however, a) the hdf file must have been opened with the mpio driver, and b) createHdf must be called collectively, i.e., called by every core in the MPI communicator that was used to open the file. In order to create large amounts of empty space before writing to it in parallel, the nRepeats parameter will extend the memory in the first dimension.
- Parameters:
h5obj (h5py.File or h5py.Group) – A HDF file or group object to create the contents in.
myName (str) – The name of the group to create.
withPosterior (bool, optional) – Include the creation of space for any attached posterior.
nRepeats (int, optional) – Inserts a first dimension into the shape of the StatArray of size nRepeats. This can be used to extend the available memory of the StatArray so that multiple MPI ranks can write to their respective parts in the extended memory.
fillvalue (number, optional) – Initializes the memory in file with the fill value
Notes
This method can be used in serial and MPI. As an example in MPI. Given 10 MPI ranks, each with a 10 length array, it is faster to create a 10x10 empty array, and have each rank write its row. Rather than creating 10 separate length 10 arrays because the overhead when creating the file metadata can become very cumbersome if done too many times.
Example
>>> from geobipy import StatArray >>> from mpi4py import MPI >>> import h5py
>>> world = MPI.COMM_WORLD
>>> x = StatArray(4, name='test', units='units') >>> x[:] = world.rank
>>> # This is a collective open of data in the file >>> f = h5py.File(fName,'w', driver='mpio',comm=world) >>> # Collective creation of space(padded by number of mpi ranks) >>> x.createHdf(f, 'x', nRepeats=world.size)
>>> world.barrier()
>>> # In a non collective region, we can write to different sections of x in the file >>> # Fake a non collective region >>> def noncollectivewrite(x, file, world): >>> # Each rank carries out this code, but it's not collective. >>> x.writeHdf(file, 'x', index=world.rank) >>> noncollectivewrite(x, f, world)
>>> world.barrier() >>> f.close()
- delete(i, axis=None)
Delete elements
- Parameters:
i (slice, int or array of ints) – Indicate which sub-arrays to remove.
axis (int, optional) – The axis along which to delete the subarray defined by obj. If axis is None, obj is applied to the flattened array.
- Returns:
out – Deepcopy of StatArray with deleted entry(ies).
- Return type:
- diff(axis=-1)
- edges(min=None, max=None, axis=-1)
Get the midpoint values between elements in the StatArray
Returns an size(self) + 1 length StatArray of the midpoints between each element. The first and last element are projected edges based on the difference between first two and last two elements in self. edges[0] = self[0] - 0.5 * (self[1]-self[0]) edges[-1] = self[-1] + 0.5 * (self[-1] - self[-2]) If min and max are given, the edges are fixed and not calculated.
- Parameters:
min (float, optional) – Fix the leftmost edge to min.
max (float, optional) – Fix the rightmost edge to max.
axis (int, optional) – Compute edges along this dimension if > 1D.
- Returns:
out – Edges of the StatArray
- Return type:
- firstNonZero(axis=0, invalid_val=-1)
Find the indices of the first non zero values along the axis.
- Parameters:
axis (int, optional) – Axis along which to find first non zeros.
invalid_val (int, optional) – When zero is not available, return this index.
- Returns:
out – Indices of the first non zero values.
- Return type:
array_like
- fit_mixture(mixture_type='gaussian', log=None, mean_bounds=None, variance_bounds=None, k=[1, 5], tolerance=0.05)
Uses Gaussian mixture models to fit the histogram.
Starts at the minimum number of clusters and adds clusters until the BIC decreases less than the tolerance.
- Parameters:
nSamples
log
mean_bounds
variance_bounds
k (ints) – Two ints with starting and ending # of clusters
tolerance
- classmethod fromHdf(grp, name=None, index=None, **kwargs)
Read the StatArray from a HDF group
Given the HDF group object, read the contents into a StatArray.
- Parameters:
h5obj (h5py._hl.group.Group) – A HDF group object to write the contents to.
index (slice, optional) – If the group was created using the nRepeats option, index specifies the index’th entry from which to read the data.
- gaussianMixture(clusterID, trainPercent=75.0, covType=['spherical'], plot=True)
Use a Gaussian Mixing Model to classify the data. clusterID is the initial assignment of the rows to their clusters
- getNameUnits()
Get the name and units
Gets the name and units attached to the StatArray. Units, if present are surrounded by parentheses
- Returns:
out – String containing name(units).
- Return type:
str
- hasLabels()
- property hasPosterior
- property hasPrior
- property hasProposal
- hist(bins=10, **kwargs)
Plot a histogram of the StatArray
Plots a histogram, estimates the mean and standard deviation and overlays the PDF of a normal distribution with those values, if density=1.
See also
geobipy.plotting.hist
For geobipy additional arguments
matplotlib.pyplot.hist
For histogram related arguments
Example
>>> from geobipy import StatArray >>> import numpy as np >>> import matplotlib.pyplot as plt >>> x = StatArray(random.randn(1000), name='name', units='units') >>> plt.figure() >>> x.hist() >>> plt.show()
- index(values)
Find the index of values.
Assumes that self is monotonically increasing!
- Parameters:
values (scalara or array_like) – Find the index of these values.
- Returns:
out – Indicies into self.
- Return type:
ints
- insert(i, values, axis=0)
Insert values
- Parameters:
i (int, slice or sequence of ints) – Object that defines the index or indices before which values is inserted.
values (array_like) – Values to insert into arr. If the type of values is different from that of arr, values is converted to the type of arr. values should be shaped so that
arr[...,obj,...] = values
is legal.axis (int, optional) – Axis along which to insert values. If axis is None then arr is flattened first.
- Returns:
out – StatArray after inserting a value.
- Return type:
- interleave(other)
Interleave two arrays together like zip
- internalEdges(axis=-1)
Get the midpoint values between elements in the StatArray
Returns an size(self) + 1 length StatArray of the midpoints between each element
- Returns:
out – Edges of the StatArray
- Return type:
- isRegular(axis=-1)
Checks that the values change regularly
- Returns:
out – Is regularly changing.
- Return type:
bool
- kMeans(nClusters, standardize=False, nIterations=10, plot=False, **kwargs)
Perform K-Means clustering on the StatArray
- property label
- lastNonZero(axis=0, invalid_val=-1)
Find the indices of the first non zero values along the axis.
- Parameters:
axis (int) – Axis along which to find first non zeros.
- Returns:
out – Indices of the first non zero values.
- Return type:
array_like
- property n_posteriors
- property name
- nanmax()
- nanmin()
- normalize(axis=None)
Normalize to range 0 - 1.
- pad(N)
Copies the properties of a StatArray including all priors or proposals, but pads everything to the given size
- Parameters:
N (int) – Size to pad to.
- Returns:
out – Padded StatArray
- Return type:
- pcolor(x=None, y=None, **kwargs)
Create a pseudocolour plot of the StatArray array, Actually uses pcolormesh for speed.
If the arguments x and y are geobipy.StatArray classes, the axes can be automatically labelled. Can take any other matplotlib arguments and keyword arguments e.g. cmap etc.
- Parameters:
x (1D array_like or StatArray, optional) – Horizontal coordinates of the values edges.
y (1D array_like or StatArray, optional) – Vertical coordinates of the values edges.
alpha (scalar or arrya_like, optional) – If alpha is scalar, behaves like standard matplotlib alpha and opacity is applied to entire plot If array_like, each pixel is given an individual alpha value.
log ('e' or float, optional) – Take the log of the colour to a base. ‘e’ if log = ‘e’, and a number e.g. log = 10. Values in c that are <= 0 are masked.
equalize (bool, optional) – Equalize the histogram of the colourmap so that all colours have an equal amount.
nbins (int, optional) – Number of bins to use for histogram equalization.
xscale (str, optional) – Scale the x axis? e.g. xscale = ‘linear’ or ‘log’
yscale (str, optional) – Scale the y axis? e.g. yscale = ‘linear’ or ‘log’.
flipX (bool, optional) – Flip the X axis
flipY (bool, optional) – Flip the Y axis
grid (bool, optional) – Plot the grid
noColorbar (bool, optional) – Turn off the colour bar, useful if multiple plotting plotting routines are used on the same figure.
trim (bool, optional) – Set the x and y limits to the first and last non zero values along each axis.
classes (dict, optional) – A dictionary containing three entries. classes[‘id’] : array_like of same shape as self containing the class id of each element in self. classes[‘cmaps’] : list of matplotlib colourmaps. The number of colourmaps should equal the number of classes. classes[‘labels’] : list of str. The length should equal the number of classes. If classes is provided, alpha is ignored if provided.
- Returns:
matplotlib .Axes
- Return type:
ax
See also
matplotlib.pyplot.pcolormesh
For additional keyword arguments you may use.
- plot(x=None, i=None, axis=0, **kwargs)
Plot self against x
If x and y are StatArrays, the axes are automatically labelled.
- Parameters:
x (array_like or StatArray) – The abcissa
i (sequence of ints, optional) – Plot the ith indices of self, against the ith indices of x.
axis (int, optional) – If self is 2D, plot values along this axis.
log ('e' or float, optional) – Take the log of the colour to a base. ‘e’ if log = ‘e’, and a number e.g. log = 10. Values in c that are <= 0 are masked.
xscale (str, optional) – Scale the x axis? e.g. xscale = ‘linear’ or ‘log’.
yscale (str, optional) – Scale the y axis? e.g. yscale = ‘linear’ or ‘log’.
flipX (bool, optional) – Flip the X axis
flipY (bool, optional) – Flip the Y axis
labels (bool, optional) – Plot the labels? Default is True.
- Returns:
matplotlib .Axes
- Return type:
ax
See also
matplotlib.pyplot.plot
For additional keyword arguments you may use.
- prepend(values, axis=0)
Prepend to a StatArray
Prepends numbers to a StatArray, Do not use this too often as it is quite slow
- Parameters:
values (scalar or array_like) – A number to prepend.
- Returns:
out – StatArray with prepended values.
- Return type:
- property range
- rescale(a, b)
Rescale to the interval (a, b)
- Parameters:
a (float) – Lower limit
b (float) – Upper limit
- Returns:
out – Rescaled Array
- Return type:
- reset_posteriors()
- resize(new_shape)
Resize a StatArray
Resize a StatArray but copy over any attached attributes
- Parameters:
new_shape (int or tuple of ints) – Shape of the resized array
- Returns:
out – Resized array.
- Return type:
See also
numpy.resize
For more information.
- rolling(numpyFunction, window=1)
- scatter(x=None, y=None, i=None, **kwargs)
Create a 2D scatter plot.
Create a 2D scatter plot, if the y values are not given, the colours are used instead. If the arrays x, y, and c are geobipy.StatArray classes, the axes are automatically labelled. Can take any other matplotlib arguments and keyword arguments e.g. markersize etc.
- Parameters:
x (1D array_like or StatArray) – Horizontal locations of the points to plot
c (1D array_like or StatArray) – Colour values of the points
y (1D array_like or StatArray, optional) – Vertical locations of the points to plot, if y = None, then y = c.
i (sequence of ints, optional) – Plot a subset of x, y, c, using the indices in i.
See also
geobipy.plotting.Scatter2D
For additional keyword arguments you may use.
- smooth(a)
- stackedAreaPlot(x=None, i=None, axis=0, labels=[], **kwargs)
Create stacked area plot where column elements are stacked on top of each other.
- Parameters:
x (array_like or StatArray) – The abcissa.
i (sequence of ints, optional) – Plot a subset of x, y, c, using the indices in i.
axis (int) – Plot along this axis, stack along the other axis.
labels (list of str, optional) – The labels to assign to each column.
colors (matplotlib.colors.LinearSegmentedColormap or list of colours) – The colour used for each column.
xscale (str, optional) – Scale the x axis? e.g. xscale = ‘linear’ or ‘log’.
yscale (str, optional) – Scale the y axis? e.g. yscale = ‘linear’ or ‘log’.
- Returns:
matplotlib .Axes
- Return type:
ax
See also
matplotlib.pyplot.scatterplot
For additional keyword arguments you may use.
- standardize(axis=None)
Standardize by subtracting the mean and dividing by the standard deviation.
- strip_nan()
- property summary
Write a summary of the StatArray
- Parameters:
out (bool) – Whether to return the summary or print to screen
- Returns:
out – Summary of StatArray
- Return type:
str, optional
- property units
- update_posterior(**kwargs)
Adds the current values of the StatArray to the attached posterior.
- property values
- verbose()
Explicit print of every element
- writeHdf(h5obj, name, index=None, **kwargs)
Write the values of a StatArray to a HDF file
Writes the contents of the StatArray to an already created group in a HDF file under h5obj. This method can be used in an MPI parallel environment, if so however, the hdf file must have been opened with the mpio driver. Unlike createHdf, writeHdf does not have to be called collectively, each rank can call writeHdf independently, so long as they do not try to write to the same index.
- Parameters:
h5obj (h5py._hl.files.File or h5py._hl.group.Group) – A HDF file or group object to write the contents to.
myName (str) – The name of the group to write to. The group must have been created previously.
withPosterior (bool, optional) – Include writing any attached posterior.
index (int, optional) – If the group was created using the nRepeats option, index specifies the index’th entry at which to write the data
Example
>>> from geobipy import StatArray >>> from mpi4py import MPI >>> import h5py
>>> world = MPI.COMM_WORLD
>>> x = StatArray(4, name='test', units='units') >>> x[:] = world.rank
>>> # This is a collective open of data in the file >>> f = h5py.File(fName,'w', driver='mpio',comm=world) >>> # Collective creation of space(padded by number of mpi ranks) >>> x.createHdf(f, 'x', nRepeats=world.size)
>>> world.barrier()
>>> # In a non collective region, we can write to different sections of x in the file >>> # Fake a non collective region >>> def noncollectivewrite(x, file, world): >>> # Each rank carries out this code, but it's not collective. >>> x.writeHdf(file, 'x', index=world.rank) >>> noncollectivewrite(x, f, world)
>>> world.barrier() >>> f.close()