StatArray

digraph inheritanced8e76283df { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "ABC" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Helper class that provides a standard way to create an ABC using"]; "DataArray" [URL="../core/DataArray.html#geobipy.src.classes.core.DataArray.DataArray",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class extension to numpy.ndarray"]; "ndarray" -> "DataArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "myObject" -> "DataArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "StatArray" [URL="#geobipy.src.classes.statistics.StatArray.StatArray",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class extension to numpy.ndarray"]; "DataArray" -> "StatArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "myObject" [URL="../core/myObject.html#geobipy.src.classes.core.myObject.myObject",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "ABC" -> "myObject" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ndarray" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="ndarray(shape, dtype=float, buffer=None, offset=0,"]; }
class geobipy.src.classes.statistics.StatArray.StatArray(shape=None, name=None, units=None, verbose=False, **kwargs)

Class extension to numpy.ndarray

This subclass to a numpy array contains extra attributes that can describe the parameters it represents. One can also attach prior and proposal distributions so that it may be used in an MCMC algorithm easily. Because this is a subclass to numpy, the StatArray contains all numpy array methods and when passed to an in-place numpy function will return a StatArray. See the example section for more information.

StatArray(shape, name=None, units=None, **kwargs)

Parameters:
  • shape (int or sequence of ints or array_like or StatArray) –

    • If shape is int or sequence of ints : give the shape of the new StatArray e.g., 2 or (2, 3). All other arguments that can be passed to functions like numpy.zeros or numpy.arange can be used, see Other Parameters.

    • If shape is array_like : any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence. e.g., StatArray(numpy.arange(10)) will cast the result into a StatArray and will maintain the properies passed through to that function. One could then attach the name, units, prior, and/or proposal through this interface too. e.g., x = StatArray(numpy.arange(10,dtype=numpy.int), name='aTest', units='someUnits')

    • If shape is StatArray : the returned object is a deepcopy of the input. If name and units are specified with this option they will replace those parameters in the copy. e.g., y = StatArray(x, name='anotherTest') will be a deepcopy copy of x, but with a different name.

  • name (str, optional) – The name of the StatArray.

  • units (str, optional) – The units of the StatArray.

  • dtype (data-type, optional) – The desired data-type for the array. Default is numpy.float64. Only used when shape is int or sequence of ints. The data type could also be a class.

  • buffer (object exposing buffer interface, optional) – Used to fill the array with data. Only used when shape is int or sequence of ints.

  • offset (int, optional) – Offset of array data in buffer. Only used when shape is int or sequence of ints.

  • strides (tuple of ints, optional) – Strides of data in memory. Only used when shape is int or sequence of ints.

  • order ({'C', 'F', 'A'}, optional) – Specify the order of the array. If order is ‘C’, then the array will be in C-contiguous order (rightmost-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (leftmost-index varies the fastest). If order is ‘A’ (default), then the returned array may be in any order (either C-, Fortran-contiguous, or even discontiguous), unless a copy is required, in which case it will be C-contiguous. Only used when shape is int or sequence of ints.

Returns:

out – Extension to numpy.ndarray with additional attributes attached.

Return type:

StatArray

Raises:
  • TypeError – If name is not a str.

  • TypeError – If units is not a str.

Notes

When the StatArray is passed through a numpy function, the name and units are maintained in the new object. Any priors or proposals are not kept for two reasons. a) keep computational overheads low, b) assume that a possible change in size or meaning of a parameter warrants a change in any attached distributions.

Examples

Since the StatArray is an extension to numpy, all numpy attached methods can be used.

>>> from geobipy import StatArray
>>> import numpy as np
>>> x = StatArray(arange(10), name='test', units='units')
>>> print(x.mean())
4.5

If the StatArray is passed to a numpy function that does not return a new instantiation, a StatArray will be returned (as opposed to a numpy array)

>>> delete(x, 5)
StatArray([0, 1, 2, 3, 4, 6, 7, 8, 9])

However, if you pass a StatArray to a numpy function that is not in-place, i.e. creates new memory, the return type will be a numpy array and NOT a StatArray subclass

>>> append(x,[3,4,5])
array([0, 1, 2, ..., 3, 4, 5])

See also

geobipy.src.classes.statistics.Distribution

For possible prior and proposal distributions

property address
property addressof
centred_grid_nodes(spacing)

Generates grid nodes centred over bounds

Parameters:
  • bounds (array_like) – bounds of the dimension

  • spacing (float) – distance between nodes

copy(order='C')

Return a copy of the array.

Parameters:

order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout of the copy. ‘C’ means C-order, ‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible. (Note that this function and numpy.copy() are very similar but have different default values for their order= arguments, and this function always passes sub-classes through.)

See also

numpy.copy

Similar function with different default behavior

numpy.copyto

Notes

This function is the preferred method for creating an array copy. The function numpy.copy() is similar, but it defaults to using order ‘K’, and will not pass sub-classes through by default.

Examples

>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> y = x.copy()
>>> x.fill(0)
>>> x
array([[0, 0, 0],
       [0, 0, 0]])
>>> y
array([[1, 2, 3],
       [4, 5, 6]])
>>> y.flags['C_CONTIGUOUS']
True

For arrays containing Python objects (e.g. dtype=object), the copy is a shallow one. The new array will contain the same object which may lead to surprises if that object can be modified (is mutable):

>>> a = np.array([1, 'm', [2, 3, 4]], dtype=object)
>>> b = a.copy()
>>> b[2][0] = 10
>>> a
array([1, 'm', list([10, 3, 4])], dtype=object)

To ensure all elements within an object array are copied, use copy.deepcopy:

>>> import copy
>>> a = np.array([1, 'm', [2, 3, 4]], dtype=object)
>>> c = copy.deepcopy(a)
>>> c[2][0] = 10
>>> c
array([1, 'm', list([10, 3, 4])], dtype=object)
>>> a
array([1, 'm', list([2, 3, 4])], dtype=object)
copyStats(other)

Copy statistical properties from other to self

[extended_summary]

Parameters:

other ([type]) – [description]

createHdf(h5obj, name, shape=None, withPosterior=True, add_axis=None, fillvalue=None, upcast=True)

Create the Metadata for a StatArray in a HDF file

Creates a new group in a HDF file under h5obj. A nested heirarchy will be created e.g., myName/data, myName/prior, and myName/proposal. This method can be used in an MPI parallel environment, if so however, a) the hdf file must have been opened with the mpio driver, and b) createHdf must be called collectively, i.e., called by every core in the MPI communicator that was used to open the file. In order to create large amounts of empty space before writing to it in parallel, the nRepeats parameter will extend the memory in the first dimension.

Parameters:
  • h5obj (h5py.File or h5py.Group) – A HDF file or group object to create the contents in.

  • myName (str) – The name of the group to create.

  • withPosterior (bool, optional) – Include the creation of space for any attached posterior.

  • nRepeats (int, optional) – Inserts a first dimension into the shape of the StatArray of size nRepeats. This can be used to extend the available memory of the StatArray so that multiple MPI ranks can write to their respective parts in the extended memory.

  • fillvalue (number, optional) – Initializes the memory in file with the fill value

Notes

This method can be used in serial and MPI. As an example in MPI. Given 10 MPI ranks, each with a 10 length array, it is faster to create a 10x10 empty array, and have each rank write its row. Rather than creating 10 separate length 10 arrays because the overhead when creating the file metadata can become very cumbersome if done too many times.

Example

>>> from geobipy import StatArray
>>> from mpi4py import MPI
>>> import h5py
>>> world = MPI.COMM_WORLD
>>> x = StatArray(4, name='test', units='units')
>>> x[:] = world.rank
>>> # This is a collective open of data in the file
>>> f = h5py.File(fName,'w', driver='mpio',comm=world)
>>> # Collective creation of space(padded by number of mpi ranks)
>>> x.createHdf(f, 'x', nRepeats=world.size)
>>> world.barrier()
>>> # In a non collective region, we can write to different sections of x in the file
>>> # Fake a non collective region
>>> def noncollectivewrite(x, file, world):
>>>     # Each rank carries out this code, but it's not collective.
>>>     x.writeHdf(file, 'x', index=world.rank)
>>> noncollectivewrite(x, f, world)
>>> world.barrier()
>>> f.close()
create_posterior_hdf(grp, add_axis, fillvalue, upcast)
delete(i, axis=None)

Delete elements

Parameters:
  • i (slice, int or array of ints) – Indicate which sub-arrays to remove.

  • axis (int, optional) – The axis along which to delete the subarray defined by obj. If axis is None, obj is applied to the flattened array.

Returns:

out – Deepcopy of StatArray with deleted entry(ies).

Return type:

StatArray

classmethod fromHdf(grp, name=None, index=None, skip_posterior=False, posterior_index=None)

Read the StatArray from a HDF group

Given the HDF group object, read the contents into a StatArray.

Parameters:
  • h5obj (h5py._hl.group.Group) – A HDF group object to write the contents to.

  • index (slice, optional) – If the group was created using the nRepeats option, index specifies the index’th entry from which to read the data.

property hasPosterior

Check that the StatArray has an attached posterior.

Returns:

out – Has an attached posterior.

Return type:

bool

property hasPrior

Check that the StatArray has an attached prior.

Returns:

out – Has an attached prior.

Return type:

bool

property hasProposal

Check that the StatArray has an attached proposal.

Returns:

out – Has an attached proposal.

Return type:

bool

insert(i, values, axis=0)

Insert values

Parameters:
  • i (int, slice or sequence of ints) – Object that defines the index or indices before which values is inserted.

  • values (array_like) – Values to insert into arr. If the type of values is different from that of arr, values is converted to the type of arr. values should be shaped so that arr[...,obj,...] = values is legal.

  • axis (int, optional) – Axis along which to insert values. If axis is None then arr is flattened first.

Returns:

out – StatArray after inserting a value.

Return type:

StatArray

mahalanobis()
property n_posteriors
overlay_on_posteriors(overlay, ax, **kwargs)
pad(N)

Copies the properties of a StatArray including all priors or proposals, but pads everything to the given size

Parameters:

N (int) – Size to pad to.

Returns:

out – Padded StatArray

Return type:

StatArray

perturb(i=slice(None, None, None), relative=False, imposePrior=False, log=False)

Perturb the values of the StatArray using the attached proposal

The StatArray must have had a proposal set using StatArray.setProposal()

Parameters:
  • i (slice or int or sequence of ints, optional) – Index or indices of self that should be perturbed.

  • relative (bool) – Update the StatArray relative to the current values or assign the new samples to the StatArray.

Raises:

TypeError – If the proposal has not been set

plot_posteriors(**kwargs)

Plot the posteriors of the StatArray.

Parameters:

ax (matplotlib axis or list of ax, optional) – Must match the number of attached posteriors. * If not specified, subplots are created vertically.

property posterior

Returns the posterior if available.

posteriors_from_hdf(grp, index)
property prior

Returns the prior if available.

priorDerivative(order, i=None)

Get the derivative of the prior.

Parameters:

order (int) – 1 or 2 for first or second order derivative

probability(log, x=None, i=None, active=None)

Evaluate the probability of the values in self using the attached prior distribution

Parameters:
  • arg1 (array_like, optional) – Will evaluate the probability of the numbers in the arg1 using the prior attached to self

  • i (slice or int or sequence of ints, optional) – Index or indices of self that should be evaluated.

Returns:

numpy.float

Return type:

out

Raises:

TypeError – If the prior has not been set

property proposal

Returns the prior if available.

proposal_derivative(order, i=None)

Get the derivative of the proposal.

Parameters:

order (int) – 1 or 2 for first or second order derivative

propose(i=slice(None, None, None), relative=False, imposePrior=False, log=False)

Propose new values using the attached proposal distribution

Parameters:
  • i (ints, optional) – Only propose values for these indices.

  • relative (bool, optional) – Use the proposal distribution as a relative change to the parameter values.

  • imposePrior (bool, optional) – Continue to propose new values until the prior probability is non-zero or -infinity.

  • log (bool, required if imposePrior is True.) – Use log probability when imposing the prior.

Returns:

out – Valuese generated from the proposal.

Return type:

array_like

reset_posteriors()
resize(new_shape)

Resize a StatArray

Resize a StatArray but copy over any attached attributes

Parameters:

new_shape (int or tuple of ints) – Shape of the resized array

Returns:

out – Resized array.

Return type:

StatArray

See also

numpy.resize

For more information.

property summary

Write a summary of the StatArray

Parameters:

out (bool) – Whether to return the summary or print to screen

Returns:

out – Summary of StatArray

Return type:

str, optional

summaryPlot(**kwargs)

Creates a summary plot of the StatArray with any attached priors, proposal, or posteriors.

update_posterior(**kwargs)

Adds the current values of the StatArray to the attached posterior.

writeHdf(h5obj, name, withPosterior=True, index=None)

Write the values of a StatArray to a HDF file

Writes the contents of the StatArray to an already created group in a HDF file under h5obj. This method can be used in an MPI parallel environment, if so however, the hdf file must have been opened with the mpio driver. Unlike createHdf, writeHdf does not have to be called collectively, each rank can call writeHdf independently, so long as they do not try to write to the same index.

Parameters:
  • h5obj (h5py._hl.files.File or h5py._hl.group.Group) – A HDF file or group object to write the contents to.

  • myName (str) – The name of the group to write to. The group must have been created previously.

  • withPosterior (bool, optional) – Include writing any attached posterior.

  • index (int, optional) – If the group was created using the nRepeats option, index specifies the index’th entry at which to write the data

Example

>>> from geobipy import StatArray
>>> from mpi4py import MPI
>>> import h5py
>>> world = MPI.COMM_WORLD
>>> x = StatArray(4, name='test', units='units')
>>> x[:] = world.rank
>>> # This is a collective open of data in the file
>>> f = h5py.File(fName,'w', driver='mpio',comm=world)
>>> # Collective creation of space(padded by number of mpi ranks)
>>> x.createHdf(f, 'x', nRepeats=world.size)
>>> world.barrier()
>>> # In a non collective region, we can write to different sections of x in the file
>>> # Fake a non collective region
>>> def noncollectivewrite(x, file, world):
>>>     # Each rank carries out this code, but it's not collective.
>>>     x.writeHdf(file, 'x', index=world.rank)
>>> noncollectivewrite(x, f, world)
>>> world.barrier()
>>> f.close()
write_posterior_hdf(grp, index=None)