StatArray
digraph inheritanced8e76283df { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "ABC" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Helper class that provides a standard way to create an ABC using"]; "DataArray" [URL="../core/DataArray.html#geobipy.src.classes.core.DataArray.DataArray",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class extension to numpy.ndarray"]; "ndarray" -> "DataArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "myObject" -> "DataArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "StatArray" [URL="#geobipy.src.classes.statistics.StatArray.StatArray",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class extension to numpy.ndarray"]; "DataArray" -> "StatArray" [arrowsize=0.5,style="setlinewidth(0.5)"]; "myObject" [URL="../core/myObject.html#geobipy.src.classes.core.myObject.myObject",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "ABC" -> "myObject" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ndarray" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="ndarray(shape, dtype=float, buffer=None, offset=0,"]; }- class geobipy.src.classes.statistics.StatArray.StatArray(shape=None, name=None, units=None, verbose=False, **kwargs)
Class extension to numpy.ndarray
This subclass to a numpy array contains extra attributes that can describe the parameters it represents. One can also attach prior and proposal distributions so that it may be used in an MCMC algorithm easily. Because this is a subclass to numpy, the StatArray contains all numpy array methods and when passed to an in-place numpy function will return a StatArray. See the example section for more information.
StatArray(shape, name=None, units=None, **kwargs)
- Parameters:
shape (int or sequence of ints or array_like or StatArray) –
If shape is int or sequence of ints : give the shape of the new StatArray e.g.,
2
or(2, 3)
. All other arguments that can be passed to functions like numpy.zeros or numpy.arange can be used, see Other Parameters.If shape is array_like : any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence. e.g.,
StatArray(numpy.arange(10))
will cast the result into a StatArray and will maintain the properies passed through to that function. One could then attach the name, units, prior, and/or proposal through this interface too. e.g.,x = StatArray(numpy.arange(10,dtype=numpy.int), name='aTest', units='someUnits')
If shape is StatArray : the returned object is a deepcopy of the input. If name and units are specified with this option they will replace those parameters in the copy. e.g.,
y = StatArray(x, name='anotherTest')
will be a deepcopy copy of x, but with a different name.
name (str, optional) – The name of the StatArray.
units (str, optional) – The units of the StatArray.
dtype (data-type, optional) – The desired data-type for the array. Default is numpy.float64. Only used when shape is int or sequence of ints. The data type could also be a class.
buffer (object exposing buffer interface, optional) – Used to fill the array with data. Only used when shape is int or sequence of ints.
offset (int, optional) – Offset of array data in buffer. Only used when shape is int or sequence of ints.
strides (tuple of ints, optional) – Strides of data in memory. Only used when shape is int or sequence of ints.
order ({'C', 'F', 'A'}, optional) – Specify the order of the array. If order is ‘C’, then the array will be in C-contiguous order (rightmost-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (leftmost-index varies the fastest). If order is ‘A’ (default), then the returned array may be in any order (either C-, Fortran-contiguous, or even discontiguous), unless a copy is required, in which case it will be C-contiguous. Only used when shape is int or sequence of ints.
- Returns:
out – Extension to numpy.ndarray with additional attributes attached.
- Return type:
- Raises:
TypeError – If name is not a str.
TypeError – If units is not a str.
Notes
When the StatArray is passed through a numpy function, the name and units are maintained in the new object. Any priors or proposals are not kept for two reasons. a) keep computational overheads low, b) assume that a possible change in size or meaning of a parameter warrants a change in any attached distributions.
Examples
Since the StatArray is an extension to numpy, all numpy attached methods can be used.
>>> from geobipy import StatArray >>> import numpy as np >>> x = StatArray(arange(10), name='test', units='units') >>> print(x.mean()) 4.5
If the StatArray is passed to a numpy function that does not return a new instantiation, a StatArray will be returned (as opposed to a numpy array)
>>> delete(x, 5) StatArray([0, 1, 2, 3, 4, 6, 7, 8, 9])
However, if you pass a StatArray to a numpy function that is not in-place, i.e. creates new memory, the return type will be a numpy array and NOT a StatArray subclass
>>> append(x,[3,4,5]) array([0, 1, 2, ..., 3, 4, 5])
See also
geobipy.src.classes.statistics.Distribution
For possible prior and proposal distributions
- property address
- property addressof
- centred_grid_nodes(spacing)
Generates grid nodes centred over bounds
- Parameters:
bounds (array_like) – bounds of the dimension
spacing (float) – distance between nodes
- copy(order='C')
Return a copy of the array.
- Parameters:
order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout of the copy. ‘C’ means C-order, ‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible. (Note that this function and
numpy.copy()
are very similar but have different default values for their order= arguments, and this function always passes sub-classes through.)
See also
numpy.copy
Similar function with different default behavior
numpy.copyto
Notes
This function is the preferred method for creating an array copy. The function
numpy.copy()
is similar, but it defaults to using order ‘K’, and will not pass sub-classes through by default.Examples
>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> y = x.copy()
>>> x.fill(0)
>>> x array([[0, 0, 0], [0, 0, 0]])
>>> y array([[1, 2, 3], [4, 5, 6]])
>>> y.flags['C_CONTIGUOUS'] True
For arrays containing Python objects (e.g. dtype=object), the copy is a shallow one. The new array will contain the same object which may lead to surprises if that object can be modified (is mutable):
>>> a = np.array([1, 'm', [2, 3, 4]], dtype=object) >>> b = a.copy() >>> b[2][0] = 10 >>> a array([1, 'm', list([10, 3, 4])], dtype=object)
To ensure all elements within an
object
array are copied, use copy.deepcopy:>>> import copy >>> a = np.array([1, 'm', [2, 3, 4]], dtype=object) >>> c = copy.deepcopy(a) >>> c[2][0] = 10 >>> c array([1, 'm', list([10, 3, 4])], dtype=object) >>> a array([1, 'm', list([2, 3, 4])], dtype=object)
- copyStats(other)
Copy statistical properties from other to self
[extended_summary]
- Parameters:
other ([type]) – [description]
- createHdf(h5obj, name, shape=None, withPosterior=True, add_axis=None, fillvalue=None, upcast=True)
Create the Metadata for a StatArray in a HDF file
Creates a new group in a HDF file under h5obj. A nested heirarchy will be created e.g., myName/data, myName/prior, and myName/proposal. This method can be used in an MPI parallel environment, if so however, a) the hdf file must have been opened with the mpio driver, and b) createHdf must be called collectively, i.e., called by every core in the MPI communicator that was used to open the file. In order to create large amounts of empty space before writing to it in parallel, the nRepeats parameter will extend the memory in the first dimension.
- Parameters:
h5obj (h5py.File or h5py.Group) – A HDF file or group object to create the contents in.
myName (str) – The name of the group to create.
withPosterior (bool, optional) – Include the creation of space for any attached posterior.
nRepeats (int, optional) – Inserts a first dimension into the shape of the StatArray of size nRepeats. This can be used to extend the available memory of the StatArray so that multiple MPI ranks can write to their respective parts in the extended memory.
fillvalue (number, optional) – Initializes the memory in file with the fill value
Notes
This method can be used in serial and MPI. As an example in MPI. Given 10 MPI ranks, each with a 10 length array, it is faster to create a 10x10 empty array, and have each rank write its row. Rather than creating 10 separate length 10 arrays because the overhead when creating the file metadata can become very cumbersome if done too many times.
Example
>>> from geobipy import StatArray >>> from mpi4py import MPI >>> import h5py
>>> world = MPI.COMM_WORLD
>>> x = StatArray(4, name='test', units='units') >>> x[:] = world.rank
>>> # This is a collective open of data in the file >>> f = h5py.File(fName,'w', driver='mpio',comm=world) >>> # Collective creation of space(padded by number of mpi ranks) >>> x.createHdf(f, 'x', nRepeats=world.size)
>>> world.barrier()
>>> # In a non collective region, we can write to different sections of x in the file >>> # Fake a non collective region >>> def noncollectivewrite(x, file, world): >>> # Each rank carries out this code, but it's not collective. >>> x.writeHdf(file, 'x', index=world.rank) >>> noncollectivewrite(x, f, world)
>>> world.barrier() >>> f.close()
- create_posterior_hdf(grp, add_axis, fillvalue, upcast)
- delete(i, axis=None)
Delete elements
- Parameters:
i (slice, int or array of ints) – Indicate which sub-arrays to remove.
axis (int, optional) – The axis along which to delete the subarray defined by obj. If axis is None, obj is applied to the flattened array.
- Returns:
out – Deepcopy of StatArray with deleted entry(ies).
- Return type:
- classmethod fromHdf(grp, name=None, index=None, skip_posterior=False, posterior_index=None)
Read the StatArray from a HDF group
Given the HDF group object, read the contents into a StatArray.
- Parameters:
h5obj (h5py._hl.group.Group) – A HDF group object to write the contents to.
index (slice, optional) – If the group was created using the nRepeats option, index specifies the index’th entry from which to read the data.
- property hasPosterior
Check that the StatArray has an attached posterior.
- Returns:
out – Has an attached posterior.
- Return type:
bool
- property hasPrior
Check that the StatArray has an attached prior.
- Returns:
out – Has an attached prior.
- Return type:
bool
- property hasProposal
Check that the StatArray has an attached proposal.
- Returns:
out – Has an attached proposal.
- Return type:
bool
- insert(i, values, axis=0)
Insert values
- Parameters:
i (int, slice or sequence of ints) – Object that defines the index or indices before which values is inserted.
values (array_like) – Values to insert into arr. If the type of values is different from that of arr, values is converted to the type of arr. values should be shaped so that
arr[...,obj,...] = values
is legal.axis (int, optional) – Axis along which to insert values. If axis is None then arr is flattened first.
- Returns:
out – StatArray after inserting a value.
- Return type:
- mahalanobis()
- property n_posteriors
- overlay_on_posteriors(overlay, ax, **kwargs)
- pad(N)
Copies the properties of a StatArray including all priors or proposals, but pads everything to the given size
- Parameters:
N (int) – Size to pad to.
- Returns:
out – Padded StatArray
- Return type:
- perturb(i=slice(None, None, None), relative=False, imposePrior=False, log=False)
Perturb the values of the StatArray using the attached proposal
The StatArray must have had a proposal set using StatArray.setProposal()
- Parameters:
i (slice or int or sequence of ints, optional) – Index or indices of self that should be perturbed.
relative (bool) – Update the StatArray relative to the current values or assign the new samples to the StatArray.
- Raises:
TypeError – If the proposal has not been set
- plot_posteriors(**kwargs)
Plot the posteriors of the StatArray.
- Parameters:
ax (matplotlib axis or list of ax, optional) – Must match the number of attached posteriors. * If not specified, subplots are created vertically.
- property posterior
Returns the posterior if available.
- posteriors_from_hdf(grp, index)
- property prior
Returns the prior if available.
- priorDerivative(order, i=None)
Get the derivative of the prior.
- Parameters:
order (int) – 1 or 2 for first or second order derivative
- probability(log, x=None, i=None, active=None)
Evaluate the probability of the values in self using the attached prior distribution
- Parameters:
arg1 (array_like, optional) – Will evaluate the probability of the numbers in the arg1 using the prior attached to self
i (slice or int or sequence of ints, optional) – Index or indices of self that should be evaluated.
- Returns:
numpy.float
- Return type:
out
- Raises:
TypeError – If the prior has not been set
- property proposal
Returns the prior if available.
- proposal_derivative(order, i=None)
Get the derivative of the proposal.
- Parameters:
order (int) – 1 or 2 for first or second order derivative
- propose(i=slice(None, None, None), relative=False, imposePrior=False, log=False)
Propose new values using the attached proposal distribution
- Parameters:
i (ints, optional) – Only propose values for these indices.
relative (bool, optional) – Use the proposal distribution as a relative change to the parameter values.
imposePrior (bool, optional) – Continue to propose new values until the prior probability is non-zero or -infinity.
log (bool, required if imposePrior is True.) – Use log probability when imposing the prior.
- Returns:
out – Valuese generated from the proposal.
- Return type:
array_like
- reset_posteriors()
- resize(new_shape)
Resize a StatArray
Resize a StatArray but copy over any attached attributes
- Parameters:
new_shape (int or tuple of ints) – Shape of the resized array
- Returns:
out – Resized array.
- Return type:
See also
numpy.resize
For more information.
- property summary
Write a summary of the StatArray
- Parameters:
out (bool) – Whether to return the summary or print to screen
- Returns:
out – Summary of StatArray
- Return type:
str, optional
- summaryPlot(**kwargs)
Creates a summary plot of the StatArray with any attached priors, proposal, or posteriors.
- update_posterior(**kwargs)
Adds the current values of the StatArray to the attached posterior.
- writeHdf(h5obj, name, withPosterior=True, index=None)
Write the values of a StatArray to a HDF file
Writes the contents of the StatArray to an already created group in a HDF file under h5obj. This method can be used in an MPI parallel environment, if so however, the hdf file must have been opened with the mpio driver. Unlike createHdf, writeHdf does not have to be called collectively, each rank can call writeHdf independently, so long as they do not try to write to the same index.
- Parameters:
h5obj (h5py._hl.files.File or h5py._hl.group.Group) – A HDF file or group object to write the contents to.
myName (str) – The name of the group to write to. The group must have been created previously.
withPosterior (bool, optional) – Include writing any attached posterior.
index (int, optional) – If the group was created using the nRepeats option, index specifies the index’th entry at which to write the data
Example
>>> from geobipy import StatArray >>> from mpi4py import MPI >>> import h5py
>>> world = MPI.COMM_WORLD
>>> x = StatArray(4, name='test', units='units') >>> x[:] = world.rank
>>> # This is a collective open of data in the file >>> f = h5py.File(fName,'w', driver='mpio',comm=world) >>> # Collective creation of space(padded by number of mpi ranks) >>> x.createHdf(f, 'x', nRepeats=world.size)
>>> world.barrier()
>>> # In a non collective region, we can write to different sections of x in the file >>> # Fake a non collective region >>> def noncollectivewrite(x, file, world): >>> # Each rank carries out this code, but it's not collective. >>> x.writeHdf(file, 'x', index=world.rank) >>> noncollectivewrite(x, f, world)
>>> world.barrier() >>> f.close()
- write_posterior_hdf(grp, index=None)