rcatool.stats package
Submodules
rcatool.stats.ASoP module
ASoP - Analyzing Scales of Precipitation
Reference: Klingaman et al (2017) https://www.geosci-model-dev.net/10/57/2017/
Authors: Petter Lind Created: Spring 2019 Updates:
May 2020
- rcatool.stats.ASoP.asop(data, keepdims=False, axis=0, bins=None, thr=None, return_bins=False)[source]
Calculate ASoP parameters.
- Parameters:
data (array) – 2D or 1D array of data. All data points are collectively used in the asop calculation unless ‘keepdims’ is True. Then calculation is performed along zeroth axis (expected time dimension).
keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the dimension defined by axis argument (default 0) and returns a 2d array of asop components. If set to False (default) all values are collectively assembled before calculation.
axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.
bins (list/array) – Defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is set to ‘None’ they will be automatically calculated using Klingaman bins; function bins_calc in this module.
thr (float) – Value of threshold if thresholding data. Default None.
return_bins (boolean) – If set to True (default False), bins that have been used in the calculation are returned.
- Returns:
Cfactor (array) – data array with relative contribution per bin to the total mean.
FCfactor (array) – data array with relative contribution per bin independent of the total mean.
bins_ret (array) – If return_bins is True, the array of bin edges is returned.
- rcatool.stats.ASoP.bins_calc(n, bintype='Klingaman')[source]
Calculates bins with edges according to Eq. 1 in Klingaman et al. (2017); https://www.geosci-model-dev.net/10/57/2017/
- Parameters:
n (array/list) – 1D array or list with bin numbers
bintype (str) – The type of bins to be calculated; ‘Klingaman’ (see reference) or ‘exponential’ for exponential bins.
- Returns:
bn – 1D array of bin edges
- Return type:
array
rcatool.stats.arithmetics module
Functions for various arithmetic calculations.
Created: Autumn 2016 Authors: Petter Lind & David Lindstedt
rcatool.stats.bootstrap module
Bootstrapping
Functions for bootstrap calculations.
Authors: Petter Lind Created: Autumn 2016 Updates:
May 2020
- rcatool.stats.bootstrap.block_bootstr(data, block=5, nrep=500, nproc=1)[source]
Calculate block bootstrap samples.
This is a block boostrap function, converted from R into python, based on: http://stat.wharton.upenn.edu/~buja/STAT-541/time-series-bootstrap.R
- Parameters:
data (list/array) – 1D data array on which to perform the block bootstrap.
block (int) – the block length to be used. Default is 5.
nrep (int) – the number of resamples produced in the bootstrap. Default is 500.
nproc (int) – Number of processors, default 1. If larger than 1, multiple processors are used in parallell using the multiprocessing module.
- Returns:
arrBt – 2D array with bootstrap samples; rows are the samples, columns the values.
- Return type:
Array
rcatool.stats.climateindex module
Climate indices
Functions for various climate index calculations.
Authors: Petter Lind & David Lindstedt Created: Autumn 2016 Updates:
May 2020
- rcatool.stats.climateindex.RRpX(data, percentile, thr=None, axis=0, keepdims=False)[source]
RRpX mm, total amount of precipitation above the percentile threshold pX; RR ≥ pX mm: Let RRij be the daily precipitation amount on day i in period j. Sum the precipitation for all days where RRij ≥ pX mm.
- Parameters:
data (array) – 1D/2D data array, with time steps on the zeroth axis (axis=0).
percentile (int) – Percentile that defines the threshold.
thr (float/int) – Pre-thresholding of data to do calculation for wet days/hours only.
keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.
- Returns:
RRpx – 1D/2D array with calculated RRpXX indices.
- Return type:
list/array
- rcatool.stats.climateindex.RRtX(data, thr, axis=0, keepdims=False)[source]
RRtX mm, total amount of precipitation above the threshold ‘thr’.
- Parameters:
data (array) – 1D/2D data array, with time steps on the zeroth axis (axis=0).
thr (int) – Threshold that defines the threshold above which data is summed.
keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.
- Returns:
RRtx – 1D/2D array with calculated RRtXX indices.
- Return type:
list/array
- rcatool.stats.climateindex.Rxx(data, thr=1.0, axis=0, normalize=False, keepdims=False)[source]
Rxx mm, count of any time units (days, hours, etc) when precipitation ≥ xx mm: Let RRij be the precipitation amount on time unit i in period j. Count the number of days where RRij ≥ xx mm.
- Parameters:
data (array) – 1D/2D data array, with time steps on the zeroth axis (axis=0).
thr (float/int) – Threshold to be used; eg 10 for R10, 20 R20 etc. Default 1.0.
axis (int) – Along which axis to calculate Rxx. Defaults to 0
normalize (boolean) – If True (default False) the counts are normalized by total number of time units in each array/grid point. Returned values will then be fractions.
keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.
- Returns:
Rxx – 1D/2D array with calculated Rxx indices.
- Return type:
list/array
- rcatool.stats.climateindex.SDII(data, thr=1.0, axis=0, keepdims=False)[source]
SDII, Simple pricipitation intensity index: Let RRwj be the daily precipitation amount on wet days, w (RR ≥ 1mm) in period j.
- Parameters:
data (list/array) – 2D array.
thr (float/int) – threshold for wet events (wet days/hours etc)
axis (int) – The axis along which the calculation is applied (default 0).
keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the zeroth axis (time) and returns a 2d array of freq-int dists. If set to False (default) all values are collectively assembled before calculation.
- rcatool.stats.climateindex.cdd(data, thr=1.0, periods=array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]), maxper=False, axis=0, keepdims=False)[source]
Calculate the Consecutive Dry Days index (CDD).
- Parameters:
data (array) – 1D/2D daily precipitation data array in mm.
thr (float) – Value of threshold to define dry day. Default 1 mm.
periods (list/array) – Array of lenghts of dry periods to consider; e.g. [1, 3, 10, 14, 21, 30] computes frequency of dry periods with lengths 1-3 days, 3-10 days, etc. Leftmost interval edge is included, not the right. Default periods is set to 60 days with 1 day increment.
maxper (boolean) – If set to True the longest CDD period and positioned at last position in returned array. Default False.
axis (int) – Along which axis to calculate cdd. Defaults to 0
keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.
- Returns:
dlen (list) – list with lengths of each dry day event in timeseries
cdd (list/array) – 1D/2D array with frequencies of cdd intervals. For intervals where non exists, positions are set to NaN. Length of returned array (along computed ‘axis’) is equal to length of ‘periods’ list/array minus 1.
- rcatool.stats.climateindex.ehi(data, thr_95, axis=0, keepdims=False)[source]
Calculate Excessive Heat Index (EHI).
- Parameters:
data (list/array) – 1D/2D array of daily temperature timeseries
thr_95 (float) – 95th percentile daily mean value from climatology
axis (int) – The axis along which the calculation is applied (default 0).
keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the zeroth axis (time) and returns a 2d array of calculated statistics. If set to False (default) all values are collectively assembled before calculation.
- Returns:
EHI – Excessive heat index
- Return type:
float
- rcatool.stats.climateindex.extr_hotdays_calc(data, thr_p95)[source]
Calculate number of extreme hotdays.
Return days with mean temperature above the 95th percentile of climatology.
- Parameters:
data (array) – 1D-array of temperature input timeseries
thr_p95 (float) – 95th percentile daily mean value from climatology
- rcatool.stats.climateindex.hotdays_calc(data, thr_p75)[source]
Calculate number of hotdays.
Return days with mean temperature above the 75th percentile of climatology.
- Parameters:
data (array) – 1D-array of temperature input timeseries
thr_p75 (float) – 75th percentile daily mean value from climatology
rcatool.stats.convolve module
- rcatool.stats.convolve.convolve_fft(array, kernel, boundary='fill', fill_value=0, crop=True, return_fft=False, fft_pad=True, psf_pad=False, interpolate_nan=False, quiet=False, ignore_edge_zeros=False, min_wt=0.0, normalize_kernel=False, allow_huge=True, fftn=<function fftn>, ifftn=<function ifftn>)[source]
Convolve an ndarray with an nd-kernel. Returns a convolved image with shape = array.shape. Assumes kernel is centered.
convolve_fft differs from scipy.signal.fftconvolve in a few ways:
It can treat
NaNvalues as zeros or interpolate over them.infvalues are treated asNaN(optionally) It pads to the nearest 2^n size to improve FFT speed.
Its only valid
modeis ‘same’ (i.e. the same shape array is returned)It lets you use your own fft, e.g., pyFFTW <http://pypi.python.org/pypi/pyFFTW> or pyFFTW3 <http://pypi.python.org/pypi/PyFFTW3/0.2.1>, which can lead to performance improvements, depending on your system configuration. pyFFTW3 is threaded, and therefore may yield significant performance benefits on multi-core machines at the cost of greater memory requirements. Specify the
fftnandifftnkeywords to override the default, which is numpy.fft.fft and numpy.fft.ifft.
- Parameters:
array (numpy.ndarray) – Array to be convolved with
kernelkernel (numpy.ndarray) – Will be normalized if
normalize_kernelis set. Assumed to be centered (i.e., shifts may result if your kernel is asymmetric)boundary ({'fill', 'wrap'}, optional) – A flag indicating how to handle boundaries: * ‘fill’: set values outside the array boundary to fill_value (default) * ‘wrap’: periodic boundary
interpolate_nan (bool, optional) – The convolution will be re-weighted assuming
NaNvalues are meant to be ignored, not treated as zero. If this is off, allNaNvalues will be treated as zero.ignore_edge_zeros (bool, optional) – Ignore the zero-pad-created zeros. This will effectively decrease the kernel area on the edges but will not re-normalize the kernel. This parameter may result in ‘edge-brightening’ effects if you’re using a normalized kernel
min_wt (float, optional) – If ignoring
NaN/ zeros, force all grid points with a weight less than this value toNaN(the weight of a grid point with no ignored neighbors is 1.0). Ifmin_wtis zero, then all zero-weight points will be set to zero instead ofNaN(which they would be otherwise, because 1/0 = nan). See the examples belownormalize_kernel (function or boolean, optional) – If specified, this is the function to divide kernel by to normalize it. e.g.,
normalize_kernel=np.summeans that kernel will be modified to be:kernel = kernel / np.sum(kernel). If True, defaults tonormalize_kernel = np.sum.fft_pad (bool, optional) – Default on. Zero-pad image to the nearest 2^n
psf_pad (bool, optional) – Default off. Zero-pad image to be at least the sum of the image sizes (in order to avoid edge-wrapping when smoothing)
crop (bool, optional) – Default on. Return an image of the size of the largest input image. If the images are asymmetric in opposite directions, will return the largest image in both directions. For example, if an input image has shape [100,3] but a kernel with shape [6,6] is used, the output will be [100,6].
return_fft (bool, optional) – Return the fft(image)*fft(kernel) instead of the convolution (which is ifft(fft(image)*fft(kernel))). Useful for making PSDs.
fftn (functions, optional) – The fft and inverse fft functions. Can be overridden to use your own ffts, e.g. an fftw3 wrapper or scipy’s fftn, e.g.
fftn=scipy.fftpack.fftnifftn (functions, optional) – The fft and inverse fft functions. Can be overridden to use your own ffts, e.g. an fftw3 wrapper or scipy’s fftn, e.g.
fftn=scipy.fftpack.fftncomplex_dtype (np.complex, optional) – Which complex dtype to use. numpy has a range of options, from 64 to 256.
quiet (bool, optional) – Silence warning message about NaN interpolation
allow_huge (bool, optional) – Allow huge arrays in the FFT? If False, will raise an exception if the array or kernel size is >1 GB
- Raises:
ValueError: – If the array is bigger than 1 GB after padding, will raise this exception unless allow_huge is True
See also
convolveConvolve is a non-fft version of this code. It is more memory efficient and for small kernels can be faster.
- Returns:
default – array convolved with
kernel. Ifreturn_fftis set, returns fft(array) * fft(kernel). If crop is not set, returns the image, but with the fft-padded size instead of the input size- Return type:
ndarray
- rcatool.stats.convolve.fft_prep(array, kernel, fill_value, boundary='fill', psf_pad=False, fft_pad=True)[source]
Prepare data array and kernel for fft computation.
- Parameters:
boundary ({'fill', 'wrap'}, optional) –
A flag indicating how to handle boundaries:
’fill’: set values outside the array boundary to fill_value (default)
’wrap’: periodic boundary
fft_pad (bool, optional) – Default on. Zero-pad image to the nearest 2^n
psf_pad (bool, optional) – Default off. Zero-pad image to be at least the sum of the image sizes (in order to avoid edge-wrapping when smoothing)
- rcatool.stats.convolve.filtering(data, wgts, mode='valid', dim=1, axis=None, fft=False, fftn=<function fftn>, ifftn=<function ifftn>)[source]
1D and 2D filtering procedures.
Filters input data, both 1D and 2D, with user defined weights. Set fft to True for fast fourier transform to speed things up when data is large.
- Parameters:
data (array) – Data to be filtered.
wgts (array/list) – The weights (kernel) to be used in the filtering.
mode (str) – String indicating the size of output (see https://docs.scipy.org/doc/scipy/reference/signal.html)
dim (int) – If 1 one-dimensional filtering is performed and if ‘axis’ is also set, 1D-filtering is applied along this axis. If dim=2 two-dimensional filtering is applied.
fft (boolean) – Set True to use fast fourier transform in the 2D filtering.
- Returns:
data_conv – Convoluted data
- Return type:
array
- rcatool.stats.convolve.kernel_gen(n, ktype='square', kfun='mean')[source]
Function to create a kernel, i.e. a moving window (box or disk) with side/radius equal to ‘n’.
- Parameters:
n (int) – Side/radius of square/disk of smoothening window.
ktype (string) – The type of box; ‘square’ (default) or ‘disk’.
kfun (string) – The function ‘kfun’ applied to each sub-sample within the moving window. Either ‘mean’ (default) or ‘sum’.
- rcatool.stats.convolve.lanczos_filter(window, cutoff, cutoff_2=None, ftype='lowpass')[source]
Calculate weights for a low pass Lanczos filter.
- Parameters:
window (int) – The length of the filter window.
cutoff (float) – The cutoff frequency in inverse time steps.
cutoff_2 (float) – The second cutoff frequency in inverse time steps. Only used if ftype is ‘bandpass’
ftype (str) – The type of cutoff filtering: ‘lowpass’, ‘highpass’ or ‘bandpass’.
- Returns:
wgts – Array with calculated weights.
- Return type:
vector
rcatool.stats.event_duration module
Event Duration Analysis (EDA) of Precipitation
Author: Petter Lind Created: Fall 2020 Updates:
- rcatool.stats.event_duration.eda(data, keepdims=False, axis=0, thr=0.1, duration_bins=None, event_statistic='amount', statistic_bins=None, dry_events=False, dry_bins=None)[source]
Calculate event duration statistics
- Parameters:
data (array) – 2D or 1D array of data. All data points are collectively used in the asop calculation unless ‘keepdims’ is True. Then calculation is performed along zeroth axis (expected time dimension).
keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the dimension defined by axis argument (default 0) and returns a 2d array of asop components. If set to False (default) all values are collectively assembled before calculation.
axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.
event_statistic (str) – The statistic to calculate for each event; choices are ‘amount’, ‘mean int’ or ‘max int’.
duration_bins (list/array) – Defines the bin edges for event durations, including the rightmost edge, allowing for non-uniform bin widths.
statistic_bins (list/array) – Defines the bin edges for event statistic (amount/mean/max), including the rightmost edge, allowing for non-uniform bin widths.
thr (float) – Value of threshold to identify start/end of events. Default 0.1.
dry_events (bool) – If set to True, duration of dry intervals will be calculated. ‘dry_bins’ must then be provided.
- Returns:
eda_arr – data array with frequency of event statistic (amount, mean, max) per duration bin.
- Return type:
array
rcatool.stats.pdf module
Probability distributions
Authors: Petter Lind Created: Spring 2015 Updates:
May 2020
- rcatool.stats.pdf.freq_int_dist(data, keepdims=False, axis=0, bins=10, thr=None, density=True, ci=False, bootstrap=False, nmc=500, block=1, ci_level=95, nproc=1)[source]
Calculate frequency - instensity distriutions.
- Parameters:
data (array) – 2D or 1D array of data. All data points are collectively used in the freq-instensity calculation unless ‘keepdims’ is True. Then calculation is performed along the dimension defined by axis argument (default 0).
keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the dimension defined by axis argument (default 0) and returns a 2d array of freq-int dists. If set to False (default) all values are collectively assembled before calculation.
axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.
bins (int/list/array) – If an int, it defines the number of equal-width bins in the given range (10, by default). If a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is set to ‘None’ they will be automatically calculated.
thr (float) – Value of threshold if thresholding data. Default None.
density (boolean) – If True (default) then the value of the probability density function at each bin is returned, otherwise the number of samples per bin.
bootstrap (boolean) – If to use block bootstrap to calculate confidence interval.
nmc (int/float) – Number of bootstrap samples to use.
block (int/float) – Size of block to use in block bootstrap
ci_level (int/float) – The confidence interval level to use (eg 95, 99 representing 95%, 99% levels)
nproc (int) – Number of processes to use in bootstrap calculation. Default 1.
- Returns:
pdf (array) – data array with size len(bins)-1 with counts/probabilities
ci (dict) – data dictionary with confidence level for each bin; keys ‘min_levels’/’max_levels’ with corresponding values. If bootstrap is False, then None values are returned.
- rcatool.stats.pdf.perkins_skill_score(p_mod, p_obs, axis=0)[source]
Calculate the Perkins Skill Score (PSS).
- Parameters:
p_mod (list/array) – 1d or 2d arrays with frequency of values (probability) in a given bin from the model and observations respectively. Make sure that the sum of probabilities over all the bins should be equal to one. This depends on how the pdf was calculated. Bins with unity width gives total prob of one.
p_obs (list/array) – 1d or 2d arrays with frequency of values (probability) in a given bin from the model and observations respectively. Make sure that the sum of probabilities over all the bins should be equal to one. This depends on how the pdf was calculated. Bins with unity width gives total prob of one.
axis (int) – If data is 2d, the PSS will be calculated along this axis. Default axis is zero.
- Returns:
pss – Returns Perkins Skill Score, single float or array with floats.
- Return type:
float/array
- rcatool.stats.pdf.prob_of_exceed(data, keepdims=False, axis=0, thr=None, pctls_levels=None)[source]
Calculates probability of exceedance distriutions.
- Parameters:
data (array) – 2D or 1D array of data. All data points are collectively used in the freq-instensity calculation unless ‘keepdims’ is True. Then calculation is performed along zeroth axis.
pctls_levels ('default', None or array/list) – If set to ‘default’, probability levels of exceedance are defined by a set of percentiles ranging from 0-100 and calculated from input data. If an array or list, these levels (between 0 and 100) will be used instead. Default is None in which case input data is merely ranked from 0 to 1.
keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the zeroth axis (time) and returns a 2d array of freq-int dists. If set to False (default) all values are collectively assembled before calculation.
axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.
thr (float) – Value of threshold if thresholding data. Default None.
- Returns:
eop – exceedance of probability array.
- Return type:
array
rcatool.stats.precipitation_index module
- rcatool.stats.precipitation_index.precip_amount_survival_fraction(data, pctls, keepdims=False)[source]
Calculate the frequency distribution, normalize by total precipitation, and sum the fractions. It answers the question ‘what fraction of total precipitation occurs beyond the top p percentile of days in a period?’, where p is any percentile of interest.
rcatool.stats.sal module
SAL module
Functions for calculation of SAL statistics.
Based on Wernli et al 2008 http://journals.ametsoc.org/doi/abs/10.1175/2008MWR2415.1
Created: Spring 2018 Authors: Petter Lind & David Lindstedt
- rcatool.stats.sal.A_stat(data, refdata)[source]
Calculate the amplitude component (A).
- Parameters:
data (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.
refdata (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.
- Returns:
A – The calculated amplitude component
- Return type:
float
- rcatool.stats.sal.L_stat(data, data_label, refdata, refdata_label)[source]
Function to determine the location component (L). It consists of two components, L1 and L2. L1: measures the normalized distance between the centers of mass of the modelled and observed fields. L2: The second considers the averaged distance between the center of mass of the total field and individual field objects.
- Parameters:
data (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.
refdata (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.
data_label (arrays) – Arrays with labeled objects. Returned from the label() function.
refdata_label (arrays) – Arrays with labeled objects. Returned from the label() function.
- Returns:
L1, L2, L – Dictionary with the calculated location components L1 and L2 as well as its composite L (L1 + L2).
- Return type:
dictionary
- rcatool.stats.sal.S_stat(data, data_label, refdata, refdata_label, obj_prop=True, lsmask=None)[source]
Function to calculate the structure component (S). The basic idea is to compare the volume of the normalized precipitation objects. This property captures information of the geometrical characteristics (size and shape) and how they differ between model (M) and reference (O).
- Parameters:
data (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.
refdata (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.
data_label (arrays) – Arrays with labeled objects. Returned from the label() function.
refdata_label (arrays) – Arrays with labeled objects. Returned from the label() function.
obj_prop (boolean) – If True, individual object (rain fall area) properties are calculated and returned.
lsmask (array) – Land/sea mask (2d boolean array) to characterize identified objects as land, ocean or coastal objects.
- Returns:
S (float) – The calculated structure component.
area_props/ref_area_props (dictionary) – Dictionary containing properties of identified objects in data and refdata respectively.
- rcatool.stats.sal.remove_large_objects(segments, max_size)[source]
Remove large objects based on the maximum size limit defined by ‘max_size’.
- Parameters:
segments (array) – Array with labeled objects. Returned from the label() function.
max_size (int) – Maximum size (number of grid points)
- Returns:
out – The segments array with too large objects removed.
- Return type:
array
- rcatool.stats.sal.run_sal_analysis(data, refdata, thr_type, thr_val, obj_prop=True, obj_lower_size_limit=None, obj_upper_size_limit=None, smoothening_data_level=None, land_sea_mask=None, write_to_file=False, filename=None, nproc=1)[source]
Run the SAL analysis on the two data sets ‘data’ and ‘refdata’, where the latter is supposed to represent the ‘truth’.
- Parameters:
data/refdata (arrays) – 2D data arrays with zeroth dimension representing time steps. Both data sets must have the same dimension sizes, i.e. both in time and space.
thr_type (string) – Type of threshold to use. See ‘threshold’ function for more information.
thr_val (float/int) – Value of threshold.
obj_prop (boolean) – If True (default), a number of object area properties are returned for each of the identified objects. See ‘S_stat’ function for more information.
obj_lower_size_limit (int) – If set, all objects with an area (number of connected grid points) lower than the value set is removed from analysis.
obj_upper_size_limit (int) – If set, all objects with an area (number of connected grid points) greater than the value set is removed from analysis.
smoothening_data_level (int) – If set, the number represents the # of grid points of the side of a moving window used to smooth the data arrays. Mean value within window is calculated.
land_sea_mask (array/None) – If set, land_sea_mask must be a 2d boolean array with same dimension as input data. The land/sea-mask is then used to identify objects as either land (1), ocean (0) or coastal (2) in the object properties dictionary. Thus, mask only used if obj_prop=True. N.B. Mask must have True for ocean points and False for land points.
write_to_file (boolean) – Whether to write results to disk.
filename (str) – Name of file for writing to disk.
nproc (int) – Number of processors to use in calculation. If larger than 1 (default), the multiprocessing module is used to distribute the calculation in the time dimension.
- Returns:
out_dict (dictionary) – Dictionary with calculated SAL statistics and area properties (if obj_prop is set to True).
nc (file) – If ‘write_to_file’ is True, results are written to disk in a netcdf file.
- rcatool.stats.sal.sal_calc(tstep, data, refdata, thr_t, thr_v, obj_prop=True, olsl=None, ousl=None, smlvl=None, land_sea_mask=None)[source]
Perform the SAL calculation using the S, A, L functions.
- rcatool.stats.sal.threshold(data, thr_type, value)[source]
Function to calculate the threshold to be used to identify objects.
- Parameters:
data (array) – 2D data array.
thr_type (string) – Type of threshold. Can be either “S” for specified (any absolute value), “F” for a fraction (between 0 and 1) of the maximum value, and “P” for a percentile (95 for the 95th percentile etc).
value (int/float) – The corresponding value based on the chosen threhold type.