rcatool.stats package

Submodules

rcatool.stats.ASoP module

ASoP - Analyzing Scales of Precipitation

Reference: Klingaman et al (2017) https://www.geosci-model-dev.net/10/57/2017/

Authors: Petter Lind Created: Spring 2019 Updates:

May 2020

rcatool.stats.ASoP.asop(data, keepdims=False, axis=0, bins=None, thr=None, return_bins=False)[source]

Calculate ASoP parameters.

Parameters:
  • data (array) – 2D or 1D array of data. All data points are collectively used in the asop calculation unless ‘keepdims’ is True. Then calculation is performed along zeroth axis (expected time dimension).

  • keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the dimension defined by axis argument (default 0) and returns a 2d array of asop components. If set to False (default) all values are collectively assembled before calculation.

  • axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.

  • bins (list/array) – Defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is set to ‘None’ they will be automatically calculated using Klingaman bins; function bins_calc in this module.

  • thr (float) – Value of threshold if thresholding data. Default None.

  • return_bins (boolean) – If set to True (default False), bins that have been used in the calculation are returned.

Returns:

  • Cfactor (array) – data array with relative contribution per bin to the total mean.

  • FCfactor (array) – data array with relative contribution per bin independent of the total mean.

  • bins_ret (array) – If return_bins is True, the array of bin edges is returned.

rcatool.stats.ASoP.bins_calc(n, bintype='Klingaman')[source]

Calculates bins with edges according to Eq. 1 in Klingaman et al. (2017); https://www.geosci-model-dev.net/10/57/2017/

Parameters:
  • n (array/list) – 1D array or list with bin numbers

  • bintype (str) – The type of bins to be calculated; ‘Klingaman’ (see reference) or ‘exponential’ for exponential bins.

Returns:

bn – 1D array of bin edges

Return type:

array

rcatool.stats.arithmetics module

Functions for various arithmetic calculations.

Created: Autumn 2016 Authors: Petter Lind & David Lindstedt

rcatool.stats.arithmetics.run_mean(x, N, mode='valid')[source]

Calculate running mean

Return running mean of data vector x where N is the window size. mode key word argument describes how the edges should be handled. See numpy.convolve for more information.

rcatool.stats.bootstrap module

Bootstrapping

Functions for bootstrap calculations.

Authors: Petter Lind Created: Autumn 2016 Updates:

May 2020

rcatool.stats.bootstrap.block_bootstr(data, block=5, nrep=500, nproc=1)[source]

Calculate block bootstrap samples.

This is a block boostrap function, converted from R into python, based on: http://stat.wharton.upenn.edu/~buja/STAT-541/time-series-bootstrap.R

Parameters:
  • data (list/array) – 1D data array on which to perform the block bootstrap.

  • block (int) – the block length to be used. Default is 5.

  • nrep (int) – the number of resamples produced in the bootstrap. Default is 500.

  • nproc (int) – Number of processors, default 1. If larger than 1, multiple processors are used in parallell using the multiprocessing module.

Returns:

arrBt – 2D array with bootstrap samples; rows are the samples, columns the values.

Return type:

Array

rcatool.stats.climateindex module

Climate indices

Functions for various climate index calculations.

Authors: Petter Lind & David Lindstedt Created: Autumn 2016 Updates:

May 2020

rcatool.stats.climateindex.RRpX(data, percentile, thr=None, axis=0, keepdims=False)[source]

RRpX mm, total amount of precipitation above the percentile threshold pX; RR ≥ pX mm: Let RRij be the daily precipitation amount on day i in period j. Sum the precipitation for all days where RRij ≥ pX mm.

Parameters:
  • data (array) – 1D/2D data array, with time steps on the zeroth axis (axis=0).

  • percentile (int) – Percentile that defines the threshold.

  • thr (float/int) – Pre-thresholding of data to do calculation for wet days/hours only.

  • keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.

Returns:

RRpx – 1D/2D array with calculated RRpXX indices.

Return type:

list/array

rcatool.stats.climateindex.RRtX(data, thr, axis=0, keepdims=False)[source]

RRtX mm, total amount of precipitation above the threshold ‘thr’.

Parameters:
  • data (array) – 1D/2D data array, with time steps on the zeroth axis (axis=0).

  • thr (int) – Threshold that defines the threshold above which data is summed.

  • keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.

Returns:

RRtx – 1D/2D array with calculated RRtXX indices.

Return type:

list/array

rcatool.stats.climateindex.Rxx(data, thr=1.0, axis=0, normalize=False, keepdims=False)[source]

Rxx mm, count of any time units (days, hours, etc) when precipitation ≥ xx mm: Let RRij be the precipitation amount on time unit i in period j. Count the number of days where RRij ≥ xx mm.

Parameters:
  • data (array) – 1D/2D data array, with time steps on the zeroth axis (axis=0).

  • thr (float/int) – Threshold to be used; eg 10 for R10, 20 R20 etc. Default 1.0.

  • axis (int) – Along which axis to calculate Rxx. Defaults to 0

  • normalize (boolean) – If True (default False) the counts are normalized by total number of time units in each array/grid point. Returned values will then be fractions.

  • keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.

Returns:

Rxx – 1D/2D array with calculated Rxx indices.

Return type:

list/array

rcatool.stats.climateindex.SDII(data, thr=1.0, axis=0, keepdims=False)[source]

SDII, Simple pricipitation intensity index: Let RRwj be the daily precipitation amount on wet days, w (RR ≥ 1mm) in period j.

Parameters:
  • data (list/array) – 2D array.

  • thr (float/int) – threshold for wet events (wet days/hours etc)

  • axis (int) – The axis along which the calculation is applied (default 0).

  • keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the zeroth axis (time) and returns a 2d array of freq-int dists. If set to False (default) all values are collectively assembled before calculation.

rcatool.stats.climateindex.cdd(data, thr=1.0, periods=array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]), maxper=False, axis=0, keepdims=False)[source]

Calculate the Consecutive Dry Days index (CDD).

Parameters:
  • data (array) – 1D/2D daily precipitation data array in mm.

  • thr (float) – Value of threshold to define dry day. Default 1 mm.

  • periods (list/array) – Array of lenghts of dry periods to consider; e.g. [1, 3, 10, 14, 21, 30] computes frequency of dry periods with lengths 1-3 days, 3-10 days, etc. Leftmost interval edge is included, not the right. Default periods is set to 60 days with 1 day increment.

  • maxper (boolean) – If set to True the longest CDD period and positioned at last position in returned array. Default False.

  • axis (int) – Along which axis to calculate cdd. Defaults to 0

  • keepdims (boolean) – If False (default) calculation is performed on all data collectively, otherwise for each timeseries on each point in 2d space. ‘Axis’ then defines along which axis the timeseries are located.

Returns:

  • dlen (list) – list with lengths of each dry day event in timeseries

  • cdd (list/array) – 1D/2D array with frequencies of cdd intervals. For intervals where non exists, positions are set to NaN. Length of returned array (along computed ‘axis’) is equal to length of ‘periods’ list/array minus 1.

rcatool.stats.climateindex.ehi(data, thr_95, axis=0, keepdims=False)[source]

Calculate Excessive Heat Index (EHI).

Parameters:
  • data (list/array) – 1D/2D array of daily temperature timeseries

  • thr_95 (float) – 95th percentile daily mean value from climatology

  • axis (int) – The axis along which the calculation is applied (default 0).

  • keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the zeroth axis (time) and returns a 2d array of calculated statistics. If set to False (default) all values are collectively assembled before calculation.

Returns:

EHI – Excessive heat index

Return type:

float

rcatool.stats.climateindex.extr_hotdays_calc(data, thr_p95)[source]

Calculate number of extreme hotdays.

Return days with mean temperature above the 95th percentile of climatology.

Parameters:
  • data (array) – 1D-array of temperature input timeseries

  • thr_p95 (float) – 95th percentile daily mean value from climatology

rcatool.stats.climateindex.hotdays_calc(data, thr_p75)[source]

Calculate number of hotdays.

Return days with mean temperature above the 75th percentile of climatology.

Parameters:
  • data (array) – 1D-array of temperature input timeseries

  • thr_p75 (float) – 75th percentile daily mean value from climatology

rcatool.stats.climateindex.tropnights_calc(data)[source]

Calculate number of tropical nights.

Return days with minimum temperature not below 20 degrees C.

Parameters:

data (array) – 1D-array of minimum temperatures timeseries in degrees Kelvin

rcatool.stats.convolve module

rcatool.stats.convolve.convolve_fft(array, kernel, boundary='fill', fill_value=0, crop=True, return_fft=False, fft_pad=True, psf_pad=False, interpolate_nan=False, quiet=False, ignore_edge_zeros=False, min_wt=0.0, normalize_kernel=False, allow_huge=True, fftn=<function fftn>, ifftn=<function ifftn>)[source]

Convolve an ndarray with an nd-kernel. Returns a convolved image with shape = array.shape. Assumes kernel is centered.

convolve_fft differs from scipy.signal.fftconvolve in a few ways:

  • It can treat NaN values as zeros or interpolate over them.

  • inf values are treated as NaN

  • (optionally) It pads to the nearest 2^n size to improve FFT speed.

  • Its only valid mode is ‘same’ (i.e. the same shape array is returned)

  • It lets you use your own fft, e.g., pyFFTW <http://pypi.python.org/pypi/pyFFTW> or pyFFTW3 <http://pypi.python.org/pypi/PyFFTW3/0.2.1>, which can lead to performance improvements, depending on your system configuration. pyFFTW3 is threaded, and therefore may yield significant performance benefits on multi-core machines at the cost of greater memory requirements. Specify the fftn and ifftn keywords to override the default, which is numpy.fft.fft and numpy.fft.ifft.

Parameters:
  • array (numpy.ndarray) – Array to be convolved with kernel

  • kernel (numpy.ndarray) – Will be normalized if normalize_kernel is set. Assumed to be centered (i.e., shifts may result if your kernel is asymmetric)

  • boundary ({'fill', 'wrap'}, optional) – A flag indicating how to handle boundaries: * ‘fill’: set values outside the array boundary to fill_value (default) * ‘wrap’: periodic boundary

  • interpolate_nan (bool, optional) – The convolution will be re-weighted assuming NaN values are meant to be ignored, not treated as zero. If this is off, all NaN values will be treated as zero.

  • ignore_edge_zeros (bool, optional) – Ignore the zero-pad-created zeros. This will effectively decrease the kernel area on the edges but will not re-normalize the kernel. This parameter may result in ‘edge-brightening’ effects if you’re using a normalized kernel

  • min_wt (float, optional) – If ignoring NaN / zeros, force all grid points with a weight less than this value to NaN (the weight of a grid point with no ignored neighbors is 1.0). If min_wt is zero, then all zero-weight points will be set to zero instead of NaN (which they would be otherwise, because 1/0 = nan). See the examples below

  • normalize_kernel (function or boolean, optional) – If specified, this is the function to divide kernel by to normalize it. e.g., normalize_kernel=np.sum means that kernel will be modified to be: kernel = kernel / np.sum(kernel). If True, defaults to normalize_kernel = np.sum.

  • fft_pad (bool, optional) – Default on. Zero-pad image to the nearest 2^n

  • psf_pad (bool, optional) – Default off. Zero-pad image to be at least the sum of the image sizes (in order to avoid edge-wrapping when smoothing)

  • crop (bool, optional) – Default on. Return an image of the size of the largest input image. If the images are asymmetric in opposite directions, will return the largest image in both directions. For example, if an input image has shape [100,3] but a kernel with shape [6,6] is used, the output will be [100,6].

  • return_fft (bool, optional) – Return the fft(image)*fft(kernel) instead of the convolution (which is ifft(fft(image)*fft(kernel))). Useful for making PSDs.

  • fftn (functions, optional) – The fft and inverse fft functions. Can be overridden to use your own ffts, e.g. an fftw3 wrapper or scipy’s fftn, e.g. fftn=scipy.fftpack.fftn

  • ifftn (functions, optional) – The fft and inverse fft functions. Can be overridden to use your own ffts, e.g. an fftw3 wrapper or scipy’s fftn, e.g. fftn=scipy.fftpack.fftn

  • complex_dtype (np.complex, optional) – Which complex dtype to use. numpy has a range of options, from 64 to 256.

  • quiet (bool, optional) – Silence warning message about NaN interpolation

  • allow_huge (bool, optional) – Allow huge arrays in the FFT? If False, will raise an exception if the array or kernel size is >1 GB

Raises:

ValueError: – If the array is bigger than 1 GB after padding, will raise this exception unless allow_huge is True

See also

convolve

Convolve is a non-fft version of this code. It is more memory efficient and for small kernels can be faster.

Returns:

defaultarray convolved with kernel. If return_fft is set, returns fft(array) * fft(kernel). If crop is not set, returns the image, but with the fft-padded size instead of the input size

Return type:

ndarray

rcatool.stats.convolve.fft_prep(array, kernel, fill_value, boundary='fill', psf_pad=False, fft_pad=True)[source]

Prepare data array and kernel for fft computation.

Parameters:
  • boundary ({'fill', 'wrap'}, optional) –

    A flag indicating how to handle boundaries:

    • ’fill’: set values outside the array boundary to fill_value (default)

    • ’wrap’: periodic boundary

  • fft_pad (bool, optional) – Default on. Zero-pad image to the nearest 2^n

  • psf_pad (bool, optional) – Default off. Zero-pad image to be at least the sum of the image sizes (in order to avoid edge-wrapping when smoothing)

rcatool.stats.convolve.filtering(data, wgts, mode='valid', dim=1, axis=None, fft=False, fftn=<function fftn>, ifftn=<function ifftn>)[source]

1D and 2D filtering procedures.

Filters input data, both 1D and 2D, with user defined weights. Set fft to True for fast fourier transform to speed things up when data is large.

Parameters:
  • data (array) – Data to be filtered.

  • wgts (array/list) – The weights (kernel) to be used in the filtering.

  • mode (str) – String indicating the size of output (see https://docs.scipy.org/doc/scipy/reference/signal.html)

  • dim (int) – If 1 one-dimensional filtering is performed and if ‘axis’ is also set, 1D-filtering is applied along this axis. If dim=2 two-dimensional filtering is applied.

  • fft (boolean) – Set True to use fast fourier transform in the 2D filtering.

Returns:

data_conv – Convoluted data

Return type:

array

rcatool.stats.convolve.kernel_gen(n, ktype='square', kfun='mean')[source]

Function to create a kernel, i.e. a moving window (box or disk) with side/radius equal to ‘n’.

Parameters:
  • n (int) – Side/radius of square/disk of smoothening window.

  • ktype (string) – The type of box; ‘square’ (default) or ‘disk’.

  • kfun (string) – The function ‘kfun’ applied to each sub-sample within the moving window. Either ‘mean’ (default) or ‘sum’.

rcatool.stats.convolve.lanczos_filter(window, cutoff, cutoff_2=None, ftype='lowpass')[source]

Calculate weights for a low pass Lanczos filter.

Parameters:
  • window (int) – The length of the filter window.

  • cutoff (float) – The cutoff frequency in inverse time steps.

  • cutoff_2 (float) – The second cutoff frequency in inverse time steps. Only used if ftype is ‘bandpass’

  • ftype (str) – The type of cutoff filtering: ‘lowpass’, ‘highpass’ or ‘bandpass’.

Returns:

wgts – Array with calculated weights.

Return type:

vector

rcatool.stats.event_duration module

Event Duration Analysis (EDA) of Precipitation

Author: Petter Lind Created: Fall 2020 Updates:

rcatool.stats.event_duration.eda(data, keepdims=False, axis=0, thr=0.1, duration_bins=None, event_statistic='amount', statistic_bins=None, dry_events=False, dry_bins=None)[source]

Calculate event duration statistics

Parameters:
  • data (array) – 2D or 1D array of data. All data points are collectively used in the asop calculation unless ‘keepdims’ is True. Then calculation is performed along zeroth axis (expected time dimension).

  • keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the dimension defined by axis argument (default 0) and returns a 2d array of asop components. If set to False (default) all values are collectively assembled before calculation.

  • axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.

  • event_statistic (str) – The statistic to calculate for each event; choices are ‘amount’, ‘mean int’ or ‘max int’.

  • duration_bins (list/array) – Defines the bin edges for event durations, including the rightmost edge, allowing for non-uniform bin widths.

  • statistic_bins (list/array) – Defines the bin edges for event statistic (amount/mean/max), including the rightmost edge, allowing for non-uniform bin widths.

  • thr (float) – Value of threshold to identify start/end of events. Default 0.1.

  • dry_events (bool) – If set to True, duration of dry intervals will be calculated. ‘dry_bins’ must then be provided.

Returns:

eda_arr – data array with frequency of event statistic (amount, mean, max) per duration bin.

Return type:

array

rcatool.stats.pdf module

Probability distributions

Authors: Petter Lind Created: Spring 2015 Updates:

May 2020

rcatool.stats.pdf.freq_int_dist(data, keepdims=False, axis=0, bins=10, thr=None, density=True, ci=False, bootstrap=False, nmc=500, block=1, ci_level=95, nproc=1)[source]

Calculate frequency - instensity distriutions.

Parameters:
  • data (array) – 2D or 1D array of data. All data points are collectively used in the freq-instensity calculation unless ‘keepdims’ is True. Then calculation is performed along the dimension defined by axis argument (default 0).

  • keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the dimension defined by axis argument (default 0) and returns a 2d array of freq-int dists. If set to False (default) all values are collectively assembled before calculation.

  • axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.

  • bins (int/list/array) – If an int, it defines the number of equal-width bins in the given range (10, by default). If a sequence, it defines the bin edges, including the rightmost edge, allowing for non-uniform bin widths. If bins is set to ‘None’ they will be automatically calculated.

  • thr (float) – Value of threshold if thresholding data. Default None.

  • density (boolean) – If True (default) then the value of the probability density function at each bin is returned, otherwise the number of samples per bin.

  • bootstrap (boolean) – If to use block bootstrap to calculate confidence interval.

  • nmc (int/float) – Number of bootstrap samples to use.

  • block (int/float) – Size of block to use in block bootstrap

  • ci_level (int/float) – The confidence interval level to use (eg 95, 99 representing 95%, 99% levels)

  • nproc (int) – Number of processes to use in bootstrap calculation. Default 1.

Returns:

  • pdf (array) – data array with size len(bins)-1 with counts/probabilities

  • ci (dict) – data dictionary with confidence level for each bin; keys ‘min_levels’/’max_levels’ with corresponding values. If bootstrap is False, then None values are returned.

rcatool.stats.pdf.perkins_skill_score(p_mod, p_obs, axis=0)[source]

Calculate the Perkins Skill Score (PSS).

Parameters:
  • p_mod (list/array) – 1d or 2d arrays with frequency of values (probability) in a given bin from the model and observations respectively. Make sure that the sum of probabilities over all the bins should be equal to one. This depends on how the pdf was calculated. Bins with unity width gives total prob of one.

  • p_obs (list/array) – 1d or 2d arrays with frequency of values (probability) in a given bin from the model and observations respectively. Make sure that the sum of probabilities over all the bins should be equal to one. This depends on how the pdf was calculated. Bins with unity width gives total prob of one.

  • axis (int) – If data is 2d, the PSS will be calculated along this axis. Default axis is zero.

Returns:

pss – Returns Perkins Skill Score, single float or array with floats.

Return type:

float/array

rcatool.stats.pdf.prob_of_exceed(data, keepdims=False, axis=0, thr=None, pctls_levels=None)[source]

Calculates probability of exceedance distriutions.

Parameters:
  • data (array) – 2D or 1D array of data. All data points are collectively used in the freq-instensity calculation unless ‘keepdims’ is True. Then calculation is performed along zeroth axis.

  • pctls_levels ('default', None or array/list) – If set to ‘default’, probability levels of exceedance are defined by a set of percentiles ranging from 0-100 and calculated from input data. If an array or list, these levels (between 0 and 100) will be used instead. Default is None in which case input data is merely ranked from 0 to 1.

  • keepdims (boolean) – If data is 2d (time in third dimesion) and keepdims is set to True, calculation is applied to the zeroth axis (time) and returns a 2d array of freq-int dists. If set to False (default) all values are collectively assembled before calculation.

  • axis (int) – The axis over which to apply the calculation if keepdims is set to True. Default is 0.

  • thr (float) – Value of threshold if thresholding data. Default None.

Returns:

eop – exceedance of probability array.

Return type:

array

rcatool.stats.precipitation_index module

rcatool.stats.precipitation_index.precip_amount_survival_fraction(data, pctls, keepdims=False)[source]

Calculate the frequency distribution, normalize by total precipitation, and sum the fractions. It answers the question ‘what fraction of total precipitation occurs beyond the top p percentile of days in a period?’, where p is any percentile of interest.

rcatool.stats.precipitation_index.ranked_cumsum(data, axis=0, keepdims=False)[source]

Rank the data and calculate the cumulative sum, starting from the highest values.

rcatool.stats.sal module

SAL module

Functions for calculation of SAL statistics.

Based on Wernli et al 2008 http://journals.ametsoc.org/doi/abs/10.1175/2008MWR2415.1

Created: Spring 2018 Authors: Petter Lind & David Lindstedt

rcatool.stats.sal.A_stat(data, refdata)[source]

Calculate the amplitude component (A).

Parameters:
  • data (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.

  • refdata (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.

Returns:

A – The calculated amplitude component

Return type:

float

rcatool.stats.sal.L_stat(data, data_label, refdata, refdata_label)[source]

Function to determine the location component (L). It consists of two components, L1 and L2. L1: measures the normalized distance between the centers of mass of the modelled and observed fields. L2: The second considers the averaged distance between the center of mass of the total field and individual field objects.

Parameters:
  • data (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.

  • refdata (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.

  • data_label (arrays) – Arrays with labeled objects. Returned from the label() function.

  • refdata_label (arrays) – Arrays with labeled objects. Returned from the label() function.

Returns:

L1, L2, L – Dictionary with the calculated location components L1 and L2 as well as its composite L (L1 + L2).

Return type:

dictionary

rcatool.stats.sal.S_stat(data, data_label, refdata, refdata_label, obj_prop=True, lsmask=None)[source]

Function to calculate the structure component (S). The basic idea is to compare the volume of the normalized precipitation objects. This property captures information of the geometrical characteristics (size and shape) and how they differ between model (M) and reference (O).

Parameters:
  • data (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.

  • refdata (arrays) – 2D data arrays to be compared, where refdata is the reference data e.g. observations.

  • data_label (arrays) – Arrays with labeled objects. Returned from the label() function.

  • refdata_label (arrays) – Arrays with labeled objects. Returned from the label() function.

  • obj_prop (boolean) – If True, individual object (rain fall area) properties are calculated and returned.

  • lsmask (array) – Land/sea mask (2d boolean array) to characterize identified objects as land, ocean or coastal objects.

Returns:

  • S (float) – The calculated structure component.

  • area_props/ref_area_props (dictionary) – Dictionary containing properties of identified objects in data and refdata respectively.

rcatool.stats.sal.distfunc(x)[source]

Calculate distances

rcatool.stats.sal.remove_large_objects(segments, max_size)[source]

Remove large objects based on the maximum size limit defined by ‘max_size’.

Parameters:
  • segments (array) – Array with labeled objects. Returned from the label() function.

  • max_size (int) – Maximum size (number of grid points)

Returns:

out – The segments array with too large objects removed.

Return type:

array

rcatool.stats.sal.run_sal_analysis(data, refdata, thr_type, thr_val, obj_prop=True, obj_lower_size_limit=None, obj_upper_size_limit=None, smoothening_data_level=None, land_sea_mask=None, write_to_file=False, filename=None, nproc=1)[source]

Run the SAL analysis on the two data sets ‘data’ and ‘refdata’, where the latter is supposed to represent the ‘truth’.

Parameters:
  • data/refdata (arrays) – 2D data arrays with zeroth dimension representing time steps. Both data sets must have the same dimension sizes, i.e. both in time and space.

  • thr_type (string) – Type of threshold to use. See ‘threshold’ function for more information.

  • thr_val (float/int) – Value of threshold.

  • obj_prop (boolean) – If True (default), a number of object area properties are returned for each of the identified objects. See ‘S_stat’ function for more information.

  • obj_lower_size_limit (int) – If set, all objects with an area (number of connected grid points) lower than the value set is removed from analysis.

  • obj_upper_size_limit (int) – If set, all objects with an area (number of connected grid points) greater than the value set is removed from analysis.

  • smoothening_data_level (int) – If set, the number represents the # of grid points of the side of a moving window used to smooth the data arrays. Mean value within window is calculated.

  • land_sea_mask (array/None) – If set, land_sea_mask must be a 2d boolean array with same dimension as input data. The land/sea-mask is then used to identify objects as either land (1), ocean (0) or coastal (2) in the object properties dictionary. Thus, mask only used if obj_prop=True. N.B. Mask must have True for ocean points and False for land points.

  • write_to_file (boolean) – Whether to write results to disk.

  • filename (str) – Name of file for writing to disk.

  • nproc (int) – Number of processors to use in calculation. If larger than 1 (default), the multiprocessing module is used to distribute the calculation in the time dimension.

Returns:

  • out_dict (dictionary) – Dictionary with calculated SAL statistics and area properties (if obj_prop is set to True).

  • nc (file) – If ‘write_to_file’ is True, results are written to disk in a netcdf file.

rcatool.stats.sal.sal_calc(tstep, data, refdata, thr_t, thr_v, obj_prop=True, olsl=None, ousl=None, smlvl=None, land_sea_mask=None)[source]

Perform the SAL calculation using the S, A, L functions.

rcatool.stats.sal.threshold(data, thr_type, value)[source]

Function to calculate the threshold to be used to identify objects.

Parameters:
  • data (array) – 2D data array.

  • thr_type (string) – Type of threshold. Can be either “S” for specified (any absolute value), “F” for a fraction (between 0 and 1) of the maximum value, and “P” for a percentile (95 for the 95th percentile etc).

  • value (int/float) – The corresponding value based on the chosen threhold type.

rcatool.stats.sal.write_to_disk(ddict, nt, fname, attrs)[source]

rcatool.stats.t_test module

rcatool.stats.t_test.ttest_1d(data1, data2, alpha)[source]

Function for calculating the t-test for two independent samples

Module contents