tdigest-0.2.1: On-line accumulation of rank-based statistics

Safe HaskellNone
LanguageHaskell2010

Data.TDigest.Tree.Internal

Contents

Description

Internals of TDigest.

Tree implementation is based on Adams’ Trees Revisited by Milan Straka http://fox.ucw.cz/papers/bbtree/bbtree.pdf

Synopsis

Documentation

data TDigest (compression :: Nat) #

TDigest is a tree of centroids.

compression is a 1/δ. The greater the value of compression the less likely value merging will happen.

Constructors

Node !Size !Mean !Weight !Weight !(TDigest compression) !(TDigest compression)

Tree node

Nil

Empty tree

Instances
KnownNat comp => Reducer Double (TDigest comp) #

Both cons and snoc are insert

Instance details

Defined in Data.TDigest.Tree.Internal

Methods

unit :: Double -> TDigest comp #

snoc :: TDigest comp -> Double -> TDigest comp #

cons :: Double -> TDigest comp -> TDigest comp #

Show (TDigest compression) # 
Instance details

Defined in Data.TDigest.Tree.Internal

Methods

showsPrec :: Int -> TDigest compression -> ShowS #

show :: TDigest compression -> String #

showList :: [TDigest compression] -> ShowS #

KnownNat comp => Semigroup (TDigest comp) # 
Instance details

Defined in Data.TDigest.Tree.Internal

Methods

(<>) :: TDigest comp -> TDigest comp -> TDigest comp #

sconcat :: NonEmpty (TDigest comp) -> TDigest comp #

stimes :: Integral b => b -> TDigest comp -> TDigest comp #

KnownNat comp => Monoid (TDigest comp) # 
Instance details

Defined in Data.TDigest.Tree.Internal

Methods

mempty :: TDigest comp #

mappend :: TDigest comp -> TDigest comp -> TDigest comp #

mconcat :: [TDigest comp] -> TDigest comp #

KnownNat comp => Binary (TDigest comp) #

TDigest isn't compressed after de-serialisation, but it can be still smaller.

Instance details

Defined in Data.TDigest.Tree.Internal

Methods

put :: TDigest comp -> Put #

get :: Get (TDigest comp) #

putList :: [TDigest comp] -> Put #

NFData (TDigest comp) #

TDigest has only strict fields.

Instance details

Defined in Data.TDigest.Tree.Internal

Methods

rnf :: TDigest comp -> () #

HasHistogram (TDigest comp) Maybe # 
Instance details

Defined in Data.TDigest.Tree.Internal

But*, the benefit vs. code explosion is not yet worth.

totalWeight :: TDigest comp -> Weight #

Total count of samples.

>>> totalWeight (tdigest [1..100] :: TDigest 5)
100.0

size :: TDigest comp -> Int #

minimumValue :: TDigest comp -> Mean #

Center of left-most centroid. Note: may be different than min element inserted.

>>> minimumValue (tdigest [1..100] :: TDigest 3)
1.0

maximumValue :: TDigest comp -> Mean #

Center of right-most centroid. Note: may be different than max element inserted.

>>> maximumValue (tdigest [1..100] :: TDigest 3)
99.0

combineDigest :: KnownNat comp => TDigest comp -> TDigest comp -> TDigest comp #

insertCentroid :: forall comp. KnownNat comp => Centroid -> TDigest comp -> TDigest comp #

node :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Constructor which calculates size and total weight.

balanceR :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Balance after right insertion.

balanceL :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Balance after left insertion.

node' :: Int -> Mean -> Weight -> Weight -> TDigest comp -> TDigest comp -> TDigest comp #

Alias to Node

singNode :: Mean -> Weight -> TDigest comp #

Create singular node.

combinedCentroid :: Mean -> Weight -> Mean -> Weight -> Centroid #

Add two weighted means together.

threshold #

Arguments

:: Double

total weight

-> Double

quantile

-> Double

compression (1/δ)

-> Double 

Calculate the threshold, i.e. maximum weight of centroid.

compress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp #

Compress TDigest.

Reinsert the centroids in "better" order (in original paper: in random) so they have opportunity to merge.

Compression will happen only if size is both: bigger than relMaxSize * comp and bigger than absMaxSize.

forceCompress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp #

Perform compression, even if current size says it's not necessary.

toMVector #

Arguments

:: KnownNat comp 
=> TDigest comp

t-Digest

-> ST s (MVector s (Centroid, Double))

return also a "space left in the centroid" value for "shuffling"

relMaxSize :: Int #

Relative size parameter. Hard-coded value: 25.

absMaxSize :: Int #

Absolute size parameter. Hard-coded value: 1000.

debugPrint :: TDigest comp -> IO () #

Output the TDigest tree.

validate :: TDigest comp -> Either String (TDigest comp) #

Check various invariants in the TDigest tree.

insert #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value into TDigest.

insert' #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value, don't compress TDigest even if needed.

For sensibly bounded input, it makes sense to let TDigest grow (it might grow linearly in size), and after that compress it once.

singleton :: KnownNat comp => Double -> TDigest comp #

Make a TDigest of a single data point.

tdigest :: (Foldable f, KnownNat comp) => f Double -> TDigest comp #

Strict foldl' over Foldable structure.

>>> :set -XDataKinds