histc
computes the histogram of a simple series of data
Syntax
Heights = histc(Data) Heights = histc(Data, nbins) Heights = histc(Data, binsWidth) Heights = histc(Data, binsAlgo) Heights = histc(Data, binsEdges) Heights = histc(Data, binsValues [, "discrete"]) Heights = histc(Data, , Options) Heights = histc(Data, .. , Options) [Heights, jokers] = histc(Data, ..) [Heights, jokers, bins] = histc(Data, ..) [Heights, jokers, bins, inBin] = histc(Data, ..)
Arguments
Input arguments
 Data
vector, matrix or hypermatrix of encoded integers, decimal numbers, complex numbers, polynomials, or texts. Sparseencoded matrices are accepted.
Data
must have at least 2 components.histc([],..)
returns[]
for every output argument. Numerical
Data
may includeInf
inite orNaN
values. However,NaN
values are never binned in the histogram;Inf
inite values can be binned only in categorial histograms.  Textual
Data
may include empty texts""
or extendedascii or UTF8 characters.
 Binning:
histc
allows to define the set of histogram bins in several ways depending on theData
type and on the need. Two major binning types / histogram modes can be used:continuous contiguous ranging bins :this is meaningful whether
Data
values are sortable. This is the case for encoded integers, decimal numbers, and texts.histc()
continuously bins complex numbers considering only their real parts. Any number with either a real or imaginary part set to
%nan
,%inf
, or to+%inf
is excluded from bins and from the histogram.  For sparseencoded
Data
, the zero value is not taken into account to define the whole binning range.
In this case, bins are defined by their edges. For a given bin, any data value being between the bin's edges belongs to it.
discrete / categorial binning mode :this can be used for any
Data
type. It is the only binning mode available for polynomial data.A categorial bin  aka category  is defined by its value: any data belongs to the bin if its value is equal to the bin's value.
AnyData
or bin's value beingNaN
is canceled before computing the categorial histogram.
 (default)
When no binning specification is provided,
 For integers, decimal, or complex numbers, the
"sqrt"
binning algorithm is used See herebelow for more informations.  For texts and polynomials: the histogram is
computed in
"discrete"
mode, with as many bins as there are distinct data entries.
 For integers, decimal, or complex numbers, the
 nbins
single positive integer: required number of contiguous bins of equal widths covering the whole range of noninfinite
Data
values.This binning specification can't be used for textsData
 binsWidth
Single decimal number > 0 specifying the bins width for all bins. Its opposite
binsWidth < 0
must be provided in input (to not get confused withnbins
that is already a single positive number). binsAlgo
Single text word among the ones described herebelow. These automatic binning modes can be used for encoded integers, decimal, or complex numbers. None of them can be used for texts or polynomial data.
For these 3 modes, the whole range of data values is shared into nB bins of equal widths. nB is set according to the chosen algorithm as follows.
"sqrt": nB is set to the squareroot of the number
Nvalid
of valid data inData
, in such a way that there are as many bins as the average number of counts in bins. The vertical average relative resolution1 count / nB counts = 1/nB
of the histogram is then similar to the horizontal onebinWidth/range = (range/nB)/range = 1/nB
However, for encoded integers data, if the data range
dR=max(Data)min(Data)+1
is narrower than nB, nB is then set to dR, so setting the bins width to 1. Bins are then automatically centered on integer values in the range."freediac": Freedmann  Diaconis binning criterion: nB = round(strange(Data)/binWidth)
withbinsWidth = 2*iqr(Data)* Nvalid^(1/3)
."sturges": Sturges binning criterion: nB = ceil(1 + log2(Nvalid))
 binsEdges
Vector of values sorted in strict increasing order (without duplicates). N bins edges define N1 bins. For encoded integers
Data
,binsEdges
can be decimal numbers. For complex numbersData
, decimal numbers are expected inbinsEdges
: only the distribution of real parts is considered. First bin: Any noninfinite
Data
component belonging to the closed interval[binsEdges(1), binsEdges(2)]
belongs to the first bin and is accounted in theHeights(1)
count.  Next bins # i>1: Any noninfinite
Data
component belonging to the semiopen interval]binsEdges(i), binsEdges(i+1)]
belongs to the bin #i and is accounted in theHeights(i)
count.
Marginal bins:
For numerical and text
Data
, the first or/and the lastbinsEdges
components may be set to collect and count in marginal bins all noninfiniteData
components remaining in the left and right wings of the complete histogram: Left wing: set
binsEdges(1) = %inf
, orbinsEdges(1) = ""
Data
entries such thatData < binsEdges(2)
are counted inHeights(1)
. The actual
bins(1)
edge is set tomin(Data)
.

Right wing: set
binsEdges($) = %inf
, orbinsEdges($) = "~~"
(for texts in standard ascii,ascii(126)=="~"
is the last printable character)
Data
entries such thatData > binsEdges($1)
are counted inHeights($)
. The actual
bins($)
edge is set tomax(Data)
.
 First bin: Any noninfinite
 binsValues
For polynomial
Data
or when the"discrete"
option is used,binsValues
provides values whose occurrences inData
must be counted. Duplicates and
%nan
values are priorly removed frombinsValues
. binsValues
may include some%inf
values. However, for encoded integersData
, any%inf
value is removed before processing. Components of
binsValues
may be unsorted: the order ofbinsValues
components is kept as is in theHeights
output vector.
 Duplicates and
 Options
Options
is either a vector of textual flags, or equivalently a single word of commaseparated concatenated flags, or both. All flags are caseinsensitive and can be specified in any order.Examples: The following options specifications are equivalent:
["discrete" "countsNorm" "normWith: Out Inf"]
, or["countsNORM" "NORMwith: inf out" "Discrete" ]
, or["normWith: INF OUT", "discrete, countsNorm" ]
, or simply"discrete,countsNorm,normWith: inf out"
. "discrete"
This flag must be used when a discrete / categorial histogram is required. Then, the vector provided in argument #2 with at least 2 components sets bins values instead of bins edges (by default).
Presently, polynomialData
are always processed in a categorial way. The"discrete"
flag looks then useless. However, in a future release, polynomials could become sortable. Using the"discrete"
flag does not hurt and would avoid future backcompatibility issues. Histogram scale:
"counts" This mode is the default one: Whatever is each bin's width, the height of the bin is equal to the number of
Data
components falling in it."countsNorm" Whatever is each bin's width and position, the height of the bin is equal to the relative number of
Data
components falling in it, over all counted components. Then, unless the"normWith:.."
option is used, the cumulated bins heights is equal to 1:sum(Heights)==1
."density" The area of each bin is equal to the number of
Data
components falling in it. This scaling mode is meaningless and ignored in case of categorial histogram."densityNorm" The area of each bin is equal to the relative number of
Data
components falling in it. Then, unless the"normWith:.."
option is used, the whole area of the histogram is equal to 1:This scaling mode is meaningless and ignored in case of categorial histogram.
 "normWith:.."
When the
"countsNorm"
or"densityNorm"
option is used, it is possible to provide additional informations about which components ofData
out of bins should be considered for the total number N of counts over which the normalization is computed.After the
"normWith:"
option's header, a spaceseparated list of caseinsensitive flags can be provided in any order. If several concurrent flags are provided, only the last specified one is taken into account. Unrelevant flags for the givenData
type are ignored. Available flags and their relative priorities are described herebelow. Examples:"normWith: all"
,"normWith: out inf"
,"normWith: Nan inf"
,"normWith: rightout inf"
, etc."all" All components of Data
are considered:N = size(Data,"*")
. If"all"
is used, all other"normWith:.."
options are ignored."out" All Data
out ofbins
that are notNan
orInf
or""
are accounted. IfData
is sparseencoded, zeros remain excluded unless the option"normWith: zeros"
is used. If"out"
is used,"leftout"
and"rightout"
options are ignored."leftout" As with "out"
, but only forData < binsEdges(1)
. This flag is ignored in discrete/categorial mode."rightout" As with "out"
, but only forData > binsEdges($)
. This flag is ignored in discrete/categorial mode."NaN" NaN
data are accounted, in addition to other ones."Inf" Inf
data are accounted, in addition to other ones.In discrete/categorial mode,
Inf
values are not specific and are processed as other ones. This flag is then ignored."zeros" If Data
is sparseencoded, by default only nonzero elements are considered (otherwise, zeros are not specific and are processed as other values). Nevertheless, it's possible to take them into account in the normalization by using this"normWith: zeros"
flag.Using this flag does not credit theHeights
of the bin covering the zero value (if any)."empty" ""
empty texts inData
are accounted, in addition to other ones.
Results
 Heights
vector of decimal numbers whose values depend on the histogram scaling mode set with each dedicated option. See the description of the
Histogram scales
options hereabove. In brief:"counts"
mode:Heights(i)
is the number ofData
components equal to thebins(i)
value (categorial), or belonging to the]bins(i), bins(i+1)]
interval (continuous histogram)."countsNorm"
mode:Heights(i)
is as for"counts"
, divided by the total numberN
of consideredData
components.N
is the sum of counts in all bins, plus possibly the number of counts of some special jokers values (%inf, %nan, 0, ""
), according to thenormWith:
option used.
In continuous mode, statistical densities may be returned in the vector
Heights
instead of integer numbers of counts: Let's callcounts(i)
the number of counts in the bin #i defined by its edges. Then In
"density"
mode:Heights(i)
is set such that the area of the bin is equal to its population:Heights(i) * (binsEdges(i+1)  binsEdges(i)) == counts(i)
.  In
"densityNorm"
mode: the"density"
results are divided by the total numberN
of considered counts (see"countsNorm"
).
 jokers
Row vector of 1 to 5 decimal numbers indicating the frequency of special values in
Data
. Let's define the following numbers:Nnan
: number ofNaN
objects inData
.Ninf
: number ofInf
objects inData
.Nzeros
: number of null values inData
.Nempty
: number of empty texts "" inData
.Nleftout
: number ofData
components not equal to%inf
nor to""
, such thatData < binsEdges(1)
.Nrightout
: number ofData
components not equal to%inf
such thatData > binsEdges($)
.Nout
: number ofData
components out of bins, noninfinite, not beingNan
, not being empty text""
, and for sparseData
: not equal to zero.
In unnormalized
"counts"
and"density"
histogram scales,jokers
returns the integer counts numbers of special values.In normalized
"countsNorm"
and"densityNorm"
histogram scales,jokers
returns countsNorm frequencies of special values.Then, according to the
Data
type and the continuous or categorial histogram mode,jokers
is made of the following: Encoded integers:
 continuous:
[Nleftout, Nrightout]
 categorial:
[Nout]
 continuous:
 Decimal or complex numbers, full or sparse:
 continuous:
[Nleftout, Nrightout, Nzeros, Nnan, Ninf]
 categorial:
[Nout, 0, Nzeros, Nnan, Ninf]
 continuous:
 Polynomials:
[Nout, 0, 0, Nnan, Ninf]
 Texts:
 continuous:
[Nleftout, Nrightout, Nempty]
 categorial:
[Nout, 0, Nempty]
 continuous:
 bins
Row vector of bins edges or of bins values actually used to build the histogram.
histc()
allows using many semiautomatic or automatic binning modes for which no explicit or incompletebinsEdges
orbinsValues
vector is provided as input. Continuous binning mode:
 The actual
binsEdges
is returned inbins
. It has theHeights
number of components, + 1 (position of the closing edge). For encoded integers, decimal numbers, and complex numbers
Data
,bins
is of decimal type. For textData
,bins
is of type text as well.When marginal bins are required (see the
binsEdges
description)bins(1)
andbins($)
return the actual boundaries of the whole binning range used.
 The actual
 Discrete categorial mode:
For polynomial
Data
, or for otherData
types used with the"discrete"
option: if no explicitbinsValues
vector is provided,histc()
sets it tounique(Data)(:)'
and returns it asbins
.
 Continuous binning mode:
 inBin
Array of decimal integers having the sizes of
Data
. IfData
is sparseencoded,inBin
is so as well.inBin(i,j)
returns the index of thebins
whichData(i,j)
belongs to. If the value ofData(i,j)
is out of bins,inBin(i,j)=0
. Otherwise,Data(i,j)
increments theHeights(inBin(i,j))
counts by one unit.
Examples
with decimal numbers:
data = [1 1 1 2 2 3 4 4 5 5 5 6 6 7 8 8 9 9 9]; N = size(data,"*") // ==19 // Default binning; "sqrt": sqrt(19) => 4. .. => 4 bins [h, j, b, i] = histc(data) // expected: h = [6 5 3 5] = href // expected: b = [1 3 5 7 9] bins edges // expected: i = [1 1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 4 4 4] d memberships to bins histc(data, , "countsNorm") // Expected: href/N histc(data, , "density") // Expected: href/2, 2 being the bins width histc(data, , "densityNorm") // Expected: href/N/2 // Automatic Sturges binning [h, j, b, i] = histc(data,"sturges") // h = [5 1 5 2 1 5] // b = [3 7 11 15 19 23 27] / 3 // i = [1 1 1 1 1 2 3 3 3 3 3 4 4 5 6 6 6 6 6] // Explicit bins edges, with marginal bins //  data = [1 1 1 2 2 3 4 4 5 5 5 6 6 7 8 8 9 9 9]; be = [%inf 3 5 7 %inf]; [href, j, b, i] = histc(data, be) // href = [6 5 3 5] => sum N = 19 // b = [1 3 5 7 9] // bins completed with actual data bounds // i = [1 1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 4 4 4] histc(data, be, "countsNorm") // href/N histc(data, be, "density") // href/2 bins width = 2: see b histc(data, be, "densityNorm") // href/N/2 // Explicit bins edges, with outsiders //  data = [1 1 1 2 2 3 4 4 5 5 5 6 6 7 8 8 9 9 9]; // still the same be = [2, 5.5, 7]; // Bins edges (2 bins) [href, jref, b, i] = histc(d, be) // href = [8 3] jref = [3 5 0 0 0] = [leftout, rightout, ..] // i = [0 0 0 1 1 1 1 1 1 1 1 2 2 2 0 0 0 0 0] histc(data, be, "countsNorm") // href / 11 histc(data, be, "countsNorm, normWith: leftout") // href / 14 histc(data, be, "countsNorm, normWith: rightout") // href / 16 histc(data, be, "countsNorm, normWith: out") // href / 19 histc(data, be, "density") // href ./ diff(be) histc(data, be, "densityNorm") // href ./ diff(be) / 11 histc(data, be, "densityNorm, normWith: leftout") // href ./ diff(be) / 14 histc(data, be, "densityNorm, normWith: rightout") // href ./ diff(be) / 16 histc(data, be, "densityNorm, normWith: all"); // href ./ diff(be) / 19 // With Nan and Inf values //  data = [1 1 1 2 2 3 4 4 5 5 5 6 6 7 8 8 9 9 9]; data = [%nan %inf, data, %nan %nan %inf]; N = size(data,"*"); // 24 be = [2, 4.5, 7]; // Set bins edges (2 bins) [href, jref, b, iref] = histc(data, be) // href = [5 6] jref = [3 5 0 3 2]; // continuous mode: jokers = [leftout, rightout, zeros, nan, inf] // iref = [0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0] memberships [h, j] = histc(data, be, "countsNorm") // Expected: href/11, jref/11 [h, j] = histc(data, be, "countsNorm, normWith: nan") // Expected: href/14, jref/14 [h, j] = histc(data, be, "countsNorm, normWith: inf") // Expected: href/13, jref/13 [h, j] = histc(data, be, "countsNorm, normWith: inf nan") // Expected: href/16, jref/16 [h, j] = histc(data, be, "countsNorm, normWith: leftout nan") // Expected: href/17, jref/17 [h, j] = histc(data, be, "countsNorm, normWith: rightout inf") // Expected: href/18, jref/18 [h, j] = histc(data, be, "countsNorm, normWith: out inf") // Expected: href/21, jref/21 [h, j] = histc(data, be, "countsNorm, normWith: all") // Expected: href/24, jref/24 // Normalized densities over a Bins width = 2.5 (see be) [h, j] = histc(data, be, "densityNorm") // Expected: href/11/2.5, jref/11 [h, j] = histc(data, be, "densityNorm, normWith: nan") // Expected: href/14/2.5, jref/14 [h, j] = histc(data, be, "densityNorm, normWith: inf") // Expected: href/13/2.5, jref/13 [h, j] = histc(data, be, "densityNorm, normWith: inf nan") // Expected: href/16/2.5, jref/16 [h, j] = histc(data, be, "densityNorm, normWith: leftout nan") // Expected: href/17/2.5, jref/17 [h, j] = histc(data, be, "densityNorm, normWith: rightout inf") // Expected: href/18/2.5, jref/18 [h, j] = histc(data, be, "densityNorm, normWith: all") // Expected: href/24/2.5, jref/24
with texts:
histc(["a" "c" "a" "a" "b" "c"]) // [3 1 2] t = [ "c" "n" "h" "i" "b" "i" "f" "i" "p" "l" "p" "d" "f" "i" "l" "b" "m" "e" "o" "o" "f" "p" "o" "h" "f" "h" "h" "c" "k" "o" "p" "f" "k" "a" "j" "o" "j" "d" "h" "h" "n" "m" "o" "l" "n" "h" "b" "o" "l" "j" "n" "o" "i" "g" "i" "a" "a" "j" "d" "p" ]; // With default discrete bins //  [h,j,b,i] = histc(t) // h = [3 3 2 3 1 5 1 7 6 4 2 4 2 4 8 5] // b = "a" b c d e f g h i j k l m n o p iref = [ 3 14 8 9 2 9 6 9 16 12 16 4 6 9 12 2 13 5 15 15 6 16 15 8 6 8 8 3 11 15 16 6 11 1 10 15 10 4 8 8 14 13 15 12 14 8 2 15 12 10 14 15 9 7 9 1 1 10 4 16 ]; // With given discrete bins WITHOUT "" bins //  t2 = t; t2([7 13 19 26 32 39 43]) = ""; // > t2 = // c n h b i f i p l p d f i l // b m e o o f o h f h h c k o // p k a o j d h m o l n // h b o l j n o g i a a j d p // // b = '' a b c d e f g h i j k l m n o p // h = 7 3 3 2 3 1 4 1 6 4 3 2 4 2 3 8 4 [h, j, b, i] = histc(t2, ["a" "e" "i" "o"], "discrete") // h = [3 1 4 8]; N = 16 // j = [37 0 7] = [out, 0, #""] // i = [ // memberships // 0 0 0 0 0 3 0 3 0 0 0 0 0 3 0 // 0 0 2 4 4 0 0 4 0 0 0 0 0 0 4 // 0 0 0 1 0 4 0 0 0 0 0 0 4 0 0 // 0 0 4 0 0 0 4 0 0 3 1 1 0 0 0 // ]; // With continuous and marginal bins: "" <=> inf , "~~" <=> Inf (regular ascii) //  [h,j,b,i] = histc(t, ["" "c" "e" "g" "i" "k" "m" "~~"]) // h = [8 4 6 13 6 6 17] j = [0 0 0] // i = [ // memberships // 1 7 4 4 1 4 3 4 7 6 7 2 3 4 6 // 1 6 2 7 7 3 7 7 4 3 4 4 1 5 7 // 7 3 5 1 5 7 5 2 4 4 7 6 7 6 7 // 4 1 7 6 5 7 7 4 3 4 1 1 5 2 7 // ]; // Continuous bins. Data WITH "" //  // t2 = // c n h b i f i p l p d f i l // b m e o o f o h f h h c k o // p k a o j d h m o l n // h b o l j n o g i a a j d p // // b = '' a b c d e f g h i j k l m n o p // h = 7 3 3 2 3 1 4 1 6 4 3 2 4 2 3 8 4 binsEdges = ["e" "f" "g" "h" "i" "j"]; [href, jref, b, i] = histc(t2, binsEdges) // href=[5 1 6 4 3]; N = sum(href) = 19 // jref=[11 23 7]; [leftout rightout ""] [h,j,b,i] = histc(t2, binsEdges, "countsNorm,normWith: leftout") // h = href / (N+jref(1)), j = jref / (N+jref(1)) [h,j,b,i] = histc(t2, binsEdges, "countsNorm,normWith: rightout") // h = href / (N+jref(2)), j = jref / (N+jref(2)) [h,j,b,i] = histc(t2, binsEdges, "countsNorm,normWith: out"); // h = href / sum([N jref(1:2)]), j = jref / sum([N jref(1:2)]) [h,j,b,i] = histc(t2, binsEdges, "countsNorm,normWith: empty") // h = href / (N+jref(3)), j = jref/(N+jref(3)) [h,j,b,i] = histc(t2, binsEdges,"countsNorm,normWith: out empty") // h = href / sum([N jref]), j = jref / sum([N jref]) [h,j,b,i] = histc(t2, binsEdges, "countsNorm,normWith: all") // h = href / sum([N jref]), j = jref/sum([N jref])
with polynomials:
histc([%z 2+%z %z]) // [2 1] histc([%z 2+%z %z],, "countsnorm") // [2 1] / 3 histc([%z 2+%z %z %nan],, "countsnorm") // [2 1] / 3 histc([%z 2+%z %z %nan],, "countsnorm, normWith: Nan") // [2 1] / 4 // Data order is kept: histc([2+%z %z %z ]) == [1 2]
See also
 histplot — plot a histogram
 bar3d — 3D representation of a histogram
 bar — bar histogram
 barh — horizontal display of bar histogram
 plot2d2 — 2D plot (step function)
 dsearch — distribute, locate and count elements of a matrix or hypermatrix in given categories
 members — count (and locate) in an array each element or row or column of another array
 grep — find matches of a string in a vector of strings
 strcmp — compare character strings
 isnan — check for "Not a Number" entries
 isinf — tests for infinite elements
History
Version  Description 
5.5.0  histc() introduced 
6.1.0  histc() reforged:

Comments
Add a comment:
Please login to comment this page.