# dsearch

distribute, locate and count elements of a matrix or hypermatrix in given categories

### Syntax

[i_bin [,counts [,outside]]] = dsearch(X, bins ) [i_bin [,counts [,outside]]] = dsearch(X, bins , pm )

### Arguments

- X
matrix or hypermatrix of reals, encoded integers, or texts: The entries to categorize. Complex numbers and polynomials are not supported.

- bins
row or column vector defining categories, of same type as

`X`

(for encoded integers in`X`

,`bins`

may be decimals).- Discrete case (pm="d"):
`bins`

are distinct values to which`X`

entries must be identified. If`X`

is numeric (reals or encoded integers),`bins`

must be sorted in strictly increasing order. - Continuous case (default, pm="c"):
`bins`

are bounds of contiguous intervals:`I1 = [bins(1), bins(2)]`

,`I2 = (bins(2), bins(3)]`

,...,`In = (bins(n), bins(n+1)]`

. Note that entries from`X`

just equal to bins(1) are included in`I1`

. The values in`bins`

must be in strictly increasing order: bins(1) < bins(2) < ... < bins(n). For text processing, the case-sensitive lexicographic order is considered.

- Discrete case (pm="d"):
- pm
"c" (continuous, default) or "d" (discrete): processing mode. In continuous mode,

`bins`

defines the bounds of contiguous intervals considered as categories. In discrete mode,`bins`

provides the values to which entries from`X`

must be identified.- i_bin
matrix or hypermatrix with same sizes than

`X`

:`i_bin(k)`

is the index of the category to which`X(k)`

belongs. If`X(k)`

belongs to none of the categories,`i_bin(k) = 0`

- counts
- number of X entries in respective bins.
Continuous case (pm="c"): counts(i) elements of

`X`

belong to the interval`Ik`

as defined above (see the`bins`

parameter). Elements of`X`

just equal to bins(1) are included in counts(1).`counts`

has the size of`bins`

- 1Discrete case (pm="d"):

`counts(i)`

elements of`X`

are equal to`bins(i)`

- outside
Total number of X entries belonging to none of the

`bins`

.

### Description

For each `X(i)`

entry, `dsearch`

locates the value `bins(j)`

or the interval `(bins(j), bins(j+1)]`

defined by `bins`

and containing or equal to `X(i)`

. Then it returns `i_bin(i) = j`

or 0 whether no bin equals or contains it. (the first interval includes `bins(1)`

). The population of each bin is returned through `counts`

. The total number of unbinned entries is returned in `outside`

(therefore `outside = sum(bool2s(i_bin==0))`

).

`dsearch(..)`

can be overloaded.
The default pm="c" option can be used to compute the empirical histogram of a function given a dataset.

### Examples

// DISCRETE values of TEXT // ----------------------- i = grand(4,6,"uin",0,7)+97; T = matrix(strsplit(ascii(i),1:length(i)-1), size(i)); T(T=="f") = "#"; T(T=="a") = "AA"; T bins = [ strsplit(ascii(97+(7:-1:0)),1:7)' "AA"] [i_bin, counts, outside] = dsearch(T, bins, "d") // BINNING TEXTS in LEXICOGRAPHIC INTERVALS // ---------------------------------------- // generating a random matrix of text nL = 3; nC = 5; L = 3; s = ascii(grand(1,nL*nC*L,"uin",0,25)+97); T = matrix(strsplit(s, L:L:nL*nC*L-1), nL, nC); // generating random bins bounds L = 2; nC = 6; s = ascii(grand(1,nC*L,"uin",0,25)+97); bins = unique(matrix(strsplit(s, L:L:nC*L-1), 1, nC)) T [i_bin, counts, outside] = dsearch(T, bins)

In the following example, we consider 3 intervals `I1 = [5,11]`

,
`I2 = (11,15]`

and `I3 = (15,20]`

.
We are looking for the location of the entries of `X = [11 13 1 7 5 2 9]`

in these intervals.

[i_bin, counts, outside] = dsearch([11 13 1 7 5 2 9], [5 11 15 20])

Displayed output:

-->[i_bin, counts, outside] = dsearch([11 13 1 7 5 2 9], [5 11 15 20]) outside = 2. counts = 4. 1. 0. i_bin = 1. 2. 0. 1. 1. 0. 1.

Indeed,

X(1)=11 is in the interval I1, hence i_bin(1)=1.

X(2)=13 is in the interval I2, hence i_bin(2)=2.

X(3)=1 belongs to none of defined intervals, hence i_bin(3)=0.

X(4)=7 is in the interval I1, hence i_bin(4)=1.

...

There are four X entries (5, 7, 9 and 11) in I1, hence counts(1)=4.

There is only one X entry (13) in I2, hence counts(2)=1.

There is no X entry in I3, hence counts(3)=0.

There are two X entries (i.e. 1, 2) which belong to none of defined intervals, hence outside=2.

// Numbers in DISCRETE categories (having specific values) // ------------------------------ [i_bin, counts, outside] = dsearch([11 13 1 7 5 2 9], [5 11 15 20],"d" )

displays

-->[i_bin, counts, outside] = dsearch([11 13 1 7 5 2 9], [5 11 15 20], "d" ) outside = 5. counts = 1. 1. 0. 0. i_bin = 2. 0. 0. 0. 1. 0. 0.

Indeed,

X(1)=11 is in the set

`bins`

at position #2, hence i_bin(1)=2.X(2)=13 is not in the set

`bins`

, hence i_bin(2)=0....

X(7)=9 is not in the set

`bins`

, hence i_bin(7)=0.There is only one entry X (i.e. 5) equal to 5, hence counts(1)=1.

There are no entries matching

`bins(4)`

, hence counts(4)=0.There are five X entries (i.e. 13, 1, 7, 2, 9) which are not in the set

`bins`

, hence outside=5.

Numbers in `bins`

must be in increasing order, whatever is the processing mode (continuous or discrete).
If this is not the case, an error is generated:

-->dsearch([11 13 1 7 5 2 9], [2 1]) !--error 999 dsearch : the array s (arg 2) is not well ordered -->dsearch([11 13 1 7 5 2 9], [2 1],"d") !--error 999 dsearch : the array s (arg 2) is not well ordered

### Advanced Examples

In the following example, we compare the empirical histogram of uniform random numbers in [0,1) with the uniform distribution function. To perform this comparison, we use the default search algorithm based on intervals (pm="c"). We generate X as a collection of m=50 000 uniform random numbers in the range [0,1). We consider the n=10 values equally equally spaced values in the [0,1] range and consider the associated intervals. Then we count the number of entries in X which fall in the intervals: this is the empirical histogram of the uniform distribution function. The expectation for counts/m is equal to 1/(n-1).

m = 50000 ; n = 10; X = grand(m, 1, "def"); bins = linspace(0, 1, n)'; [i_bin, counts] = dsearch(X, bins); e = 1/(n-1)*ones(1, n-1); scf() ; plot(bins(1:n-1), counts/m, "bo"); plot(bins(1:n-1), e', "r-"); legend(["Experiment", "Expectation"]); xtitle("Uniform random numbers", "X", "P(X)");

In the following example, we compare the histogram of binomially distributed random numbers with the binomial probability distribution function B(N,p), with N=8 and p=0.5. To perform this comparison, we use the discrete search algorithm based on a set (pm="d").

N = 8 ; p = 0.5; m = 50000; X = grand(m,1,"bin",N,p); bins = (0:N)'; [i_bin, counts] = dsearch(X, bins, "d"); Pexp = counts/m; Pexa = binomial(p,N); scf() ; plot(bins, Pexp, "bo"); plot(bins, Pexa', "r-"); xtitle("Binomial distribution B(8,0.5)","X","P(X)"); legend(["Experiment","Expectation"]);

In the following example, we use piecewise Hermite polynomials to interpolate a dataset.

// define Hermite base functions function y=Ll(t, k, x) // Lagrange left on Ik y=(t-x(k+1))./(x(k)-x(k+1)) endfunction function y=Lr(t, k, x) // Lagrange right on Ik y=(t-x(k))./(x(k+1)-x(k)) endfunction function y=Hl(t, k, x) y=(1-2*(t-x(k))./(x(k)-x(k+1))).*Ll(t,k,x).^2 endfunction function y=Hr(t, k, x) y=(1-2*(t-x(k+1))./(x(k+1)-x(k))).*Lr(t,k,x).^2 endfunction function y=Kl(t, k, x) y=(t-x(k)).*Ll(t,k,x).^2 endfunction function y=Kr(t, k, x) y=(t-x(k+1)).*Lr(t,k,x).^2 endfunction x = [0 ; 0.2 ; 0.35 ; 0.5 ; 0.65 ; 0.8 ; 1]; y = [0 ; 0.1 ;-0.1 ; 0 ; 0.4 ;-0.1 ; 0]; d = [1 ; 0 ; 0 ; 1 ; 0 ; 0 ; -1]; X = linspace(0, 1, 200)'; i_bin = dsearch(X, x); // plot the curve Y = y(i_bin).*Hl(X,i_bin) + y(i_bin+1).*Hr(X,i_bin) + d(i_bin).*Kl(X,i_bin) + d(i_bin+1).*Kr(X,i_bin); scf(); plot(X,Y,"k-"); plot(x,y,"bo") xtitle("Hermite piecewise polynomial"); legend(["Polynomial","Data"]); // NOTE : it can be verified by adding these ones : // YY = interp(X,x,y,d); plot2d(X,YY,3,"000")

### See also

### History

Version | Description |

5.5.0 | Extension to hypermatrices, encoded integers, and text. |

Report an issue | ||

<< Search and sort | Search and sort | find >> |