Scilab Website | Contribute with GitLab | Mailing list archives | ATOMS toolboxes
Scilab Online Help
2024.1.0 - Русский


groupcounts

returns the number of elements for each group

Syntax

g = groupcounts(t, groupvars)
g = groupcounts(t, groupvars, groupbins)

g = groupcounts(t, groupvars, Name, Value)
g = groupcounts(t, groupvars, groupbins, Name, Value)

Arguments

t

table or timeseries object.

groupvars

specifies the variable used to form the groups.

data type expected: vector of strings containing the variable names or vector of indices corresponding to the positions of the variable names in the table/timeseries.

groupbins

indicates how the data is grouped, by data interval or time interval.

data type expected: vector of doubles containing the data interval, datetime, duration, calendarDuration or available string values: "year", "month", "day", "hour", "minute", "second", "monthname" and "dayname". Default value: "none".

Depending on the type of the variables contained in groupvars, groupbins will be a cell of the same size as groupvars, where each element will be applied to each variable.

Name, Value (optional)

Name: 'IncludeEmptyGroups', Value: boolean (default value: %f): returns only the combinations of groups present in the table t. When %t, the result contains also empty group.

Name: 'IncludePercentGroups', Value: boolean (default value: %f): when %t, returns the percentage of data from each group whose sum is 100.

Name: 'IncludedEdge', Value: 'left' or 'right' (default value: 'left'): this option must use only if groupbins is specified (i.e groupbins = [0 2 4]). When IncludedEdge is equal to 'left', data will be contained in the groups [groupbins(1), groupbins(2)), [groupbins(2), groupbins(3)), ..., [groupbins(n-1), groupbins(n)]. The last groupbins(n) is included. If IncludedEdge is equal to 'right', the intervals will be: [groupbins(1), groupbins(2)], (groupbins(2), groupbins(3)], ... (groupbins(n-1), groupbins(n)]. In this case, the first groupbins(1) is included. The values in groupbins must be in strictly increasing order.

g

table object.

Description

From the selected columns in the input table/timeseries, groupcounts returns a table containing the data combinations found and their number of occurrences.

g = groupcounts(t, groupvars) creates a table where each row corresponds to a unique group of data present in t(:, groupvars). For each group, the number of occurences is specified.

g = groupcounts(t, groupvars, groupbins) extracts data from t(:, groupvars); these data are then grouped according to groupbins, either group by time interval (one year, one hour, ...) or group by data interval. The "IncludedEdge" option can be added to specify the included bounds (left or right) in the case where groupbins is a data interval. For example, if groupbins = [0 2 4] and "IncludedEdge" = "left", then the intervals created are [0 2), [2 4]. If "IncludeEdge" = "right", the intervals are [0 2], (2, 4].

g = groupcounts(..., "IncludeEmptyGroups", Value) returns all possible combinations if Value is %t. By default, IncludeEmptyGroups is setted to %f.

g = groupcounts(..., "IncludePercentGroups", Value) returns the percentage of data in each group if Value is %t. By default, IncludePercentGroups is setted to %f.

Examples

Group table with groupvars

rand("seed", 0)
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5, 1) * 5) - 1.5;
x2 = -floor(rand(5, 1) * 5) + 0.5;
A = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"])

G = groupcounts(A, "x")
G = groupcounts(A, "x", "IncludePercentGroups", %t)

Group table with groupvars and groupbins

rand("seed", 0)
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5, 1) * 5) - 1.5;
x2 = -floor(rand(5, 1) * 5) + 0.5;
A = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"])

G = groupcounts(A, "x2", [-5 0 5])
G = groupcounts(A, ["x1", "x2"], [-5 0 5])

// groupbins is a cell
G = groupcounts(A, ["x1", "x2"], {[-5 0 5], [-4 -2 0]})

// With IncludeEmptyGroups
G = groupcounts(A, ["x1", "x2"], [-5 0 5], "IncludeEmptyGroups", %t)
rand("seed", 0)
// with duration
timestamp = hours([1 3 2 2 3])';
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5, 1) * 5) - 1.5;
x2 = -floor(rand(5, 1) * 5) + 0.5;
A = timeseries(timestamp, x, x1, x2, "VariableNames", ["timestamp", "x", "x1", "x2"])

G = groupcounts(A, "timestamp", "hour")

G = groupcounts(A, "timestamp", hours(2))
rand("seed", 0)
// with datetime
dt = datetime(2023,[5 3:2:10]', 1);
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5, 1) * 5) - 1.5;
x2 = -floor(rand(5, 1) * 5) + 0.5;
A = timeseries(dt, x, x1, x2, "VariableNames", ["dt", "x", "x1", "x2"])

G = groupcounts(A, "dt", "monthname")

// With IncludeEmptyGroups
G = groupcounts(A, "dt", "monthname", "IncludeEmptyGroups", %t)

// groupbins is a calendarDuration
groupcounts(A, "dt", calmonths(2))

With IncludedEdge

rand("seed", 0)
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5, 1) * 5) - 1.5;
x2 = [2.5; 3.5; 2.5; 3.5; 2.5];
A = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"])

// IncludedEdge is equal to 'left' by default
// intervals created: [-1.5, -0.5), [-0.5, 0.5), [0.5, 1.5] (last right edge included)
// x1 = [-0.5 1.5 -1.5 -0.5 1.5]
// Goal: Find for each value of x1 the interval to which it belongs
// -0.5 in [-0.5 0.5), 1.5 in [0.5, 1.5], -1.5 in [-1.5, -0.5), -0.5 in [-0.5 0.5), 1.5 in [0.5, 1.5]
G = groupcounts(A, "x1", [-1.5 -0.5 0.5 1.5])

// IncludedEdge is equal to 'right'
// intervals created: [-1.5, -0.5], (-0.5, 0.5], (0.5, 1.5] (first left edge included)
// x1 = [-0.5 1.5 -1.5 -0.5 1.5]
// -0.5 in [-1.5, -0.5], 1.5 in (0.5, 1.5], -1.5 in [-1.5, -0.5], -0.5 in [-1.5, -0.5], 1.5 in (0.5, 1.5]
G = groupcounts(A, "x1", [-1.5 -0.5 0.5 1.5], "IncludedEdge", "right")

// groupvars contains ["x", "x2"]
// groupbins = {"none", [2.5 3 3.5]}, "none" will be applied on "x", [2.5, 3, 3.5] on "x2"
// If IncludedEdge is equal to "left"
// x = ["a" "b" "b" "c" "a"] and x2 = [2.5 3.5 2.5 3.5 2.5] => [[2.5, 3) [3, 3.5] [2.5, 3) [3, 3.5] [2.5, 3)]

G = groupcounts(A, ["x", "x2"], {"none", [2.5 3 3.5]}, "IncludedEdge", "left")

// If IncludedEdge is equal to "rght"
// x = ["a" "b" "b" "c" "a"] and x2 = [2.5 3.5 2.5 3.5 2.5] => [[2.5, 3] (3, 3.5] [2.5, 3] (3, 3.5] [2.5, 3]]
G = groupcounts(A, ["x", "x2"], {"none", [2.5 3 3.5]}, "IncludedEdge", "right")

See also

  • table — create a table from variables
  • timeseries — create a timeseries - table with time as index
  • varfun — apply a function to each column of the table/timeseries
  • rowfun — apply a function to each row of the table/timeseries
  • groupsummary — create groups in table or timeseries and apply functions to variables within groups
  • pivot — create a pivoted table providing a summary of data.

History

ВерсияОписание
2024.0.0 Introduction in Scilab.
Report an issue
<< detectImportOptions Timeseries/Table groupsummary >>

Copyright (c) 2022-2024 (Dassault Systèmes)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Mon Jun 17 17:55:12 CEST 2024