groupcounts
returns the number of elements for each group
Syntax
g = groupcounts(t, groupvars) g = groupcounts(t, groupvars, groupbins) g = groupcounts(t, groupvars, Name, Value) g = groupcounts(t, groupvars, groupbins, Name, Value)
Arguments
- t
table or timeseries object.
- groupvars
specifies the variable used to form the groups.
data type expected: vector of strings containing the variable names or vector of indices corresponding to the positions of the variable names in the table/timeseries.
- groupbins
indicates how the data is grouped, by data interval or time interval.
data type expected: vector of doubles containing the data interval, datetime, duration, calendarDuration or available string values: "year", "month", "day", "hour", "minute", "second", "monthname" and "dayname". Default value: "none".
Depending on the type of the variables contained in groupvars, groupbins will be a cell of the same size as groupvars, where each element will be applied to each variable.
- Name, Value (optional)
Name: 'IncludeEmptyGroups', Value: boolean (default value: %f): returns only the combinations of groups present in the table t. When %t, the result contains also empty group.
Name: 'IncludePercentGroups', Value: boolean (default value: %f): when %t, returns the percentage of data from each group whose sum is 100.
Name: 'IncludedEdge', Value: 'left' or 'right' (default value: 'left'): this option must use only if groupbins is specified (i.e groupbins = [0 2 4]). When IncludedEdge is equal to 'left', data will be contained in the groups [groupbins(1), groupbins(2)), [groupbins(2), groupbins(3)), ..., [groupbins(n-1), groupbins(n)]. The last groupbins(n) is included. If IncludedEdge is equal to 'right', the intervals will be: [groupbins(1), groupbins(2)], (groupbins(2), groupbins(3)], ... (groupbins(n-1), groupbins(n)]. In this case, the first groupbins(1) is included. The values in groupbins must be in strictly increasing order.
- g
table object.
Description
From the selected columns in the input table/timeseries, groupcounts returns a table containing the data combinations found and their number of occurrences.
g = groupcounts(t, groupvars) creates a table where each row corresponds to a unique group of data present in t(:, groupvars). For each group, the number of occurences is specified.
g = groupcounts(t, groupvars, groupbins) extracts data from t(:, groupvars); these data are then grouped according to groupbins, either group by time interval (one year, one hour, ...) or group by data interval. The "IncludedEdge" option can be added to specify the included bounds (left or right) in the case where groupbins is a data interval. For example, if groupbins = [0 2 4] and "IncludedEdge" = "left", then the intervals created are [0 2), [2 4]. If "IncludeEdge" = "right", the intervals are [0 2], (2, 4].
g = groupcounts(..., "IncludeEmptyGroups", Value) returns all possible combinations if Value is %t. By default, IncludeEmptyGroups is setted to %f.
g = groupcounts(..., "IncludePercentGroups", Value) returns the percentage of data in each group if Value is %t. By default, IncludePercentGroups is setted to %f.
Examples
Group table with groupvars
rand("seed", 0) x = ["a"; "b"; "b"; "c"; "a"]; x1 = floor(rand(5, 1) * 5) - 1.5; x2 = -floor(rand(5, 1) * 5) + 0.5; A = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"]) G = groupcounts(A, "x") G = groupcounts(A, "x", "IncludePercentGroups", %t)
Group table with groupvars and groupbins
rand("seed", 0) x = ["a"; "b"; "b"; "c"; "a"]; x1 = floor(rand(5, 1) * 5) - 1.5; x2 = -floor(rand(5, 1) * 5) + 0.5; A = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"]) G = groupcounts(A, "x2", [-5 0 5]) G = groupcounts(A, ["x1", "x2"], [-5 0 5]) // groupbins is a cell G = groupcounts(A, ["x1", "x2"], {[-5 0 5], [-4 -2 0]}) // With IncludeEmptyGroups G = groupcounts(A, ["x1", "x2"], [-5 0 5], "IncludeEmptyGroups", %t)
rand("seed", 0) // with duration timestamp = hours([1 3 2 2 3])'; x = ["a"; "b"; "b"; "c"; "a"]; x1 = floor(rand(5, 1) * 5) - 1.5; x2 = -floor(rand(5, 1) * 5) + 0.5; A = timeseries(timestamp, x, x1, x2, "VariableNames", ["timestamp", "x", "x1", "x2"]) G = groupcounts(A, "timestamp", "hour") G = groupcounts(A, "timestamp", hours(2))
rand("seed", 0) // with datetime dt = datetime(2023,[5 3:2:10]', 1); x = ["a"; "b"; "b"; "c"; "a"]; x1 = floor(rand(5, 1) * 5) - 1.5; x2 = -floor(rand(5, 1) * 5) + 0.5; A = timeseries(dt, x, x1, x2, "VariableNames", ["dt", "x", "x1", "x2"]) G = groupcounts(A, "dt", "monthname") // With IncludeEmptyGroups G = groupcounts(A, "dt", "monthname", "IncludeEmptyGroups", %t) // groupbins is a calendarDuration groupcounts(A, "dt", calmonths(2))
With IncludedEdge
rand("seed", 0) x = ["a"; "b"; "b"; "c"; "a"]; x1 = floor(rand(5, 1) * 5) - 1.5; x2 = [2.5; 3.5; 2.5; 3.5; 2.5]; A = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"]) // IncludedEdge is equal to 'left' by default // intervals created: [-1.5, -0.5), [-0.5, 0.5), [0.5, 1.5] (last right edge included) // x1 = [-0.5 1.5 -1.5 -0.5 1.5] // Goal: Find for each value of x1 the interval to which it belongs // -0.5 in [-0.5 0.5), 1.5 in [0.5, 1.5], -1.5 in [-1.5, -0.5), -0.5 in [-0.5 0.5), 1.5 in [0.5, 1.5] G = groupcounts(A, "x1", [-1.5 -0.5 0.5 1.5]) // IncludedEdge is equal to 'right' // intervals created: [-1.5, -0.5], (-0.5, 0.5], (0.5, 1.5] (first left edge included) // x1 = [-0.5 1.5 -1.5 -0.5 1.5] // -0.5 in [-1.5, -0.5], 1.5 in (0.5, 1.5], -1.5 in [-1.5, -0.5], -0.5 in [-1.5, -0.5], 1.5 in (0.5, 1.5] G = groupcounts(A, "x1", [-1.5 -0.5 0.5 1.5], "IncludedEdge", "right") // groupvars contains ["x", "x2"] // groupbins = {"none", [2.5 3 3.5]}, "none" will be applied on "x", [2.5, 3, 3.5] on "x2" // If IncludedEdge is equal to "left" // x = ["a" "b" "b" "c" "a"] and x2 = [2.5 3.5 2.5 3.5 2.5] => [[2.5, 3) [3, 3.5] [2.5, 3) [3, 3.5] [2.5, 3)] G = groupcounts(A, ["x", "x2"], {"none", [2.5 3 3.5]}, "IncludedEdge", "left") // If IncludedEdge is equal to "rght" // x = ["a" "b" "b" "c" "a"] and x2 = [2.5 3.5 2.5 3.5 2.5] => [[2.5, 3] (3, 3.5] [2.5, 3] (3, 3.5] [2.5, 3]] G = groupcounts(A, ["x", "x2"], {"none", [2.5 3 3.5]}, "IncludedEdge", "right")
See also
- table — create a table from variables
- timeseries — create a timeseries - table with time as index
- varfun — apply a function to each column of the table/timeseries
- rowfun — apply a function to each row of the table/timeseries
- groupsummary — create groups in table or timeseries and apply functions to variables within groups
- pivot — create a pivoted table providing a summary of data.
History
Versão | Descrição |
2024.0.0 | Introduction in Scilab. |
Report an issue | ||
<< detectImportOptions | Timeseries/Table | groupsummary >> |