Scilab Website | Contribute with GitLab | Mailing list archives | ATOMS toolboxes
Scilab Online Help
2024.1.0 - Français


groupsummary

create groups in table or timeseries and apply functions to variables within groups

Syntax

g = groupsummary(t, groupvars)
g = groupsummary(t, groupvars, method)
g = groupsummary(t, groupvars, method, inputvar)

g = groupsummary(t, groupvars, groupbins)
g = groupsummary(t, groupvars, groupbins, method)
g = groupsummary(t, groupvars, groupbins, method, inputvar)

g = groupsummary(..., Name, Value)

Arguments

t

table or timeseries object

groupvars

specifies the variable used to form the groups.

data type expected: vector of strings containing the variable names or vector of indices corresponding to the positions of the variable names in the table/timeseries.

groupbins

indicates how the data is grouped, by data interval or time interval.

data type expected: vector of doubles containing the data interval, datetime, duration, calendarDuration or available string among: "year", "month", "day", "hour", "minute", "second", "monthname" and "dayname".

Depending on the type of the variables contained in groupvars, groupbins will be a cell of the same size as groupvars, where each element will be applied to each variable.

method

function to apply with prototype: y = f(x), z = f(x, ..., y)

String, method name. Available values: "sum", "mean", "median", "mode", "var" (variance), "std" (standard deviation), "min", "max", "range", "nummissing", "numunique, "nnz", "all" (all previous computations) and user function.

inputvar

variable names or variable indices, the method will be apply only on this specified variables.

If the method has multiple inputs, the variable names or indices will be contained in a cell: {[x1, .., xN], [y1, ..., yN]} if the method expects two input arguments

Name, Value

Name: 'IncludeEmptyGroups', Value: boolean (default value: %f): returns only the combinations of groups present in the table t. When %t, the result contains also empty group.

Name: 'IncludedEdge', Value: 'left' or 'right' (default value: 'left'): this option must use only if groupbins is specified (i.e groupbins = [0 2 4]). When IncludedEdge is equal to 'left', data will be contained in the groups [groupbins(1), groupbins(2)), [groupbins(2), groupbins(3)), ..., [groupbins(n-1), groupbins(n)]. The last groupbins(n) is included. If IncludedEdge is equal to 'right', the intervals will be: [groupbins(1), groupbins(2)], (groupbins(2), groupbins(3)], ... (groupbins(n-1), groupbins(n)]. In this case, the first groupbins(1) is included. The values in groupbins must be in strictly increasing order.

g

table object.

Description

The groupsummary function provides a convenient way to group variables of the table and apply functions to variables within groups.

g = groupsummary(t, groupvars) creates a table where each row corresponds to a unique group of data present in t(:, groupvars).

g = groupsummary(t, groupvars, method): groupvars specifies the variables used to group data. The method will be applied to each variable of t except those contained in the groupvars.

g = groupsummary(t, groupvars, groupbins, ...) extracts data from t(:, groupvars); these data are then grouped according to groupbins, either group by time interval (one year, one hour, ...) or group by data interval.The "IncludedEdge" option can be added to specify the included bounds (left or right) in the case where groupbins is a data interval. For example, if groupbins = [0 2 4] and "IncludedEdge" = "left", then the intervals created are [0 2), [2 4]. If "IncludeEdge" = "right", the intervals are [0 2], (2, 4].

The IncludeEmptyGroups option allows to return all possible combinations.

Examples

g = groupsummary(t, groupvars)

Countries = ["France"; "France"; "France"; "France"; "Spain"; "Spain"; "Italy"; "Italy"];
Cereals = ["wheat"; "maize"; "sunflower"; "rapeseed"; "wheat"; "maize"; "wheat"; "sunflower"];
Production = [20; 18; 10; 5; 14; 7; 12; 9];
t = table(Countries, Cereals, Production, "VariableNames", ["Countries", "Cereals", "Production"])

g = groupsummary(t, "Countries")

v = groupsummary(t, groupvars, method)

function y=f(x)
    y = x'*x;
endfunction

rand("seed", 0)
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5,1)*5)-1.5;
x2 = -floor(rand(5,1)*5)+0.5;
t = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"])

v = groupsummary(t, "x", f)

v = groupsummary(t, "x", "sum")
v = groupsummary(t, "x", "all")
v = groupsummary(t, "x", {min, max})

v = groupsummary(t, groupvars, method, inputvar)

rand("seed", 0)
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5,1)*5)-1.5;
x2 = -floor(rand(5,1)*5)+0.5;
t = table(x, x1, x2, "VariableNames", ["x", "x1", "x2"])

v = groupsummary(t, "x", "sum", "x1")

// With timeseries
timestamp = hours([1 2 2 3 3])';
ts = timeseries(timestamp, x1, x2, "VariableNames", ["timestamp", "x1", "x2"])

v = groupsummary(ts, "timestamp", sum, "x1")

function z=f(x, y)
    z = x' * y;
endfunction

rand("seed", 0)
x = ["a"; "b"; "b"; "c"; "a"];
x1 = floor(rand(5,1)*5)-1.5;
x2 = -floor(rand(5,1)*5)+0.5;
x3 = [1:5]';
x4 = 2 * x3;
t = table(x, x1, x2, x3, x4, "VariableNames", ["x", "x1", "x2", "x3", "x4"])

v = groupsummary(t, "x", f, {["x1", "x3"], ["x2", "x4"]})

rand("seed", 0)
d = [datetime(2023, 1, 1): caldays(15): datetime(2023, 12, 31)]';
s = size(d, "*");
v1 = floor(rand(s, 1) * 5) + 1;
v2 = strsplit(ascii(floor(rand(s, 1)*3)+65));
v3 = strsplit(ascii(floor(rand(s, 1)*3)+65));
ts = timeseries(d, v1, v2, v3, "VariableNames", ["Time", "value", "string1", "string2"])

function r=findA(x)
    ind = find(x == "A");
    r = sum(ind)
endfunction

function r=findB(x)
    ind = find(x == "B");
    r = sum(ind)
endfunction

function r=findC(x)
    ind = find(x == "C");
    r = sum(ind)
endfunction

g = groupsummary(ts, "Time", "month", findA, "string1")
g = groupsummary(ts, "Time", "month", {findA, findB, findC}, "string1")
g = groupsummary(ts, "Time", "month", {findA, findB, findC}, ["string1", "string2"])

v = groupsummary(t, groupvars, groupbins, method, inputvar)

rand("seed", 0)
x1 = floor(rand(5,1)*5)-1.5;
x2 = -floor(rand(5,1)*5)+0.5;
x3 = [1:5]';
x4 = 2 * x3;
t = table(x1, x2, x3, x4, "VariableNames", ["x1", "x2", "x3", "x4"])

v = groupsummary(t, ["x1", "x2"], {[-2 -0.5 0.5 2], [-4 -2 0 2]}, "sum", "x3")

// With IncludeEmptyGroups
v = groupsummary(t, ["x1", "x2"], {[-2 -0.5 0.5 2], [-4 -2 0 2]}, "sum", "x3", "IncludeEmptyGroups", %t)

With IncludedEdge = "right"

rand("seed", 0)
x1 = floor(rand(5,1)*5)-1.5;
x2 = [2.5; 3.5; 2.5; 3.5; 2.5];
x3 = [1:5]';
t = table(x1, x2, x3, "VariableNames", ["x1", "x2", "x3"])

// By default, IncludedEdge is equal to "left"
// intervals created: [-1.5, -0.5), [-0.5, 0.5), [0.5, 1.5] (last left edge included)
// x1 = [-0.5 1.5 -1.5 -0.5 1.5]
// Goal: Find for each value of x1 the interval to which it belongs
// -0.5 in [-0.5 0.5), 1.5 in [0.5, 1.5], -1.5 in [-1.5, -0.5), -0.5 in [-0.5 0.5), 1.5 in [0.5, 1.5]
// sum is applied on the values of "x3" contained in this groups
v = groupsummary(t, "x1", [-1.5 -0.5 0.5 1.5], "sum", "x3")

// IncludedEdge = "right"
// intervals created for x1: [-1.5, -0.5], (-0.5, 0.5], (0.5, 1.5] (first left edge included)
// and x2: [2.5 3], (3 3.5] (first left edge included)
v = groupsummary(t, ["x1", "x2"], {[-1.5 -0.5 0.5 1.5], [2.5 3 3.5]}, "sum", "x3", "IncludedEdge", "right")

See also

  • table — create a table from variables
  • timeseries — create a timeseries - table with time as index
  • groupcounts — returns the number of elements for each group
  • varfun — apply a function to each column of the table/timeseries
  • rowfun — apply a function to each row of the table/timeseries
  • pivot — create a pivoted table providing a summary of data.

History

VersionDescription
2024.0.0 Introduction in Scilab.
Report an issue
<< groupcounts Timeseries/Table isregular >>

Copyright (c) 2022-2024 (Dassault Systèmes)
Copyright (c) 2017-2022 (ESI Group)
Copyright (c) 2011-2017 (Scilab Enterprises)
Copyright (c) 1989-2012 (INRIA)
Copyright (c) 1989-2007 (ENPC)
with contributors
Last updated:
Mon Jun 17 17:52:33 CEST 2024