Indexing a DataFrame - Maple Programming Help

Online Help

All Products    Maple    MapleSim


Home : Support : Online Help : Programming : Data Types : DataFrames and DataSeries : DataFrame/indexing

Indexing a DataFrame

 

Calling Sequence

Parameters

Description

Selecting and Rearranging Columns

Selecting Values with a Boolean DataSeries or DataFrame

Selecting and Rearranging Rows and Columns

Reassigning Columns

Modifying a DataFrame Using a Boolean DataSeries or a Boolean DataFrame

Calling Sequence

df[col]

df[bool]

df[row, colb]

df[col] := value

df[bool] := value

df[row, colb] := value

Parameters

df

-

a DataFrame object

col

-

a label, numeric index, range, list, or rtable

bool

-

a Boolean DataSeries or DataFrame

colb

-

a label, numeric index, range, list, rtable, or Boolean DataSeries

row

-

a label, numeric index, range, list, rtable, or Boolean DataSeries

value

-

value to assign to the entries of the DataFrame specified by the indices

Description

• 

In order to select, rearrange, or reassign entries from a DataFrame object, you can use indexing.

• 

The rules for indexing are very similar to those for indexing a DataSeries object. These rules are explained on the page Indexing a DataSeries.

• 

If you use indexing anywhere other than the left-hand side of an assignment statement, you will obtain a subset or a rearrangement of the data in a DataFrame object. This will never change the DataFrame object itself. If you index a DataFrame on the left-hand side of an assignment statement, this reassigns some entries of the DataFrame.

• 

The explanations below will refer to the following DataFrame object:

df := DataFrame(Matrix(3, 5, (i, j) -> 2*i-j), rows = [a, b, c], columns = [A, B, C, D, E]);

dfABCDEa10−1−2−3b3210−1c54321

(1)

Selecting and Rearranging Columns

• 

If you index a DataFrame by a single index that is not a Boolean DataFrame or DataSeries, you select columns. This is in contrast to, for example, a Matrix, where a single index selects rows.

Selecting a Single Column

• 

In order to select a single column from a DataFrame object, you can index the DataFrame with the corresponding column label or a nonzero integer.

• 

Such a column is returned as a DataSeries object. This is the same DataSeries object that is part of the original DataFrame: modifying it later will change the DataFrame.

• 

Indexing with a positive integer i that is at most the number of columns of the DataFrame will return the ith column of the DataFrame.

df[2];

a0b2c4

(2)
• 

Indexing with a negative integer -i that is at least minus the number of entries of the DataFrame will return the ith column from the right.

df[-2];

a−2b0c2

(3)
• 

Indexing with a label that is not of one of the forms above will yield the column corresponding to that label.

df[C];

a−1b1c3

(4)

Selecting a Range of Columns

• 

In order to select a contiguous range of columns from a DataFrame object, you can index the DataFrame with a range, for example, df[a .. b].

• 

Such a range of columns is returned as a new DataFrame object. The columns of this new object, however, are the same DataSeries objects that make up the original DataFrame that was indexed. Thus, later modifying these columns will modify the original DataFrame.

• 

If the left hand side of the range is missing, it will be taken to be the first column. If the right hand side is missing, it will be taken to be the last column. In particular, if both are missing, as in df[ .. ], you obtain a copy of the full DataFrame.

df[ .. ];

ABCDEa10−1−2−3b3210−1c54321

(5)
• 

Otherwise, the left and right-hand side are interpreted as single entries (shown above).

df[D .. ];

DEa−2−3b0−1c21

(6)

df[B .. -3];

BCa0−1b21c43

(7)

Selecting Arbitrary Columns

• 

In order to create a DataFrame with arbitrarily selected and reordered columns, you can index it with a list or Vector (or another 1-dimensional rtable, such as an Array). Each entry of such a list or Vector is interpreted according to rules similar to what is listed above for a single index. Details follow in the next few items.

• 

Such a subset of columns is returned as a new DataFrame object. The columns of this new object, however, are the same DataSeries objects that make up the original DataFrame that was indexed. Thus, later modifying these columns will modify the original DataFrame.

• 

If all entries of col are integers that are in absolute value at most equal to the number of columns of df, then each entry represents a position. The columns at these positions are gathered up into a new DataFrame. Zero entries in col are replaced with columns that are filled with undefined. (These are considered missing values.)

df[[1, 2, 4]];

ABDa10−2b320c542

(8)

df[<-3, -2, 2>];

CDBa−1−20b102c324

(9)
• 

Otherwise, all entries of col are considered to be labels. Any entry that occurs as a column label in df is replaced by that column. Any other value is replaced with a column of undefined values. Note that a single value that is not an integer, or is not in the appropriate range, changes the interpretation to this mechanism.

df[[B, D, C]];

BDCa0−2−1b201c423

(10)

df[[D, E, F, 2]];

DE12a−2−3undefinedundefinedb0−1undefinedundefinedc21undefinedundefined

(11)

Selecting Values with a Boolean DataSeries or DataFrame

• 

You can select part of a DataFrame by indexing it with a DataSeries or DataFrame of which all the entries are equal to true, false, or FAIL, the three Boolean constants.

• 

It is a little more efficient if the selected datatype for the indexing DataSeries is specified as either truefalse or truefalseFAIL, rather than the default of anything. However, this is not necessary. The same holds for each of the columns of the indexing DataFrame.

Selecting with a Boolean DataSeries

• 

In order to select some of the rows of a DataFrame df, you can use a Boolean DataSeries dsb. This is done by matching the labels of dsb to the row labels of df, and returning a new DataFrame df_result consisting of those rows of df that match a label in dsb, where the corresponding value in dsb is true. All rows occur in the order they were in df.

dsb := DataSeries([true, FAIL, false, true, true], labels = [c, b, 3, a, coconut]);

dsbctruebFAIL3falseatruecoconuttrue

(12)

df[dsb];

ABCDEa10−1−2−3c54321

(13)
  

The values for the labels a and c in dsb are true, so these rows are returned. The value for the label coconut is also true, but it does not occur in df, so it is discarded. Note also that, while the label c precedes a in dsb, it's the other way around in df, and that is reflected in the result.

  

Using Boolean DataSeries is particularly useful because you can create them by performing comparison operations on DataSeries. You can also combine Boolean DataSeries by using logical operators. This is illustrated in the following examples.

dsb2 := df[5] <~ 0;

dsb2atruebtruecfalse

(14)

dsb3 := isprime~(df[1]);

dsb3afalsebtruectrue

(15)

dsb4 := dsb2 xor dsb3;

dsb4atruebfalsectrue

(16)

df[dsb4];

ABCDEa10−1−2−3c54321

(17)
  

You can put the previous few computations on a single line, too:

df[df[5] <~ 0 xor isprime~(df[1])];

ABCDEa10−1−2−3c54321

(18)

Selecting with a Boolean DataFrame

• 

If you index a DataFrame df with a Boolean DataFrame dfb, then Maple determines for each column of df if there is a column in dfb with the same label. It then determines for each row of df if there is a row in dfb with the same label. Finally, it creates a new DataFrame df_result with the same rows and columns as df, and for each entry, if there is a row and column with the same label in dfb, it determines if the corresponding value in dfb is true. If this is the case, the entry is copied from df to df_result. If the row label is missing from dfb, or the column label is missing from dfb, or the corresponding value in dfb is false or FAIL, then Maple fills that entry with the number 0.

dfb := DataFrame(Matrix(3, 5, (i, j) -> not (isprime(i) xor isprime(j))), rows=[b,c,d], columns=[A,B,D,E,F]);

dfbABDEFbtruefalsefalsetruefalsecfalsetruetruefalsetruedfalsetruetruefalsetrue

(19)

df[dfb];

ABCDEa00000b3000−1c04020

(20)

Selecting and Rearranging Rows and Columns

• 

If you index a DataFrame with two indices, like df[row, colb], the first index will be interpreted as selecting some of the rows and the second as selecting some of the columns.

• 

For both of these indices, you can choose any of the forms discussed above, except for a Boolean DataFrame: selecting a single entry, a range of entries, a subset of entries using Boolean DataSeries objects, or a potentially reordered subset using a list (or Vector or Array or DataSeries).

• 

If both indices select a single entry, then the value corresponding to that entry is returned. If one of the two selects a single entry and the other selects a subset or range, the result is returned as a DataSeries. (This applies to both subsets of columns and subsets of rows.) If both indices select a subset or range, Maple returns a DataFrame.

• 

What follows are a few examples. First, we obtain a single value by using two indexes selecting a single row and column each. You can use labels or positions.

df[2, B];

2

(21)
• 

Using .. as the first index means the second index selects columns.

df[.., 3];

a−1b1c3

(22)

df[3];

a−1b1c3

(23)
• 

Using .. as the second index allows for selecting a single row, returned as a DataSeries.

df[3, ..];

A5B4C3D2E1

(24)
• 

If you specify the row using a list, even if it's a single row, you get a DataFrame back, rather than a DataSeries.

df[[3], ..];

ABCDEc54321

(25)
• 

You can mix arbitrary selection formats; for example, Boolean DataSeries row selection with list column selection.

df[dsb2, [3, 1, 2]];

CABa−110b132

(26)
• 

You can also do it the other way around: use a Boolean DataSeries for column selection and an Array for row selection.

dsbc := DataSeries([true, false, FAIL, true, true], 'labels' = [A, F, E, D, C]);

dsbcAtrueFfalseEFAILDtrueCtrue

(27)

df[[-1, -3], dsbc];

ACDc532a1−1−2

(28)

Reassigning Columns

• 

If you use an indexed DataFrame as the left-hand side of an assignment, as in df[col] := value, you can modify some columns in a DataFrame.

• 

The index col determines which columns are overwritten, according to the same rules as for selecting and rearranging columns explained above.

• 

The labels of the DataFrame are never modified.

• 

For each type of indexing other than for modifying a single entry, there is special treatment of value if it is a Vector, an Array, a Matrix, a DataSeries, or a DataFrame. This is described below. Two-dimensional Arrays are treated like Matrices and one-dimensional Arrays are treated like Vectors; for the sake of brevity, they are left out of the discussion below.

Modifying a single column

• 

Let n be the number of columns of df. This case applies if col is a nonzero integer between -n and n, or a column label of df.

• 

If value is a Matrix or DataFrame and it has a single row or column, then Maple will interpret it as a row or column Vector and process it as explained below. Otherwise, an error is raised.

• 

If value is a Vector or a DataSeries, then it needs to have exactly the same number of elements as there are rows in df. The entries are matched in order. If value is a DataSeries, its labels are ignored.

• 

Otherwise, each entry in the selected column is assigned value.

df[2] := 3;

df23

(29)

df;

ABCDEa13−1−2−3b3310−1c53321

(30)

df[C] := <2, 4, 5>;

dfC245

(31)

df;

ABCDEa132−2−3b3340−1c53521

(32)

df[-2] := Matrix([3, -1, 2]);

df−23−12

(33)

df;

ABCDEa1323−3b334−1−1c53521

(34)

Modifying a range of columns

• 

Let n be the number of columns of df. This case applies if col is of the form a .. b, where a and b are either missing (so that we have .. b or a .. or  .. ) or nonzero integers between - n and n, or labels of df.

• 

If value is a Matrix or DataFrame and it has the same number of rows as df and as many columns as are referenced by the range col, then the entries are matched in order and overwritten. Otherwise, it needs to have a single row or column, and Maple will interpret it as a Vector and process it as explained below. If value is a DataFrame, its labels are ignored.

• 

If value is a Vector or a DataSeries, then it needs to have exactly the number of rows of df. Every column is overwritten with these values. They are matched to the rows in order. If value is a DataSeries, its labels are ignored.

• 

Otherwise, each entry in the range is assigned value.

df[D .. -1] := 3;

dfD..−13

(35)

df;

ABCDEa13233b33433c53533

(36)

df[C .. -2] := <1, 2, -2>;

dfC..−212−2

(37)

df;

ABCDEa13113b33223c53−2−23

(38)

Modifying arbitrary columns

• 

This case applies if col is a Vector, a list, or an Array.

• 

Like in the case of selecting or rearranging entries, explained above, the values in col are interpreted as positions if all of them are integers between -n and n, where n is the number of columns of df. Otherwise, the values in col are interpreted as labels. Both zero positions and missing labels are ignored.

• 

If value is a Matrix or a DataFrame object and it has exactly as many rows as df does, and as many columns as col has entries, then for all i and j, the ith entry of column col[j] of df is overwritten with value[i, j].

  

Otherwise, value needs to have a single row or column, and Maple will interpret it as a Vector and process it as explained below.

• 

If value is a Vector or a DataSeries object, then it must have exactly as many entries as df has rows. For each column that is overwritten, Maple assigns value[i] to the ith entry of that column.

• 

Otherwise, value is assigned to each entry in each column that is overwritten.

df[<C, B>] := <4, 2, 1>;

dfC&comma;B421

(39)

df;

ABCDEa14413b32223c511−23

(40)

df[[2, 3]] := df[[4, 1]];

df2&comma;3DAa11b23c−25

(41)

df;

ABCDEa11113b32323c5−25−23

(42)

Modifying a DataFrame Using a Boolean DataSeries or a Boolean DataFrame

• 

This section describes how to use a calling sequence of the form df[bool] := value, where bool is a DataSeries or DataFrame of which all the entries are equal to true, false, or FAIL, the three Boolean constants.

• 

It is a little more efficient if the selected datatype for the indexing DataSeries is specified as either truefalse or truefalseFAIL, rather than the default of anything. However, this is not necessary. The same holds for each of the columns of the indexing DataFrame.

• 

There is special treatment of value if it is a Vector, an Array, a Matrix, a DataSeries, or a DataFrame. This is described below. Two-dimensional Arrays are treated like Matrices and one-dimensional Arrays are treated like Vectors; for the sake of brevity, they are left out of the discussion below.

Modifying a DataFrame using a Boolean DataSeries

• 

This case applies if bool is a Boolean DataSeries object. It selects rows to overwrite based on the index. In all cases, the rows of df that would occur in df[bool] are overwritten and the other rows are not.

• 

Consider the case where value is a Matrix. If it has the same number of columns as df, then for each i, if the ith row is overwritten, it is overwritten with values from the ith row of value. This means that value should typically have at least as many rows as df does (though strictly speaking, it is only necessary that it has as many rows as the position of the last occurring row that would occur in df[bool]).

  

Otherwise, value needs to have a single row or column, and Maple will interpret it as a Vector and process it as explained below.

• 

Consider the case where value is a Vector. For each i, if the ith row is overwritten, all its entries will be overwritten with value[i]. This means that value should typically have at least as many entries as df as rows (though strictly speaking, it is only necessary that it has as many entries as the position of the last row that would occur in df[bool]).

• 

Consider the case where value is a DataFrame. In this case, value needs to have the same number of columns as df. If a row is overwritten and the row label does not occur as a row label in value, then it is filled with 0. If the row label does occur in value, then Maple copies the entries of that row, in order, to df. The column labels are ignored.

• 

Finally, consider the case where value is a DataSeries object. In this case, if a row is overwritten and the row label occurs as a label in value, all entries of that row in df are overwritten with the corresponding entry in value. If the row label does not occur, the entries are overwritten with 0.

• 

In all other cases, each entry of df that would occur in df[bool] is overwritten with value.

dsb;

ctruebFAIL3falseatruecoconuttrue

(43)

df[dsb] := <5, 4, 3>;

dfdsb543

(44)

df;

ABCDEa55555b32323c33333

(45)

df2 := DataFrame(Matrix(4, 5, (i, j) -> i + 2*j), rows = [a, c, d, b], columns = [C, D, F, A, B]);

df2CDFABa357911c4681012d5791113b68101214

(46)

dsb2;

atruebtruecfalse

(47)

df[dsb2] := df2;

dfdsb2CDFABa357911c4681012d5791113b68101214

(48)

df;

ABCDEa357911b68101214c33333

(49)

df[dsb] := 5;

dfdsb5

(50)

df;

ABCDEa55555b68101214c55555

(51)

Modifying a DataFrame using a Boolean DataFrame

• 

This case applies if bool is a Boolean DataFrame object. It selects entries to overwrite based on the index. In all cases, the entries of df that would be copied into df[bool] are overwritten and the other entries are not.

• 

Consider the case where value is a Matrix. If it has a single row or column, Maple will interpret it as a Vector and process it as explained below. Otherwise, for each i and j, if df[i, j] is overwritten, it is overwritten with value[i, j].  This means that value should typically have at least as many rows as df does (though strictly speaking, it is only necessary that it has as many rows as the position of the last row that would be used to provide entries in df[bool], and as many columns as the position of the last column that would be used to provide entries in df[bool]).

• 

Consider the case where value is a Vector. For each i and j, if df[i, j] is overwritten, it is overwritten with value[i].  This means that value should typically have at least as many entries as df has rows (though strictly speaking, it is only necessary that it has as many entries as the position of the last row that would be used to provide entries in df[bool]).

• 

Consider the case where value is a DataFrame. If an entry is to be overwritten, then if its row and column label do not occur as a row and column label in value, respectively, it is overwritten with 0. If the row and column label do occur, the entry is overwritten with the entry of value that has the corresponding row and column label.

• 

Finally, consider the case where value is a DataSeries object. In this case, if an entry is to be overwritten, then if its row label does not occur as a label in value, it is overwritten with 0. If the row label does occur, the entry is overwritten with the entry of value that has the corresponding label.

• 

In all other cases, each entry of df that would occur in df[bool] is overwritten with value.

df[dfb] := 23;

dfdfb23

(52)

df;

ABCDEa55555b238101223c5235235

(53)

df2;

CDFABa357911c4681012d5791113b68101214

(54)

df[not~ dfb] := df2;

df`~`notdfb&comma; $CDFABa357911c4681012d5791113b68101214

(55)

See Also

DataFrame

DataFrame,constructor

DataFrame,Guide

DataSeries