The Calculation Block is used to modify the Colums of a data set. Existing Columns can be discarded or renamed, and new Columns can be added based on calculations or transformations on existing columns.
Block Connection Stubs
Editing output columns
The control for the calculation block allows you to modify which columns exist, with what names, and in what order at the output of the block. New columns that are defined using calculations on columns available from the input (or other calculated columns within the calculation block) can be added.
The following buttons and controls allow for the definition of what columns will exist on the output of the calculation block;
Using functions and expressions in Datamartist
As a general rule, when a function used by Datamartist exists in MS Excel, Datamartist uses the same syntax. In this way, users of Excel should find many of the functions and expressions very familiar.
Expressions are used in Datamartist in a number of blocks, but all follow the same syntax and in most cases use the same expression editor;
Using the expression editor
The expression can simply be typed into the expression pane, or by double clicking on either column names in the column picker, or functions or operators in the function reference pane, the expression will be inserted at the last location of the cursor.
[Colname] Columns are refered too using square brackets. Both columns from the input of the block, and columns that are created within the block can be used. In cases where an output column has the same name as an input column use [Original.ColName] to explicitly refer to the input column.
Datamartist is case insensitive
Datamartist does not take into account upper or lower case when comparing strings and this is also true for column names- COL1 is considered to have the same name as col1 and cOl1 and cOL1.
"string" string constants should be enclosed in quotes within an expression. The & operator is used to combine strings, just as in MS Excel.
Example: [First_Name] & " " & [Last_Name] would return the full name with a space between the first and last name if the block data contained the columns First_Name and Last_Name
Datamartist supports a number of date functions.
Because of its flexible data typing, dates are all treated as date times, and can be operated on by adding and subracting either whole or decimal days.
- 01/01/2007 - 1 = 31/12/2006
- 01/01/2007 + 0.5 = 01/01/2007 12:00:00 PM
- 01/01/2007 + 15 = 16/01/2007
- 01/01/2007 - 10/01/2006 = 92
The date format in the V1.0 version (this will become more configurable in future versions) is based on the Windows default culture setting. I.E. mm/dd/yyyy vs dd/mm/yyyy
Datamartist supports boolean logic using the same method as MS Excel, namely through the use of a series of functions;
- IF( Boolean Expression, Result if True, Result if False)
- AND(Boolean Expression 1,Boolean Expression 2,...)
- OR(Boolean Expression 1,Boolean Expression 2,...)
- NOT(Boolean Expression)
Data type flexibility
The Datamartist expression engine performs data type conversion in a similar fashion to spreadsheet engines, converting if possible rather than throwing an error. This makes it extreemly flexible particularly when dealing with data quality issues. Some examples:
- LEFT("test",3) returns "tes"
- LEFT("test","3") returns "tes" as well, alghough strictly speaking the LEFT function requires an integer.
- LEFT("test","3rd") returns an error
- LEFT("test",LEFT("3rd",1)) returns "tes"
- "What is 3 times 5? It is " & 3*5 returns "What is 3 times 5? It is 15"
- "Hello" + 3 returns an error. Hello cannot be converted to a number.
- "Hello" & 3 returns "Hello3". 3 can be converted to a string.
Data storage flexiblity in regards to data type.
Not only can expressions use data type flexibly, any given column in a datamartist data set can contain data of any data type. A column can hold strings, numbers, dates and boolean values. This is obviously useful when loading data from multiple data sources that might have different data types to start, and then cleaning them. For example, merging two data sets, where in one data set the field [Survey_Response] contains "Yes to survey" "No to survey" and another where the field is a boolean value, can be resolved with a nested IF statement- without any data type errors.