The join block is used to create a join between two tables. It allows joins on multiple columns, and can be configured to provide all of the standard join types. (Inner, outer, left right). To illustrate its operation, we'll use two simple text files:
- Departments.txt - which contains a list of departments in a company
- Employees.txt - which contains a list of employees in a company
Every employee can belong to one department, and the two tables can be joined together on the DEPT_ID field which is present in both. To do the join, drag and drop a join block onto the canvas containing the two import blocks, and connect one to each input of the join block. By default, if a single column name in each table match this column will be configured as the joining column;
Once the block has run, you will notice that the venn diagram to the left will show the number of rows for each of the possible join results. When joining two tables, there are three possible sets of data:
- Rows from table 1 that do not join with any rows in table 2. (In this case, departments that do not have any employees)
- Rows that are the result of a successful join. (Employee rows joined to department rows.)
- Rows from table 2 that do not join with any rows in table 1. (In this case, employees that have a department code that does not match any code in the department table)
Depending on what you are trying to achieve, any one of these sets, or any combination of them may be the desired output for the block. To configure what you want, simply check all the sets that you want included in the output. The venn diagram will show in green which rows will be included.
Additionally, the block has three separate outputs for each of the components, to make it possible to define and output, and additionally use non-joined or joined data sets for other transformations on the canvas.