TIBCO Spotfire：A Comprehensive Primer（Second Edition）

上QQ阅读APP看书，第一时间看更新

Understanding the basic row/column structure of a data table

Before we get started with some examples, a basic understanding of the row/column structure of a Spotfire data table is essential for data analysis and report authoring.

The columns in the dataset represent how the information has been categorized. They exist even if there is no data. Most people these days are familiar with Microsoft Excel. When you start a new spreadsheet, one of the first things I suspect you to do is decide what types of information you are going to add; for example, using First Name, Last Name, and Department in column headings in a simple human resources spreadsheet.

Once you have structured your spreadsheet with column headings, you begin to add the actual information, row by row. Your columns don't usually change in number or description, but your rows grow and shrink in number, and corrections might be made to the information at any time.

One important distinction between Spotfire data tables and spreadsheet worksheets is the way in which the visual layout of the rows in a spreadsheet is represented. You might, for instance, not repeat a department value on a spreadsheet until it changes; you might merge cells to improve the look and feel. You cannot do that with Spotfire data tables. If the department column value for the first three rows is Marketing, then Marketing must be repeated in each row.

The following diagram shows the structures of Excel and Spotfire data together. On the left is how you might represent data in Excel—with merged row headers. On the right is how data must be structured in Spotfire:

If a data cell is not defined in the underlying data, Spotfire regards it as (Empty) and represents this fact within the data table. (Empty) and null are different ways of representing the same concept in Spotfire.

In Spotfire, data is filtered and selected through references to column names. For example, you could filter the Department column to show all the records for Marketing. The answer might be zero rows or several million rows, depending on how many records have Marketing in the Department column.

If you want to make a calculation in a data table, such as sale amount minus cost amount, Spotfire will apply the calculation across all rows. It's not possible (like in Excel) to have a per-row calculation or formula. However, it is possible to group values and perform calculations between rows using Spotfire's OVER calculation functions, which are explained in detail in Chapter 7, Source Data is Never Enough.

The fact that the calculation is applied over the entire column is a key distinction between data tables and spreadsheets. It might seem like a limitation in a data table, but the discipline of that structural integrity ultimately allows you to create very powerful analyses. There are ways to change the structure of a data table into new forms to support a particular calculation requirement. These are also covered in Chapter 7, Source Data is Never Enough.

Reports and visualizations are built around column names. For example, you might want to create a simple sales by region visualization. What you are doing is putting the sales column against the region column and asking Spotfire's visualization engine to populate the chart or graph with whatever row values are present beneath those columns at any given time.

This section was a very brief introduction to the way that data is structured in Spotfire. We have covered how data is stored in rows and columns and that each row and column must have a value so that Spotfire can work with it. As a reminder, we cannot reference individual "cells" within a Spotfire data table, at least not in the way that we can in Excel and other similar tools.