Many business decisions require analysis of data from transactional, operational, and other data sources. For example, an SEO analyst may want to compare two versions of a web page to determine which leads to more conversions (commonly referred to as A/B testing or split testing). The analyst may wish to explore the relationship between page version A, page version B, and weekday to determine if any combination of these result in greater conversions. To perform this analysis, the analyst must aggregate sales and website data then group it into two categories: dimensions and measures. The analyst must then calculate and compare summary measures for each combination of these to determine if a statistically significant difference exists.
The above is very simple example of multidimensional analysis. It is also an example of analytics-oriented processing, a term used to describe analytics oriented activities such as gathering data, classifying and summarizing it, performing calculations on it and, finally, analyzing and presenting the results.
Flow is a platform for building and running workflows that automate analytics-oriented processes. The Flow platform provides end-to-end analytics-oriented processing via three primary components: datasets, hypercubes, and workflows (see sidebar).
In this post, I'll provide a simple example of multidimensional analysis using Flow. I'll show a step-by-step walk through that loads a set of data, builds a hypercube, performs summary calculations and, finally, produces several pivot table results using the summary calculations. This example will also provide a basic illustration of analytics-oriented processing.
Note - The data used in this example is one of the sample datasets available to registered users of Flow. To try this example, just add the sample data to your Flow account then follow the steps outlined below.
Using Flow to Perform Multi-Dimensional Analysis
Some Definitions - Datasets, Measures, Dimensions, and Hypercubes
A dataset is simply a collection of related data. Analytical datasets often consist of one or more columns grouped into one or more rows. A column represents a named subset of data elements of a particular data type (for example string, integer, or date). Each column element contains a data point value allowed by its data type, and each row contains one of each column.
Measures and Dimensions
A measure, also referred to as a fact, represents an observed, derived, or recorded quantity such as a count, distance, or ratio.
A dimension is a property that can be used to group and classify measures. Dimensions are often date/time, geographic, or demographic values. For example, day, year, city, state, and male or female. When naming dimensions, it is good practice to:
- Use meaningful singular nouns or present tense verbs
- Use names that are descriptive and self-documenting
- Use names that are easily distinguishable
Performing multidimensional analysis on two-dimensional data sets is exceedingly complex. Flow provides a hypercube data structure that handles the underlying complexity of organizing and managing measures and dimensions. Hypercubes also facilitate and optimize any computational operations applied to measures.
The Sample Data
Our sample data is intended to represent a simple (and admittedly contrived) set of A/B test data. It contains five data points: Visit, Conversion, Day, Year, and Site. There are two measures: Visit and Conversion, and three dimensions: Day, Year, and Site. Also, each of our dimensions has a set of unique members values; for example, Year has the member values 2016 and 2017, and Site has member values A and B.
Primary Components of Flow
Dataset - A dataset is a self-describing, generic data container designed to hold data from any external source.
Hypercube - A data container designed to hold, and provide optimized access to, multidimensional data.
Workflow - A user defined series of steps that perform analytics-oriented processing.
Note - The term workflow, as used in the context of Flow, requires a bit more explanation. In Flow, a workflow is a high-level functional language optimized for analytics-oriented processing. Workflow actions, functions, and expressions provide a level of abstraction beyond lower level languages such as R, SQL or Python. This layer of abstraction enables non-technical end users to perform analytics-oriented processing via a series of user- configured actions.
Sample A/B Test Data