5/29/2017 - A Basic Introduction to Multidimensional Analysis Using Flow

Overview


Many business decisions require analysis of data from transactional, operational, and other data sources. For example, an SEO analyst may want to compare two versions of a web page to determine which leads to more conversions (commonly referred to as A/B testing or split testing). The analyst may wish to explore the relationship between page version A, page version B, and weekday to determine if any combination of these result in greater conversions. To perform this analysis, the analyst must aggregate sales and website data then group it into two categories: dimensions and measures. The analyst must then calculate and compare summary measures for each combination of these to determine if a statistically significant difference exists.

The above is very simple example of multidimensional analysis. It is also an example of analytics-oriented processing, a term used to describe analytics oriented activities such as gathering data, classifying and summarizing it, performing calculations on it and, finally, analyzing and presenting the results.

Flow is a platform for building and running workflows that automate analytics-oriented processes. The Flow platform provides end-to-end analytics-oriented processing via three primary components: datasets, hypercubes, and workflows (see sidebar).

In this post, I'll provide a simple example of multidimensional analysis using Flow. I'll show a step-by-step walk through that loads a set of data, builds a hypercube, performs summary calculations and, finally, produces several pivot table results using the summary calculations. This example will also provide a basic illustration of analytics-oriented processing.

Note - The data used in this example is one of the sample datasets available to registered users of Flow. To try this example, just add the sample data to your Flow account then follow the steps outlined below.

Using Flow to Perform Multi-Dimensional Analysis

Some Definitions - Datasets, Measures, Dimensions, and Hypercubes

Datasets

A dataset is simply a collection of related data. Analytical datasets often consist of one or more columns grouped into one or more rows. A column represents a named subset of data elements of a particular data type (for example string, integer, or date). Each column element contains a data point value allowed by its data type, and each row contains one of each column.

Measures and Dimensions

A measure, also referred to as a fact, represents an observed, derived, or recorded quantity such as a count, distance, or ratio.

A dimension is a property that can be used to group and classify measures. Dimensions are often date/time, geographic, or demographic values. For example, day, year, city, state, and male or female. When naming dimensions, it is good practice to:

  • Use meaningful singular nouns or present tense verbs
  • Use names that are descriptive and self-documenting
  • Use names that are easily distinguishable

Hypercubes

Performing multidimensional analysis on two-dimensional data sets is exceedingly complex. Flow provides a hypercube data structure that handles the underlying complexity of organizing and managing measures and dimensions. Hypercubes also facilitate and optimize any computational operations applied to measures.

The Sample Data

Our sample data is intended to represent a simple (and admittedly contrived) set of A/B test data. It contains five data points: Visit, Conversion, Day, Year, and Site. There are two measures: Visit and Conversion, and three dimensions: Day, Year, and Site. Also, each of our dimensions has a set of unique members values; for example, Year has the member values 2016 and 2017, and Site has member values A and B.

Primary Components of Flow

Dataset - A dataset is a self-describing, generic data container designed to hold data from any external source.

Hypercube - A data container designed to hold, and provide optimized access to, multidimensional data.

Workflow - A user defined series of steps that perform analytics-oriented processing.

Note - The term workflow, as used in the context of Flow, requires a bit more explanation. In Flow, a workflow is a high-level functional language optimized for analytics-oriented processing. Workflow actions, functions, and expressions provide a level of abstraction beyond lower level languages such as R, SQL or Python. This layer of abstraction enables non-technical end users to perform analytics-oriented processing via a series of user- configured actions.

Sample A/B Test Data

MeasuresDimensions
VisitConversionDayYearSite
350 233 1 2016 A
91 57 1 2016 B
1612822016A
938 426 2 2016 B
542 122 3 2016 A
18918132016B
348 5 4 2016 A
69324442016B
672 624 5 2016 A
954 193 5 2016 B
71437162016A
705 144 6 2016 B
886 375 7 2016 A
423572016B
376 351 1 2017 A
1061512017B
124 26 2 2017 A
222 76 2 2017 B
1528032017A
442 56 3 2017 B
76411442017A
296 62 4 2017 B
34734452017A
387 153 5 2017 B
821 619 6 2017 A
56847862017B
568 422 7 2017 A
3257872017B
12,7835,912

Getting Started With Flow

If you already have an account click here to login, otherwise, you can sign up for an account here.

Walk-Through: A Four-Step Analytics Workflow


In this hands-on example, I'll walk through the development of a four step workflow that will do the following:

  • Load the sample dataset.
  • Build a hypercube from the sample dataset.
  • Apply hypercube expressions to compute sums, averages, and also find the maximum number of Conversions for any given Site, Day, or Year or combination there of.
  • Create and save a pivot table report result.

To get started, you'll need to add the sample data to your Flow user account.

Add Sample Data


After logging into Flow, you'll first need to add the "Sample AB Test Data" dataset to your Flow account. Click on the down arrow down arrow button in the top menu bar then click on the sample data sample data icon to open the Add Sample Dataset dialog, as follows:

sample dataset

Click the ADD link next to the Hypercube Sample Data entry. The sample data will be added to your account.

Open Cloud Connect


Flow provides an integrated development environment for building analytics-oriented workflows called Cloud Connect.

Cloud Connect is a ClickOnce deployed desktop application for building and managing workflows. To launch Cloud Connect, open the Workflows menu in the left sidebar of the Flow portal then click the Launch Cloud Connect, see below:

workflow menu

Depending upon your browser, you may need to add an extension to enable ClickOnce and launch Cloud Connect. Click on the View Cloud Connect Requirements link beneath the Launch Cloud Connect button to learn the requirements for your particular browser.

If this is your first time using Cloud Connect, it may require a minute or so to install. Once the installation has completed, the application will open automatically.

Add a New Workflow


To get started, we'll need to add a new workflow. From the top menu, click the Add Workflow button. The Add New Workflow dialog will appear as shown below:

add workflow dialog

Enter the information as shown above then click OK. The new workflow will appear in the Workflowslist on the left-side of your screen. To open the workflow for editing, simply double click its name. Alternatively, you can right-click the name and then select Open from the context menu.

Add Workflow Steps


From the Workflows list, open the workflow for editing by double-clicking its name or right-clicking the name then selecting Open.

open workflow

Step 1 - Add a Load Data Workflow Step

Add a Load Dataset step by clicking on the actions menu menu and selectingLoad Dataset from the drop-down menu, the Load Dataset Dialog will display as follows:

load dataset

Click OK to add the Load Dataset workflow step.

The Cloud Connect development environment provides a run-time engine that allows users to run workflow steps and view results. To run the workflow now. Click the run button run button in Workflow Menu, the following prompt will appear:

run all dialog

Click Yes or press Enter to run the workflow. When the workflow completes, the Sample AB Test Data dataset will display under the Working Data tab, see below:

ab sample data display

Step 2 - Add a Build Hypercube Step

With our working data loaded into memory, we can now add a Build Hypercube step. From the Actions drop-down menu, select Hypercubes then Build Hypercube. The Build Hypercube dialog box will appear as shown below.

build hypercube dialog

Select Hypercube Dimensions

  • From the Working Data drop down list, select the Sample AB Test Data dataset,a check box list of possible dimensions will be displayed.
  • Add dimensions for Site, Visit, and Year by checking the box next to each datapoint name.
  • Enter AB Test Cube in the Hypercube Name text box.
  • Click OK to add the workflow step.

After clicking OK, a Build Hypercube workflow step will be added to the workflow Action List.

Run the workflow again from the start by clicking the run button button in Workflow Menu bar and answering Yes when prompted. The AB Test Cube will be created and displayed under the Working Data tab, as shown below:

ab test cube

Note - The Working Data tab displays a flattened, or two-dimensional, view of a hypercube to make it easier to read.

Step 3 - Add a Hypercube Expression Evaluation Workflow Step

At this point, we've loaded our Sample AB Test Data dataset (a dataset loaded into Flow is called working data) and built a hypercube from it. We now want to compute summary values for specified measures in the AB Test Cube. We can specify measures using Hypercube Expressions. Flow provides an extensive library of functions and operations for computing summary values across hypercube dimensions. We are going to add an Expression Evaluation step that will sum the Visit and Conversion measures across each of their dimensions.

From the Actions drop-down menu, select Expression Builder, the Expression Builder dialog will load, see below:

expression builder

Steps required to add an expression. Expression Example: Max of Conversions

  • Select the AB Test Cube collection from drop-down list in Expression Builder's Working Data box.
  • Check the Hypercube check box next to the Collection drop-down list.
  • In the Select Expression and Operation Type box, Select the Stat expression type from the Expression Type drop-down list.
  • Next, select the ArithmeticMean statistical operation from the Operation drop-down list.
  • Within the Build Expression box, under Input 1, use the Datapoint select list to select the Conversion datapoint.
  • In the Result box, enter a name for the expression result, in this case: Max Conversion
  • Click the Add Expression button to add the expression to the list of expressions that will be evaluated when this workflow step runs.

When you are done adding expressions, click OK. The Expression Evaluation workflow step will be added to the workflow steps list.

Run the workflow again and view the results. Click the run button button in the workflow menu bar and answer Yes to the prompt. When the workflow completes, the working data tab will display a flattened view of hypercube which includes our expression evaluation results, see below:

hypercube data

Here are the workflow steps we have added thus far:

workflow step display

In just three steps we've loaded our data, created a hypercube, and computed summary values for a set of measures.

We now want to view and/or share the results of our effort. To accomplish this, we'll add a Result Step to our workflow. A Result Step creates and updates workflow results, which include datasets, reports, and charts, among others. Results can be used to create dashboards, delivered to internal or external users via email or subscription, or as input to other workflows or external applications.

Step 4 - Add a Pivot Table Result Step

We will add a workflow step that generates a Pivot Table Result showing Total Visits by Day and Site.

From the Results drop-down menu, select the Pivot Table menu item to display the Pivot Table Result dialog box, as shown below:

pivot table result

Steps required to add a Pivot Table Result

  • Select the Hypercube that will provide data to the pivot table, in this case AB Test Cube
  • Select a column dimension from theColumn Data drop-down list, in this case Site
  • Select a row dimension from the Row Data drop-down list, in this case Day
  • Now select a hypercube value (these are the values computed by our expressions) to display from the Cell Data drop-down list
  • Once you have selected the row, column, and cell a preview of the pivot table is displayed.
  • To add the pivot table result step, enter a result name, title, and optionally description, then click OK

Pivot Table Result Viewed in the Flow Portal

pivot table

Summary


This post provides a basic, hands-on introduction to multidimensional analysis using Flow. A brief introduction to multidimensional analysis and analytics-oriented processing was given using the analysis of A/B testing web pages as an example. A definition datasets, measures, dimensions, and hypercubes is provided as background. Finally, a hands-on, step-by-step working example of a four-step analytics workflow is developed.