Exporting Einstein Analytics Datasets

The Summer '20 release of Einstein Analytics introduced an awesome new feature that allows you to export  datasets from Einstein Analytics.  This sounds like a simple thing, but it's an answer to a very common question found on the forums. Prior to this release there really wasn’t a good solution.   Check out the release notes and documentation for some more details.  When I set this up for the first time there were a few things that were a bit unclear. Hopefully this article will clear things up! After figuring out the basics, it's actually incredibly easy to use and seems to scale nicely (exported 146,000+ rows with no issues).

Preparing AWS

There are a couple of steps that you need to take on the AWS side of things before you begin.

  1. You will need to get your AWS user account setup correctly
  2. You will need to prepare a bucket and several folders in S3

Amazon User Credentials

First, you need an AWS user that has permissions to access S3.  

Next, you will need to generate access keys.  Einstein Analytics will use these access keys to authenticate with AWS.  

  1. Sign into the AWS Console
  2. Navigate to your user in the IAM Management Console
  3. Click on Security credentials and then click the Create access key button (Note after these keys are generated and you view the secret key for the first time, you cannot retrieve it again.  So store it in a safe place, otherwise you’ll have to generate new keys.)
No alt text provided for this image

S3 Setup

Next you will need to create a location for your Einstein Analytics exports to be stored.  You need one bucket and a folder for every dataset that you want to export.  

  • Sign into the AWS Console
  • Navigate to the S3 Management Console
  • Click Create Bucket
  • Give your bucket a name.  Note that the name cannot contain capital letters or spaces.  I called mine my-ea-exports
Create Bucket and name it.
  • On the options page, I turned on default encryption
Enable bucket encryption
  • On the permission page, I left the default to block all public access.
Block all public access on bucket
  • Next, I created a folder to hold all my data sets.  This step is optional.  I called my folder datasets.  In a moment, we will setup the S3 connector in Einstein Analytics and we will use my-ea-exports/datasets as the folder in the connector.
  • Finally, create a folder for each dataset that you want to export.  You will need these folders when you setup the recipe in Einstein Analytics.  Here’s how my final bucket and folder looked.
Folder structure in S3

I now have everything setup to export two datasets (Accounts and Cases) via one connector (using my-ea-exports/datasets as a destination).

Einstein Analytics Configuration

Now that we have all the AWS setup complete, there are a few things that need to be configured in Einstein Analytics.

Enabling Beta Features

There are two beta features that you need to enable in your org.  

  • Navigate to Analytics Settings in Setup
  • Turn on Enable Amazon S3 output connection and Use the new version of data prep (Beta)
Enable beta features on Analytics Settings Setup page
  • Click Save

Configuring the S3 Output Connector

Now we need to configure the output connector in Einstein Analytics.  If you’ve configured Data Sync before, either with your local org, external orgs or other external connectors, you’ll find this in the same place.  

  1. Navigate to the Data Manager in Analytics Studio
  2. Click on Connect and then the Connect to Data button
  3. Click on the Output Connections tab and then click Add Connection
  4. Click Amazon S3 Output Connector
  5. Fill out the connection information.
  • Connection Name: Label for your connection
  • Developer Name: API Name of your connection (must follow Salesforce API naming conventions)
  • Description: A description of what your connector is for
  • Secret Key: Secret Access Key from the AWS access key you generated earlier
  • Region Name: US West(Oregon)  I really struggled with this.  The connector seems to want a very specific name for the region and it seems to use a format that I haven’t seen anywhere else.  My bucket was in US West (Oregon).  I’ve seen AWS use official names like us-west-2 and Regions.US_WEST_2 for this but none of these worked.  After playing around with this a bit, I finally figured out that it wanted US West(Oregon).  Notice that there is no space between West and (Oregon).  Also note that both the UI and the documentation list this field as optional. I could not get it to work unless I specified a region in this very specific format.
  • Folder Path: my-ea-exports/datasets Recall that when you created the folders you had to create a folder for each dataset.  This folder path must not contain the folder name for the dataset you plan to export.
  • Access Key: Access Key ID from the AWS access key you generated earlier
Connector setup page

As soon as you get the connector to save and test successfully you are ready to start building your export using Data Prep.

Exporting your dataset using Data Prep

I’m super excited about the future of dataflows and data recipes in Einstein Analytics: Data Prep.  With Summer ‘20, we have access to a beta version of Data Prep. One of the many new features in Data Prep is the ability to export datasets to S3.  Now that we have finished our configuration on AWS and in Einstein Analytics, we are ready to build the recipe to export the data.

From within Data Manager, click on Dataflows & Recipes, Recipes, and the down arrow next to Create with Data Prep so you can select Create with Data Prep (Beta).

No alt text provided for this image

Now, select the datasets you want to export.  I selected my datasets called AccountDataset and CaseDataset.  Add an output node for each of these.  For each output node you need to specify the following:

  • Write to: Output Connection
  • Connection Name: Select the name of the connector we just configured
  • Object Name: The folder that you want to export to.  Note that the first time you configure this screen, the Object Name will be disabled for a moment.  I originally thought something was wrong.  However, the UI is just making a connection to AWS to get a list of folders to populate the drop down list for you.
Output node configuration

Now you can save and run your dataflow.  

Quick side note: The Data Prep editor will let you build all of the above with an object synced from Salesforce as a data source.  However, every time I tried doing this, I got an error while running the recipe.  It seems to be a bug in the beta version of the data prep editor.  So for now, I would only recommend trying to export actual datasets and avoid using synced Salesforce objects directly.

Checking Results

Check your S3 bucket.  You should have 3 files in each folder.  There is a CSV with the actual data, a JSON file with the metadata you would need to import this data back into Einstein Analytics and an empty file with a title that tells you if it was successful or not.

Results in S3 folder

Wrapping up

In this article we’ve looked at everything needed to be able to export datasets from Einstein Analytics to Amazon S3.  We started with prepping credentials and a bucket in Amazon and then turning on features, setting up connections and building a data recipe in Salesforce.  This is a great addition to the data transformation capabilities of Einstein Analytics.  

Use cases at Halosight

At Halosight we’ve already put this feature to use.  Halosight is an augmented analytics solution that analyzes unstructured data stored in Salesforce and surfaces insights gained from our Natural Language Processing pipeline in Einstein Analytics.  For example, if you wanted Halosight to analyze your cases, we would do the following:

  1. Look at all the unstructured data attached to your cases in Salesforce (case comments, chatter, notes, descriptions, email, LiveChat transcripts).  
  2. Extract useful insights from that data (Entity mentions and attributes; Events including the who, what when, where and why of the event; Sentiment; and HaloIQ scores).  
  3. Combine these insights with your case structured data (Case Origin, Case Reason, Status) and present the findings in Einstein Analytics.  

We call this final set of data HaloData™ (HaloData™ = Structured Data + Insights from Unstructured Data).  Several of our beta customers have asked us to also make HaloData™ available to load into their data warehouse.  This new feature in Einstein Analytics allows us to do this without having to do any extra engineering work. One of the many benefits of building our application on the Salesforce platform!

Beta “issues”

I put the word “issues” in quotes because export to S3 and the data prep are both still in beta.  I was able to successfully use these features, but I did run into a couple of things that would be nice to get fixed or enhanced before these features becomes generally available.

  1. Region field on the connector setup page.  At a minimum some documentation needs to be provided on the allowed region names.
  2. Support for synced Salesforce objects.  The data prep UI will let you select a synced Salesforce object as the data source in the recipe, but the recipe always errored out for me when I did this.
  3. Improved UI for the object name field on the output connector.  The first time I used this, I was sure that it was broken because the field was disabled while it was retrieving folder names from S3.  I didn’t notice when the field did enable itself after a moment.  To be clear, everything works here and once I knew what was happening, I had no trouble using this.  I just think it could be clearer to the end user.  
  4. An option to put column names in a header line in the CSV.  The generated CSV does not contain column names.  If you want to know the column names, you can look them up in the JSON that got generated.  However, that isn’t very useful unless you plan on loading this data right back into Einstein Analytics.