Monday, March 27, 2023
HomeBig DataSynchronize your Salesforce and Snowflake knowledge to hurry up your time to...

Synchronize your Salesforce and Snowflake knowledge to hurry up your time to perception with Amazon AppFlow


This put up was co-written with Amit Shah, Principal Guide at Atos.

Clients throughout industries search significant insights from the info captured of their Buyer Relationship Administration (CRM) techniques. To realize this, they mix their CRM knowledge with a wealth of data already obtainable of their knowledge warehouse, enterprise techniques, or different software program as a service (SaaS) functions. One extensively used strategy is getting the CRM knowledge into your knowledge warehouse and protecting it updated by frequent knowledge synchronization.

Integrating third-party SaaS functions is commonly difficult and requires important effort and improvement. Builders want to know the applying APIs, write implementation and take a look at code, and preserve the code for future API adjustments. Amazon AppFlow, which is a low-code/no-code AWS service, addresses this problem.

Amazon AppFlow is a completely managed integration service that lets you securely switch knowledge between SaaS functions, like Salesforce, SAP, Zendesk, Slack, and ServiceNow, and AWS companies like Amazon Easy Storage Service (Amazon S3) and Amazon Redshift in only a few clicks. With Amazon AppFlow, you possibly can run knowledge flows at enterprise scale on the frequency you select—on a schedule, in response to a enterprise occasion, or on demand.

On this put up, we give attention to synchronizing your knowledge from Salesforce to Snowflake (on AWS) with out writing code. This put up walks you thru the steps to arrange an information stream to deal with full and incremental knowledge load utilizing an instance use case.

Answer overview

Our use case includes the synchronization of the Account object from Salesforce into Snowflake. On this structure, you employ Amazon AppFlow to filter and switch the info to your Snowflake knowledge warehouse.

You may configure Amazon AppFlow to run your knowledge ingestion in three other ways:

  • On-demand – You may manually run the stream by the AWS Administration Console, API, or SDK name.
  • Occasion-driven – Amazon AppFlow can subscribe and pay attention to vary knowledge seize (CDC) occasions from the supply SaaS software.
  • Scheduled – Amazon AppFlow can run schedule-triggered flows based mostly on a pre-defined schedule rule. With scheduled flows, you possibly can select both full or incremental knowledge switch:
    • With full switch, Amazon AppFlow transfers a snapshot of all information on the time of the stream run from the supply to the vacation spot.
    • With incremental switch, Amazon AppFlow transfers solely the information which have been added or modified because the final profitable stream run. To find out the incremental delta of your knowledge, AppFlow requires you to specify a supply timestamp area to instruct how Amazon AppFlow identifies new or up to date information.

We use the on-demand set off for the preliminary load of information from Salesforce to Snowflake, as a result of it helps you pull all of the information, regardless of their creation. To then synchronize knowledge periodically with Snowflake, after we run the on-demand set off, we configure a scheduled set off with incremental switch. With this strategy, Amazon AppFlow pulls the information based mostly on a selected timestamp area from the Salesforce Account object periodically, based mostly on the time interval specified within the stream.

The Account_Staging desk is created in Snowflake to behave as a brief storage that can be utilized to establish the info change occasions. Then the everlasting desk (Account) is up to date from the staging desk by operating a SQL saved process that accommodates the incremental replace logic. The next determine depicts the assorted elements of the structure and the info stream from the supply to the goal.

The information stream accommodates the next steps:

  1. First, the stream is run with on-demand and full switch mode to load the complete knowledge into Snowflake.
  2. The Amazon AppFlow Salesforce connector pulls the info from Salesforce and shops it within the Account Information S3 bucket in CSV format.
  3. The Amazon AppFlow Snowflake connector hundreds the info into the Account_Staging desk.
  4. A scheduled job, operating at common intervals in Snowflake, triggers a saved process.
  5. The saved process begins an atomic transaction that hundreds the info into the Account desk after which deletes the info from the Account_Staging desk.
  6. After the preliminary knowledge is loaded, you replace the stream to seize incremental updates from Salesforce. The stream set off configuration is modified to scheduled, to seize knowledge adjustments in Salesforce. This permits Snowflake to get all updates, deletes, and inserts in Salesforce at configured intervals.
  7. The stream makes use of the configured LastModifiedDate area to find out incremental adjustments.
  8. Steps 3, 4, and 5 are run once more to load the incremental updates into the Snowflake Accounts desk.

Conditions

To get began, you want the next conditions:

  • A Salesforce person account with enough privileges to put in linked apps. Amazon AppFlow makes use of a linked app to speak with Salesforce APIs. For those who don’t have a Salesforce account, you possibly can join a developer account.
  • A Snowflake account with enough permissions to create and configure the mixing, exterior stage, desk, saved procedures, and duties.
  • An AWS account with entry to AWS Id and Entry Administration (IAM), Amazon AppFlow, and Amazon S3.

Arrange Snowflake configuration and Amazon S3 knowledge

Full the next steps to configure Snowflake and arrange your knowledge in Amazon S3:

  1. Create two S3 buckets in your AWS account: one for holding the info coming from Salesforce, and one other for holding error information.

A greatest observe when creating your S3 bucket is to ensure you block public entry to the bucket to make sure your knowledge isn’t accessible by unauthorized customers.

  1. Create an IAM coverage named snowflake-access that enables itemizing the bucket contents and studying S3 objects contained in the bucket.

Comply with the directions for steps 1 and a pair of in Configuring a Snowflake Storage Integration to Entry Amazon S3 to create an IAM coverage and function. Substitute the placeholders along with your S3 bucket names.

  1. Log in to your Snowflake account and create a brand new warehouse known as SALESFORCE and database known as SALESTEST.
  2. Specify the format by which knowledge shall be obtainable in Amazon S3 for Snowflake to load (for this put up, CSV):
USE DATABASE SALESTEST;
CREATE or REPLACE file format my_csv_format
sort = csv
field_delimiter=","
Y skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
compression = gzip;

  1. Amazon AppFlow makes use of the Snowflake COPY command to maneuver knowledge utilizing an S3 bucket. To configure this integration, comply with steps 3–6 in Configuring a Snowflake Storage Integration to Entry Amazon S3.

These steps create a storage integration along with your S3 bucket, replace IAM roles with Snowflake account and person particulars, and creates an exterior stage.

This completes the setup in Snowflake. Within the subsequent part, you create the required objects in Snowflake.

Create schemas and procedures in Snowflake

In your Snowflake account, full the next steps to create the tables, saved procedures, and duties for implementing the use case:

  1. In your Snowflake account, open a worksheet and run the next DDL scripts to create the Account and Account_staging tables:
CREATE or REPLACE TABLE ACCOUNT_STAGING (
ACCOUNT_NUMBER STRING NOT NULL,
ACCOUNT_NAME STRING,
ACCOUNT_TYPE STRING,
ANNUAL_REVENUE NUMBER,
ACTIVE BOOLEAN NOT NULL,
DELETED BOOLEAN,
LAST_MODIFIED_DATE STRING,
main key (ACCOUNT_NUMBER)
);

CREATE or REPLACE TABLE ACCOUNT (
ACCOUNT_NUMBER STRING NOT NULL,
ACCOUNT_NAME STRING,
ACCOUNT_TYPE STRING,
ANNUAL_REVENUE NUMBER,
ACTIVE BOOLEAN NOT NULL,
LAST_MODIFIED_DATE STRING,
main key (ACCOUNT_NUMBER)
);

  1. Create a saved process in Snowflake to load knowledge from staging to the Account desk:
CREATE or REPLACE process sp_account_load( )
returns varchar not null
language sql
as
$$
start
Start transaction;
merge into ACCOUNT utilizing ACCOUNT_STAGING
on ACCOUNT.ACCOUNT_NUMBER = ACCOUNT_STAGING.ACCOUNT_NUMBER
when matched AND ACCOUNT_STAGING.DELETED=TRUE then delete
when matched then UPDATE SET
ACCOUNT.ACCOUNT_NAME = ACCOUNT_STAGING.ACCOUNT_NAME,
ACCOUNT.ACCOUNT_TYPE = ACCOUNT_STAGING.ACCOUNT_TYPE,
ACCOUNT.ANNUAL_REVENUE = ACCOUNT_STAGING.ANNUAL_REVENUE,
ACCOUNT.ACTIVE = ACCOUNT_STAGING.ACTIVE,
ACCOUNT.LAST_MODIFIED_DATE = ACCOUNT_STAGING.LAST_MODIFIED_DATE
when NOT matched then
INSERT (
ACCOUNT.ACCOUNT_NUMBER,
ACCOUNT.ACCOUNT_NAME,
ACCOUNT.ACCOUNT_TYPE,
ACCOUNT.ANNUAL_REVENUE,
ACCOUNT.ACTIVE,
ACCOUNT.LAST_MODIFIED_DATE
)
values(
ACCOUNT_STAGING.ACCOUNT_NUMBER,
ACCOUNT_STAGING.ACCOUNT_NAME,
ACCOUNT_STAGING.ACCOUNT_TYPE,
ACCOUNT_STAGING.ANNUAL_REVENUE,
ACCOUNT_STAGING.ACTIVE,
ACCOUNT_STAGING.LAST_MODIFIED_DATE
) ;

Delete from ACCOUNT_STAGING;
Commit;
finish;
$$
;

This saved process determines whether or not the info accommodates new information that have to be inserted or present information that have to be up to date or deleted. After a profitable run, the saved process clears any knowledge out of your staging desk.

  1. Create a job in Snowflake to set off the saved process. Ensure that the time interval for this job is greater than the time interval configured in Amazon AppFlow for pulling the incremental adjustments from Salesforce. The time interval must be enough for knowledge to be processed.
CREATE OR REPLACE TASK TASK_ACCOUNT_LOAD
WAREHOUSE = SALESFORCE
SCHEDULE = 'USING CRON 5 * * * * America/Los_Angeles'
AS
name sp_account_load();

  1. Present the required permissions to run the duty and resume the duty:
  • As quickly as job is created it is going to be suspended state so must resume it manually first time
ALTER TASK TASK_ACCOUNT_LOAD RESUME;

  • If the function which is assigned to us doesn’t have correct entry to renew/execute job must grant execute job privilege to that function
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE SYSADMIN;

This completes the Snowflake a part of configuration and setup.

Create a Salesforce connection

First, let’s create a Salesforce connection that can be utilized by AppFlow to authenticate and pull information out of your Salesforce occasion. On the AWS console, ensure you are in the identical Area the place your Snowflake occasion is operating.

  1. On the Amazon AppFlow console, select Connections within the navigation pane.
  2. From the listing of connectors, choose Salesforce.
  3. Select Create connection.
  4. For Connection identify, enter a reputation of your alternative (for instance, Salesforce-blog).
  5. Go away the remainder of the fields as default and select Proceed.
  6. You’re redirected to a sign-in web page, the place it is advisable to log in to your Salesforce occasion.
  7. After you permit Amazon AppFlow entry to your Salesforce account, your connection is efficiently created.
           

 Create a Snowflake connection

Full the next steps to create your Snowflake connection:

  1. On the Connections menu, select Snowflake.
  2. Select Create connection.
  3. Present info for the Warehouse, Stage identify, and Bucket particulars fields.
  4. Enter your credential particulars.

  1. For Area, select the identical Area the place Snowflake is operating.
  2. For Connection identify, identify your connection Snowflake-blog.
  3. Go away the remainder of the fields as default and select Join.

Create a stream in Amazon AppFlow

Now you create a stream in Amazon AppFlow to load the info from Salesforce to Snowflake. Full the next steps:

  1. On the Amazon AppFlow console, select Flows within the navigation pane.
  2. Select Create stream.
  3. On the Specify stream particulars web page, enter a reputation for the stream (for instance, AccountData-SalesforceToSnowflake).
  4. Optionally, present an outline for the stream and tags.
  5. Select Subsequent.

  1. On the Configure stream web page, for Supply identify¸ select Salesforce.
  2. Select the Salesforce connection we created within the earlier step (Salesforce-blog).
  3. For Select Salesforce object, select Account.
  4. For Vacation spot identify, select Snowflake.
  5. Select the newly created Snowflake connection.
  6. For Select Snowflake object, select the staging desk you created earlier (SALESTEST.PUBLIC. ACCOUNT_STAGING).

  1. Within the Error dealing with part, present your error S3 bucket.
  2. For Select learn how to set off the stream¸ choose Run on demand.
  3. Select Subsequent.

  1. Choose Manually map fields to map the fields between your supply and vacation spot.
  2. Select the fields Account Quantity, Account Title, Account Kind, Annual Income, Energetic, Deleted, and Final Modified Date.

  1. Map every supply area to its corresponding vacation spot area.
  2. Underneath Extra settings, depart the Import deleted information unchecked (default setting).

  1. Within the Validations part, add validations for the info you’re pulling from Salesforce.

As a result of the schema for the Account_Staging desk in Snowflake database has a NOT NULL constraint for the fields Account_Number and Energetic, information containing a null worth for these fields must be ignored.

  1. Select Add Validation to configure validations for these fields.
  2. Select Subsequent.

  1. Go away every part else as default, proceed to the ultimate web page, and select Create Move.
  2. After the stream is created, select Run stream.

When the stream run completes efficiently, it’s going to carry all information into your Snowflake staging desk.

Confirm knowledge in Snowflake

The information shall be loaded into the Account_staging desk. To confirm that knowledge is loaded in Snowflake, full the next steps:

  1. Validate the variety of information by querying the ACCOUNT_STAGING desk in Snowflake.
  2. Wait in your Snowflake job to run based mostly on the configured schedule.
  3. Confirm that every one the info is transferred to the ACCOUNT desk and the ACCOUNT_STAGING desk is truncated.

Configure an incremental knowledge load from Salesforce

Now let’s configure an incremental knowledge load from Salesforce:

  1. On the Amazon AppFlow console, choose your stream, and select Edit.
  2. Go to the Edit configuration step and alter to Run stream on schedule.
  3. Set the stream to run each 5 minutes, and supply a begin date of At this time, with a begin time sooner or later.
  4. Select Incremental switch and select the LastModifiedDate area.
  5. Select Subsequent.
  6. Within the Extra settings part, choose Import deleted information.

This ensures that deleted information from the supply are additionally ingested.

  1. Select Save after which select Activate stream.

Now your stream is configured to seize all incremental adjustments.

Take a look at the answer

Log in to your Salesforce account, and edit any file within the Account object.

Inside 5 minutes or much less, a scheduled stream will decide up your change and write the modified file into your Snowflake staging desk and set off the synchronization course of.

You may see the small print of the run, together with variety of information transferred, on the Run Historical past tab of your stream.

Clear up

Clear up the assets in your AWS account by finishing the next steps:

  1. On the Amazon AppFlow console, select Flows within the navigation pane.
  2. From the listing of flows, choose the stream AccountData-SalesforceToSnowflakeand delete it.
  3. Enter delete to delete the stream.
  4. Select Connections within the navigation pane.
  5. Select Salesforce from the listing of connectors, choose Salesforce-blog, and delete it.
  6. Enter delete to delete the connector.
  7. On the Connections web page, select Snowflake from the listing of connectors, choose Snowflake-blog, and delete it.
  8. Enter delete to delete the connector.
  9. On the IAM console, select Roles within the navigation web page, then choose the function you created for Snowflake and delete it.
  10. Select Insurance policies within the navigation pane, choose the coverage you created for Snowflake, and delete it.
  11. On the Amazon S3 console, seek for the info bucket you created, select Empty to delete the objects, then delete the bucket.
  12. Seek for the error bucket you created, select Empty to delete the objects, then delete the bucket.
  13. Clear up assets in your Snowflake account:
  • Delete the duty TASK_ACCOUNT_LOAD:
ALTER TASK TASK_ACCOUNT_LOAD SUSPEND;
DROP TASK TASK_ACCOUNT_LOAD;

  • Delete the saved process sp_account_load:
DROP process sp_account_load();

  • Delete the tables ACCOUNT_STAGING and ACCOUNT:
DROP TABLE ACCOUNT_STAGING;
DROP TABLE ACCOUNT;

Conclusion

On this put up, we walked you thru learn how to combine and synchronize your knowledge from Salesforce to Snowflake utilizing Amazon AppFlow. This demonstrates how one can arrange your ETL jobs with out having to study new programming languages by utilizing Amazon AppFlow and your acquainted SQL language. It is a proof of idea, however you possibly can attempt to deal with edge circumstances like failure of Snowflake duties or perceive how incremental switch works by making a number of adjustments to a Salesforce file throughout the scheduled time interval.

For extra info on Amazon AppFlow, go to Amazon AppFlow.


In regards to the authors

Ramesh Ranganathan is a Senior Companion Answer Architect at AWS. He works with AWS prospects and companions to supply steering on enterprise cloud adoption, software modernization and cloud native improvement. He’s obsessed with know-how and enjoys experimenting with AWS Serverless companies.

Kamen Sharlandjiev is an Analytics Specialist Options Architect and Amazon AppFlow knowledgeable. He’s on a mission to make life simpler for purchasers who’re dealing with advanced knowledge integration challenges. His secret weapon? Absolutely managed, low-code AWS companies that may get the job accomplished with minimal effort and no coding.

Amit Shah is a cloud based mostly trendy knowledge structure knowledgeable and at present main AWS Information Analytics observe in Atos. Based mostly in Pune in India, he has 20+ years of expertise in knowledge technique, structure, design and improvement. He’s on a mission to assist group turn out to be data-driven.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments