Data Cloud Topology

In a related article, I talked about the initial Data Cloud Setup and Provisioning Models that you can configure for Data Cloud.

Once you’ve decided this, the next step is to understand the high level data ingestion process of Data Cloud.

Data Ingestion Process Overview

In a nutshell:

Data Cloud can be connected to a wide variety of data sources.
These sources are called Data Streams and once ingested, they create Data Source Objects.
The Data Source Objects go through transformation, validation and normalization settings in the Data Cloud Data Lake.
Once normalised, the Data Source Objects automatically create a Data Lake Object and these type of objects can be mapped to the SF Data Model.
Data Spaces act as partitions, so you can decide how to organise the ingested data, divide it by region/brand/purpose, etc.
Once normalised data is into a Data Space, it can be mapped to the Salesforce Data Model to create a Data Model Object. This is the final data version that is leveraged for insights, data models, segments, etc.

DC OBJECT TERMINOLOGY:
Salesforce often uses the following terms to talk about the different types of objects in Data Cloud:

Data Stream –> Data Stream
(level 0: the initial ingestion step of raw data from a specific source)
DSO –> Data Source Object
(level 1: the raw object created once a Data Stream is configured to ingest one Data Source).
DLO –> Data Lake Object
(level 2: the object created once the DSO is validated and transformed to be able to be mapped to the SF Data Model).
DMO –> Data Model Object
(level 3: the object created once the DLO is mapped to the Salesforce 360 Data Model)

Salesforce Data Cloud Objects

There are 3 levels of objects that are created and leveraged in Salesforce Data Cloud Ingestion. Here is a breakdown of each type:

Name	Term	Stage	Format	Purpose	Location
Data Source Object	DSO	Stage 1: raw, when data source is first ingested and a Data Stream is configured	- Multi Format (e.g: CSV, Json, etc). - Original schema is preserved - Multi sourced: Mulesoft, Cloud Storage, etc.	Original data source and files/transient data of the organization. Needed so a Data Lake copy can be created for later mapping.	Not physically stored: A staging or temporary view of the original, raw data, before it's created physically in the Data Lake as a DLO.
Data Lake Object	DLO	Stage 2: automatically created as a locally stored copy of the DSO, but including transformations, validation, normalisation, etc.	- Schema is enforced. - Categorised into DC types of objects: Profile, Engagement, Other - Parquet Formatted Iceberg tables	Normalised, validated and transformed version of the DSO so it can be mapped to the SF Data Model. Objects cannot be used unless they're mapped to the Data Model.	Physically stored in the Data Lake assigned Data Space: This is the core "copy" of the data ingested. Data only exists at the DLO level in Data Cloud.
Data Model Object	DMO	Stage 3: DLOs are mapped to a SF 360 Data Model object and this creates a harmonized grouping of data that is mapped to the canonical DM of Salesforce.	- Harmonized and grouped view of the DLO, mapped to a data model object. - Type Category is inherited from the DLO mapped. - Unified Individuals and Insights are DMOs.	Mapped version of the DLO so it can be leveraged in Data Cloud features: insights, segments, identity resolution, etc.	Not physically stored: A DMO is just a "view" of the DLO in the Data Lake, not a copy of the DLO itself. It is not materialised.

SALESFORCE DATA MODEL
Use the following link to check the Salesforce 360 Data Model, which is the canonical data model that all data in Data Cloud must be mapped to eventually, in a DMO:
https://help.salesforce.com/s/articleView?id=sf.c360_a_c360datamodel.htm&type=5

Data Sources

There are many data sources you can ingest into Data Cloud. They can be, for instance:

A Salesforce Marketing Cloud Account
Your Commerce Cloud Instance
Salesforce Service Cloud or Sales Cloud Orgs
Events from a mobile app
META Ads
An SFPT Connector
Azure Storage
API batching
A specific Data Set within a Salesforce Marketing Cloud Personalization Account

DOCUMENTATION:
Please use the following link for the official documentation reference on available Data Sources:
https://help.salesforce.com/s/articleView?id=sf.c360_a_connectors.htm&type=5

OOTB Connectors

To make the data ingestion process as easy as possible, Data Cloud comes with out-of-the-box connectors for many Data Sources.

The connectors can be categorised into 3 main types:

Salesforce Connectors
Usual native Salesforce apps such as Salesforce Marketing Cloud, Service Cloud, etc.
Connector Service
These connectors leverage SDKs and Ingestion API to expand the sources you can connect (e.g: Mulesoft Anypoint Connector).
Third-Party Connectors
These integrate Third-Party sources for data to flow into or out of Data Cloud.

DOCUMENTATION:
Check the following link for the official documentation reference on Data Cloud Connectors:
https://help.salesforce.com/s/articleView?id=sf.c360_a_connectors.htm&type=5

Data Bundles

Data Bundles are data sets that import pre-defined objects to be mapped into the SF Data Model easily.

For example, there are Data Bundles for Salesforce Marketing Cloud that include engagement data objects (e.g: sends, clicks, opens, etc).

This way, you don’t have to manually decide which objects and datasets to ingest into Data Cloud and how to best normalise them to map them later on.

Salesforce CRM Data Source Options

One of the core Data Sources you will want to ingest into Data Cloud is a Salesforce Org (e.g: Sales Cloud, Service Cloud, Commerce Cloud, Loyalty, etc).

For these, there are different ingestion options and best practices to keep in mind:

1:1 CRM to DC Org Connection
You may connect 1 Data Cloud Org to –> 1 SF CRM OrgUse Case: the provisioning model is to host Data Cloud on a stand-alone Org and then this is connected to the CRM Org of the organization, with a single line of business and data.
N:1 CRM to DC Org Connection
You can connect N SF CRM Orgs to –> 1 Data Cloud OrgUse Case: aggregated data from multiple business lines from different CRM orgs or brands is consolidated into one Data Cloud Org.
1:N CRM to DC Org Connection
You may also connect 1 single SF CRM Org to –> multiple Data Cloud Orgs.Use Case: the data from one single Salesforce CRM Org needs to be connected but organised by specific governance criteria (brand, region).

Salesforce Marketing Cloud Data Source Options

The second most common core data source you’ll want to leverage when using Data Cloud is, of course, Salesforce Marketing Cloud.

For marketing cloud, there are different ingestion options and an important drawback to mention:

1:1 SFMC to DC Org Connection
You can connect 1 SFMC Account to –> 1 Data Cloud Org.Use Case: Data Cloud data is connected to the Enterprise parent level of a SFMC instance so data can be activated and used in all the children Business Units.

1:N SFMC to DC Org Connection
You can also connect 1 SFMC Account to –> Multiple SF CRM OrgsUse Case: Data from different Data Cloud Orgs can be activated in a single SFMC instance and/or in different Business Units, at the regional level, for example.

N:1 SFMC to DC Org Connection (Currently not available)
This is the current drawback of this connection type.At the moment, Data Cloud does not allow N:1 connections where a single Data Cloud Org is connected to –> more than 1 SFMC instances.

IMPORTANT:
Currently, only 1 SFMC Instance can be connected to the same Data Cloud Org. For Limits and Guidelines, check the official documentation link:
https://help.salesforce.com/s/articleView?id=sf.c360_a_limits_and_guidelines.htm&type=5

MC Personalisation Data Source Options

Finally, another of the most typical data sources you’ll want to connect is MC Personalisation. For this connector, here you have the diffeent options available:

1:1 IS Account to Data Cloud Org
You can connect 1 IS Account to –> 1 Data Cloud Org, and you can also decide if you want to connect only 1 Data Set or Multiple Data Sets from that IS Account.Use Case: only Data Set 1 from the IS Account is to be ingested and activated in the Data Cloud Org, for governance, brand or regional reasons.
N:1 IS Accounts to Data Cloud Org
You may also connect multiple IS Accounts to –> 1 Data Cloud Org, and it’s possible to connect only specific Data Sets from each IS Account.Use Case: Data Set 02 from IS Account 1 and Data Set 01 from IS Account 2 are connected to the same Data Cloud Org 1.
1:N IS Accounts to Multiple Data Cloud Orgs
Out of the same IS Account, you can:
connect one Data Set to –> Multiple Data Cloud OrgsBut you cannot:
connect the same Data Set to –> Multiple Data Cloud Orgs

IMPORTANT:
Currently, 1 IS Account Data Set can be connected to only 1 Data Cloud Org at the same time.

For Limits and Guidelines, check the official documentation link:
https://help.salesforce.com/s/articleView?id=sf.c360_a_limits_and_guidelines.htm&type=5

Data Spaces

Data Cloud Data Spaces are partitions available within Data Cloud so you can organise your ingested data however is best for your organization without having to purchase multiple Data Cloud Orgs.

Data Spaces can follow a logical pattern such as, for example:

By brand
By region
By purpose

Data Spaces require an additional add-on license and currently the limit is 50 Data Spaces and only 1 Data Space if it’s a Developer Org.

Also, keep in mind that:

Every Salesforce Data Cloud Org is created with a “default” Data Space. All your Data Lake Objects will be associated to this default Data Space until you create and configure new ones.
Identity Resolution, Data Models, Insights, Einstein Studio AI Models, Data Actions, etc are built within the context of a Data Space.
You may ingest the same Data Lake Object into multiple Data Spaces for different purposes.
Users can only manage and view data within their assigned Data Space.

Creating and Editing Data Spaces

Once you’ve purchased the additional add-on license to have more Data Spaces, follow these steps to create and edit Data Spaces:

Go to Data Cloud Setup > Data Management > Data Spaces
Click New and give a unique name to the Data Space
Configure a Data Space prefix, starting with 1 letter and up to 3 alphanumeric characters.
E.g: DS01
Note: after you set up the prefix, you cannot change it afterwards. Prefix is used to distinguish between the objects mapped in different Data Spaces.
For Editing an existing Data Space, just go to Data Cloud Setup > Data Management > Data Spaces and click on the data space you’d like to edit.

Adding Data to Data Spaces

Follow these steps to add data to a specific Data Space:

Go to Data Cloud > tab “Data Spaces”
Click on the Data Space you want to add data to
Click Add Data and select the Data Lake Objects
Before finishing, you can decide if you’d like to apply specific Filters to the DLO or just assign it without filters.Note: filters are typically in the format of Object > Column > Operator > Value and they can be used, for instance, to segregate data based on brand, region, etc.
e.g:
Case > Contact ID > EQUALS > Brand_2

Removing data from Data Spaces

You may decide to remove Data Lake Objects (DLOs) from a given Data Space. To do so, follow these steps:

Data Cloud > tab “Data Spaces”
Select your data space
Search for the DLO and click on the dropdown menu
Click on Delete

IMPORTANT:
You cannot remove a DLO from a Data Space if there are dependencies. A common dependency is that the DLO has mappings to the Data Model and there are corresponding DMOs created out of that mapping.

You need to remove all the dependencies for that given DLO before you can remove it from a specific Data Space.

DOCUMENTATION:
Check the following link for the official documentation reference on Data Cloud Data Spaces and how they work, configuration, best practices, etc:
https://help.salesforce.com/s/articleView?id=sf.c360_a_data_spaces.htm&type=5

Probably the most important aspect of all this data you want to ingest: how to protect and honour customer privacy, data rights and management of consent.

Data Subject Rights

Data Cloud allows you to honour the data subject rights of your customers, such as those defined by the famous European Union’s General Data Protection Regulation (GDPR).

There are 3 main Data Subject Rights that Data Cloud manages:

Data Deletion or Right To Be Forgotten (RTBF)
Data Access and Export (Portability)
Restriction of Processing (RofP)

To manage all these data subject rights requests, you need to always use the Data Cloud Consent API.

The Data Cloud Consent API reads and writes to the Profile of a customer in Data Cloud. It does so by accessing and updating org-wide consent data (e.g: links between records, values of consent flags, etc).

These are the actions supported by the Consent API, as documented by Salesforce:

Action	Description
ShouldForget	This is the Right To Be Forgotten: permanently deleting PII (Personally Identifiable Information) data and all related records.
Processing	This action restricts the use of Data Cloud processes (e.g: insights, query, segmentations) to process the customer data.
Portability	This is used to allow the customer to have thier Data Cloud data exported.

DOCUMENTATION:
– Check the following link for the official Developer documentation of the Consent API:
https://developer.salesforce.com/docs/atlas.en-us.250.0.api_rest.meta/api_rest/resources_consent_cdp_params.htm

Data Subject Rights Request

To process this Data Subject Rights request, you need to use the Individual ID as the parameter that identifies the record in the Consent API.

If your Unified Individual for a customer named “David” is made of 3 source versions of David:
Name: David 1
Individual ID: 001

Name: David 2
Individual ID: 002

Name: David 3
Individual ID: 003

Then, you will have to process a Consent API request with each Individual ID: 001, then 002 and finally, 003.

IMPORTANT
– This Data Subject Rights Request must be submitted separately in all your connected Salesforce apps (e.g: Commerce Cloud, Service Cloud).
– Link to documentation:
https://help.salesforce.com/s/articleView?id=sf.c360_a_data_subject_rights_request.htm&type=5

Data Deletion Request

To process a Right To Be Forgotten Data Subject Rights request, you will have to use the Consent API.

This request deletes the individual record of the customer from the Individual DMO and all the related DMOs. This means that all the relevant DMOs must be related to the Individual DMO.

Data Deletion Requests are processed again after 30, 60 and 90 days to make sure that a full deletion is executed.

IMPORTANT
– This Data Deletion Request must be submitted separately in all your connected Salesforce apps (e.g: Commerce Cloud, Service Cloud).
– Always ensure that all relevant DMOs are related to the Individual DMO. Otherwise, the non-related data will not be deleted when a customer exercises this right.
– Link to documentation:
https://help.salesforce.com/s/articleView?id=sf.c360_a_data_deletion_request.htm&type=5

Restriction of Processing Request

To process this Data Subjects Right request, you need to use the Consent API.

This request can be processed for Individual and Unified Individual profiles. The request restricts all the data processing of the Individual and Unified Individual profile within 24 hours.

IMPORTANT
– This Data Deletion Request must be submitted separately in all your connected Salesforce apps (e.g: Commerce Cloud, Service Cloud).
– According to Salesforce Documentation, it can take up to 90 days to process a request.
– Link to official documentation:
https://help.salesforce.com/s/articleView?id=sf.c360_a_restrict_processing_request.htm&type=5

Data Portability Request

To process this Data Subjects Right request, you have to do so via the Consent API.

This request can be processed for Individual profiles. The request triggers an export of all the customer data stored in Salesforce Data Cloud for that Individual.

The export is a CSV file sent to the pre-defined AWS S3 bucket. The process may take up to 15 days.

IMPORTANT
– This request is processed within 15 days.
– Link to official documentation:
https://help.salesforce.com/s/articleView?id=sf.c360_a_data_access_request.htm&type=5

Data Streams

Data Streams are the initial step for data ingestion in Data Cloud. In order to connect any data source, you need to go to Data Cloud and click on Data Streams to create your first one.

Below is an overview of some of the most important Data Streams currently available in Salesforce Data Cloud ingestion:

Salesforce CRM Data Stream

Ingest data from CRM across clouds (Service Cloud, Sales Cloud, Loyalty, Commerce Cloud, Omnichannel Inventory, etc).
Native, seamless integration thanks to SF proprietary APIs.
Pre-Made Data Bundles already mapped to the DC Data Model.

Salesforce Marketing Cloud Data Stream

Bring SFMC Data into Data Cloud.
You may ingest data from any Data Extension in SFMC, as well as all engagement data (e.g: clicks, opens), etc.
Native, seamless integration thanks to SF proprietary APIs.
Pre-Made Data Bundles already mapped to the DC Data Model.

Salesforce Commerce Cloud Data Stream

Bring Commerce Cloud Data into Data Cloud.
You may ingest Order, Related Customer and Catalog Data.
Native, seamless integration thanks to SF proprietary APIs.
Pre-Made Data Bundles already mapped to the DC Data Model.

Web SDK Data Stream

SDK/Tag to collect real-time web events and bring them into Data Cloud.
Unified SDK with Personalization enables you to both collect and action the data using the same tag.

Mobile SDK Data Stream

Mobile SDK to collect all mobile events, transactions, behaviours, etc and bring them into Data Cloud.
Unified SDK with Marketing Cloud enables you to both collect and action the data in push notifications, Journey Builder, in-app messages, etc.

Ingestion API Data Stream

Both Bulk and Streaming APIs available.
Use this data stream to send data from any application to Data Cloud.

Ingestion Personalization Data Stream

Native Integration to bring data from MC Personalization into Data Cloud.
You may ingest anonymous and known data.
Multiple options available with Data Sets: 1:1, N:1, 1:N.
Allows ingestion of all types of events in MC Personalization.

Ingestion Amazon S3 Data Stream

Allows you to ingest any S3 bucket data of any system into Data Cloud.
Automatic detection of data types, delimiters, date times, etc.
You may transform ingested data.
Custom schedule stream of data (5-10-30 minutes, hourly, weekly, daily, monthly)
Wildcard strings available for date time and file name matching.
Compression available (ZIP, GZ).
High-water mark for updates to only new records.
Seamless User Interface and quick integration.

Ingestion Google Cloud Storage Data Stream

Allows you to ingest data from any system via Google Cloud Storage.
Automatic detection of data types, delimiters, date times, etc.
You may transform ingested data.
Custom schedule stream of data (hourly, weekly, monthly)
Wildcard strings available for date time and file name matching.
High-water mark for updates to only new records.
Seamless User Interface and quick integration.

Ingestion Mulesoft Data Stream

Native Integration to bring data from Mulesoft into Data Cloud.
Bulk and Streaming APIs for ingestion.
Mulesoft expands connectivity with +250 OOTB connectors.
Easily build custom integrations.
API First Strategy and reduced time to value for use cases.

BYOL Data Federation

BYOL stands for Bring Your Own Lake.

The BYOL Data Federation is a capability of Salesforce Data Cloud that allows you to get direct access to your Partner Data.

Let’s see how this works:

General Data Storage in Data Cloud

Data Source > Data Stream > DSO > DLO

As we’ve explained above, when you want to ingest data from a Data Source, a Data Stream is configured in Data Cloud. This creates a temporary and non-materialised object called a Data Source Object, a DSO.

When this happens, a Data Lake Object is created off-core, on the Data Lake of Data Cloud. This happens automatically when you configure the DSO. The DLO is a reference to that physically stored data, real data.

That physical data is stored in the Data Cloud Data Lake, which is not in the Salesforce core, it’s a partner data lake.

EXAMPLE
You want to ingest data from Marketing Cloud. Specifically, you want to ingest engagement data.

You set up the connector, ingest the stream of engagement data and Data Cloud creates the corresponding DSO (non-material, temporary) and then the Data Lake Object (materialised).

The only physically stored data is the Engagement Data DLO, not the Data Stream, further data model mappings, etc. That DLO points at the data stored in the Data Lake (where the real physical copy is).

BYOL Partner Data Storage in Data Cloud

Ok, so that’s the usual behaviour for most data ingestion.

However, you may have important, business data stored in a partner Data Lake, such as Snowflake, where there is already a “real, physical copy” of your company data. No need to create a data ingestion, DSO, etc and store it in Data Cloud (e.g: due to governance, security, etc).

What this BYOL means is that you can directly query the data stored in Snowflake, without having to create a copy in Data Cloud.

The access is near real-time and zero-copy.

Zero-copy means the data is never copied into Data Cloud, you just query it from Snowflake, and then you can map it, harmonize it, use it in insights, etc as if it was native to Data Cloud.

Here are the 4 BYOL Data Federation partners currently available:

Amazon Redshift
Databricks
Google BigQuery
Snowflake

Unstructured Data in Data Cloud

Unstructured data is data that does not have a consistent and specific format. It is therefore data that cannot be stored in a relational database.

Examples of unstructured data:

an audio file
chat transcripts of sales agents’ conversations with customers
a PDF file
Knowledge Articles

How to leverage this type of data, you might think?

Well, Salesforce Data Cloud allows you to ingest unstructured data so that you can lay out the foundation layer for your generative AI, analytics, etc.

USE CASE FOR SERVICE AGENTS
You bring a lot of unstructured data from Service Cloud customer chat transcripts and Knowledge Articles into Data Cloud.

Data Cloud allows you to search and query this data, using indexes (even if the data is not normalised or structured as a typical relational database).

In your Salesforce Apps, such as Einstein Copilot, you leverage all this data so that your Service Agents can find related data, references, answers to questions, past history data, etc to solve cases more effectively.

The recommended steps for a Use Case using unstructured data are:

External, unstructured data is connected to Data Cloud. There are 2 ways:
– external blob storage –> this creates an UDLO (Unstructured Data Lake Object)
– data stream from a DC connector –> this creates a structured Data Lake Object
Configure a search index for the UDMOs or DMOs.
This search index allows you to search and query that data.
Then, you perform vector search queries from different apps such as Tableau, Prompt Builder, Einstein Copilot.

DOCUMENTATION
Please use the following link for the official documentation link on Unstructured Data in Data Cloud:
https://help.salesforce.com/s/articleView?id=sf.c360_a_unstructured_data_about.htm&type=5

Data Ingestion Timings

Apart from what data to ingest, it’s important to decide how much data to ingest and how often this will happen.

Below is a table with the current Data Ingestion Timings, as shared by Salesforce documentation here:
https://help.salesforce.com/s/articleView?id=sf.c360_a_data_stream_schedule.htm&type=5

B2C Commerce

Data Stream	Data Type	Refresh Mode	Refresh Schedule	Lookback
BundleProduct	Other	Full Refresh	Daily	30 days
GoodsProduct	Other	Full Refresh	Daily	30 days
MasterProduct	Other	Full Refresh	Daily	30 days
ProductCatalog	Other	Full Refresh	Daily	30 days
ProductCategory	Other	Full Refresh	Daily	30 days
ProductCategoryProduct	Other	Full Refresh	Daily	30 days
ProductOption	Other	Full Refresh	Daily	30 days
ProductOptionValue	Other	Full Refresh	Daily	30 days
ProductProductOption	Other	Full Refresh	Daily	30 days
SalesOrder	Engagement	Upsert	Hourly	30 days
SalesOrderCustomer	Profile	Full Refresh	Hourly	30 days

Cloud File Storage – AWS S3, GCS, Azure

Data Stream	Data Type	Refresh Mode	Refresh Schedule	Lookback
AWS S3	N/A	Upsert or Full Refresh	5 min +	None
GCS	N/A	Upsert or Full Refresh	5 min +	None
Azure	N/A	Upsert or Full Refresh	5 min +	None

Salesforce Marketing Cloud

Data Stream	Data Type	Refresh Mode	Refresh Schedule	Lookback
Standard	Standard	Upsert or Full Refresh	Daily	90 days
Standard Data Engagements	Standard Data Engagements	Upsert or Full Refresh	Hourly	90 days
Data Extension Full Extract	Data Extension Full Extract	Upsert or Full Refresh	Daily	90 days
Data Extension Delta Extract	Data Extension Delta Extract	Upsert or Full Refresh	Hourly	90 days

Salesforce CRM

Data Stream	Refresh Mode	Refresh Schedule	Lookback	Details
SF CRM	Full Refresh	Every other week	No Limit	On top of the bi-weekly refresh, also when: - A Data Stream is created - Adding or removing a column - >= 600k deletion records detected
SF CRM	Upsert (incremental)	Every 10 mins	No Limit	- Starts after a full refresh - If a field is deleted, it's processed in the next incremental refresh - Refresh times may vary

Both Stream processing (continuous, ongoing) and Batch processing (all at once) are available, but stream processing starts after each full refresh and it’s only available for objects that support change events (e.g: Account, Lead).
For a full list of all the objects that support change events to be streamed, check this link:
https://developer.salesforce.com/docs/atlas.en-us.250.0.object_reference.meta/object_reference/sforce_api_associated_objects_change_event.htm
Formulas get refreshed during an incremental or full refresh, and they are recalculated if the record has changed since the last refresh.

Salesforce MC Personalization

Data Stream	Data Type	Refresh Mode	Refresh Schedule
Users	Profile	Upsert	15 mins
Catalog Events	Events / Engagement	Insert	2 mins
Cart & CartLineItem events	Events / Engagement	Insert	2 mins
Order & OrderLineItem Events	Events / Engagement	Insert	2 mins
Custom engagement events	Events / Engagement	Insert	2 mins

Ingestion API

Data Stream	Data Type	Refresh Mode	Refresh Schedule	Lookback
Bulk Ingest API	Batch	Upsert	Daily, Weekly, Monthly	N/A
Streaming Ingest API	Streaming	Upsert	15 mins	N/A

Mobile and Web SDK

Data Stream	Data Type	Refresh Mode	Refresh Schedule	Lookback
Engagement SDK	Mobile		User Profiles - Hourly Engagement - 15 mins
Salesforce Interactions SDK	Web		User Profiles - Hourly Engagement - 15 mins

Mulesoft (ingestion API)

Data Stream	Data Type	Refresh Mode	Refresh Schedule	Lookback
Ingestion API Connector	External System		15 minutes

Data Object Type Categories

When you set up a Data Stream in Data Cloud, you are required to select an Object Type Category for that Data Stream.

There are 3 Object Type Categories:

1. PROFILE
  This data is geared towards segments, populations, etc. Any data you will want to segment by, use as the base for your segmentation, etc.
  Examples:
  Profile Attributes
  Party Identification
2. ENGAGEMENT
  This data involves time oriented data. An Event Time field is required for this category.
  Examples:
  Order
  Case
3. OTHER
  This type of data does not fit into any of the other two categories. It can be used for any of them, or for instance, if it’s Engagement, it does not have an inmutable date field.
  Examples:
  Price Book
  Catalog
  Lookups

IMPORTANT:
– Once you’ve selected the Data Type Category for a Data Stream, you won’t be able to change it again.
– The first DLO mapped to a Data Model Object will define the Data Type Category for that DMO, only accepting DLOs mapped from the same category. Keep this in mind.

Data Field Types

See below a detailed list of the currently available Data Field Types for Data Streams and data ingestion:

Data Field	Description
Text	It stores a string of any type of text. It accepts both single-byte and multibyte characters, when supported by locale. - Do not use colon : and single quote ' in field values. - For text fields like file name, directory name, etc ensure ASCII is used. UTF-8 migth lead to errors. - Strings that contain only quotes "" or contain no value are treated as "empty".
Boolean	Accepted values are: "true", "false", blank This type of field cannot be used as a Primary Key, Record Modified field, etc.
Email	This field stores an email address value. It works exactly like the Text field type in Data Cloud, so it accepts any text value that can be inserted into a Text. Data Cloud will not validate the format of this field type value.
Number	This field stores number values with a fixed scale of 18 and a precision of 38. Precision = total number of digits, irrespective of the decimal point location. Scale = total number of digits to the right of the decimal point location. Important: a NULL value is stored if a number value is non-numerical or does not fit into the field allowed range.
Percent	This field stores a percentage value. It works like the Number data type, accepting only numeric values. - a Percent data type can be used as a Primary Key, but not as an engagement date field or internal organization field - formula functions that accept the numeric value can also take percent values - as it's modeled on the Number data type, values such as 25%, 19.2% are not accepted.
Phone	It stores a phone number. It works exactly like the Text field type in Data Cloud, so it accepts any text value that can be inserted into a Text. Data Cloud will not validate the format of this field type value. - a Phone data type can be used as a Primary Key, but not as an engagement date field. - formula functions that accept Text data type values can also take Phone data type values
URL	This field type stores URL values. It works exactly like the Text field type in Data Cloud, so it accepts any text value that can be inserted into a Text. - Data Cloud will not validate, parse or interpret the URL value. - a URL data type can be used as a Primary Key, but not as an engagement date field - formula functions that accept Text data type values can also take URL data type values
Date	Stores a calendar date value excluding the time and/or time zone parts. If the data source includes a time or time zone part, the time part will be ignored when ingested. - if a value cannot be parsed as a date, it will return a NULL value. - Example of time part: 2024-11-21 10:30:00 UTC --> 2024-11-21
Datetime	Stores a calendar date and time of day. For the value to be valid, the date must include both the time and time zone parts. If these are not included, it returns 00:00:00 UTC as a default value. - if a value cannot be parsed as a datetime, it will return a NULL value. - time zones that are abbreviated such as CST are not valid. They are not supported by ISO8601 standard.

IMPORTANT:
– Once a DLO is created, you will not be able to change the Data Type of its fields.
– When you map a DLO to a DMO, the data type of both fields mapped must match.
– When creating a Data Stream, Data Cloud will auto-suggest the data type for each field
– Check the official documentation link for Data Cloud Field Data Types below:
https://help.salesforce.com/s/articleView?id=sf.c360_a_data_types.htm&type=5

Formula Fields

During the Data Stream configuration process, you might want to add extra fields.

Formula Fields are custom fields you add to the Data Stream, either hard-coded or derived from the values of other fields.

Formulas available follow this statement pattern:
CONCAT(value1, value2, …)
ISEMPTY(value)
IF(condition,resultIfTrue,resultIfFalse)

How to create a Formula Field

Click on New Data Stream
On that page, click New Formula Field
Provide the Field Label (this is the display name)
Provide the Field API name (the programmatic reference)
Provide the Field Return Type (e.g: boolean, text, phone, number, percent, etc)
Use the Syntax Editor to write your formula
Validate both the Syntax and the Return Values using the Formula Testing Panel

IMPORTANT
– Check the following documentation link for limits and considerations when using formulas in Data Cloud:
https://help.salesforce.com/s/articleView?id=sf.c360_a_considerations_formula_fields.htm&type=5
– Check the official documentation link for Data Cloud Formula Fields below:
https://help.salesforce.com/s/articleView?id=sf.c360_a_formula_expression_library.htm&type=5
– For a full list of available Formula Library Functions, check this link:
https://help.salesforce.com/s/articleView?id=sf.c360_a_supported_library_functions.htm&type=5

Source Fields

It might be the case that you configured a new Data Stream and later on, you need to add new source fields from the raw Data Source.

You can do so at any time. Follow these steps:

Go to the tab “Data Streams” and select your Data Stream
Click on Add Source Fields
For GCS and AWS S3 data streams, you can decide if you want to AddFields Manually or Add Discovered Fields.
If the fields are added Manually, enter the name of the file where the source schema is, and click Verify. Data Cloud will show you all the available source fields.
For Discovered Fields, Data Cloud will show you a Field Last Detected column so you can assess if when a field was last discovered and if it’s relevant for you.
Review the suggested Data Type for each of the fields you’re adding.
Save.

Primary Keys and Data Lineage Fields

When working with Data Cloud, it’s crucial to work with the correct Primary Keys for each Data Source.

Deciding the PKs is part of the Data Discovery phase that should be carried out initially, before you even activate Data Cloud.

A Primary Key is the minimum field that can uniquely identify a row in an object.

E.g:
Name: John
Customer ID: 001

Name: John
Customer ID: 002

Were it not for the field Customer ID, which differs for each row, it would be impossible to know which John we’re talking about.

Primary Keys can also be Composite Keys, which means you identify unique rows using more than 1 column or field which are combined into a concatenated unique field.

E.g:
OrderID: 001
OrderLineItem: PFG

OrderID: 001
OrderLineItem: TDE

Composite Key: OrderID_OrderLineItem –> 001_PFG

If we’re looking at the Orders and different products (normally called “Line Items”) that John bought, the field OrderID would not be enough for us to uniquely identify an item.

We need the extra OrderLineItem field as Primary Key, so the Composite Key is a combination of the OrderID and the OrderLineItem fields, concatenated.

How to choose a Primary Key

Here are some best practices and things to avoid when choosing a Primary Key field:

Uniqueness:
The PK must be a unique value.
Minimality:
Try to use as few fields as possible. The more fields/values, the more confusing it can get.
Stability:
If the PK changes over time, it may lead to issues. Choose a type of value that will be permanent.
Broad Scope:
If you sell books and your PK is BookID, but you might also sell videogames in the future, BookID doesn’t seem like the best field for a Primary Key.
Data Available at time of entry:
If, in an airline database, your Primary Key is a Composite Key of both FlightID and Departure_Date but the last field does not get populated when a client first registers, you’ll have issues.
Special Value Cases:
Always ask about all the possible values that the field can get. E.g: are there flights without a FlightID value? Is it possible for a book not to have a BookID in our data set?
Not-Nullable Fields:
The field or fields you choose to use as a PK must be not-nullable, so they always contain a value.
Composite Keys:
If there is no single field that uniquely identifies a row, you may concatenate (with a formula) the values from two fields to create what’s called a Composite Key.
E.g:
CustomerID_EmaiAddress

IMPORTANT
– Check the following documentation link for Primary Keys in Data Cloud:
https://help.salesforce.com/s/articleView?id=sf.c360_a_primary_key.htm&type=5

Data Lineage Fields

When working in Data Cloud, you will have a lot of levels of source, connected objects, etc. Data Cloud comes with some fields that help identify the source or lineage of the data:

Data Source
This is a reference ID, also called external source ID, that identifies the name of the source.
E.g: SFMC_6650912
This means the source is SFMC and the Org ID is the number concatenated.
Data Source Object
This is a reference ID to identify which object the data comes from.
E.g: SFMC_6650912_Lead
This means the data is from SFMC, the Org ID is 6650912 and the source object is Lead.
Internal Organization
This is a reference ID to the Business Unit or any other organizational unit ID that the source has.
In our example of SFMC, it would be the Business Unit MID.

Data Transforms

There are 2 main methods for Data Transformation in Salesforce Data Cloud:

Batch Transformation
Streaming Transformation

Batch Data Transforms

Batch Data Transforms allow you to set up an ongoing series of data operations to update and transform the data of a target Data Lake Object (DLO).

It works with a Visual Editor where you have nodes and drag and drop features to set up your transforms.
It allows you to perform a Full Refresh of the data, in bulk.
Once the first bulk run is over, it can either run Manually or On a Schedule.
It can have both a DLO or a DMO as source.
It accepts multiple source objects (e.g: joins).

To Create a Batch Data Transform:

Go to the tab “Data Transforms” from Data Cloud
Click New
Select Batch Data Transform
Use the visual editor to create your transform

DOCUMENTATION
Check the following documentation link for Batch Data Transforms in Data Cloud:
https://help.salesforce.com/s/articleView?id=sf.c360_a_batch_transform_overview.htm&type=5

Streaming Data Transforms

Unlike Batch Data Transforms, Streaming Data Transforms allow you to transform data in near real-time, as a continuous streaming process.

Streaming Data Transforms work like this:

Read a record in source Data Lake Object 1.
Reshape or transform the record data.
Write one or more records to Data Lake Object 2.

Visually from left to right,the order of steps can be explained like:
Data Stream –> Source DLO –> Data Transform –> Target DLO

Streaming Data Transforms work with one record at a time.
They can only be used with DLOs, not DMOs.
They transform the data as it gets ingested.

To create a Streaming Data Transform, follow these steps:

First, create a Target DLO: Data Lake Objects tab > New
Enter Name and API Name.
Select Category for the DLO.
Once Active, go to Data Transforms tab > New
Fill in required fields (e.g: label, target DLO, etc).
In the Expression window, insert the SQL statement of your transformation.
Click on Check Syntax to validate and save your data transform.
The Streaming Data Transform starts immediately after being saved.

IMPORTANT
– A Streaming Data Transform cannot be changed after saving it. You will have to delete and create a new one to fix this.
– If your Org has used up all the available Data Transforms, the New button will not appear.
– You will not be able to change the Category of your target DLO once you’ve selected one. Keep this in mind.
– The source DLO and the target DLO cannot be the same object.

DOCUMENTATION
Check the following documentation link for Streaming Data Transforms in Data Cloud:
https://help.salesforce.com/s/articleView?id=sf.c360_a_streaming_transform_overview.htm&type=5

Summary – Data Cloud Ingestion

Data Cloud Data Ingestion is probably one of the most crucial aspects of using Data Cloud.

Here’s a recap of everything you need to account for when setting it up:

Define Data Strategy
Consider consent, data spaces, data sources, Primary Keys, connectors needed, etc.
Select Data Sources
Configure connectors or authenticate a new data source file, etc.
Create Data Streams
Choose a pre-defined bundle or select objects manually.
Confirm Data Source Object Schema
Data types, Primary keys, API names, etc.
Apply Data Transforms
Create Formula fields if needed, hardcoded or derived.
Assign Data Space for the target DLO
Assign a Data Space and decide if filters are needed based on data strategy.
Configure Updates to Data Stream
Refresh mode, schedule, etc.

What is digital transformation and why it matters?

Salesforce Data Cloud Pricing and Credit Consumption

Salesforce Data Cloud Activation

Salesforce Data Cloud Data Ingestion