Free Data Engineering on Microsoft Azure (DP-203) Questions and Answers

Question 1

You are creating a container's folder structure for Azure Data Lake Storage Gen2.
Several services, such as Azure Databricks and Azure Synapse Analytics serverless SQL pools, will be used by users to query data. Subject area will secure the data. The majority of inquiries will contain information from the current month or year.
Which folder organization would you suggest supporting quick queries and streamlined folder security?

Accepted Answer

/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv

Answer

The suggested folder organization, /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}..., is ideal for both quick queries and streamlined security. Placing `SubjectArea` and `DataSource` at the top allows for easy application of security controls (like POSIX ACLs or RBAC) at a high level, securing entire data domains. Partitioning data by `YYYY/MM/DD` (year, month, day) enables query engines to efficiently prune data, significantly speeding up queries that filter by time, especially for current month/year data, as specified in the requirements.

Question 2

You must create a specific SQL pool for Azure Synapse Analytics that complies with the following criteria:
✑ can provide a record of an employee from a specific time.
✑ keeps up with employee details.
✑ reduces the complexity of the query.
How should the personnel data be modeled

Accepted Answer

as a Type 2 slowly changing dimension (SCD) table

Answer

A Type 2 Slowly Changing Dimension (SCD) table is the appropriate modeling technique. This method tracks historical changes to dimension attributes, such as employee details, by creating a new record for each change while preserving the previous versions. This allows you to retrieve an employee's record as it appeared at any specific point in time, fulfilling the requirement to 'provide a record of an employee from a specific time' and 'keeps up with employee details' while simplifying historical analysis queries.

Question 3

Your Azure Data Lake Storage Gen2 account is corporate-wide. The only way to reach the data lake in Azure is through the virtual network VNET1.
In Azure Synapse, you are creating a SQL pool that will take advantage of the data lake.
Your business has a sales force. The Sales group in Azure Active Directory contains each and every member of the sales team. The Sales group is given access to the files in the data lake through POSIX controls.
Every hour, you intend to load data into the SQL pool.
The SQL pool has to be able to load the sales data from the data lake.
What steps should you take?

Accepted Answer

All of the above

Answer

To securely load data from Azure Data Lake Storage Gen2 into an Azure Synapse SQL pool within a VNET, all the listed steps are necessary. First, create a managed identity (A) for the SQL pool, providing it with an Azure Active Directory identity. Next, add this managed identity to the 'Sales' group (B), which has POSIX access to the data lake files, granting the SQL pool the necessary permissions. Finally, use the managed identity as the credentials for the data load process (C) to ensure secure and authorized access to the data lake. Therefore, 'All of the above' is the correct solution.

Question 4

100 TB of data are present in an Azure Data Lake Storage Gen2 container that you have.
If there is a problem with the primary region, you must make sure that the data in the container is accessible for read workloads in a backup region. The answer must reduce expenses.
What kind of data redundancy ought to you employ?

Accepted Answer

read-access geo-redundant storage (RA-GRS)

Answer

To ensure data accessibility for read workloads in a backup region during a primary region problem while reducing expenses, read-access geo-redundant storage (RA-GRS) is the ideal choice. RA-GRS replicates your data to a secondary region and provides read access to that secondary copy. This fulfills the requirement for cross-region data availability for reads at a lower cost than geo-zone-redundant storage (GZRS), which offers write access to the secondary region.

Question 5

You want to set up a Gen 2 storage account for Azure Data Lake.
You must make sure that the data lake will still be accessible if a data center in the main Azure region fails. The answer must reduce expenses.
Which replication type ought to be applied to the storage account?

Accepted Answer

zone-redundant storage (ZRS)

Answer

To ensure the data lake remains accessible if a data center in the main Azure region fails, while reducing expenses, zone-redundant storage (ZRS) should be applied. ZRS synchronously replicates your data across three Azure availability zones within a single region. This protects against data center outages within that region, providing high availability and durability at a lower cost than geo-redundant options like GRS or GZRS, which replicate data across different regions.

Microsoft Azure Security Engineer Certification Practice Test

Microsoft Azure Security Engineer Certification Practice Test

Free Data Engineering on Microsoft Azure (DP-203) Questions and Answers