FREE Data Engineering on Microsoft Azure (DP-203) Questions and Answers
You are creating a container's folder structure for Azure Data Lake Storage Gen2.
Several services, such as Azure Databricks and Azure Synapse Analytics serverless SQL pools, will be used by users to query data. Subject area will secure the data. The majority of inquiries will contain information from the current month or year.
Which folder organization would you suggest supporting quick queries and streamlined folder security?
There is a crucial justification for placing the date at the end of the directory structure. With the POSIX permissions, you may simply restrict access to specific areas or topics to users or groups. With the data structure in front, many directories under each hour directory would need different permission if it were necessary to limit a certain security group to accessing only the UK data or specific planes. Furthermore, putting the date structure first would cause the number of directories to grow exponentially over time.
Remarkably large amounts of data from multiple goods, devices, businesses, and clients may enter the data store under IoT workloads. For the sake of organization, security, and effective data processing for consumers farther down the supply chain, the directory layout must be carefully planned in advance. The following layout could serve as a general example:
{Region}/{SubjectMatter(s)}/{yyyy}/{mm}/{dd}/{hh}/
You must create a specific SQL pool for Azure Synapse Analytics that complies with the following criteria:
✑ can provide a record of an employee from a specific time.
✑ keeps up with employee details.
✑ reduces the complexity of the query.
How should the personnel data be modeled
The versioning of dimension members is supported by a Type 2 SCD. As versions are frequently not stored by the source system, the data warehouse load process monitors and controls changes to dimension tables. To provide a specific reference to a particular version of the dimension member in this situation, the dimension table must employ a surrogate key. Additionally, it has columns that specify the version's valid date range, such as StartDate and EndDate, as well as perhaps a flag column, such as IsCurrent, that makes it simple to filter by members of the current dimension.
In a special SQL pool for Azure Synapse Analytics, you create a data warehouse.
To modify data for use in inventory reports, analysts create an intricate SELECT query that includes numerous JOIN and CASE expressions. Depending on the report, the inventory reports will use the data and extra WHERE parameters. Once every day, the reports will be generated.
You must put a plan in place to make the dataset accessible for reporting. The answer must reduce query times.
What actions should you take?
For complicated analytical queries, materialized views for dedicated SQL pools in Azure Synapse offer a low-maintenance way to gain rapid performance without query change.
The dedicated SQL pool automatically caches query results in the user database for repetitive use when a result set caching is enabled. This eliminates the requirement for recomputation by enabling subsequent query executions to obtain results straight from the stored cache. The use of computing resources is decreased and query performance is increased via result set caching. Additionally, since no concurrency slots are used by queries that use cached results sets, they do not count against current concurrency restrictions.
You want to set up a Gen 2 storage account for Azure Data Lake.
You must make sure that the data lake will still be accessible if a data center in the main Azure region fails. The answer must reduce expenses.
Which replication type ought to be applied to the storage account?
Data is synchronously copied across three Azure availability zones in the main region using zone-redundant storage (ZRS).
You have an Apache Spark pool called Pool1 in your WS1 Azure Synapse Analytics workspace.
You want to set up a database in Pool1 called DB1.
You must make sure that when tables are created in DB1, they become instantly accessible to the internal serverless SQL pool as external tables.
Which format ought to be applied to the DB1 tables?
Apache Spark metadata may be automatically synchronized using a serverless SQL pool. Each database now present in serverless Apache Spark pools will be converted into a serverless SQL pool database. An external table is produced in a serverless SQL pool database for each Spark external table that is built on Parquet or CSV and is housed in Azure Storage.
100 TB of data are present in an Azure Data Lake Storage Gen2 container that you have.
If there is a problem with the primary region, you must make sure that the data in the container is accessible for read workloads in a backup region. The answer must reduce expenses.
What kind of data redundancy ought to you employ?
To guard against regional failures, geo-redundant storage (with GRS or GZRS) duplicates your data to a different physical location in the secondary region.
However, that information may only be viewed if a failover from the primary to the secondary region is started by the customer or Microsoft. Your data is always accessible to read when you enable read access to the secondary area, even in the event that the primary region is inaccessible.
Your Azure Data Lake Storage Gen2 account is corporate-wide. The only way to reach the data lake in Azure is through the virtual network VNET1.
In Azure Synapse, you are creating a SQL pool that will take advantage of the data lake.
Your business has a sales force. The Sales group in Azure Active Directory contains each and every member of the sales team. The Sales group is given access to the files in the data lake through POSIX controls.
Every hour, you intend to load data into the SQL pool.
The SQL pool has to be able to load the sales data from the data lake.
What steps should you take?
The dedicated SQL pools in the workspace receive permissions from the managed identity.
A feature of Azure Active Directory is managed identity for Azure resources. The function offers an automatically managed identity in Azure services.