Friday 2 August 2019

Azure Data Factory : Pipeline Activities Datasets LinkedService

 Pipeline A data factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task.

Activities The activities in a pipeline define actions to perform on your data. 

DataSet dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents. For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.

Linked Service : Before we create a dataset, we must create a linked service to link your data store to the data factory. Linked services are much like connection strings, which define the connection information needed for Data Factory to connect to external resources. 

Here is a sample scenario. To copy data from Blob storage to a SQL database, you create two linked services: Azure Storage and Azure SQL Database. Then, create two datasets: Azure Blob dataset (which refers to the Azure Storage linked service) and Azure SQL Table dataset (which refers to the Azure SQL Database linked service). The Azure Storage and Azure SQL Database linked services contain connection strings that Data Factory uses at runtime to connect to your Azure Storage and Azure SQL Database, respectively. The Azure Blob dataset specifies the blob container and blob folder that contains the input blobs in your Blob storage. The Azure SQL Table dataset specifies the SQL table in your SQL database to which the data is to be copied.



Friday 26 July 2019

Integration Runtime


Azure Integration Runtime :

The Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide the following data integration capabilities across different network environments.

  • Data Flow: Execute a Data Flow.
  • Data movement: Copy data across data stores in public network and data stores in private network (on-premises or virtual private network). 
  • Activity dispatch: Dispatch and monitor transformation activities running on a variety of compute services such as Azure Databricks, Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more.
  • SSIS package execution: Execute SQL Server Integration Services (SSIS) packages in a managed Azure compute environment.

Types of Integration Runtime :

  • Azure
  • Self-hosted
  • Azure-SSIS

An Azure integration runtime is capable of:
  • Running Data Flows in Azure
  • Running copy activity between cloud data stores.
  • Dispatching the following transform activities in public network: Databricks Notebook/ Jar/ Python activity, HDInsight Hive activity, HDInsight Pig activity, HDInsight MapReduce activity, HDInsight Spark activity, HDInsight Streaming activity, Machine Learning Batch Execution activity, Machine Learning Update Resource activities, Stored Procedure activity, Data Lake Analytics U-SQL activity, .NET custom activity, Web activity, Lookup activity, and Get Metadata activity.