9 Integrating with Azure Data Lake Analytics

 

This chapter covers

  • Using Azure Cognitive Services to enhance data
  • Building user-defined functions using Visual Studio and C#
  • Connecting to remote data sources

In the last chapter, you learned how to use Azure Data Lake Analytics (ADLA) to build reusable objects. You also used C# to enhance, and sometimes replace, the functions of SQL. In this chapter, you’ll build on that by adding features to improve your U-SQL scripts. You’ll use the Data Lake store to serve assembly files for use in ADLA jobs. You’ll run Azure PowerShell and U-SQL scripts to modify the ADLA and Data Lake environments. You’ll add new types of data extraction classes to ADLA, and add C# functions for modifying data. You’ll also connect to external providers to add even more data with minimal effort. This extensibility is facilitated by the compiled nature of ADLA jobs.

The ADLA cluster translates each U-SQL script submitted into a .NET compiled application as a new ADLA job. This creates a new set of code to be executed on the cluster nodes assigned to the job. Because the script is compiled, each job includes a step that allows external code libraries to be included. The compiler includes SQL and .NET assemblies in every job, which lets jobs use many C# and SQL functions. Adding custom assemblies to a job works this way:

9.1 Processing unstructured data

9.1.1 Azure Cognitive Services

9.1.2 Managing assemblies in the Data Lake

9.1.3 Image data extraction with Advanced Analytics

9.2 Reading different file types

9.2.1 Adding custom libraries with a Catalog

9.2.2 Creating a catalog database

9.2.3 Building the U-SQL DataFormats solution

9.2.4 Code folders

9.2.5 Using custom assemblies

9.3 Connecting to remote sources

9.3.1 External databases

9.3.2 Credentials

9.3.3 Data Source