chapter eight

8 U-SQL for complex analytics

 

This chapter covers:

  • Creating reusable data access objects with U-SQL views, table-valued functions, and tables
  • Staging data for reuse with U-SQL tables
  • Using window functions for aggregation queries
  • Adding custom inline C# functions to U-SQL scripts.

In the last chapter, you learned how to create an Azure Data Lake Analytics (ADLA) account and how to build and run simple jobs. In this chapter, you’ll build on that knowledge by writing more complex queries. Because U-SQL scripts compile into C# programs, you can use many features of C# within the U-SQL expressions.

  • You’ll compare methods of structuring U-SQL for reuse, including creating indexed data stores in the U-SQL Catalog.
  • You’ll use features of the C# language to replace and extend features of SQL.
  • You’ll learn that reusing outputs of previous jobs allows you to begin planning for repeated use of U-SQL scripts and reap benefits from previous executions.

Let’s get started by prepping some data for repeated use.

8.1  Data Lake Analytics Catalog

ADLA makes use of the attached Data Lake storage for more than reading and writing files with the current U-SQL job. ADLA offers a structured interface for querying reusable rowsets via a database catalog. The catalog provides a few features that improve the experience of creating and running ADLA jobs, including the following:

8.1.1  Simplifying U-SQL queries

8.1.2  Simplifying data access

8.1.3  Loading data for reuse

8.2  Window functions

8.3  Local C# functions

8.4  Exercises

8.4.1  Exercise 1

8.4.2  Exercise 2

8.5  Summary