8 U-SQL for complex analytics

 

This chapter covers

  • Creating reusable data access objects with U-SQL views, table-valued functions, and tables
  • Staging data for reuse with U-SQL tables
  • Using window functions for aggregation queries
  • Adding custom inline C# functions to U-SQL scripts

In the last chapter, you learned how to create an Azure Data Lake Analytics (ADLA) account and how to build and run simple jobs. In this chapter, you’ll build on that knowledge by writing more complex queries. Because U-SQL scripts compile into C# programs, you can use many C# features within U-SQL expressions.

  • You’ll compare methods of structuring U-SQL for reuse, including creating indexed data stores in the U-SQL Catalog.
  • You’ll use C# language features to replace and extend features of SQL.
  • You’ll see how to reap benefits from previous jobs by reusing outputs and repeatedly reusing U-SQL scripts.

Let’s get started by prepping some data for repeated use.

Tip

You can find the code listings in the GitHub repository for this book at https://github.com/rnuckolls/azure_storage.

8.1 Data Lake Analytics Catalog

ADLA uses the attached Data Lake store for more than reading and writing files. It offers a structured interface for querying reusable rowsets via a database catalog. The catalog provides a few features for creating and running ADLA jobs, including the following:

8.1.1 Simplifying U-SQL queries

8.1.2 Simplifying data access

8.1.3 Loading data for reuse

8.2 Window functions

8.3 Local C# functions

8.4 Exercises

8.4.1 Exercise 1