chapter four

4 Integrate your data with connectors and pipelines

This chapter covers

Creating datasources with connectors to ingest data into Fusion
Configuring parsers to extract data from various formats
Creating and configuring index pipelines to transform end enrich data
Taking facets further with range facets
Testing and troubleshooting configurations with the index workbench
Creating schedules and running index jobs to keep data fresh

So now that we’ve started to get an understanding of some of the things Fusion can do with search, and with the promise of analytics and business intelligence coming in later chapters, it is a good time to start thinking about the variety of data you will want to work with.

This chapter focuses primarily on Fusion’s connectors, parsers, scheduler, and index pipelines. In this chapter you will learn how to create datasources in Fusion that use connectors to pull in data from source systems. You will also learn to keep data fresh by running index jobs according to schedules you define. As we go through the various types of source systems Fusion can connect to, I hope you will begin thinking of all the data you have spread throughout your enterprise and how you can bring it together to bring value to your business.

4.1 Data integration overview

4.2 Ingest diverse data formats with parsers

4.2.1 Apache Tika parser stage

4.3 Common connectors

4.3.1 Other available connectors

4.4 Site search with the Web connector

4.4.1 Create a site search app

4.4.2 Keep data fresh with job triggers and schedules

4.5 Building a data app with the JDBC connector

4.5.1 The JDBC connector

4.5.2 Install a JDBC database

4.5.3 Create a MovieLens Fusion app

4.5.4 Testing configurations with the index workbench

4.5.5 Index Job

4.6 ETL with index pipelines

4.6.1 Integrating external data at index time with the REST query index stage

4.6.2 Schema and datatypes

4.6.3 Range facets

4.6.4 Completing your data app