Chapter 4. Extracting intelligence from content

 

This chapter covers

  • Architecture for integrating various types of content
  • A more detailed look at blogs, wikis, and message boards
  • A working example of extracting intelligence from unstructured text
  • Extracting intelligence from different types of content

Content as used in this chapter is any item that has text associated with it. This text can be in the form of a title and a body as in the case of articles, keywords associated with a classification term, questions and answers on message boards, or a simple title associated with a photo or video. Content can be developed either professionally by the site provider or by users (commonly known as user-generated content), or be harvested from external sites via web crawling.[1]

1 Web crawling is covered in chapter 6.

Content is the fundamental building block for developing applications. This chapter provides background on integrating and analyzing content in your application. It’ll be helpful to go through the example developed in section 4.3, which illustrates how intelligence can be extracted from analyzing content.

4.1. Content types and integration

4.2. The main CI-related content types

4.3. Extracting intelligence step by step

4.4. Simple and composite content types

4.5. Summary

4.6. Resources