chapter four

Chapter 4. Extracting intelligence from content

This chapter covers

Architecture for integrating various types of content
A more detailed look at blogs, wikis, and message boards
A working example of extracting intelligence from unstructured text
Extracting intelligence from different types of content

Content as used in this chapter is any item that has text associated with it. This text can be in the form of a title and a body as in the case of articles, keywords associated with a classification term, questions and answers on message boards, or a simple title associated with a photo or video. Content can be developed either professionally by the site provider or by users (commonly known as user-generated content), or be harvested from external sites via web crawling.^[1]

¹ Web crawling is covered in chapter 6.

Content is the fundamental building block for developing applications. This chapter provides background on integrating and analyzing content in your application. It’ll be helpful to go through the example developed in section 4.3, which illustrates how intelligence can be extracted from analyzing content.

Chapter 4. Extracting intelligence from content

This chapter covers

4.1. Content types and integration

4.2. The main CI-related content types

4.3. Extracting intelligence step by step

4.4. Simple and composite content types

4.5. Summary

4.6. Resources