11 Creating an authorship identification program

 

This chapter covers

  • Writing an authorship identification program using top-down design
  • Learning about refactoring code and why you would do it

In chapter 7, we learned about problem decomposition and top-down design when we wrote our Spelling Suggestions program. Here, we’re going to take top-down design to the next level and solve a much larger problem. We’re still doing the same thing as in chapter 7: dividing a problem into subproblems, and further dividing those subproblems into sub-subproblems as needed. And, just like before, we’re looking to design functions with a small number of parameters that return a meaningful and useful result to their caller. It’s also a good sign if we’re able to design functions that are called by multiple other functions—that helps reduce code repetition!

We’re including this chapter because we wanted to provide a more authentic example than the Spelling Suggestions problem we solved in chapter 7. We hope our example here is motivating and feels like a real problem that you could imagine yourself wanting to solve.

In this chapter, we’re going to write a program that tries to identify the unknown author of a mystery book. It’ll be an example of a program that uses artificial intelligence (AI) to make a prediction. We couldn’t resist the opportunity to include an AI example in a book about programming with AI!

11.1 Authorship identification

11.2 Authorship identification using top-down design

11.3 Breaking down the process subproblem

11.3.1 Figuring out the signature for the mystery book

11.4 Summary of our top-down design

11.5 Implementing our functions

11.5.1 clean_word

11.5.2 average_word_length

11.5.3 different_to_total

11.5.4 exactly_once_to_total

11.5.5 split_string

11.5.6 get_sentences

11.5.7 average_sentence_length

11.5.8 get_phrases

11.5.9 average_sentence_complexity

11.5.10 make_signature

11.5.11 get_all_signatures

11.5.12 get_score

11.5.13 lowest_score

11.5.14 process_data

11.5.15 make_guess