5 Advanced data management

 

This chapter covers

  • Using mathematical and statistical functions
  • Utilizing character functions
  • Looping and conditional execution
  • Writing your own functions
  • Aggregating and reshaping data

In chapter 3, we reviewed the basic techniques used for managing datasets in R. In this chapter, we’ll focus on advanced topics. The chapter is divided into three basic parts. In the first part, we’ll take a whirlwind tour of R’s many functions for mathematical, statistical, and character manipulation. To give this section relevance, we begin with a data management problem that can be solved using these functions. After covering the functions themselves, we’ll look at one possible solution to the data management problem.

Next, we’ll cover how to write your own functions to accomplish data management and analysis tasks. First, we’ll explore ways of controlling program flow, including looping and conditional statement execution. Then we’ll investigate the structure of user-written functions and how to invoke them once they have been created.

Then, we’ll look at ways of aggregating and summarizing data, along with methods of reshaping and restructuring datasets. When aggregating data, you can specify the use of any appropriate built-in or user-written function to accomplish the summarization, so the topics you learn in the first two parts of the chapter will really benefit you.

5.1 A data management challenge

5.2 Numerical and character functions

5.2.1 Mathematical functions

5.2.2 Statistical functions

5.2.3 Probability functions

5.2.4 Character functions

5.2.5 Other useful functions

5.2.6 Applying functions to matrices and data frames

5.2.7 A solution for the data management challenge

5.3 Control flow

5.3.1 Repetition and looping

5.3.2 Conditional execution

5.4 User-written functions