Preface

 

I’ve been fascinated by data for a long time. When I was an undergrad in electrical engineering, I discovered digital signal processing and gravitated toward it. I found out that music, video, photos, and lots of other stuff could be viewed as data. Computation was creating and enhancing those emotional experiences. I thought that was the coolest thing ever.

Over time, I continued to get excited by new aspects of data. The last few years had exposed me to social and big data. Big data was especially intellectually challenging for me. Previously I had learned to look at data from a statistician’s point of view, and new types of data had “only” asked for new mathematical methods. It wasn’t simple, but at least I had been trained for that, and there was also a wealth of resources to tap into. Big data, on the other hand, was about system-level innovations and new ways of programming. I wasn’t trained for it, and more importantly, I wasn’t alone. Knowledge about handling big data in practice was somewhat of a black art. This was true of many tools and techniques for scaling data processing, including caching (for example, memcached), replication, sharding, and, of course, MapReduce/Hadoop. I had spent the last few years getting up to speed on many of these skills.