, ,

Practical Data Science with Hadoop and Spark

Designing and Building Effective Analytics at Scale

Paperback Engels 2017 9780134024141
Verwachte levertijd ongeveer 9 werkdagen

Samenvatting

This book provides a unique perspective on applying data science with Hadoop by explaining what data science with Hadoop is all about, its practical business applications, and then diving deep into the details and providing a hands-on tutorial and showcase of various use-cases from the real world. The authors bring together all the practical knowledge students will need to do real, useful data science with Hadoop.

Specificaties

ISBN13:9780134024141
Taal:Engels
Bindwijze:Paperback

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Inhoudsopgave

<p>Foreword xiii</p> <p>Preface xv</p> <p>Acknowledgments xxi</p> <p>About the Authors xxiii</p> <p>Part I: Data Science with Hadoop—An Overview 1</p> <p>Chapter 1: Introduction to Data Science 3</p> <p>What Is Data Science? 3</p> <p>Example: Search Advertising 4</p> <p>A Bit of Data Science History 5</p> <p>Becoming a Data Scientist 8</p> <p>Building a Data Science Team 12</p> <p>The Data Science Project Life Cycle 13</p> <p>Managing a Data Science Project 18</p> <p>Summary 18</p> <p><strong>Chapter 2: Use Cases for Data Science 19</strong></p> <p>Big Data—A Driver of Change 19</p> <p>Business Use Cases 21</p> <p>Summary 29</p> <p><strong>Chapter 3: Hadoop and Data Science 31</strong></p> <p>What Is Hadoop? 31</p> <p>Hadoop’s Evolution 37</p> <p>Hadoop Tools for Data Science 38</p> <p>Why Hadoop Is Useful to Data Scientists 46</p> <p>Summary 51</p> <p>Part II: Preparing and Visualizing Data with Hadoop 53</p> <p>Chapter 4: Getting Data into Hadoop 55</p> <p>Hadoop as a Data Lake 56</p> <p>The Hadoop Distributed File System (HDFS) 58</p> <p>Direct File Transfer to Hadoop HDFS 58</p> <p>Importing Data from Files into Hive Tables 59</p> <p>Importing Data into Hive Tables Using Spark 62</p> <p>Using Apache Sqoop to Acquire Relational Data 65</p> <p>Using Apache Flume to Acquire Data Streams 74</p> <p>Manage Hadoop Work and Data Flows with Apache</p> <p>Oozie 79</p> <p>Apache Falcon 81</p> <p>What’s Next in Data Ingestion? 82</p> <p>Summary 82</p> <p><strong>Chapter 5: Data Munging with Hadoop 85</strong></p> <p>Why Hadoop for Data Munging? 86</p> <p>Data Quality 86</p> <p>The Feature Matrix 93</p> <p>Summary 106</p> <p><strong>Chapter 6: Exploring and Visualizing Data 107</strong></p> <p>Why Visualize Data? 107</p> <p>Creating Visualizations 112</p> <p>Using Visualization for Data Science 121</p> <p>Popular Visualization Tools 121</p> <p>Visualizing Big Data with Hadoop 123</p> <p>Summary 124</p> <p>Part III: Applying Data Modeling with Hadoop 125</p> <p>Chapter 7: Machine Learning with Hadoop 127</p> <p>Overview of Machine Learning 127</p> <p>Terminology 128</p> <p>Task Types in Machine Learning 129</p> <p>Big Data and Machine Learning 130</p> <p>Tools for Machine Learning 131</p> <p>The Future of Machine Learning and Artificial Intelligence 132</p> <p>Summary 132</p> <p><strong>Chapter 8: Predictive Modeling 133</strong></p> <p>Overview of Predictive Modeling 133</p> <p>Classification Versus Regression 134</p> <p>Evaluating Predictive Models 136</p> <p>Supervised Learning Algorithms 140</p> <p>Building Big Data Predictive Model Solutions 141</p> <p>Example: Sentiment Analysis 145</p> <p>Summary 150</p> <p><strong>Chapter 9: Clustering 151</strong></p> <p>Overview of Clustering 151</p> <p>Uses of Clustering 152</p> <p>Designing a Similarity Measure 153</p> <p>Clustering Algorithms 154</p> <p>Example: Clustering Algorithms 155</p> <p>Evaluating the Clusters and Choosing the Number of Clusters 157</p> <p>Building Big Data Clustering Solutions 158</p> <p>Example: Topic Modeling with Latent Dirichlet Allocation 160</p> <p>Summary 163</p> <p><strong>Chapter 10: Anomaly Detection with Hadoop 165</strong></p> <p>Overview 165</p> <p>Uses of Anomaly Detection 166</p> <p>Types of Anomalies in Data 166</p> <p>Approaches to Anomaly Detection 167</p> <p>Tuning Anomaly Detection Systems 170</p> <p>Building a Big Data Anomaly Detection Solution with Hadoop 171</p> <p>Example: Detecting Network Intrusions 172</p> <p>Summary 179</p> <p><strong>Chapter 11: Natural Language Processing 181</strong></p> <p>Natural Language Processing 181</p> <p>Tooling for NLP in Hadoop 184</p> <p>Textual Representations 187</p> <p>Sentiment Analysis Example 189</p> <p>Summary 193</p> <p><strong>Chapter 12: Data Science with Hadoop—The Next Frontier 195</strong></p> <p>Automated Data Discovery 195</p> <p>Deep Learning 197</p> <p>Summary 199</p> <p>Appendix A: Book Web Page and Code Download 201</p> <p>Appendix B: HDFS Quick Start 203</p> <p>Quick Command Dereference 204</p> <p><strong>Appendix C: Additional Background on Data Science and Apache Hadoop and Spark 209</strong></p> <p>General Hadoop/Spark Information 209</p> <p>Hadoop/Spark Installation Recipes 210</p> <p>HDFS 210</p> <p>MapReduce 211</p> <p>Spark 211</p> <p>Essential Tools 211</p> <p>Machine Learning 212</p> <p>Index 213</p>

Managementboek Top 100

Rubrieken

Populaire producten

    Personen

      Trefwoorden

        Practical Data Science with Hadoop and Spark