Meer over de auteurs

Ofer Mendelevitch, Casey Stella, Douglas Eadline

Practical Data Science with Hadoop and Spark

Name: Practical Data Science with Hadoop and Spark
Author: Ofer Mendelevitch

Designing and Building Effective Analytics at Scale

Paperback Engels 2017 9780134024141

Verwachte levertijd ongeveer 9 werkdagen

50,14

In winkelwagen

Samenvatting

This book provides a unique perspective on applying data science with Hadoop by explaining what data science with Hadoop is all about, its practical business applications, and then diving deep into the details and providing a hands-on tutorial and showcase of various use-cases from the real world. The authors bring together all the practical knowledge students will need to do real, useful data science with Hadoop.

Specificaties

ISBN13:9780134024141

Taal:Engels

Bindwijze:Paperback

Uitgever:Pearson Education

Hoofdrubriek:Databases, Computer en informatica

Lezersrecensies

Wees de eerste die een lezersrecensie schrijft!

Schrijf een recensie

Uw cijfer

?

Log in om te stemmen

Inhoudsopgave

Foreword xiii Preface xv Acknowledgments xxi About the Authors xxiii Part I: Data Science with Hadoop—An Overview 1 Chapter 1: Introduction to Data Science 3 What Is Data Science? 3 Example: Search Advertising 4 A Bit of Data Science History 5 Becoming a Data Scientist 8 Building a Data Science Team 12 The Data Science Project Life Cycle 13 Managing a Data Science Project 18 Summary 18 Chapter 2: Use Cases for Data Science 19 Big Data—A Driver of Change 19 Business Use Cases 21 Summary 29 Chapter 3: Hadoop and Data Science 31 What Is Hadoop? 31 Hadoop’s Evolution 37 Hadoop Tools for Data Science 38 Why Hadoop Is Useful to Data Scientists 46 Summary 51 Part II: Preparing and Visualizing Data with Hadoop 53 Chapter 4: Getting Data into Hadoop 55 Hadoop as a Data Lake 56 The Hadoop Distributed File System (HDFS) 58 Direct File Transfer to Hadoop HDFS 58 Importing Data from Files into Hive Tables 59 Importing Data into Hive Tables Using Spark 62 Using Apache Sqoop to Acquire Relational Data 65 Using Apache Flume to Acquire Data Streams 74 Manage Hadoop Work and Data Flows with Apache Oozie 79 Apache Falcon 81 What’s Next in Data Ingestion? 82 Summary 82 Chapter 5: Data Munging with Hadoop 85 Why Hadoop for Data Munging? 86 Data Quality 86 The Feature Matrix 93 Summary 106 Chapter 6: Exploring and Visualizing Data 107 Why Visualize Data? 107 Creating Visualizations 112 Using Visualization for Data Science 121 Popular Visualization Tools 121 Visualizing Big Data with Hadoop 123 Summary 124 Part III: Applying Data Modeling with Hadoop 125 Chapter 7: Machine Learning with Hadoop 127 Overview of Machine Learning 127 Terminology 128 Task Types in Machine Learning 129 Big Data and Machine Learning 130 Tools for Machine Learning 131 The Future of Machine Learning and Artificial Intelligence 132 Summary 132 Chapter 8: Predictive Modeling 133 Overview of Predictive Modeling 133 Classification Versus Regression 134 Evaluating Predictive Models 136 Supervised Learning Algorithms 140 Building Big Data Predictive Model Solutions 141 Example: Sentiment Analysis 145 Summary 150 Chapter 9: Clustering 151 Overview of Clustering 151 Uses of Clustering 152 Designing a Similarity Measure 153 Clustering Algorithms 154 Example: Clustering Algorithms 155 Evaluating the Clusters and Choosing the Number of Clusters 157 Building Big Data Clustering Solutions 158 Example: Topic Modeling with Latent Dirichlet Allocation 160 Summary 163 Chapter 10: Anomaly Detection with Hadoop 165 Overview 165 Uses of Anomaly Detection 166 Types of Anomalies in Data 166 Approaches to Anomaly Detection 167 Tuning Anomaly Detection Systems 170 Building a Big Data Anomaly Detection Solution with Hadoop 171 Example: Detecting Network Intrusions 172 Summary 179 Chapter 11: Natural Language Processing 181 Natural Language Processing 181 Tooling for NLP in Hadoop 184 Textual Representations 187 Sentiment Analysis Example 189 Summary 193 Chapter 12: Data Science with Hadoop—The Next Frontier 195 Automated Data Discovery 195 Deep Learning 197 Summary 199 Appendix A: Book Web Page and Code Download 201 Appendix B: HDFS Quick Start 203 Quick Command Dereference 204 Appendix C: Additional Background on Data Science and Apache Hadoop and Spark 209 General Hadoop/Spark Information 209 Hadoop/Spark Installation Recipes 210 HDFS 210 MapReduce 211 Spark 211 Essential Tools 211 Machine Learning 212 Index 213

Managementboek Top 100

Bekijk de volledige Managementboek Top 100

Uw winkelwagen

Practical Data Science with Hadoop and Spark

Designing and Building Effective Analytics at Scale

Samenvatting

Specificaties

Lezersrecensies

Inhoudsopgave

Managementboek Top 100

Rubrieken

Populaire producten

Personen

Trefwoorden