Начать новую тему Ответить на тему
Статистика раздачи
Размер: 11.71 МБ | | Скачали: 0
Сидеров: 0  [0 байт/сек]    Личеров: 0  [0 байт/сек]
Пред. тема | След. тема 

Автор
Сообщение

Ответить с цитатой 

Hadoop in Practice, Second Edition

Год издания: 2014
Автор: Alex Holmes

Издательство: Manning
ISBN: 9781617292224
Язык: Английский

Формат: ePub
Качество: Изначально компьютерное (eBook)
Интерактивное оглавление: Да
Количество страниц: 377

Описание: Hadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently. In short, this is the most practical, up-to-date coverage of Hadoop available anywhere
preface
acknowledgments
about this book
about the cover illustration
Part 1 Background and fundamentals
1. Hadoop in a heartbeat
1.1. What is Hadoop?
1.1.1. Core Hadoop components
1.1.2. The Hadoop ecosystem
1.1.3. Hardware requirements
1.1.4. Hadoop distributions
1.1.5. Who’s using Hadoop?
1.1.6. Hadoop limitations
1.2. Getting your hands dirty with MapReduce
1.3. Summary
2. Introduction to YARN
2.1. YARN overview
2.1.1. Why YARN?
2.1.2. YARN concepts and components
2.1.3. YARN configuration
Technique 1 Determining the configuration of your cluster
2.1.4. Interacting with YARN
Technique 2 Running a command on your YARN cluster
Technique 3 Accessing container logs
Technique 4 Aggregating container log files
2.1.5. YARN challenges
2.2. YARN and MapReduce
2.2.1. Dissecting a YARN MapReduce application
2.2.2. Configuration
2.2.3. Backward compatibility
Technique 5 Writing code that works on Hadoop versions 1 and 2
2.2.4. Running a job
Technique 6 Using the command line to run a job
2.2.5. Monitoring running jobs and viewing archived jobs
2.2.6. Uber jobs
Technique 7 Running small MapReduce jobs
2.3. YARN applications
2.3.1. NoSQL
2.3.2. Interactive SQL
2.3.3. Graph processing
2.3.4. Real-time data processing
2.3.5. Bulk synchronous parallel
2.3.6. MPI
2.3.7. In-memory
2.3.8. DAG execution
2.4. Chapter summary
Part 2 Data logistics
3. Data serialization—working with text and beyond
3.1. Understanding inputs and outputs in MapReduce
3.1.1. Data input
3.1.2. Data output
3.2. Processing common serialization formats
3.2.1. XML
Technique 8 MapReduce and XML
3.2.2. JSON
Technique 9 MapReduce and JSON
3.3. Big data serialization formats
3.3.1. Comparing SequenceFile, Protocol Buffers, Thrift, and Avro
3.3.2. SequenceFile
Technique 10 Working with SequenceFiles
Technique 11 Using SequenceFiles to encode Protocol Buffers
3.3.3. Protocol Buffers
3.3.4. Thrift
3.3.5. Avro
Technique 12 Avro’s schema and code generation
Technique 13 Selecting the appropriate way to use Avro in MapReduce
Technique 14 Mixing Avro and non-Avro data in MapReduce
Technique 15 Using Avro records in MapReduce
Technique 16 Using Avro key/value pairs in MapReduce
Technique 17 Controlling how sorting worksin MapReduce
Technique 18 Avro and Hive
Technique 19 Avro and
3.4. Columnar storage
3.4.1. Understanding object models and storage formats
3.4.2. Parquet and the Hadoop ecosystem
3.4.3. Parquet block and page sizes
Technique 20 Reading Parquet files via the command line
Technique 21 Reading and writing Avro data in Parquet with Java
Technique 22 Parquet and MapReduce
Technique 23 Parquet and Hive/Impala
Technique 24 Pushdown predicates and projection with Parquet
3.4.4. Parquet limitations
3.5. Custom file formats
3.5.1. Input and output formats
Technique 25 Writing input and output formats for CSV
3.5.2. The importance of output committing
3.6. Chapter summary
4. Organizing and optimizing data in HDFS
4.1. Data organization
4.1.1. Directory and file layout
4.1.2. Data tiers
4.1.3. Partitioning
Technique 26 Using MultipleOutputs to partition your data
Technique 27 Using a custom MapReduce partitioner
4.1.4. Compacting
Technique 28 Using filecrush to compact data
Technique 29 Using Avro to store multiple small binary files
4.1.5. Atomic data movement
4.2. Efficient storage with compression
Technique 30 Picking the right compression codec for your data
Technique 31 Compression with HDFS, MapReduce, Pig, and Hive
Technique 32 Splittable LZOP with MapReduce, Hive, and Pig
4.3. Chapter summary
5. Moving data into and out of Hadoop
5.1. Key elements of data movement
5.2. Moving data into Hadoop
5.2.1. Roll your own ingest
Technique 33 Using the CLI to load files
Technique 34 Using REST to load files
Technique 35 Accessing HDFS from behind a firewall
Technique 36 Mounting Hadoop with NFS
Technique 37 Using DistCp to copy data within and between clusters
Technique 38 Using Java to load files
5.2.2. Continuous movement of log and binary files into HDFS
Technique 39 Pushing system log messages into HDFS with Flume
Technique 40 An automated mechanism to copy files into HDFS
Technique 41 Scheduling regular ingress activities with Oozie
5.2.3. Databases
Technique 42 Using Sqoop to import data from MySQL
5.2.4. HBase
Technique 43 HBase ingress into HDFS
Technique 44 MapReduce with HBase as a data source
5.2.5. Importing data from Kafka
Technique 45 Using Camus to copy Avro data from Kafka into HDFS
5.3. Moving data out of Hadoop
5.3.1. Roll your own egress
Technique 46 Using the CLI to extract files
Technique 47 Using REST to extract files
Technique 48 Reading from HDFS when behind a firewall
Technique 49 Mounting Hadoop with NFS
Technique 50 Using DistCp to copy data out of Hadoop
Technique 51 Using Java to extract files
5.3.2. Automated file egress
Technique 52 An automated mechanism to export files from HDFS
5.3.3. Databases
Technique 53 Using Sqoop to export data to MySQL
5.3.4. NoSQL
5.4. Chapter summary
Part 3 Big data patterns
6. Applying MapReduce patterns to big data
6.1. Joining
Technique 54 Picking the best join strategy for your data
Technique 55 Filters, projections, and pushdowns
6.1.1. Map-side joins
Technique 56 Joining data where one dataset can fit into memory
Technique 57 Performing a semi-join on large datasets
Technique 58 Joining on presorted and prepartitioned data
6.1.2. Reduce-side joins
Technique 59 A basic repartition join
Technique 60 Optimizing the repartition join
Technique 61 Using Bloom filters to cut down on shuffled data
6.1.3. Data skew in reduce-side joins
Technique 62 Joining large datasets with high join-key cardinality
Technique 63 Handling skews generated by the hash partitioner
6.2. Sorting
6.2.1. Secondary sort
Technique 64 Implementing a secondary sort
6.2.2. Total order sorting
Technique 65 Sorting keys across multiple reducers
6.3. Sampling
Technique 66 Writing a reservoir-sampling InputFormat
6.4. Chapter summary
7. Utilizing data structures and algorithms at scale
7.1. Modeling data and solving problems with graphs
7.1.1. Modeling graphs
7.1.2. Shortest-path algorithm
Technique 67 Find the shortest distance between two users
7.1.3. Friends-of-friends algorithm
Technique 68 Calculating FoFs
7.1.4. Using Giraph to calculate PageRank over a web graph
Technique 69 Calculate PageRank over a web graph
7.2. Bloom filters
Technique 70 Parallelized Bloom filter creation in MapReduce
7.3. HyperLogLog
7.3.1. A brief introduction to HyperLogLog
Technique 71 Using HyperLogLog to calculate unique counts
7.4. Chapter summary
8. Tuning, debugging, and testing
8.1. Measure, measure, measure
8.2. Tuning MapReduce
8.2.1. Common inefficiencies in MapReduce jobs
Technique 72 Viewing job statistics
8.2.2. Map optimizations
73 Data locality
Technique 74 Dealing with a large number of input splits
Technique 75 Generating input splits in the cluster with YARN
8.2.3. Shuffle optimizations
Technique 76 Using the combiner
Technique 77 Blazingly fast sorting with binary comparators
Technique 78 Tuning the shuffle internals
8.2.4. Reducer optimizations
Technique 79 Too few or too many reducers
8.2.5. General tuning tips
Technique 80 Using stack dumps to discover unoptimized user code
Technique 81 Profiling your map and reduce tasks
8.3. Debugging
8.3.1. Accessing container log output
Technique 82 Examining task logs
8.3.2. Accessing container start scripts
Technique 83 Figuring out the container startup command
8.3.3. Debugging OutOfMemory errors
Technique 84 Force container JVMs to generate a heap dump
8.3.4. MapReduce coding guidelines for effective debugging
Technique 85 Augmenting MapReduce code for better debugging
8.4. Testing MapReduce jobs
8.4.1. Essential ingredients for effective unit testing
8.4.2. MRUnit
Technique 86 Using MRUnit to unit-test MapReduce
8.4.3. LocalJobRunner
Technique 87 Heavyweight job testing with the LocalJobRunner
8.4.4. MiniMRYarnCluster
Technique 88 Using MiniMRYarnCluster to test your jobs
8.4.5. Integration and QA testing
8.5. Chapter summary
Part 4 Beyond MapReduce
9. SQL on Hadoop
9.1. Hive
9.1.1. Hive basics
9.1.2. Reading and writing data
Technique 89 Working with text files
Technique 90 Exporting data to local disk
9.1.3. User-defined functions in Hive
Technique 91 Writing UDFs
9.1.4. Hive performance
Technique 92 Partitioning
Technique 93 Tuning Hive joins
9.2. Impala
9.2.1. Impala vs. Hive
9.2.2. Impala basics
Technique 94 Working with text
Technique 95 Working with Parquet
Technique 96 Refreshing metadata
9.2.3. User-defined functions in Impala
Technique 97 Executing Hive UDFs in Impala
9.3. Spark SQL
9.3.1. Spark 101
9.3.2. Spark on Hadoop
9.3.3. SQL with Spark
Technique 98 Calculating stock averages with Spark SQL
Technique 99 Language-integrated queries
Technique 100 Hive and Spark SQL
9.4. Chapter summary
10. Writing a YARN application
10.1. Fundamentals of building a YARN application
10.1.1. Actors
10.1.2. The mechanics of a YARN application
10.2. Building a YARN application to collect cluster statistics
Technique 101 A bare-bones YARN client
Technique 102 A bare-bones ApplicationMaster
Technique 103 Running the application and accessing logs
Technique 104 Debugging using an unmanaged application master
10.3. Additional YARN application capabilities
10.3.1. RPC between components
10.3.2. Service discovery
10.3.3. Checkpointing application progress
10.3.4. Avoiding split-brain
10.3.5. Long-running applications
10.3.6. Security
10.4. YARN programming abstractions
10.4.1. Twill
10.4.2. Spring
10.4.3. REEF
10.4.4. Picking a YARN API abstraction
10.5. Chapter summary
Appendix A: Installing Hadoop and friends
A.1. Code for the book
A.2. Recommended Java versions
A.3. Hadoop
A.4. Flume
A.5. Oozie
A.6. Sqoop
A.7. HBase
A.8. Kafka
A.9. Camus
A.10. Avro
A.11. Apache Thrift
A.12. Protocol Buffers
A.13. Snappy
A.14. LZOP
A.15. Elephant Bird
A.16. Hive
A.17. R
A.18. RHadoop
A.19. Mahout
index
bonus chapters available online
11. Integrating R and Hadoop for statistics and more
11.1. Comparing R and MapReduce integrations
11.2. R fundamentals
11.3. R and streaming
11.3.1. Streaming and map-only R
Technique 105 Calculate the daily mean for stocks
11.3.2. Streaming, R, and full MapReduce
Technique 106 Calculate the cumulative moving average for stocks
11.4. RHadoop—a simple integration of client-side R and Hadoop
Technique 107 Calculating CMA with RHadoop
11.5. Chapter summary
12. Predictive analytics with Mahout
Правила, инструкции, FAQ!!!
Торрент   Скачать торрент Магнет ссылка
Скачать торрент
[ Размер 4.14 КБ / Просмотров 81 ]

Статус
Проверен 
 
Размер  11.71 МБ
Приватный: Нет (DHT включён)
.torrent скачан  0
Как залить торрент? | Как скачать Torrent? | Ошибка в торренте? Качайте магнет  


     Отправить личное сообщение
   
Страница 1 из 1
Показать сообщения за:  Поле сортировки  
Начать новую тему Ответить на тему


Сейчас эту тему просматривают: нет зарегистрированных пользователей и гости: 1


Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете добавлять вложения

Перейти:  
Ресурс не предоставляет электронные версии произведений, а занимается лишь коллекционированием и каталогизацией ссылок, присылаемых и публикуемых на форуме нашими читателями. Если вы являетесь правообладателем какого-либо представленного материала и не желаете чтобы ссылка на него находилась в нашем каталоге, свяжитесь с нами и мы незамедлительно удалим её. Файлы для обмена на трекере предоставлены пользователями сайта, и администрация не несёт ответственности за их содержание. Просьба не заливать файлы, защищенные авторскими правами, а также файлы нелегального содержания!