Начать новую тему Ответить на тему
Статистика раздачи
Размер: 15.52 МБ | | Скачали: 27
Сидеров: 0  [0 байт/сек]    Личеров: 0  [0 байт/сек]
Пред. тема | След. тема 

Автор
Сообщение

Ответить с цитатой 

Web Crawling and Data Mining with Apache Nutch

Год: 2013
Автор: Dr. Zakir Laliwala, Abdulbasit Shaikh
Издательство: Packt Publishing
ISBN: 978-1-78328-685-0
Язык: Английский
Формат: PDF/EPUB/MOBI
Качество: Изначально компьютерное (eBook)
Количество страниц: 136
Описание:

In Detail

Apache Nutch helps you to create your own search engine and customize it according to your needs. You can integrate Apache Nutch very easily with your existing application and get the maximum benefit from it. It can be easily integrated with different components like Apache Hadoop, Eclipse, and MySQL.
"Web Crawling and Data Mining with Apache Nutch" shows you all the necessary steps to help you in crawling webpages for your application and using them to make your application searching more efficient. You will create your own search engine and will be able to improve your application page rank in searching.
"Web Crawling and Data Mining with Apache Nutch" starts with the basics of crawling webpages for your application. You will learn to deploy Apache Solr on server containing data crawled by Apache Nutch and perform Sharding with Apache Nutch using Apache Solr.
You will integrate your application with databases such as MySQL, Hbase, and Accumulo, and also with Apache Solr, which is used as a searcher.
With this book, you will gain the necessary skills to create your own search engine. You will also perform link analysis and scoring that are helpful in improving the rank of your application page.

What you will learn from this book

Carry out web crawling for your application
Make your application searching efficient by integrating it with Apache Solr
Integrate your application with different databases for data storage purposes
Run your application in a cluster environment by integrating it with Apache Hadoop
Perform crawling operations with Eclipse, which is used as an IDE instead of the command line
Create your own plugin in Apache Nutch
Integrate Apache Solr with Apache Nutch, and deploy Apache Solr on Apache Tomcat
Apply Sharding on Apache Tomcat for getting good results from Apache Solr while searching

Approach

This book is a user-friendly guide that covers all the necessary steps and examples related to web crawling and data mining using Apache Nutch.

Who this book is written for

"Web Crawling and Data Mining with Apache Nutch" is aimed at data analysts, application developers, web mining engineers, and data scientists. It is a good start for those who want to learn how web crawling and data mining is applied in the current business world. It would be an added benefit for those who have some knowledge of web crawling and data mining.
Preface
Chapter 1: Getting Started with Apache Nutch

Introduction to Apache Nutch
Installing and configuring Apache Nutch
Installation dependencies
Verifying your Apache Nutch installation
Crawling your first website
Installing Apache Solr
Integration of Solr with Nutch
Crawling your website using the crawl script
Crawling the Web, the CrawlDb, and URL filters
InjectorJob
GeneratorJob
FetcherJob
ParserJob
DbUpdaterJob
Invertlinks
Indexing with Apache Solr
Parsing and parse filters
Webgraph
Loops
LinkRank
ScoreUpdater
A scoring example
The Apache Nutch plugin
The Apache Nutch plugin example
Modifying plugin.xml
Describing dependencies with the ivy module
The Indexer extension program
The Scoring extension program
Using your plugin with Apache Nutch
Compiling your plugin
Understanding the Nutch Plugin architecture
Chapter 2: Deployment, Sharding, and AJAX Solr with Apache Nutch
Deployment of Apache Solr
Introduction of deployment
Need of Apache Solr deployment
Setting up Java Development Kit
Setting up Tomcat
Setting up Apache Solr
Running Solr on Tomcat
Sharding using Apache Solr
Introduction to sharding
Use of sharding with Apache Nutch
Distributing documents across shards
Sharding Apache Solr indexes
Single cluster
Splitting shards with Apache Nutch
Cleaning up with Apache Nutch
Splitting cluster shards
Checking statistics of sharding with Apache Nutch
The finaltest with Apache Nutch
Working with AJAX Solr
Architectural overview of AJAX Solr
Applying AJAX Solr on Reuters' data
Running AJAX Solr
Chapter 3: Integration of Apache Nutch with Apache
Hadoop and Eclipse
Integrating Apache Nutch with Apache Hadoop
Introducing Apache Hadoop
InstallingApache Hadoop and Apache Nutch
Downloading Apache Hadoop and Apache Nutch
Setting up Apache Hadoop with the cluster
Installing Java
Downloading Apache Hadoop
Configuring SSH
Disabling IPv6
Installing Apache Hadoop
Required ownerships and permissions
The configuration required for Hadoop_HOME/conf/*
Formatting the HDFS filesystem using the NameNode
Setting up the deployment architecture of Apache Nutch
Installing Apache Nutch
Key points of the Apache Nutch installation
Starting the cluster
Performing crawling on the Apache Hadoop cluster
Configuring Apache Nutch with Eclipse
Introducing Apache Nutch configuration with Eclipse
Installation and building Apache Nutch with Eclipse
Crawling in Eclipse
Chapter 4: Apache Nutch with Gora, Accumulo, and MySQL
Introduction to Apache Accumulo
Main features of Apache Accumulo
Introduction to Apache Gora
Supported data stores
Use of Apache Gora
Integration of Apache Nutch with Apache Accumulo
ConfiguringApache Gora with Apache Nutch
Setting up Apache Hadoop and Apache ZooKeeper
Installing and configuring Apache Accumulo
Testing Apache Accumulo
Crawling with Apache Nutch on Apache Accumulo
Integration of Apache Nutch with MySQL
Introduction to MySQL
Benefits of integrating MySQL with Apache Nutch
Configuring MySQL with Apache Nutch
Crawling with Apache Nutch on MySQL
Index
Правила, инструкции, FAQ!!!
Торрент   Скачать торрент Магнет ссылка
Скачать торрент
[ Размер 20.28 КБ / Просмотров 90 ]

Статус
Проверен 
 
Размер  15.52 МБ
Приватный: Нет (DHT включён)
.torrent скачан  27
Как залить торрент? | Как скачать Torrent? | Ошибка в торренте? Качайте магнет  


     Отправить личное сообщение
   
Страница 1 из 1
Показать сообщения за:  Поле сортировки  
Начать новую тему Ответить на тему


Сейчас эту тему просматривают: нет зарегистрированных пользователей и гости: 1


Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете добавлять вложения

Перейти:  
Ресурс не предоставляет электронные версии произведений, а занимается лишь коллекционированием и каталогизацией ссылок, присылаемых и публикуемых на форуме нашими читателями. Если вы являетесь правообладателем какого-либо представленного материала и не желаете чтобы ссылка на него находилась в нашем каталоге, свяжитесь с нами и мы незамедлительно удалим её. Файлы для обмена на трекере предоставлены пользователями сайта, и администрация не несёт ответственности за их содержание. Просьба не заливать файлы, защищенные авторскими правами, а также файлы нелегального содержания!