Начать новую тему Ответить на тему
Статистика раздачи
Размер: 12.04 МБ | | Скачали: 27
Сидеров: 1  [0 байт/сек]    Личеров: 0  [0 байт/сек]
Пред. тема | След. тема 

Автор
Сообщение

Ответить с цитатой 

Webbots, Spiders, and Screen Scrapers, 2nd Edition

Год: 2012
Автор: Michael Schrenk
Издательство: No Starch Press
ISBN: 978-1-59327-397-2
Язык: Английский
Формат: PDF
Качество: Изначально компьютерное (eBook)
Интерактивное оглавление: Да
Количество страниц: 392
Описание:There's a wealth of data online, but sorting and gathering it by hand can be tedious and time consuming. Rather than click through page after endless page, why not let bots do the work for you?Webbots, Spiders, and Screen Scrapers will show you how to create simple programs with PHP/CURL to mine, parse, and archive online data to help you make informed decisions. Michael Schrenk, a highly regarded webbot developer, teaches you how to develop fault-tolerant designs, how best to launch and schedule the work of your bots, and how to create Internet agents that:Send email or SMS notifications to alert you to new information quickly
Search different data sources and combine the results on one page, making the data easier to interpret and analyze
Automate purchases, auction bids, and other online activities to save time
[*]Sample projects for automating tasks like price monitoring and news aggregation will show you how to put the concepts you learn into practice.This second edition of Webbots, Spiders, and Screen Scrapers includes tricks for dealing with sites that are resistant to crawling and scraping, writing stealthy webbots that mimic human search behavior, and using regular expressions to harvest specific data. As you discover the possibilities of web scraping, you'll see how webbots can save you precious time and give you much greater control over the data available on the Web.
Fundamental Concepts and Techniques
Chapter 1 : What’s in It for You?
Uncovering the Internet’s True Potential
What’s in It for Developers?
What’s in It for Business Leaders?
Final Thoughts

Chapter 2 : Ideas for Webbot Projects
Inspiration from Browser Limitations
A Few Crazy Ideas to Get You Started
Final Thoughts

Chapter 3 : Downloading Web Pages
Think About Files, Not Web Pages
Downloading Files with PHP’s Built-in Functions
Introducing PHP/CURL
Installing PHP/CURL
LIB_http
Final Thoughts

Chapter 4 : Basic Parsing Techniques
Content Is Mixed with Markup
Parsing Poorly Written HTML
Standard Parse Routines
Using LIB_parse
Useful PHP Functions
Final Thoughts

Chapter 5 : Advanced Parsing with Regular Expressions
Pattern Matching, the Key to Regular Expressions
PHP Regular Expression Types
Learning Patterns Through Examples
Regular Expressions of Particular Interest to Webbot Developers
When Regular Expressions Are (or Aren’t) the Right Parsing Tool
Final Thoughts

Chapter 6 : Automating Form Submission
Reverse Engineering Form Interfaces
Form Handlers, Data Fields, Methods, and Event Triggers
Unpredictable Forms
Analyzing a Form
Final Thoughts

Chapter 7 : Managing Large Amounts of Data
Organizing Data
Making Data Smaller
Thumbnailing Images
Final Thoughts

Projects
Chapter 8 : Price-Monitoring Webbots
The Target
Designing the Parsing Script
Initialization and Downloading the Target
Further Exploration

Chapter 9 : Image-Capturing Webbots
Example Image-Capturing Webbot
Creating the Image-Capturing Webbot
Further Exploration
Final Thoughts

Chapter 10 : Link-Verification Webbots
Creating the Link-Verification Webbot
Running the Webbot
Further Exploration

Chapter 11 : Search-Ranking Webbots
Description of a Search Result Page
What the Search-Ranking Webbot Does
Running the Search-Ranking Webbot
How the Search-Ranking Webbot Works
The Search-Ranking Webbot Script
Final Thoughts
Further Exploration

Chapter 12 : Aggregation Webbots
Choosing Data Sources for Webbots
Example Aggregation Webbot
Adding Filtering to Your Aggregation Webbot
Further Exploration

Chapter 13 : FTP Webbots
Example FTP Webbot
PHP and FTP
Further Exploration

Chapter 14 : Webbots That Read Email
The POP3 Protocol
Executing POP3 Commands with a Webbot
Further Exploration

Chapter 15 : Webbots That Send Email
Email, Webbots, and Spam
Sending Mail with SMTP and PHP
Writing a Webbot That Sends Email Notifications
Further Exploration

Chapter 16 : Converting a Website into a Function
Writing a Function Interface
Final Thoughts

Advanced Technical Considerations
Chapter 17 : Spiders
How Spiders Work
Example Spider
LIB_simple_spider
Experimenting with the Spider
Adding the Payload
Further Exploration

Chapter 18 : Procurement Webbots and Snipers
Procurement Webbot Theory
Sniper Theory
Testing Your Own Webbots and Snipers
Further Exploration
Final Thoughts

Chapter 19 : Webbots and Cryptography
Designing Webbots That Use Encryption
A Quick Overview of Web Encryption
Final Thoughts

Chapter 20 : Authentication
What Is Authentication?
Example Scripts and Practice Pages
Basic Authentication
Session Authentication
Final Thoughts

Chapter 21 : Advanced Cookie Management
How Cookies Work
PHP/CURL and Cookies
How Cookies Challenge Webbot Design
Further Exploration

Chapter 22 : Scheduling Webbots and Spiders
Preparing Your Webbots to Run as Scheduled Tasks
The Windows XP Task Scheduler
The Windows 7 Task Scheduler
Non-calendar-based Triggers
Final Thoughts

Chapter 23 : Scraping Difficult Websites with Browser Macros
Barriers to Effective Web Scraping
Overcoming Webscraping Barriers with Browser Macros
Final Thoughts

Chapter 24 : Hacking iMacros
Hacking iMacros for Added Functionality
Further Exploration

Chapter 25 : Deployment and Scaling
One-to-Many Environment
One-to-One Environment
Many-to-Many Environment
Many-to-One Environment
Scaling and Denial-of-Service Attacks
Creating Multiple Instances of a Webbot
Managing a Botnet
Further Exploration

Larger Considerations
Chapter 26 : Designing Stealthy Webbots and Spiders
Why Design a Stealthy Webbot?
Stealth Means Simulating Human Patterns
Final Thoughts

Chapter 27 : Proxies
What Is a Proxy?
Proxies in the Virtual World
Why Webbot Developers Use Proxies
Using a Proxy Server
Types of Proxy Servers
Final Thoughts

Chapter 28 : Writing Fault-Tolerant Webbots
Types of Webbot Fault Tolerance
Error Handlers
Further Exploration

Chapter 29 : Designing Webbot-Friendly Websites
Optimizing Web Pages for Search Engine Spiders
Web Design Techniques That Hinder Search Engine Spiders
Designing Data-Only Interfaces
Final Thoughts

Chapter 30 : Killing Spiders
Asking Nicely
Building Speed Bumps
Setting Traps
Final Thoughts

Chapter 31 : Keeping Webbots out of Trouble
It’s All About Respect
Copyright
Trespass to Chattels
Internet Law
Final Thoughts

Appendix : PHP/CURL Reference
Creating a Minimal PHP/CURL Session
Initiating PHP/CURL Sessions
Setting PHP/CURL Options
Executing the PHP/CURL Command
Closing PHP/CURL Sessions

Appendix : Status Codes
HTTP Codes
NNTP Codes

Appendix : SMS Gateways
Sending Text Messages
Reading Text Messages
A Sampling of Text Message Email Addresses
Правила, инструкции, FAQ!!!
Торрент   Скачать торрент Магнет ссылка
Скачать торрент
[ Размер 15.54 КБ / Просмотров 26 ]

Статус
Проверен 
 
Размер  12.04 МБ
Приватный: Нет (DHT включён)
.torrent скачан  27
Как залить торрент? | Как скачать Torrent? | Ошибка в торренте? Качайте магнет  


     Отправить личное сообщение
   
Страница 1 из 1
Показать сообщения за:  Поле сортировки  
Начать новую тему Ответить на тему


Сейчас эту тему просматривают: нет зарегистрированных пользователей и гости: 1


Вы не можете начинать темы
Вы не можете отвечать на сообщения
Вы не можете редактировать свои сообщения
Вы не можете удалять свои сообщения
Вы не можете добавлять вложения

Перейти:  
Ресурс не предоставляет электронные версии произведений, а занимается лишь коллекционированием и каталогизацией ссылок, присылаемых и публикуемых на форуме нашими читателями. Если вы являетесь правообладателем какого-либо представленного материала и не желаете чтобы ссылка на него находилась в нашем каталоге, свяжитесь с нами и мы незамедлительно удалим её. Файлы для обмена на трекере предоставлены пользователями сайта, и администрация не несёт ответственности за их содержание. Просьба не заливать файлы, защищенные авторскими правами, а также файлы нелегального содержания!