Python - HTML scraper
Jump to navigation
Jump to search
About
NOTE: This page is a daughter page of: Python
Sometimes you just want a snippet of info from webpages. There are various libraries for "scraping" HTML, here's a really simple example if you just want a dump of all the HTML.
requests library
First, install the "requests" library with:
$ sudo easy_install pip $ pip install requests
Then create file basic_scraper.py:
#!/usr/bin/env python
# Basic script to print the content of a webpage.
# Thanks to: http://docs.python-guide.org/en/latest/scenarios/scrape/
# To install these libraries run:
# $ pip install requests
import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
print (page.content)
Lastly, run with:
$ python basic_scraper.py
NOTE: If any type of login/authentication is required this will now work.. you'll probably need a library like "selenium" for that.
Links
- Scrape using lxml and requests - Example I used... read on and it shows you how to isolated key values.
- Selenium - Getting Started - Gives an example of how the selenium library can help you login to a page (for pages which require authentication to get in).