Python - HTML scraper

From NoskeWiki
Jump to navigation Jump to search


NOTE: This page is a daughter page of: Python

Sometimes you just want a snippet of info from webpages. There are various libraries for "scraping" HTML, here's a really simple example if you just want a dump of all the HTML.

requests library

First, install the "requests" library with:

$ sudo easy_install pip
$ pip install requests

Then create file

#!/usr/bin/env python
# Basic script to print the content of a webpage.
# Thanks to:
# To install these libraries run:
#  $ pip install requests

import requests
page = requests.get('')
print (page.content)

Lastly, run with:

$ python

NOTE: If any type of login/authentication is required this will now work.. you'll probably need a library like "selenium" for that.