Python - HTML scraper

From NoskeWiki
Revision as of 13:17, 18 December 2019 by NoskeWiki (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

About

NOTE: This page is a daughter page of: Python


Sometimes you just want a snippet of info from webpages. There are various libraries for "scraping" HTML, here's a really simple example if you just want a dump of all the HTML.


requests library

First, install the "requests" library with:

$ sudo easy_install pip
$ pip install requests

Then create file basic_scraper.py:

#!/usr/bin/env python
# Basic script to print the content of a webpage.
# Thanks to: http://docs.python-guide.org/en/latest/scenarios/scrape/
# To install these libraries run:
#  $ pip install requests

import requests
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
print (page.content)

Lastly, run with:

$ python basic_scraper.py


NOTE: If any type of login/authentication is required this will now work.. you'll probably need a library like "selenium" for that.


Links