I'm struggling to find a generic approach to detecting a form in HTML and then submitting it. When the page structure is know in advance for a given page, we of course have several options:
-- Selenium/Webdriver (by filling in the fields and 'clicking' the button)
-- Determining the form of the POST query manually, then reconstructing it with urllib2 directly:
import lxml.html as LH
url = "http://apply.ovoenergycareers.co.uk/vacancies/#results"
params = urllib.urlencode([('field_36', 73), ('field_37', 76),
response = urllib2.urlopen(url, params)
or with Requests:
r = requests.post("http://apply.ovoenergycareers.co.uk/vacancies/#results", data = 'Manager')
But although most forms involve a POST request, some input fields and a submit button, they vary greatly in their implementation under the hood. When the number of pages to be scraped gets into the hundreds, it's not feasible to define a custom form-filling approach for each.
My understanding is that Scrapy's main added value is its ability to follow links. I presume that this would also include links ultimately arrived at via form submission. Can this ability then be used to build a generic approach to "following" a form submission?
CLARIFICATION: In the case of a form with several dropdown menus, I will typically be leaving these at their default value, and only filling in the search term input field. So locating this field and 'filling it in' is ultimately the main challenge here.