ScraperProject¶
-
class
scraper_toolkit.ScraperProject.
ScraperProject
(domain: str)[source]¶ Handle the page fetching, HTML parsing, and exporting of a web scraping project.
Parameters: domain – Prefix to be added to scraped URLs missing the domain. -
add_selector
(selector: Union[str, Selector], attribute: str = None, name: str = None, post_processing: Callable = None)[source]¶ Add the given selector to loaded CSS selectors.
Parameters: - selector – CSS selector as a string or a Selector type object.
- attribute – HTML attribute of the element to store
- name – Optional name for the parsed attribute, useful for creating the header row when exporting as a CSV file.
- post_processing – Optional function called on the parsed attribute before it is stored. Useful for cleaning up and splitting data.
-
add_selectors
(selectors: List[Selector])[source]¶ Add multiple CSS selectors to loaded selectors.
Parameters: selectors – List of Selector objects.
-
export_to_csv
(csv_path: pathlib.Path, encoding: str = 'UTF-8', write_header: bool = True)[source]¶ Export parsed data to a CSV file.
Parameters: - csv_path – Path of the location to save the CSV file.
- encoding – CSV file encoding. Default is UTF-8.
- write_header – If true, write a header row to the CSV file using the “name” keys in the provided data.
-