Parser

class scraper_toolkit.components.Parser.Parser(html: str)[source]

Parse HTML for specific elements or attributes

Parameters:html – HTML to parse, as a string.
add_selector(selector: Union[str, scraper_toolkit.components.Selector.Selector] = None, attribute: str = None, name: str = None, post_processing: Callable = None)[source]

Add the given selector to loaded CSS selectors.

Parameters:
  • selector – CSS selector as a string or a Selector type object.
  • attribute – HTML attribute of the element to store
  • name – Optional name for the parsed attribute, useful for creating the header row when exporting as a CSV file.
  • post_processing – Optional function called on the parsed attribute before it is stored. Useful for cleaning up and splitting data.
parse()[source]

Parse HTML for elements using loaded CSS selectors and append matching elements to self.parsed as dictionary objects.