src.spider module

SiteCrawler uses Crawler to instantiate Producers and Consumers specifically designed to generate links and read their content.

class src.spider.Spider(url: str, session: ClientSession, crawler: Crawler, max_links: int = 100)

Bases: object

async crawl() Registry

async classmethod create(url: str, session: ClientSession, max_links: int = 100) Spider
results(extract_text: bool = False) List[str]

Retrieve either raw results of crawling, or extract the text.

Parameters:

extract_text (bool) – Whether or not to extract text from results. Only valid if spider was used to fetch URLs.

Returns:

Return a list of results.

Return type:

List[str]

async src.spider.main()