src.spider module¶

SiteCrawler uses Crawler to instantiate Producers and Consumers specifically designed to generate links and read their content.

class src.spider.Spider(url: str, session: ClientSession, crawler: Crawler, max_links: int = 100)¶

Bases: object

async crawl() → Registry¶: …

async classmethod create(url: str, session: ClientSession, max_links: int = 100) → Spider¶

results(extract_text: bool = False) → List[str]¶

Retrieve either raw results of crawling, or extract the text.

Parameters:: extract_text (bool) – Whether or not to extract text from results. Only valid if spider was used to fetch URLs.
Returns:: Return a list of results.
Return type:: List[str]

async src.spider.main()¶