- class CB_IPO.scrape¶
This class creates an object that can scrape for various filings and and attributes of SEC filings
- url_info¶
The url to be used for scraping the site
- __init__()¶
Instantiates a scraper object, with a default search space of recent S-1 filings
- set_page(page_number)¶
Modifies the page number being opened from the search results, and modifies url_info as such
- Parameters:
page_number (int) – The page number in search results to be opened
- Returns:
A string of the modified url
- Return type:
- reset_url()¶
Resets the url to only search for the most recent S-1 forms, and modifies url_info as such
- Returns:
A string of the original url
- Return type:
- set_search_date(start_date, end_date)¶
Modified the search to look between two specific dates, and modifies url_info as such
- Parameters:
start_date (str) – Start of the date range YYYY-MM-DD format
end_date (str) – End of the date range YYYY-MM-DD format
- Returns:
A string of the modified url
- Return type:
- edgar_scrape(num)¶
Finds the names, dates, and types of forms filed by different companies
- Parameters:
num (int) – The number of entities to be scraped from a page
- Returns:
A list of company names, a list of filing dates, and a set of filing types in chronological order
- Return type:
- Raises:
ValueError – If num is greater than 100
- generate_df(num_entries=100, num_pages=1)¶
Finds the names, dates, and types of forms filed by different companies
- Parameters:
num_entries (int) – The number of entities to be scraped from a page
num_pages (int) – The number of pages to be scraped
- Returns:
A dataframe of the companies scraped and the dates they filed
- Return type:
- Raises:
ValueError – If num_entries is greater than 100
- add_forms(forms_list)¶
Updates query to include certain form types, and modifies url_info as such
- Parameters:
forms_list (list) – List of strings for forms to search for
- Returns:
A tuple of the string for the new url, and the appended forms
- Return type:
- get_anums(cik, num)¶
Scrapes accession numbers from a page when given a cik
- Parameters:
cik (int) – The cik id for a company
num (int) – The number of entities to be scraped from a page
- Returns:
a list of accession numbers relating to files for a cik
- Return type:
- get_refs(cik, num)¶
Finds the reference numbers for filings and company name for a given cik
- Parameters:
cik (int) – The cik id for a company
num (int) – The number of entities to be scraped from a page
- Returns:
A list of filing reference numbers and the name of the company associated with a cik
- Return type:
- create_links(cik, num)¶
Finds the links for xrbl versions of filings for a given cik
- Parameters:
cik (int) – The cik id for a company
num (int) – The number of entities to be scraped from a page
- Returns:
A list of links to filings and the name of the company associated with a cik
- Return type:
- scrape_xbrl(link)¶
Finds the account value for elements like total assets, liabilities, and net income in a filing
- Parameters:
link (str) – link to an xbrl for a 10-K filing
- Returns:
A dictionary of financial statement elements such as total assets, liabilities, and net income
- Return type:
- Raises:
Exception – If the scraped Assets, Liabilities, and Equity fail the Accounting Equation
- calculate_ratios(financials)¶
Calculates financial ratios relevant to balance sheet and income statements
- Parameters:
financials (dict) – a dictionary containg Asset and Liability information, as formatted by scrape_xrbl()
- Returns:
A dictionary of financial ratios relating to profitability, liquidity, and leverage
- Return type:
- Raises:
ValueError – If financials is improperly formated and doesn’t contain the requisite values
- summarize_10k(link, flag='raw')¶
Creates dataframe that summarize information scraped from the 10-K filing
- Parameters:
link (str) – link to an xbrl for a 10-K filing
flag (str) – str indicating summary type, ‘raw’, ‘liquidity’, ‘debt’, etc.
- Returns:
A dataframe of the companies scraped and the dates they filed
- Return type: