Methods¶

class CB_IPO.scrape¶

This class creates an object that can scrape for various filings and and attributes of SEC filings

url_info¶: The url to be used for scraping the site

__init__()¶: Instantiates a scraper object, with a default search space of recent S-1 filings

set_page(page_number)¶

Modifies the page number being opened from the search results, and modifies url_info as such

Parameters:: page_number (int) – The page number in search results to be opened
Returns:: A string of the modified url
Return type:: str

reset_url()¶

Resets the url to only search for the most recent S-1 forms, and modifies url_info as such

Returns:: A string of the original url
Return type:: str

set_search_date(start_date, end_date)¶

Modified the search to look between two specific dates, and modifies url_info as such

Parameters:

start_date (str) – Start of the date range YYYY-MM-DD format
end_date (str) – End of the date range YYYY-MM-DD format

Returns:

A string of the modified url

Return type:

str

edgar_scrape(num)¶

Finds the names, dates, and types of forms filed by different companies

Parameters:: num (int) – The number of entities to be scraped from a page
Returns:: A list of company names, a list of filing dates, and a set of filing types in chronological order
Return type:: tuple
Raises:: ValueError – If num is greater than 100

generate_df(num_entries=100, num_pages=1)¶

Finds the names, dates, and types of forms filed by different companies

Parameters:

num_entries (int) – The number of entities to be scraped from a page
num_pages (int) – The number of pages to be scraped

Returns:

A dataframe of the companies scraped and the dates they filed

Return type:

pandas.DataFrame

Raises:

ValueError – If num_entries is greater than 100

add_forms(forms_list)¶

Updates query to include certain form types, and modifies url_info as such

Parameters:: forms_list (list) – List of strings for forms to search for
Returns:: A tuple of the string for the new url, and the appended forms
Return type:: tuple

get_anums(cik, num)¶

Scrapes accession numbers from a page when given a cik

Parameters:

cik (int) – The cik id for a company
num (int) – The number of entities to be scraped from a page

Returns:

a list of accession numbers relating to files for a cik

Return type:

list

get_refs(cik, num)¶

Finds the reference numbers for filings and company name for a given cik

Parameters:

cik (int) – The cik id for a company
num (int) – The number of entities to be scraped from a page

Returns:

A list of filing reference numbers and the name of the company associated with a cik

Return type:

tuple

create_links(cik, num)¶

Finds the links for xrbl versions of filings for a given cik

Parameters:

cik (int) – The cik id for a company
num (int) – The number of entities to be scraped from a page

Returns:

A list of links to filings and the name of the company associated with a cik

Return type:

tuple

scrape_xbrl(link)¶

Finds the account value for elements like total assets, liabilities, and net income in a filing

Parameters:: link (str) – link to an xbrl for a 10-K filing
Returns:: A dictionary of financial statement elements such as total assets, liabilities, and net income
Return type:: dict
Raises:: Exception – If the scraped Assets, Liabilities, and Equity fail the Accounting Equation

calculate_ratios(financials)¶

Calculates financial ratios relevant to balance sheet and income statements

Parameters:: financials (dict) – a dictionary containg Asset and Liability information, as formatted by scrape_xrbl()
Returns:: A dictionary of financial ratios relating to profitability, liquidity, and leverage
Return type:: dict
Raises:: ValueError – If financials is improperly formated and doesn’t contain the requisite values

summarize_10k(link, flag='raw')¶

Creates dataframe that summarize information scraped from the 10-K filing

Parameters:

link (str) – link to an xbrl for a 10-K filing
flag (str) – str indicating summary type, ‘raw’, ‘liquidity’, ‘debt’, etc.

Returns:

A dataframe of the companies scraped and the dates they filed

Return type:

pandas.DataFrame