Methods

class CB_IPO.scrape

This class creates an object that can scrape for various filings and and attributes of SEC filings

url_info

The url to be used for scraping the site

__init__()

Instantiates a scraper object, with a default search space of recent S-1 filings

set_page(page_number)

Modifies the page number being opened from the search results, and modifies url_info as such

Parameters:

page_number (int) – The page number in search results to be opened

Returns:

A string of the modified url

Return type:

str

reset_url()

Resets the url to only search for the most recent S-1 forms, and modifies url_info as such

Returns:

A string of the original url

Return type:

str

set_search_date(start_date, end_date)

Modified the search to look between two specific dates, and modifies url_info as such

Parameters:
  • start_date (str) – Start of the date range YYYY-MM-DD format

  • end_date (str) – End of the date range YYYY-MM-DD format

Returns:

A string of the modified url

Return type:

str

edgar_scrape(num)

Finds the names, dates, and types of forms filed by different companies

Parameters:

num (int) – The number of entities to be scraped from a page

Returns:

A list of company names, a list of filing dates, and a set of filing types in chronological order

Return type:

tuple

Raises:

ValueError – If num is greater than 100

generate_df(num_entries=100, num_pages=1)

Finds the names, dates, and types of forms filed by different companies

Parameters:
  • num_entries (int) – The number of entities to be scraped from a page

  • num_pages (int) – The number of pages to be scraped

Returns:

A dataframe of the companies scraped and the dates they filed

Return type:

pandas.DataFrame

Raises:

ValueError – If num_entries is greater than 100

add_forms(forms_list)

Updates query to include certain form types, and modifies url_info as such

Parameters:

forms_list (list) – List of strings for forms to search for

Returns:

A tuple of the string for the new url, and the appended forms

Return type:

tuple

get_anums(cik, num)

Scrapes accession numbers from a page when given a cik

Parameters:
  • cik (int) – The cik id for a company

  • num (int) – The number of entities to be scraped from a page

Returns:

a list of accession numbers relating to files for a cik

Return type:

list

get_refs(cik, num)

Finds the reference numbers for filings and company name for a given cik

Parameters:
  • cik (int) – The cik id for a company

  • num (int) – The number of entities to be scraped from a page

Returns:

A list of filing reference numbers and the name of the company associated with a cik

Return type:

tuple

Finds the links for xrbl versions of filings for a given cik

Parameters:
  • cik (int) – The cik id for a company

  • num (int) – The number of entities to be scraped from a page

Returns:

A list of links to filings and the name of the company associated with a cik

Return type:

tuple

scrape_xbrl(link)

Finds the account value for elements like total assets, liabilities, and net income in a filing

Parameters:

link (str) – link to an xbrl for a 10-K filing

Returns:

A dictionary of financial statement elements such as total assets, liabilities, and net income

Return type:

dict

Raises:

Exception – If the scraped Assets, Liabilities, and Equity fail the Accounting Equation

calculate_ratios(financials)

Calculates financial ratios relevant to balance sheet and income statements

Parameters:

financials (dict) – a dictionary containg Asset and Liability information, as formatted by scrape_xrbl()

Returns:

A dictionary of financial ratios relating to profitability, liquidity, and leverage

Return type:

dict

Raises:

ValueError – If financials is improperly formated and doesn’t contain the requisite values

summarize_10k(link, flag='raw')

Creates dataframe that summarize information scraped from the 10-K filing

Parameters:
  • link (str) – link to an xbrl for a 10-K filing

  • flag (str) – str indicating summary type, ‘raw’, ‘liquidity’, ‘debt’, etc.

Returns:

A dataframe of the companies scraped and the dates they filed

Return type:

pandas.DataFrame