Comparable company analysis (CCA) is one of the most common valuation techniques in banking. This process attempts to estimate the enterprise value (market cap + debt + minority interest - cash) of a private company using the enterprise values of similar, public companies. Enterprise value is used to assess company value in situations such as M&A advisory, fairness opinions, IPOs and restructuring.
The simplified steps to comparable company analysis are as follows:
- Select similar companies to your target to form your 'peer universe' or 'comparables universe'
- Calculate the enterprise values for the companies in your comparables universe
- Choose a ratio for this universe such as EV/Revenue or EV/EBITDA
- Calculate the mean or median ratio of this universe
- Multiply the mean or median ratio by the appropriate metric of your target company (Revenue or EBITDA usually) to get the enterprise value for your target company
In this project, I attempt to predict enterprise value in a better way than comparable company analysis. I gather a company's name, stock ticker, assets, revenue, sector, and enterprise value from the Quandl and Intrinio APIs in the quandl_api and intrinio_api notebooks. From there, I clean and organize the data and run regression and random forest models in the models notebook in order to try and predict the enterprise value of a company. The models have the following form:
Enterprise value ~ Assets + Revenue + Sector
After this is done I create the Company class in comparables_analysis.py which creates a Company object for each company in my dataset. I use this class to run a pseudo-comparables-analysis to compare my results from the regression and random forest against.
As someone who has conducted comparables analysis, I know the process can be improved. It's time consuming, entirely dependent on assumptions, and uses very few datapoints. The Securities Litigation and Consulting Group published a paper in 2011 in which they try to replace comparables analysis with a regression on only a company's EBITDA or revenue which piqued my interest. I try to further improve on their work by using more companies, more varaibles (assets and sector), and more models (random forest).
In this script I start with a list of tickers I downloaded from the internet. They come with a lot of unnecessary text so I had to clean it up before it was usable.
Once I have this dataframe in the proper form, I loop through the tickers column and make calls to Quandl's API for the revenue and assets of each company. The result is shown below with some familiar companies (and my example row that was needed to start the loop).
For this script, I use Intrinio's API (and many, many keys for that API) to get the enterprise value and sector of each company in dataset. I used multiple API key's because Intrino limits daily requests for each key and 10-minute emails are very easy to create. In a process very similar to the one in quandl_api.py, I loop through the tickers in my dataframe and pull the appropriate data from the API. Below is an example of the final output of this script.
Now I have a dataframe that is combined and cleaned (above). The final step before running models on it is to do a small amount of transformation. One of the assumptions of regression is that the independent variables are normally distributed. Plotting enterprise value, assets, and revenue shows that these variables have a heavy right skew. To fix this I log-transform them to get a more Gaussian distribution for each variable.
The data on over 1,400 companies that incldues their tickers, name, enterprise value, assets, revenue, and sector. I split the data into training and test samples, and train linear regression and random foreset models on the training data. I use these models to predict the enterprise value of the companies in the test sample. Below is a screenshot of what my predictions dataframe looks like.
Finally we arrive at the comparables analysis. In this script I first make a Company class that houses all of the data I've gathered for each company. I also create methods for it called get_comparables() and get_ccv(). get_comparables() will find the 5 closest companies in terms of revenue to the target company within it's sector. get_ccv() averages the enerprise value of these companies to give it's comparable company valuation. I realize this method has limits to it and the actual process of comparable company analysis is more nuanced than this, but this is my rough approximation. I use these methods to obtain the comparable company valuation for each company in my predictions dataframe in order to get a baseline to compare my models against. Below is the final dataframe with all of the data and predicitons in it.
I am extremely happy with the results I've obtained. The comparable company valuation had a mean absolute error of 1175.3%, the regression had a mean absolute error of 62.0%, and the random forest had a mean absolute error of 71.5%. I was expecting the random forest to pick up interaction effects between sector and assets/revenue, but it turns out that a simple regression outperformed it. These results are in-line with a research paper published by the Securities Litigation and Consulting Group in which they ran a regression on only revenue for a company in an attempt to beat the comparable company valuation and got a mean absolute error of 31.7% for regression and 11,641.8% for comparable company analysis.