I recently compared several private-company databases, as my one-year subscription to CB Insights was coming to an end. I casually signed up for free trials for similar services, without any intention of doing the thorough comparison I ended up doing. But here I am, having created a 26 column spreadsheet dissecting PitchBook, PrivCo, DataFox, mattermark, and CB Insights, I would like to share what I have found.
- If you need detailed deal terms and VC vitals, go for PitchBook.
- PrivCo has lots of revenue and other financial data for startups, even small ones.
- CB Insights has the most balanced offering with an aesthetically pleasing interface.
- mattermark‘s free company info is great. Free employee counts are here.
- DataFox is good for identifying companies in a specific business/technology area.
As for myself, I ended up choosing DataFox and Tracxn (more about Tracxn below).
I wasn’t quite sure what I was looking for when I started this mini research project, but more I looked into the five companies, it became clearer that what I needed was to be able to find companies in a field I was not familiar with. I needed a database that would help me generate an exhaustive list of meaningful players in a field that I might not know the right term to describe. Compared to that, in-depth individual company information was nice to have but secondary. Hence the chart on top. High prices range $10,000 – $20,000 or so per year, and low is around $6000. Since everyone knows CrunchBase and I still use it from time to time, it’s included as a reference.
Where did Tracxn come from? Though not really a database company, Tracxn has over 100 researchers in India and can create a custom report on demand in 3 days. You can order as many reports as you like for mere $2000 per month, and only have to commit by month. What a world we live in.
Before going into details about each database, you may want to ask:
“Does any database have enough companies and is their data accurate?”
For me, the answer was yes for all five of them.
- mattermark claims to have the highest number of companies at 1.4 million, while CB Insights has the lowest 180,000. I never felt CB Insights lacked in coverage, however, during my one year of subscription. I also looked up ten startups ranging from a tiny one with only a couple of employees to the ones with funding over $100 million. Every database had all ten of them.
- Accuracy is also similar across the board, though some had more proprietary or analytical information than others. i.e., depth may differ, but accuracy isn’t
For me, accuracy only needs to be “good enough.” I mainly consult for medium-to-large companies to identify startups for investment and/or business development. Once my client has enough interest, we would meet with the company and directly ask questions. For this purpose, a database needs to be good enough for filtering, and all of them were.
Anyway, below is my take of each one.
- PitchBook: awesome if you are in finance
Based in Seattle, PitchBook was founded in 2007 and has a staff of over 400, by far the largest among the five companies. According to their own database, their revenue was $31 million in 2015.
Their strength is in valuation and detailed investment terms. Like an example below, it not only shows pre/post valuation but also liquidation preference multiple, anti-dilution provision (ratchet), participation, dividend rate, voting rights, and so on. I felt a little uncomfortable looking at all those naked numbers as if I was peeking into someone’s bedroom.
Companies are not the only ones whose financial identity is brutally dissected by PitchBook. For VC funds, it shows IRR and dry powder (money left for investment), in addition to usual things like fund size and portfolio companies.
If you are regularly in need of finding deal terms or VC’s performances, PitchBook is a must-have database.
PitchBook also has eerily specific employee count for each company, like 138 and 142, not 140. They also claim to have revenue figures, but not much. Searching for companies with more than $100,000 revenue resulted in 12,383. This is only about 1.8% of PitchBook’s total coverage of 695,000 companies. Of course, many startups do not have revenues, but probably – hopefully – more than 1.8% make at least $100,000.
For me, PitchBook’s biggest flaw was their lack of searchability by keyword. They do offer keyword search functionality, but I had to use pretty much the exact keyword they had assigned to each company. For example, I couldn’t find any keywords that yielded company names that included both Wealthfront and Betterment, well-known front runners of automated investment advisors. (The sales guy told me they had fixed this particular issue after I told him about it, but that was not the point.) Usually, this kind of problem is resolved by finding one of them and looking up its competitors. PitchBook has another problem here: it lists so many companies as “comparable companies” that there is more noise than signal. For Wealthfront, there were 245 “comparable companies” and for Betterment, 133. Obviously, these are not intended for identifying competitors. There must be a use case where knowing an average multiple for hundreds of tangentially related companies is crucial, but that’s not anywhere in my future.
- PrivCo: Revenues, revenues, revenues.
PrivCo is based in New York with about 30 employees and a dozen or so outsourced researchers. Of the five databases, this is the only one that did not offer free trial period. I had to call a salesperson and awkwardly ask him to do multiple searches over web conference.
PrivCo really shines in its coverage of revenue figures. Of 864,000 companies on their database, about 750,000 come with some revenue information. That’s close to 90%. It could be multi-year balance sheets plus profit and loss statements, or just one revenue number from three years ago. Any number is useful as private company’s revenues are often hard to come by.
PrivCo’s sales person said they dig into such esoteric information sources as divorce filings. Hats off to the effort.
What PrivCo is bad at is, again, search by keywords. It’s worse than PitchBook: you have to follow their classification, which there are more than 20,000 of. While 20,000 sounds like a lot, when it came to what I was looking for, each category ended up too broad. For example, for computer or mobile games, there were only three categories that might be applicable:
- Technology > Software > Online Video Game Development
- Internet > Internet Services > Online Games
- Media > Education > Educational Games
Good thing is that the competitors of each company are hand picked by PrivCo’s analysts, often concise and to the point.
- CB Insights: fun and pretty
CB Insights is lead by a social marketing-savvy founder, Anand Sanwal. The company has 70+ people, and raised $10M last November.
Their site is visually highly pleasant; information is presented in a way that’s intuitive and fun. If PitchBook and PrivCo were Windows applications, CB Insights would be an OS X app. Look at how Uber’s funding history is visualized:
If you hover over each circle, that part gets magnified, kind of like how dock gets magnified on OS X.
CB Insights also gives you great insight by compiling job-opening data. You can see the trend of job-opening numbers by week, and understand the startup’s momentum. They also analyze the job listings by type (competency) and level, and even include snippets of individual listing. You can get a good idea where the startup stands in terms of growth trajectory and strategy. CB Insights does not include employee counts, but you can go to mattermark for that.
CB Insights has another quirky data point: Tech stack. Is the company using Hadoop, Docker, and/or Python? CB Insights covers that. As software engineers are attracted to “hot” technologies, and the world is eaten by software (TM), the stack information could be a useful way to gauge the company’s savviness. Alternatively, if you are in enterprise software sales, this information will definitely be valuable.
One problem, albeit small, lies in a keyword search (again). There are two weaknesses: the smaller one is that multiple keywords are joined only by OR, though you can use + (must include the word), – (exclude the word), and “”(phrase). The bigger issue is that keyword has to be in the company description almost verbatim. Combining those two problems by AND, it becomes vexing: when you look for “IoT platform,” the description must contain those two words together in that order; you can’t use “IoT” and “platform” separately either, as the two words will be joined by OR and you get lots of noise.
List of competitors for each company is usually good, just like PrivCo’s. As such, once you identify at least one company in a specific field, you can find others daisy-chain style. I’m not being sarcastic here. This is a legitimate way to use any company database for now.
Even if you have no intention to sign up for any database, CB Insights’ free email newsletter is worth subscribing to. As a matter of fact, full of insightful and/or funny number crunching results, the newsletters are a lot more entertaining than the paid contents. They remind me of OKCupid’s big data analysis blog, OKTrend, that became inactive in 2014 to my great chagrin. CB Insights also offers free online seminars and they are good too.
CB Insights’ crawlers are fast as well. Company news is listed on the same day as they come out.
During my one year of subscription, I was impressed by how much progress they had in terms of functionalities and UI improvements. There should be more to come in the future.
You might have correctly guessed that I miss CB Insights. I do. It’s just that their beautiful bells and whistles feel overpriced.
- mattermark: free employee count
mattermark has 40+ employees in San Francisco and so far raised $10 million. They announced recently that they made most of the individual company information available for free. If you want to search by criteria, or access growth-related data like historical employee number, web traffic, and mobile download, you still have to pay $6000 a year.
I did try their paid features, and again, I wasn’t impressed by their keyword searchability, while it was a lot better than that with PrivCo or PitchBook. For example, when I searched by “investment management,” (+raised more than $50 million, +last funding event was after August 1, 2014), I only got Wealthfront, but not Betterment. I’m not trying to be mean here, but this is after trying many keyword or keyword pairs. (mattermark also assigns keywords to each company, but the keywords the two had were B2B, B2C, Analytics, Banking, Finance, and Mobile. Obviously, these are not specific to robo-advisor.) mattermark lists “similar companies” for each company, either “strong” or “moderate.” Wealthfront’s “strong” lists 11 companies including Betterment. This is good.
On mattermark, you can view companies grouped by incubator batches. With other databases, you could do the same thing by searching by investor name and funding date, but it’s a nice touch.
mattermark also has a very fine-grained search interface, where you can combine AND and OR as you like. But plain keyword search wasn’t as robust as I would have liked. For example, if I searched for “3D” (+raised more than $50M, + last funding date after August 1, 14) it showed 14 companies including Magic Leap, but if I used “virtual reality” instead of “3D,” I only got Jaunt and CCP Games.
mattermark’s employee count information is as eerily specific as PitchBook’s. The two databases’ numbers are not exactly the same, but very close. mattermark’s sales person said they got these numbers from “social data.” Maybe ADP is now called social data. I don’t know. For a handful of startups I actually knew the employee counts, mattermark’s (and PitchBook’s) numbers were not exactly correct but within “good enough” range. (e.g. 30 for actual 45.)
As for venture capital, mattermark only has basic information such as industry focus, investment stage, and location.
mattermark’s interface is clean and it is a great free resource particularly for employee numbers.
- DataFox: AI/human powered clustering
DataFox is a 20+ people company in San Francisco started by Stanford alums.
Let’s start with the bad: their UI made me reminisce about the days when Balmer was Microsoft’s CEO; functionalities are a bit rough around the edges euphemistically speaking, in other words, buggy; information for each company seems not as deep as what’s available on other databases; their crawler is slow (their sales guy told me I could ask to speed up the crawler for specific companies); it doesn’t have much information about investors (basically just portfolio company names); there is no way to search for acquisition deals.
But I decided to move from CB Insights to DataFox, because it has several ways to enhance keyword-based search.
For example, if I type in “artificial intelligence,” it will automatically suggest “artificial intelligence computer vision” and “artificial intelligence machine learning,” with a number of companies in each. I can choose either OR or AND as a way to join search conditions.
DataFox also has an interesting feature called Public Lists, each of which is a group of companies in a specific field. There are over 10,000 Public Lists created either by DataFox’s analysts or users. Some are generic like “Social Gaming” or “CyberSecurity 500.” There are more fine-grained lists like “Internet of Things – Applications (Vertical),” “Blockchain Technology,” or “Stanford GSB 2010.” You can follow these lists and get news alerts about the companies in them.
Another feature is “conference,” where they populate the lists of exhibitors and speakers for a specific conference. If the conference you are interested in is not there, you can ask DataFox to make one.
DataFox makes it easier for users to add more information to the database with input forms embedded in various places as well. You can also make your own data field so that you could use it like Salesforce for investment, albeit in a rudimentary way.
DataFox, mattermark, and CB Insights all claim they incorporate machine learning, but their AI (or probably any AI) is yet to completely replace human. I guess we are still in a supervised-learning phase for company discovery, and DataFox’s crowd sourcing features could be one way to build better predictive model.
- appendix 1 – Tracxn: smart Indians galore
Tracxn is more like a Gartner of startups than a database, though they do offer such information of over 100,000 companies as location, founding year, business area, funding, investors, and funding history. But its discovery capability is beyond rudimentary: if you search for “artificial intelligence,” it returns 6,815 results; adding “raised more than $20 million” narrows them down to whopping 724.
Instead, Tracxn offers super quick research on demand. You can ask a specific question, and they will put together a report within 3 days. They do this with over 100 researchers in India who had been working in global tech or consulting companies. One caveat is that their research is based only on information culled from the internet and not particularly eye opening. But if you want a quick data gathering as a starting point of your research, they are quick and cheap. As mentioned above, you can order unlimited number of custom reports for $2000/month/seat.
In addition, they publish 16 reports on a specific sector every month, and those are included in the subscription. No wonder traditional consulting firms are trying to evolve to specialize and differentiate.
- appendix 2 – Quid: AI visualization
Quid is an AI-based visualization engine that lets users map out complicated concepts. They ingest over 300,000 news and blog sources, patent filings from around the world, and 44,000 VC-funded startups information. It sounds like an overkill for what I need, but I do want to try it sometime.
It has been a couple of weeks since my DataFox subscription started, and I was out of town half the time. Now that I started to actually use it, I’m a bit worried by their cranky interface. It feels that they are trying to do too many things at once and spread too thin to pay attention to their UI/UX. Or they just do not have enough devs. Either way, I believe DataFox is differentiated with good ROI. Since I have committed for a year, I selfishly hope more people signup with DataFox and create useful Public Lists, and give the company more resources to make the site better.