You can say this for Diffbot: It’s not shying away from a big challenge.
The San Francisco startup, which just closed $10 million in Series A investment, wants to scrape all the data on the web (all of it) to put it into a structured format, which thus makes it useful for all sorts of business purposes.
And perhaps more to the point, the five-year-old company became profitable last year, according to founder and chief executive Mike Tung. Of course that isn’t really verifiable from the outside, but it’s worth noting especially as far larger companies are struggling to find a good business model for AI or cognitive computing or whatever the next name for this self-teaching technology will be.
More: IBM cognitive computing exec exits after a few months on the job.
The company claims big customers including Amazon
, DuckDuckGo, and Salesforce
which uses the service in its Radian6 “social listening” business.
Any large consumer products company that wants to gauge interest, comments, or complaints about its products might take a look at Diffbot because it does more than search what people are saying on their Twitter
feeds. It looks at comment threads in Reddit and customer support forums, basically everywhere on the web.
“You’d be surprised. In some of these niche hardware forums people write novel-like descriptions of their impressions of the product,” Tung told Fortune.
To hear what Elon Musk and Steven Hawking think about AI, check out this video.
By structuring all that wildly unstructured data, Diffbot makes it searchable and thus useful.
Small companies can get started for free. Big companies pay based on the volume of data they need to access.
Subscribe to Data Sheet, Fortune’s daily newsletter on the business of technology
The new funding round was led by Chinese retail giant Tencent and Felicis Ventures with contributions from Amplify Partners, Valor Capital, and Bill Lee, an early investor in Tesla and SpaceX.
Note: This story was updated at 1:47 p.m. EST with a more complete list of the latest investors.