hacker news Hacker News
  1. new
  2. show
  3. ask
  4. jobs

Ask HN: Is Common Crawl used exhaustively by any search engine?

8 points

by n1xis10t

17 hours ago

2 comments

story

The Common Crawl has about 300 billion pages in it, and if you downloaded all of it in extracted text format it would only take up about 816 TB compressed. If someone were to make a search engine with this I think it would be more comprehensive than Bing, and possibly pretty similar to Google. The only CC based search engines that I know of use a tiny fraction of what they have available. Do you know of any that use the whole thing?

loading...