Common Crawl, and subsequent spin off projects, is an organization I believe librarians should be following closely. It would be great for library and information service professionals to be involved with some of these projects. I could also see the government and educational institutions providing funding for research proposals analyzing some of the data.
“A nonprofit called Common Crawl is now using its own web crawler and making a giant copy of the web that it makes accessible to anyone. The organization offers up more than 5 billion web pages, available for free so that researchers and entrepreneurs can try things otherwise possible only for those with access to resources on the scale of Google’s.”
via Mashable | Free Database of the Entire Web May Spawn the Next Google.