This Grad Student Hacked Semantic Search To Be Better Than Google | Co.Labs


Full Post

Google may be the dominant search engine, but it’s far from ideal. One major problem: How do you search for things you don’t know exist?

Using Google’s own experimental algorithms, a graduate student may have build a solution: a search engine that allows you to add and subtract search terms for far more intuitive results.

The new search engine, ThisPlusThat.Me, similarly looks for context clues among the terms. For instance: Entering the arithmetic search “Paris – France + Italy” gives the top result as “Rome,” but if I search the same thing in Google, I’ll get directions between Paris and Italy, restaurants in France and Italy, and a depressing Yahoo Answers of whether Italy is in Paris (or vice versa). “Rome,” on the other hand, is an association you, a human, would make (I wantThis, without That but including Those)–and the engine makes that decision based on each answer’s semantic value compared to your search.

Until now, search has been stuck in a paradigm of literal matching, unable to break into conceptual associations and guessing what you mean when you search. There’s a reason Amazon and Netflix have scored points for their item suggestions: They’re thinking how you think.

The engine, created by Astrophysics PhD candidate Christopher Moody, uses Google’s own open-source word2vec algorithm research to take the terms you searched for and ranks the query results by relevance, just like a normal search–except the rankings are based on “vector distances” that have a lot more human sense. So in the above example, other results could have been, say, Napoleon or wine–both have ties with the above search terms, but within the context of City – Country + Other Country, Rome is the vector that has the closest “distance.”

All the word2vec algorithm needs is an appropriate corpus of data to build its word relations on: Moody used Wikipedia’s corpus as a vocabulary and relational base–an obvious advantage in size, but it also had the added benefit of “canonicalizing” terms (is it Paris the city, or Paris from the Trojan War? In Wikipedia, the first is “Paris” and the second “Paris_(mythology).” But millions of search-and-replaces in Wiki’s 42 GB of text was intensive, so Moody used Hadoop’s Map functions to fan those search-and-replaces to several nodes.

A search query then spits out an 8 GB table of vectors with varying distances; Moody tried out a few data search systems before settling on Google’s Numexpr to find the term with the closest vector distance.

via This Grad Student Hacked Semantic Search To Be Better Than Google | Co.Labs | code + community.

8 Years Later, Google’s Book Scanning Crusade Ruled ‘Fair Use’ | Wired.com


Full Post

Eight years after a group of authors and publishers sued Google for scanning more than 20 million library books without the permission of rights holders, a federal judge has ruled that the web giant’s sweeping book project stayed within the bounds of U.S. copyright law.

On Thursday morning, U.S. Circuit Judge Denny Chin dismissed a lawsuit from the Author Guild, ruling that Google’s book scans constituted fair use under the law. Though Google scanned those 20 million books in full and built a web service, Google Books, that lets anyone search the digital texts, users can only view “snippets” of a book if the right holder hasn’t given approval.

“In my view, Google Books provides significant public benefits,” the ruling reads. “It advances the progress of the arts and sciences, while maintaining respectful consideration for the rights of authors and other creative individuals, and without adversely impacting the rights of copyright holders.”

In a statement sent to WIRED, a Google spokesperson said the company was “absolutely delighted” with the ruling. “As we have long said, Google Books is in compliance with copyright law and acts like a card catalog for the digital age giving users the ability to find books to buy or borrow.”

Michael Boni, a partner with Boni & Zack, the law firm representing the Authors Guild, did not immediately respond to a phone message seeking comment. Nor did the Author’s Guild. But the Guild has told other news outlets it will appeal the decision.

“We disagree with and are disappointed by the court’s decision today. This case presents a fundamental challenge to copyright that merits review by a higher court,” reads a statement sent to GigaOm. “Google made unauthorized digital editions of nearly all of the world’s valuable copyright-protected literature and profits from displaying those works. In our view, such mass digitization and exploitation far exceeds the bounds of the fair use defense.”

The ruling comes two years after Judge Chin rejected a $125 million settlement between Google, the Author’s Guild, and the Association of American Publishers, which was also part of the original lawsuit against the web giant. After complaints over the settlement from outside organizations such as the Internet Archive and Google rivals such as Microsoft, Chin ruled the deal would give Google a de facto monopoly over so-called “orphan books,” scanned texts whose rights holders had not come forward to claim their share of the revenues Google would make from its book scanning endeavor.

A year after this ruling, the publishers agreed to another settlement with Google, and this one was not subject to approval from the court. But Chin allowed the case to continue as a class action, but an appeals court reversed this decision and told Chin to rule on the copyright issue.

Though Google limits how much book text you can view online — and though it doesn’t display ads on pages describing books it does not have rights to, the company, as the court explained, can still use its service to draw people to its websites and make money in other ways. But this commercial gain doesn’t necessarily mean copy infringement. Google Books, the judge ruled, doesn’t “negatively impact the market for books.”

On the contrary, Chin said, Google Books feeds the market for books. “A reasonable factfinder could only find that Google Books enhances the sales of books to the benefit of copyright holders,” the ruling reads. “Google Books provides a way for authors’ works to become noticed, much like traditional in-store book displays.”

via 8 Years Later, Google’s Book Scanning Crusade Ruled ‘Fair Use’ | Wired Business | Wired.com.

Classrooms Go High-Tech With Google Play for Education | PCMag.com


Google is making it more convenient for schools around the country to integrate tablets and educational apps into the classroom.

The search giant on Wednesday officially launched tablets running Google Play for Education, a version of the Google Play app store specifically designed for K-12 schools in the U.S.

Read: Classrooms Go High-Tech With Google Play for Education | News & Opinion | PCMag.com.

News: Education & Technology, Librarianship


Education & Technology

With MakerBot Academy, the 3-D Printing Movement Aims for Schools | AllThingsD
The company announced on Tuesday an initiative to begin seeding its Replicator 3-D printing machines inside of K-12 schools across the U.S. The effort comes in partnership with DonorsChoose.org, a site that allows public school teachers to make online requests for classroom projects, which are then backed by a Kickstarter-like funding drive.

Twitter goes for the masses with new storytelling feature | CNET
Twitter excels in capturing the “moment” as events happen, but it isn’t great at telling a story. With custom timelines, the company hopes to lure a broader audience by giving it coherent narratives rather than just the raw materials.

Librarianship

How Iran Uses Wikipedia To Censor The Internet | BuzzFeed
A new study from the University of Pennsylvania’s Annenberg School claims that Wikipedia might hold the key to understanding how Iran censors, and controls, the internet. The answer, in four words: with a heavy hand.

News: Education & Technology, Librarianship


Education & Technology

The LA Times Trolls Innocent Teachers | TechCrunch
The once-respectable LA Times is leveraging its dwindling platform to attack individual teachers under the guise of data transparency. The editorial board won a court case allowing them to use a highly contentious, self-designed algorithm to rank the best and worst teachers in the Los Angeles Unified School District. Neither the suicide of one of the shamed teachers, nor the widespread criticism of the statistical methods have aroused the editorial board’s better judgment.

Google Earth Tour Builder lets you tell stories through maps | Engadget
Google has used Earth and Maps to tell tales of unfolding tragedies and soldiers fighting for our country. Now its opening up those tools to the public, allowing users to build what they’re calling “Tours” through Google Earth. Tour Builder was released in honor of Veterans Day and it allows users to create narratives tied to points on a map. More Google news: Google Quick Actions Let Users Act on Emails Without Opening Them | MashableYour Face and Name Will Appear in Google Ads Starting Today | Gizmodo and Apple maps: how Google lost when everyone thought it had won | theguardian

Librarianship

Super Searcher Tips | Mary Ellen Bates


Google ordered to remove Max Mosley orgy pictures | theguardian.com


Decision in French court comes after former head of Formula One said that showing images breaches his privacy.

The important consideration in this story is the following snip:

The decision is a setback to Google as it tries to defend a global stance that the search engine is merely a platform that delivers links to content and it should not be responsible for policing them.

Although Google can delete images on its website, it cannot prevent others reposting them, resulting in a constant game of catch-up.

In a statement, Google said the court’s request would require it to build a new software filter to continuously catch new versions of the posted images and remove them.

“This is a troubling ruling with serious consequences for free expression and we will appeal it,” said Google’s associate general counsel Daphne Keller in a statement.

via Google ordered to remove Max Mosley orgy pictures | Technology | theguardian.com.

Google Maps becoming more context-aware and ’emotional’ | CNET


Snip

SAN FRANCISCO — For Google, the map of the future is taking everything it knows about you and the world and plotting it in real-time as you move through your life.

“We can build a whole new map for every context and every person,” said Bernhard Seefeld, product management director for Google Maps, speaking at the GigaOm Roadmap 2013 conference. “It’s a specific map nobody has seen before, and it’s just there for that moment to visualize the data.”

Like the early days of map making that told stories of discovery and created more of an emotional connection with the unfolding world, Google wants to build what Seefeld called “emotional maps that reflect our real life connections and peek into the future and possibly travel there.”

Google’s context-aware maps will require refining and extending the underlying map data, and combining it with the kind of personal data from applications that powers Google Now, the company’s personal digital assistant technology.

Read more: Google Maps becoming more context-aware and ’emotional’ | Internet & Media – CNET News.

Expanding your site to more languages | Google Webmaster Help | YouTube


The Battle of Facebook and Google+ [Infographic] | Social Annex

Image


The Battle of Facebook and Google+