Archive for category Search

Umbria – Market Intelligence from Blogs

FORTUNE has an article (”Blogging for Dollars”) that covers Umbria, a company based here in Colorado that tracks what bloggers are saying about its clients (aka mining blogs for market intelligence).

Economically, this market is finally starting to take shape — the ideas and attempts have been out there for a few years, but consumer companies have been on the fence about whether the blogosphere is worth listening in on. Until recently, that is. Umbria claims they’ll have $2M revenue this year and will be profitable next year, but the overall market for this kind of service is still only $20M according to the article (Intelliseek has about 1/3rd of that market).

Technologically, Umbria also sounds pretty interesting. They claim to have a competitive edge in automating most of the process:

Umbria’s solution is entirely software-based. [Umbria's] competitors also meet with clients to interpret the data and suggest strategic responses. “Ultimately we rely on both technology and humans for analysis,” says Max Kalehoff, marketing director for BuzzMetrics [another Umbria competitor]. “Umbria takes an extremely automated approach.”

Umbria’s technology sounds like a pipeline of parsers that generates features that in turn drive product and sentiment classifiers (and those drive reporting):

Every few hours Umbria sends an application called a spider out over the web to scour the blogosphere for postings about the firm’s clients, most of which are big consumer companies, such as Electronic Arts, SAP, and Sprint. By analyzing keywords in blogs, Umbria can classify each citation thematically. In the case of Sprint, for example, Umbria’s software can tell whether a blogger is talking about customer service, the company’s advertisements, or a particular calling plan.

Another big challenge is to decipher what’s on a blogger’s mind. To figure out whether an opinion is strong or tepid, for example, it helps to know that “awesome” is a stronger endorsement than “pretty cool,” and that “shoddy” is less damning than “abominable.” Umbria has several employees with Ph.D.s in linguistics and artificial intelligence who are forever tweaking the software to make it better at categorizing opinions.

I can’t help thinking that more manual tweaking goes into each client’s setup than this description lets on, but still, I’m glad they’re seeing success, and I bet those linguists are having fun with the blogosphere, even if they have to do a bit of slumming to come up with their rules:

The software can also estimate the author’s age and gender. Elongated spellings (”soooooooo”), multiple exclamation marks (!!!), and acronyms such as POS (”parent over shoulder”) suggest a teenage female member of Generation Y (born after 1979). The blogger is probably a teenage boy if a posting is rife with hip-hop terminology such as “aight” (translation: “all right”) and “true dat” (”I agree!”).

There you have it, you don’t even have to know the language to have your voice heard by the people who want to sell you more stuff. Now that’s power. On one side of that function, at least.

No Comments

Yahoo == IR talent magnet | The tip of the iceberg

Article in NY Times today, Yahoo is wooing I.B.M. Technical Talent:

Yahoo plans to announce Thursday that it is recruiting scientists who pioneered an advanced search-engine technology at I.B.M.’s Silicon Valley research laboratory.

Prabhakar Raghavan, a computer scientist who once led the Clever effort, joined Yahoo last week as head of research. He left I.B.M. in 2000 to become a vice president and chief scientist at Verity Inc., a maker of search and retrieval software for corporations; he was later named chief technical officer.

Yahoo offers one of the best opportunities to explore new ideas in search, Mr. Raghavan said

One area that will be pursued is new search technologies related to digital media.

It’s been fun to watch Google being forced from the position of category killer to more-or-less evenly matched contestant over the last year or two. There’s a mind-boggling amount of innovation happening in search, which is levelling the playing field for new entrants, but even the stuff we’re seeing now is only the beginning. Search, and other modes of information retrieval, will become even more ubiquitous and integrated than they are now, and we’ll wonder how an OS like Windows without integrated search ever came to dominate a market. The desktop market itself may go away (yes, I’ve been reading Paul Graham’s book Hackers and Painters, which contains this great essay on server-based software from 2001, which is still relevant and engaging, as are his many other essays).

Search is poised to become the great collective memory, and new research being brought to market in real services, along with the availability of public APIs, will speed progress toward that reality. But it won’t be just the extent of information covered by search that will grow, but also interconnectivity of seach services and, most importantly, new modes of retrieving information (the only mode now in widespread use is keyword search, which is as old computer science itself — or much older, if you count manual versions such as file cabinets and card catalogs and other manually compiled indexes). I don’t see any reason why search shouldn’t aim to duplicate in software all of the modes in which humans retrieve information in their own brains (by context, by association and so on) or from others, by interactive question answering or guided discovery.

No Comments

Yahoo! briefly launches … Feedsterati?

Steve Rubel and Niall Kennedy are reporting on a Yahoo RSS search service which was briefly public this morning. Seems to combine feed search (not just blogs, apparently, but other feed content, too, like Feedster) and several ranking options (date, relevance, and popularity). I’m curious about the popularity ranking, but I’d guess the initial version will resemble a Technorati-like tally of incoming-links.

Greg Linden wonders whether the small blog/feed search engines will survive the entry of the giants into the field:

… it is good for a startup to see the entry of a big company into its area since it attracts attention and legitimizes the field … but competing directly against these giants is scary if you have no differentiator.

While the small players have driven innovation and broad acceptance of concepts like link popularity and tagging, they continue to struggle with scalability. Also, the most compelling products to come out of the blog search startups, while they’ve been exciting and even revolutionary from a user’s point of view, have not been technologically deep in the sense of difficult to duplicate by the search giants. There have been exceptions, of course, but no really deep technology is in evidence among those services that have made the biggest splashes (technorati, bloglines, flickr, del.icio.us).

So, when a search giant comes in with equal-or-better features, scalability, and a huge engineering team that can relatively quickly merge ideas emerging from the programming part of the blogosphere into the vast search toolkit that the giants already have, that might just cast a bit of a cloud over the little guys.

Having said that, I believe there will continue to be a place for the little guys in the blog search ecosystem. They’re the real innovators and they have their ears to the ground. And even at the break-neck speed at which Yahoo and Google have been rolling out features lately, an army of little guys can still cover a lot more ground than the two giants in the search for the next cool thing that will make users’ lives (even) better.

No Comments

See blogs near you on Google Earth with Blogdigger Local

Greg Gershman has built a cool application of Google Earth. You can jump from Blogdigger Local search results to Google Earth and see markers for all of the blogs in your geo neighborhood. The result looks something like this:

(That’s Greg’s image. Don’t have Google Earth running here, waiting for the OS X version. Impatiently.) Blogdigger seems to have found its niche with Blogdigger Local, and it’s a good one.

No Comments

SearchEngineWatch joins the link counting fray

Danny Sullivan is skeptical about the accuracy of Google’s and Yahoo’s results counts, used by Tristan Louis in two studies, which concluded that Yahoo has better coverage of blogs than Google, which in turn has better coverage than Technorati. Danny posted an email conversation with Tristan about his study. It’s a little hard to follow the lines of argument, but it’s well worth reading because it illuminates the difficulties in getting a handle on index size, and especially blog coverage, by the search giants.

Danny, from his exchange with Tristan:

Also, Google did say “of about” with the numbers it reports. That’s not an accident. They’re saying that this is an estimate. But no disagreement with me. If you put up a count, it would be nice if the count was as accurate as possible. Google’s have come under question.

Hmm. From what I’ve seen in Tristan’s data and my own testing, it’s Yahoo’s counts that ought to come under question, specifically for link: queries.

Danny to Tristan again:

The link: command is completely different than the site: command. The link command tells you nothing about the size of the index. As for a confirmation that all links aren’t reported, this past blog post from SEW gives you confirmation and this page on Google mentions links are only a sampling of what Google knows although this other Google page fails to make this clear.

link: and site: are very different, that’s true enough. And maybe the link command doesn’t tell you much about the size of an index, but if link collection methods are similar between Yahoo and Google (and why wouldn’t they be, it’s a relatively easy part of the whole game), then the counts ought to be similar. But they’re not, not by a long shot.

By the way, a big thanks to Tristan for posting his studies and kicking off this discussion. Most of us don’t take the time to do analysis of that depth to support our opinions, and to post the entire method and dataset so others can reproduce it, shoot holes in it, go off on tangents from it.

(I stumbled onto Danny’s post via John Battelle)

No Comments

What’s up with Yahoo’s link count estimates?

Dave Sifry is chiming in on some analysis done by Tristan Louis about how well Google, Yahoo and Technorati are covering the blogosphere. Briefly, here’s what Tristan did: He ran link: queries on Google, Yahoo and Technorati for the blogs in the Technorati Top 100 and recorded the number of results reported by each search engine. For example, taking BoingBoing, the 1st blog on that list:
Read the rest of this entry »

No Comments

Fallows on getting answers

Great column on the state of search by James Fallows in today’s New York Times (online version here), entitled “Enough Keyword Searches, Just Answer My Question”. Fallows doesn’t mince words.
Read the rest of this entry »

No Comments

The nature of blog search

The new Technorati is beautiful. The UI is beautifully conceived and lavishly rendered, and completes the integration of tags and photos with search that Technorati has been working on for some time. It strikes me as the first of its generation of blog search engines that has fully grown up to be what it wants to be, and the UI implementation is head and shoulders above its peers. And yet, when you use it, you have the feeling of opening the door to an overstuffed closet. There’s a lot of stuff that comes tumbling at you.

The presentation reflects some real qualities of the blogosphere: In aggregate, the blogosphere is noisy, diverse, urgent, in-your-face, gah! Technorati gets across the busy-ness of the blogosphere of the last few hours, where bloggers continuously decant their paragraphs and photographs into the teeming “world live web”, as Technorati used to call it. Is this the best way to do blog search? Should blog search be a megaphone or an earphone? Should it be an amplifier, a repeater, a filter, or a tuner? Some of each? Something else entirely? A purple frog?

No Comments

Blog Search news roundup

Peter Caputa has a concise roundup of blog search news from this week.

No Comments

Google’s “Secret Lab”? Ho-hum.

Henk van Ess makes a dramatic show of scooping a story about a Google “Secret Lab”, which consists of an army of students worldwide that rate Google search results and new features using an eval UI. Ho-hum. I’m not steeped enough in Googlemania to know whether this is some kind of scandal, e.g. whether Google has claimed that it doesn’t use human raters or whatever. Every search company needs something like this, appropriately scaled for its content and audience, of course.

This flash movie shows some screenshots of what’s purported to be the Google eval UI. More or less what you’d expect, but not as nice as some others I’ve seen …

No Comments