ADVERTISERS


« Microsoft designs on Yahoo open source: the sequel | Main | Can Sun make MySQL pay? »
Wednesday
Feb062008

A Microsoft secret plan for Yahoo's open source?

By Jeff Gould (Peerstone Research / Interop News)
One of the most surprising things about Microsoft's bid for Yahoo is that if successful it will make Microsoft one of the two or three largest users of open source software in the world. Google is certainly the largest. The National Security Agency may or may not be second (only the spooks know for sure), but if it is then by my estimate that would make Yahoo the third.

As Microsoft execs have been quick to point out, the combined entity will have two of everything, that is to say, two home pages, two e-mail services, two instant messaging services, two search engines, two ad platforms, and so forth. There is an obvious business argument for unifying some of these properties sooner rather than later, for example the ad platforms. These all-important revenue drivers are the tools advertisers use to manage the ads they buy on these sites and the associated keywords that drive customers to them. Both Yahoo and Microsoft have recently launched major new upgrades in this area – Panama for Yahoo, and AdCenter for Microsoft – but continue to trail Google's pioneering AdWords platform in performance and market share. Panama is built on Yahoo's open source stack, which includes MySQL, PHP and FreeBSD among other goodies, while AdCenter is an all-Microsoft affair.

Since the ad platforms must surely be combined at some point in order to offer advertisers easy access to the combined audience of the two sites, the question immediately arises: how will this combination happen? I suspect the smart folks in Redmond and Sunnyvale will find there is more than one way to skin this particular cat. Some bloggers, like Duncan Riley at TechCrunch, have concluded that Microsoft will dump both the Yahoo search engine and the Panama ad platform in favor of their Microsoft equivalents. I'm not so sure. I find it hard to believe that Microsoft would fork over $45 billion just to turn around and clean house. If that were the case, they'd be better off spending the money on buying third-party traffic for their existing sites. And while Yahoo is certainly a fabulous name – its home page is still the most visited page on the web, with 2 billion visits per month in the U.S. alone – I don't think the brand alone is worth the gross national product of Angola, which is what Microsoft is offering.

Over and above the brand, there must be something about Yahoo's technology that Microsoft thinks is extraordinarily valuable. Could this be Microsoft's "If you can't beat 'em, join 'em" moment with respect to open source? Cynics will argue: "Oh but it's the engineers Ballmer wants, not the code." But seriously folks, how likely is that Microsoft will be able to separate the two? Imagine a bunch of ex-Yahoo coders showing up for their first day of work at Microsoft. What are they going to say? "Hey Steve, whistle us up some Visual Studio and a little .Net, and we'll get right at that new search optimization algorithm you asked for." I don't think so. The people up north know a thing or two about software talent management, and if they're offering big bucks to bring the Yahoo engineers on board, then they're sure as heck not going to drive them away right from the get-go by trying to force an alien technology on them.

If Microsoft thinks some of the intellectual property Yahoo has built with open source tools is worth a pot of gold, what does that technology consist of exactly? In last month's conference call Yahoo President Susan Decker described some of it as follows:

" Although we haven’t elaborated on this much publicly, on the back end we have made a major investment in open source development of grid computing which provides a substantially greater scalability at fast iteration on core technologies. This is already dramatically impacting our competitiveness in algorithmic search and advertising. For example, in some cases we have an order of magnitude 10x improvement in indexing speed. This has been a multi-year project and we’re on track to have our future Search and advertising systems built on the new infrastructure, positioning us well for acceleration in iteration and experiments that are likely to lead to significant future product enhancements."

Translation: Yahoo – having finally remembered that it was located in Silicon Valley rather than Hollywood – has spent the last year or two desperately trying to catch up with Google by pumping hundreds of millions of dollars into an ambitious new architecture for search and ad placement based on massively parallel compute farms and open source clusterware. Might the results of this effort be one of the reasons Microsoft is so eager to acquire Yahoo? Only the inner circle at Redmond knows for sure. Be that as it may, one of the most interesting pieces of this architecture is something called Hadoop, which is an Apache-sponsored open source implementation of Google's famous MapReduce and the Google File System. Hadoop creator Doug Cutting (who named the software for his child's stuffed elephant) was also the lead developer of the Lucene and Nutch open source search engine tools. He now works for Yahoo, as do most of the active contributors to the project.

So what exactly does MapReduce (or Hadoop) do? Basically it's a cluster operating system for executing simple but massively parallel computations (such as indexing) on massive data sets (such as the results of a web crawl). MapReduce automatically parcels out parallel computations and data to thousands of cheap and possibly unreliable computers, and hides the complexity of doing this from the programmers who write the algorithms that do the actual desired work (e.g. indexing web pages).

Interestingly enough, Hadoop – unlike MapReduce, as far as I know – uses Java (think of the other Apache Java projects such as the celebrated Tomcat JSP engine). For an interesting commentary on Hadoop's strategic importance to Yahoo, check out Tim O'Reilly's blog post from last August. Along the same lines, check out the jobs ads Yahoo is currently running for grid software engineers experienced in Java as well as the more traditional C++, Perl and Python.

To be sure, this Hadoop stuff doesn't exactly sound like .Net. But I bet it could run on a Microsoft operating system just fine. In fact, I'm willing to bet it already does. After all, all the other important kinds of open source middleware do. My point is simply this. If Yahoo has just plowed a fairly large fortune into rebuilding its core platforms using advanced new clusterware that just so happens to duplicate in open source the proprietary heart of Google, then maybe this is something that Mr. Ballmer has noticed and has taken a certain interest in. He may even have judged it to be of sufficient strategic value to Microsoft to justify a little premium on the price he has offered for the company that developed it.

What an irony it will be if Microsoft ends up using Yahoo’s open source software platform as a stick to beat Google. Only time will tell. But with billions of dollars riding on the outcome, it could be that Microsoft is willy-nilly about to become the world's largest lab experiment in practical software interoperability.

Reader Comments (8)

Hmm, it's always possible, but I doubt it for a couple of reasons:
1. Microsoft at least claims that their ad system is amazing, and it's just not being used.
2. I can't imagine MS risking their investment in Windows server by practically admitting that open source works better for them (or at least makes it easier to get a big, usable prototype running).
3. The theory I've read that makes the most sense is that it's all about the subscribers. MS likely wants the eyes and the content more than the actual code.

It's unlikely they'd immediately dump all of the Yahoo code, but I would guess they would work their ass off to port everything or convert users to the new system.

February 6, 2008 | Unregistered CommenterAndrew

I agree with andrew in that its the subscribers and the yahoo name microsoft are after. I would'nt be suprised to see some of that technology rebranded to try and take microsoft products into new areas (Windows gridServer2010 anyone?)

February 7, 2008 | Unregistered CommenterBen

Don't forget that open source projects can be relicensed by their copyright holders. If yahoo has the sole copyright on some of these projects Microsoft would be able to put out new versions as proprietary software or something 'less' open source such as under their shared source licenses...

February 7, 2008 | Unregistered CommenterJust me

What does this mean for Zimbra (which Yahoo acquired not long ago).

February 7, 2008 | Unregistered CommenterAnonymous

Hmmm... I find this post interesting... I agree and I wrote about it at the beginning of this week.

http://www.devspace.com/doku.php?id=research:microsoft_yahoo_deal

February 7, 2008 | Unregistered CommenterChristian Gross

I think you're way off the mark. Of course they're going to mention keeping the engineering staff - they are what make tech companies what they are. And yahoo aren't a MS shop are they? They wont all be that happy with this kinda deal.

And ... Microsoft HATE 'open source'. They want nothing to do with it - anything they say is either straight out lies or marketing fud. At most they might create extra fud about patents or other 'ip' issues out there in free-software land, but I think that's pushing it.

I'm sure the deal is only about beating Google. As you say in the opening - they are nothing if not persistent. But this time they can't just break the law to get what they want.

to "Just me": Well according to the article much of the stuff is BSD licensed. They can just appropriate that however they want even if there are multiple copyright holders.

February 8, 2008 | Unregistered CommenterMichael

Michael, you are only half right. Microsoft don't hate 'open source' as a whole, what they hate is the GPL that prevents them from embracing, extending and extinguishing.

That's why they have been submitting their own licenses to OSI for approval, to muddy the water so they can claim open source involvement without having to actually open anything.

Yahoo's open source projects do not use the GPL. If M$ can retain control of the code (and copyrights) while crowing about their 'open source' credentials, then they will have furthered their goal of diluting the very definition of 'open' to work more in their favor (much like their redefining of 'open' and 'standard' in the OOXML circus).

February 8, 2008 | Unregistered Commentergonzo

Jeff has posted a follow-up article. See the sequel.

February 11, 2008 | Registered CommenterInterop Systems

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>