I'm not sold on WinFS.

First off, let me say that I am very impressed with the work the WinFS team has done and that they've already won a great deal of respect from me.  That said, I'm not 100% sure I see the need for their product.  This opinion is certainly not set in stone, but from what I've seen and what I understand so far... I'm not sold.

Why is that?  Well maybe it's because I watch videos like this one and I think... why is that different from what we already do?  You can tell that these guys are super fired up about the fact that they've matched parity with the basic Windows filesystem.  And I'm sure it wasn't easy, and as a developer I was impressed.  But as a user I thought, “Great.  You have parity with the current system.  Now what are you offering me that makes your effort worth while?”

And that's where WinFS runs into trouble, in my mind.  I see it as a great technological feat, but one with few or no rewards.  I think Everything I saw so far from a user perspective can be done today, without WinFS.  Some of it hasn't been done... but it could be.  I sat here watching the video with some other members of my team and several times one of us would say “Looks like our stuff,”  or “Yeah, we could do that.” 

The way I see it, the fundamental difference between WinFS and the Windows Desktop Search “platform” is that they embed the actual file data inside the database.  Whereas, we have a database with all of the item's data, and then we have a reference or URL to the item (be it a file, an Outlook e-mail or contact, something from Thunderbird, etc). 

The other differences are just implementation details.  They implement their types as CLR-types.  That's pretty cool, but I don't see why we couldn't do that in theory

But as a user AND as a developer, I don't really care where the actual data is stored.  I don't care if it's in a file under some folder.  I don't care if it's a MAPI item stored in a PST or on a server.  All I care about is that I can query the system for this data regardless of where or what it is, and sort, group, view, or manipulate those results however I choose.

So in the video, they demonstrate an app that displays all your “stuff” on a timeline, and divides it into bands based on tags or properties.  You could do that today using our WDS APIs.  They show creating “relationships“ between a group of photos.  We do that today in Vista beta 1 with tagging.

They show an X1-like app that lets them filter results.  So they do a search for all the meetings he's had in the last 30 days.  We can do that today.  Then he gets a list of all the attendees from in that list (meetings from the last 30 days).  We can do that (even if it's not completely exposed in our UI so far) - and Vista does it all over the place.  Then he saves the query.  We can do that today.

Now, the next thing he does is cool, and it's my favorite part of the demo.  He starts a new query for emails, and filters the From field by the results of that original query (people he's met with in the last 30 days).  So now he has a list of emails from people he's met with in the last 30 days, using the saved query from before as a source for the From filter.  Sweet.

But again, I think this is something we could do without WinFS.  In fact, I've already started some discussions around here about different ways that we could accomplish these same goals.

The key is that WinFS and our index are both databases.  And we have access to the same metadata that they do.  The only difference I see is that when a program asks us for the result file or item, we give them a pointer to it, instead of actually returning the item directly.  That... and we can index things that aren't on the local hard drive. 

We can index objects without having to consume them entirely. 

We can go out to the object and say “Give me all the useful information you can about yourself” - and then remember it.  If something changes, it can tell us.  Why would I care if the object actually lives in the database.  Why not let Outlook store data where it wants... in PSTs or on an Exchange server.  With something like WinFS... you're trying to get Outlook to use the WinFS way of storing items.  And if I want to index e-mail that's on a server, my guess is that the server has to have its own WinFS implementation for it to federate queries to.  But what if my server doesn't run WinFS?  Can I plug in a protocol handler and have WinFS pull just the metadata to my local machine?  What if my e-mail server runs Unix and can't run WinFS?

For all I know, they have good answers to those questions

I just haven't heard them addressed yet.  And it's not like they're pressed for time... It sounds like they have plenty of time left to convince me that my all my team's work is going to be obsolete =)


Feedback

# re: I'm not sold on WinFS.

I think you're missing the big point about winFS - it's not just about the searching. From a user's perspective, it quite probably is - you've got your files somewhere, you simply search for it (with relationships) and you've got your files. Not much different from MSN search.

The big difference is the storage platform. Let's look at contact information. If we use MSN search, I can search for all items classified as contacts that match "Matt Ellis" somehow. The returned results could be Outlook contacts, MSN messenger contacts, Jabber, ICQ, gmail, etc. And there might be a Matt Ellis contact for each of these programs. MSN search has no way of knowing whether the Outlook Matt Ellis is the same as the MSN Messenger Matt Ellis. Also, the MSN search will return back a URL/reference to the actual contact item. If it's a file, then it'll be a url, which a program could at least navigate to. But that program won't know what format the file is. If it's not a url (like Outlook email is surfaced in MSN search right now) then a program calling into MSN search wouldn't have a clue how to navigate to the item to display or parse it.

Contrast this with WinFS. WinFS will have a "Contact" schema which will be an open format that all programs can consume and understand. Let's assume that all the email and IM programs store their contacts in WinFS. Now we've only got one Matt Ellis contact, and any program can access this data. This is great from the user's perspective, and brilliant from an integration program's perspective - I can integrate with MSN Messenger, Outlook and so on.

And since you can define your own schema, I can now use WinFS to be my data store. If I'm writing an RSS reader (as just about everyone should/does these days) I don't have to worry about the format of my data store (xml files? sql server database? proprietary binary file format?) and just create a bunch of schema-tised items in a WinFS store. And now that data's fully searchable to boot.

I don't know how they expect to solve the problem of storing and syncing stuff with servers and non-winFS stores. That's a very interesting problem...

MSN desktop search is a fantastic program - I use it daily. But WinFS is much more than just searching. It's a unifying data store for all of your data. I think it's the best part of the Longhorn pillars as was, and thought it a crying shame that it got ditched from Vista. I'm very glad to see that it's back on track!

Cheers
Matt
9/1/2005 12:41 AM | Matt Ellis

# re: I'm not sold on WinFS.

But that's my point... Windows Desktop Search (it's not called MSN DS!) is not just about searching either!

Your contacts example, while it sounds great in theory, is a bit unrealistic. Why? Because we could do that today - but no one does.

The problem is that you'd need to convince every program with the concept of Contacts to use the WinFS definition of a Contact. That means rewriting MSN Messenger, Outlook, OE, AOL/IM, Trillian, Communicator, Hotmail, Gmail, Thunderbird, Eudora, etc...

But they could do that today. They could store all their contact data in a commmon file location. OR they could store their contact data in a common database. Either way, we can index it.

As for programs knowing what to do with a Contact object... EVERY item in Windows Desktop Search has a URL. Including Outlook items. Any program calling into WDS can ask for the MAPI URL of an e-mail item, and launch it (or any perform any verb that the MAPI Protocol Handler implements properly). So I can get a URL for any item, regardless of whether it's a file, contact, message, whatever... and I can query Windows to say, "What can I do with this object?"

The main issue I see is data formats. Just because you have this wonderful database with its schemas... doesn't mean that everyone is going to agree on file formats over night. For example, I believe WinFS has a Music type. But the data stored can be WMA, MP3, AAC, WAV, Ogg, FLAC, etc.

Just because it's stored in WinFS doesn't mean it's going to suddenly be magically interpreted by apps everywhere. 9/1/2005 9:15 AM | Brandon Paddock

# WinFS: From a Windows Desktop Search developer at Microsoft: I'm not sold on WinFS

Checkout: http://geekswithblogs.net/bpaddock/archive/2005/08/29/51550.aspx
  9/2/2005 10:31 AM | Michael Herman (Parallelspace)

# re: I'm not sold on WinFS.

Isn't it so, that with WinFS, the data to search is never out of sync with the index, but with the Vista search, you can have an index which is out of sync with the actual data? Because in WinFS, the index IS the data, and in the Vista search, you have to wait till the item is re-indexed? 9/2/2005 12:04 PM | Frans Bouma

# re: I'm not sold on WinFS.

That depends. On XP, by default we have "Back-off" logic that will queue up items to be indexed until your computer is idle.

However, on faster machines you can disable our back-off policies and have the index always be up-to-date. That's because the filesystem gives us notifications anytime a file is added/edited/deleted/etc. The same happens with protocol handlers, which have the ability to tell us when items have been updated. 9/2/2005 12:06 PM | Brandon Paddock

# re: I'm not sold on WinFS.

Okay I hear what you are saying...

However as far as I can see the only way you can have access to 'the same meta data' as WinFS is if you have a schema and schema extensibility model too, in which case you are building something very similar to WinFS.

Sure using WinFS I can find all documents where 'Brandom Paddock' is the author, and I could using my favourite desktop search tool too. Why? Because 'Brandon Paddock' is a string in the Author property of the document, so the index can nab it.

But can you find all documents by people who work in your Office? That isn't mentioned in the document, but it is mentioned on all the Person entities who work in your office.

I fail to see how an index can do that without having a schema like WinFS to allow it understand relationships between data and as a result do a JOIN, but I would love you to enlighten me! ;)
9/6/2005 3:58 AM | Alex James

# re: I'm not sold on WinFS.


Ultimately you both build indexes and have a schema. You both want speed up similar scenarios. Yours works today and theirs promises to be more complete.

Ultimately it comes down to execution, and which target market you focus on. You're focussed on the end user, they're focussed on the developer. You say "what app wouldn't want to write their own PST and IFilter and store provider?" and they say "why not make it easier to build apps that store and share data".

In other words, I don't see the conundrum. You should focus on what users actually want. Optimize their queries, find their info wherever it lives. They should focus on an efficient store and good developer APIs. Between the two of you, try to cover all the managed/unmanaged scenarios, because from what I've seen so far you're both incomplete.
9/6/2005 9:50 AM | JD

# re: I'm not sold on WinFS.

Alex -

Regarding the "Find all documents by people who work in your Office" query...

I think that IS possible. Why? Because in either case (WinFS or WDS) you're assuming that the contact metadata includes something that says "This contact works in my office". This could be an "Office" tag, or it could mean that the Work Address for this contact is One Redmond Way.

Either way, both WinFS and WDS would know that information.

The difference is, where they have a simple JOIN statement... we would have two queries.

Query A is simply: "All my contacts at One Microsoft Way".

Query B is: "All documents where the Author field matches one of the unique values from the Name column from Query A"

Even better, our first query doesn't need to return all the results. It just needs to enumerate all of the unique values that exist in the Name column filtered by the Address field.

Again... a somewhat different means to the same end.

I touched on this in the original post. It's the section where I said I thought what they did was really cool... BUT I still think we can do it too. 9/6/2005 2:41 PM | Brandon Paddock

# re: I'm not sold on WinFS.

Brandon,

So essentially you are using IN instead of JOIN?

i.e.

SELECT * FROM DOCUMENTS WHERE AUTHOR IN (SELECT DISTINCT(NAME) FROM CONTACT WHERE ADDRESS = 'One Microsoft Way')

instead of

SELECT D.* FROM DOCUMENTS D JOIN CONTACT C ON D.AUTHOR = C.NAME
WHERE C.ADDRESS = 'One Microsoft Way'

Fine that is what I do in Base4 when a query spans multiple databases too.

Base4 is aiming to be a kind of mini WinFS, but I am aiming to support complete virtualization of data sources, so that a table (like Contact) doesn't need to be owned by Base4. It just needs to be registered. I don't from what I have seen think they are doing this in WinFS.

I get the feeling that is half way between what you are doing and what MS are doing? 9/7/2005 2:39 PM | Alex James

# re: I'm not sold on WinFS.

I agree with James. WinFS is all about the platform, it offers almost no user experience and expect your application to leverage it and its services (which windows search don't have like sync). WinFS is data storage platform with integrated services and Windows file system interface, not a simple search solution.

One can argue that he/she can use search to emulate fundamental relational operations, but what's the point? No one uses Internet + search to replace their SQL databases, the same is valid for WinFS vs. Windows Search. 9/7/2005 2:47 PM | Andrei

# re: I'm not sold on WinFS.

Well, remember I work at MS as well ;)

The way a developer could do this TODAY using WDS is to use two queries. So the first query is something like SELECT "FullName" FROM "MyIndex" WHERE CONTAINS("DocCompany", '"microsoft*"') AND (CONTAINS("System.PerceivedType", 'contact')

Or something like that.

Then you'd issue a second query like SELECT "DocTitle" FROM "MyIndex" WHERE CONTAINS("FullName",'contact names here')

So your UI would have to take the results, plug them into the new query like "Tom Jones" OR "John Smith" OR "Sam Someguy"

something along those lines.

That gets you the same results that WinFS does with its JOIN operation. But we don't care if the Contacts or the Email are in the WinFS store. They can be in Outlook, OE, Thunderbird, whatever... so long as someone writes a PH for the store. Same goes for the emails.

But what I described above is something a developer could do TODAY with our public APIs. Down the road, in *theory* it's *conceivable* that we could do something in the indexer or data layer to further facilitate this kind of query... if it were desired. 9/7/2005 2:58 PM | Brandon Paddock

# re: I'm not sold on WinFS.

Shooting back up to the 30,000 foot level ... At the end of the day, desktop search (or any search) and relational storage of unstructured data are attempted solutions to the same problem, that of exponential growth in unstructured data volume.

The difference is one (search) is a temporary fix which in the end is defeated by the fact that it doesn't solve the problem, it just makes it easier to work with in the short term.

Logically (business related) unstructured data volume should not grow any faster than structured data volume (or roughly the rate of business growth, 10% year on year). This is clearly not the case however, because at the moment, unstructured data is growing at exponential rates ... why? Mostly because of duplication caused by non-unified storage.

When you send an attachment in email, you copy it twice .. original on the file system, copy in sent items, copy in senders inbox (and that's if you only send it to one person). If they edit it and send it back ... voila three more copies (minimum).

A good searching application will find all these copies (go ahead try it with desktop search today, you will get at least the one in My Documents and the one in sent items). I suppose that is better than not finding it at all (which until recently was the option).

WinFS holds out the possibility of a solution by suggesting the possibility that we all (at least within the one organisation) will be working from the same (single) object. This (if delivered) would actually go towards solving the problem of exponential proliferation of unstructured data rather than just making it less painful for the time being.

Of course WinFS will have a searching capability with all the same capabilities as desktop search (and more) and of course you can do most of them today, but that after all is the point ... desktop search is an interim solution and at most a specification (for testing the best UI) for the ultimate solution to the problem, relational storage for unstructured data. 9/7/2005 3:37 PM | DG

# re: I'm not sold on WinFS.

That's an interesting take DG - and I think you're right. Once you assume that lots of people are using WinFS (such as an entire company), and you start leveraging it across the network and not just on a local machine... you have something pretty powerful.

So... it's not so much that I don't think WinFS would be great if everyone used it and it reached its full potential. But rather, I'm concerned that WinFS might rely *too much* on developers building their solutions entirely around WinFS.

Certainly, the existance and determination behind this technology suggest that it has huge potential. Perhaps in 5 years I'll look back and say, "What would I do without WinFS?"

But when I see a 2007 release date and consider that it will take at *least* a couple years for full developer/user adoption (just look at .NET) - I'm inclined to think... let's do as much of this as we can NOW. Or at least, SOON.

Of course, I work on a team that competes with the likes of Google, Copernic, and X1... So for us it pays to think fast.

So hopefully that explains my thinking a bit :) 9/7/2005 3:45 PM | Brandon Paddock

# re: I'm not sold on WinFS.

You're completely right about the timeframe. WinFS missed its mark completely, and no one is going to want to build themselves around it unless it proves better than what they are using now.

But who's going to build on the WDS? Expose data to it, defintely. But build apps upon it?
9/8/2005 2:18 PM | JD

# re: I'm not sold on WinFS.

FolderShare and Viapoint already have. And I have a feeling you'll see a lot more (from inside and outside MS) very soon :) 9/8/2005 2:30 PM | Brandon Paddock

# re: I'm not sold on WinFS.

I guess you weren't the only one. Any comments now that WinFS is dead? 6/27/2006 6:30 PM | PaulG

Post a comment





 

Please add 4 and 7 and type the answer here:

News

The views expressed within my blog are my own - and are not in any way indicative of those of the company I work for, Microsoft, or it's employees.

Article Categories

Archives

Post Categories

Blogs I Read

Desktop Search Links

Syndication: