Seadragon and Photosynth Media Resources
by Nate Lawrence
SEADRAGON AND PHOTOSYNTH MEDIA RESOURCES C OMPILED BY N ATE L AWRENCE O RIGINAL D ESCRIPTION URL T ITLE N OTES D ATE T IME OF D AY M EDIA T YPE D URATION R ELEVANT T RANSCRIPTION E VENT V ENUE L OCATION H OST M AIN S PEAKER S EADRAGON P HOTOSYNTH http://www.youtube.com/watch?v=0ra5tp7K--I Seadragon Tech Demo 2005 02 12 Video Demonstration You are about to see a demonstration of Seadragon technology browsing 800 high resolution images from the Library of Congress' map collection. The performance you will see corresponds to about a 500 kilobit per second connection between the Seadragon client and server. Notice how smoothly Seadragon can zoom and pan on this vast collection of visual information. There are no pauses and no abrupt scale changes. The only noticable effect is a gradual focusing as the fine image details are recieved across the network. Seadragon completely changes the experience of viewing large collections or large images on a computer screen. This beautiful 1899 view of Oakland, California is 118 megapixels in size. To achieve this smoothness, Seadragon does not need to transmit the entire file across the network. It sends only the details you can currently see and does this so efficiently that it feels like you are looking through a real window onto a massive wall of visual information. The images are stored in the new ISO standard JPEG2000 format and, while other companies offer JPEG2000 client-server systems, only Seadragon allows you to smoothly move through thousands of images and billions of pixels. Note, too, that this collection is not one enormous image. Each of these 800 scanned images is a separate object on the client and can be rearranged based on attibutes of the image, such as date, image type, or color. Seadragon is the subject of more than a dozen patents, currently pending, for computer and mobile device applications. This same collection can be smoothly browsed on a small screen, over a mobile connection. Seadragon is a revolutionary technology. It is, quite simply, a better way to browse and view visual information on any device, over any network. Seadragon Website Launch Ballard, Washington, USA David Gedye Seadragon http://phototour.cs.washington.edu/ Photo Tourism 2006 03 01 Video Presentation, Paper Searching for a particular image of a well photographed object using conventional tools often results in a large number of images that are not ordered in an intuitive way. Finding the exact picture you want can mean browsing through page after page of thumbnails. How can we organize such large photo collections in a more intuitive way? In this project, we present a novel system for registering large sets of photos and exploring them in a 3D browser. Our system discovers the relative positions of the cameras used to take each photograph, situates the photos in 3D space, and provides intuitive controls for exploring the scene and finding interesting photographs. Our system takes a collection of photos of the same scene as input. We first find keypoints in each of the input images, then match keypoints between each pair of images. Next, we run an iterative bundle adjustment procedure to estimate the parameters of each camera and the positions of the observed 3D points. Once the photos have been registered, they can be browsed using our photo exploration interface. Our system provides standard controls for moving around a 3D scene. In addition, when the user selects a photograph, the virtual camera is smoothly brought into alignment with that photo. Information about the photograph appears in the information pane on the left. Our system provides several intuitive ways to select new photos. One is to select an object. The user can highlight a region of the current photo and the system automatically finds a good photo of the selection and smoothly moves the virtual camera to the new photo. During transitions, we use a simple plane-based morph to provide context as the camera moves. A thumbnail pane along the bottom of the screen shows other photos of the selected object. When the user moves the mouse over a thumbnail, that photo is displayed in the main view, projected onto a planar approximation to the selected object. Here, the user selects a thumbnail to see a different view of the statue. We also provide tools for viewing the scene at different scales. The user can step back from the scene with the 'Zoom out' tool. This finds photos that display a larger area of the scene. The 'Show me similar images' tool finds images of the scene with scale and orientation similar to that of the current photo. The 'Zoom in' button finds details, showing the user what parts of the scene can be viewed at a higher resolution. Here, the user selects a photo of the bar relief in the upper left and the browser zooms in to the more detailed photo. Our second example uses a set of photos taken by one person over the course of two days. We registered the photographs and reconstructed line segments as well as points. We can align the reconstructed model with the sattelite image to situate it in a geo-referenced coordinate system. We render the scene using the reconstructed line segments. We also project blurred, partially transparent versions of the photos onto the scene to convey more information with a non-photo-realistic look. An overhead map is displayed in the upper right. The user can select a photo using the map. Here, the user selects a building to see a photograph of it. For this data set, we can also move left and right along a row of building facades. We provide geometric controls for this type of interaction. For each photograph, we precompute a 'left' and 'right' neighbor, based on the projected motion of the points observed by the images. We also precompute a 'step back' image so the user can quickly view more of the scene. In this example, we explore photos of the Notre Dame Cathedral in Paris, downloaded from the web. The user can select regions in the pointcloud to find images of an object. Our system also allows users to annotate photos. Theses annotations are automatically transfered to new images. Here, the user labels several regions of the current photo. As each region is labelled, we transfer the annotation to the other photos. The transferred annotations are highlighted in the thumbnail pane. So as not to cover the photos, we'll hide the panes and use the hotkey to step to the next photo in the sequence. As we move to each photograph, the annotations appear. Our system uses simple heuristics to determine if an annotated region is occluded, as in this example where one region is hidden. We can also transfer annotations from other sources, such as annotated images on flickr. In this scene, we've also added several other annotations by hand. Our annotation transfer algorithm is sensitive to scale. If we look photographs taken at different scales, we see different annotations. Next, we explore a set of photos of Halfdome in Yosemite National Park, gathered from the web. If the user finds a viewpoint they like, our system makes it easy to find images taken from a similar viewpoint. By selecting the 'Lock the camera' option, we can generate a slideshow where an object remains fixed in the view. Now we unlock the camera. We can also register historical imagery, such as this photograph of Halfdome taken by Ansel Adams in 1960. Here's our estimate of where Ansel was standing when he took this photograph. Here, we compare the photograph to a synthetic rendering, from the same location. The whiteboard has been manually added for clarity. Our final example is a scene created from about eighty photos of a walk along the Great Wall of China. We organized about twenty of the photos, seen here, into a slideshow. We have experimented with an alternative morphing technique that creates a mesh from the 3D pointcloud which is used as an imposter for the true scene geometry. This method often works well for nearby viewpoints, but creates artifacts in cases where the matching fails. We hope you have enjoyed our 3D photo tours.
TechFest 2006, SIGGRAPH 2006 Seattle, Washington, USA Noah Snavely Photosynth: Early Work http://www.microsoft.com/msft/speech/FY06/MundieFAM2006.mspx Long Cycle Innovation: Enabling New Ways To Search and Browse 2006 07 27 CRAIG MUNDIE: What I'd like to do now is have Blaise Aguera y Arcas come out. He's an architect who joined our Live Labs Group through an acquisition earlier this year. Blaise, great to see you. BLAISE AGUERA Y ARCAS: Thanks. CRAIG MUNDIE: And what I'm going to show you is that, while we do long-cycle innovation, we really are also focused on how we can take great technologies and blend them together with these things we developed over a much longer period of time through our research assets to develop very compelling products, and as part of the live capability be able to accelerate their availability in the marketplace -- so that they complement the long-cycle delivery platforms we've got in the case of the PC or Office or the basic Windows Mobile technologies, or even television technologies. It allows us to add value on top of those things on a much more accelerated basis. So, Blaise, let's talk about what you've got going here. What is that big doc they see? BLAISE AGUERA Y ARCAS: Up on the screen now is our dowry. So this is some technology that I brought to Microsoft from my start-up, just acquired at the end of 2005. So this is Seadragon technology. It's a method for interacting with very large volumes of visual information very rapidly. So these are mostly cell phone pictures, although we have a few here that are really large, like this map that is in the 100-megapixel range. This is an experience that one can have over an ordinary broadband or even narrowband connection. A thin cell pipe can do this. CRAIG MUNDIE: So this is some technology that allows us to take essentially pictures of any size from something you take as a low-res picture on your cell phone to something that is in fact entire documents that are represented as a single image of ultimately the resolution necessary to read. I think you're going to show them one of those now? BLAISE AGUERA Y ARCAS: Right. So -- well, this is actually another document type now, which is all of "Bleak House," the entire book. Every column is a chapter. And you can see this is not an image. This is real text. So this is the kind of technology that we expect is going to be really changing quite a number of things at Microsoft in the coming years. We've actually done this on cell phones as well. CRAIG MUNDIE: So this is a new model of navigation. You just zoom around in a two-dimensional space in this case. So what else do you think we can do with this? BLAISE AGUERA Y ARCAS: Well, so, a couple of months after the acquisition -- this is now only about four months ago -- we were -- our acquisition I should mention was driven by technical fellow Gary Flake, who founded Live Labs, the idea of Live Labs being to really shorten the innovation cycle dramatically and to bring a lot of the interesting things happening in Microsoft Research very quickly to prototype and to market. CRAIG MUNDIE: Right. BLAISE AGUERA Y ARCAS: So a couple of months after that acquisition I saw an amazing demo of some research at Microsoft Research at Tech Fest, which is the fair for those things. Can we go to video? What these guys had gone is develop a system that allows you take a bunch of images -- in this case these are images tagged "Trevi Fountain." Here they're mined from Flickr, so there are lots of cameras, lots of times of day, times of year. CRAIG MUNDIE: So Flickr is a Web site -- has nothing to do with Microsoft -- where people put their photos up there, and then the community can tag them. So you can put it in and say it's Trevi Fountain. Somebody else comes along, says, “Oh, I know what that is -- that's Trevi Fountain.” And they're adding tags to these pictures. So here they basically took them. And -- BLAISE AGUERA Y ARCAS: Well, what they're able to do is figure out from those images alone what the three-dimensional model was of the Trevi Fountain. That's what's being shown on the screen now. Each of those triangles is the location of a particular camera. So you can simultaneously solve for the geometry of what you're looking at, as well as where the camera was. CRAIG MUNDIE: So even though none of those people knew each other, none of the pictures were taken at the same time or from the same place -- BLAISE AGUERA Y ARCAS: Right, they're cell phone pictures. CRAIG MUNDIE: They were able to take them all up and make a 3-D model of that fountain. BLAISE AGUERA Y ARCAS: Exactly. CRAIG MUNDIE: So what did that inspire you to do? BLAISE AGUERA Y ARCAS: Well, of course when I first saw this one of the first things I wanted to do was put it together with our stuff. And this is the result. This is only after about four months of work, so you'll excuse me if it crashes. But this is a collection of images that have been synthesized together using that technology. These are a few hundred images of St. Peter's Basilica in Rome, taken by one of our guys in Italy a few weeks ago. And -- CRAIG MUNDIE: So he just wandered around St. Peter's and took a bunch of pictures? BLAISE AGUERA Y ARCAS: He wandered around, took a bunch of these pictures. These white boxes are where those pictures were taken. And so let's zoom around. He went up to the top of the cupola. And here's that picture from above. He took these images from the top. And you can see what's happening here is that all these images are being registered together in 3-D, and they give you an experience that's almost game-like of moving around in the space. All these are places where we stood and took shots in the center, and what that shot was taken of. We can move around from image to image in this way. So it's sort of halfway between a game and a slide show. CRAIG MUNDIE: So, the 3-D model, which was synthesized by the machine from all the pictures, produces the navigation metaphor. And the Seadragon technology allows us to stream the entire collection of photos to you, just hooking them all together seamlessly, so you're operating within that 3-D space. BLAISE AGUERA Y ARCAS: Exactly. So this is pretty interesting technology just for collections of images like this. But where we really think this comes into its own is when one thinks about what can happen when we take this technology and deploy it at Web scale, which we'll be doing. So this technology with canned environments we'll be releasing as a technical preview in the fall. And the scaling up to the Web we'll see. But the idea is if we can incorporate this into the Web crawler for images, then we can build up organically a three-dimensional model of the entire world. And that model is built entirely out of those photos in an unstructured way. It can incorporate everything from satellite and aerial photographs that give you a sense of what cities look like from high above down to street-level shots and down to close-ups. And it's one of those really revolutionary kind of paradigm-shifting things. We believe that this can give you a new way of interacting not only with images, but with the information behind it. CRAIG MUNDIE: And a new way to get information and to discover things. That's fabulous. Thanks a lot. Thanks for sharing with us. BLAISE AGUERA Y ARCAS: Thanks so much, Craig. (Applause.)
Financial Analyst Meeting 2006 Microsoft Corporation Redmond, Washington, USA Craig Mundie Blaise Agüera y Arcas Photosynth: Introduction http://vimeo.com/1865306 Take The Photosynth Tour Video Promotion In Narnia there's this idea of a world that connects all other worlds and there are these shallow pools that you can dive into and then you're in another world and then you dive back out... I've been reading a lot of Narnia to my son lately. You can think about what we're doing as making that Wood Between the Worlds that connects the worlds of photos from many many different web pages into a woven whole. Live Labs really started as a vision for getting products out there quickly, doing rapid prototypes - new ideas, new technologies - and releasing them rapidly and responding rapidly. We decided we'd try to prototype some things, so we took some photo collection and eventually something clicked that looked very compelling. The people in Live Labs - the SeaDragon team - got really fired up and they've been working day and night to turn this into something that really scales up to all of the world's photographs. Live Labs wants to advance the state of the art for the internet, both in terms of building better products, but also in terms of doing better research. Photosynth, in some ways, is a poster child for the sort of things that we want to do within Live Labs. It is a really fine example of a project that has really interesting potential in terms of its scientific impact, its technology impact, its product impact, its business impact, and its, you know, strategy for how we think the internet will evolve. Photosynth is also a really great story in terms of how lots of different teams can come together in a short period of time, do something that's really groundbreaking. It would not have been possible without participation from MSR, from the Pics team, from University of Washington, from Virtual Earth. All of these different teams were instrumental in being able to, to, to produce the technology preview that we have today. We started with some collections that I had originally gathered on my travels and what you do is you have the computer figure out how the photos relate to each other. You're basically flown into a three dimensional world which is full of photographs and you can click on individual photos that look very interesting or you can take, like, an automated tour where someone has prescripted it so they basically show you the things - the historic buildings you want or the good aspects of the motorcycle that you're trying to buy. So you can do it, either more like a slide show or more like a computer game. And so it's very much like floating in a world full of images and people generally figure it out without any instruction. One web page that has a picture of a hotel that I once stayed in - I can now dive into that picture and be there. And if I see, ah, some other venue that's interesting on that town square, I can dive into that picture, come out on, on the webpage that that image took place in. The interesting part, of course, is what could happen if all of my images were indexed with everybody else's? If I just have one particular shot of a place - I don't know exactly where it is - I could actually find out where that was taken, who else photographed it? I could walk around that environment, look at it from other angles. Once we can take all of the world's photos and organize them in 3D, we can now navigate the streets of Paris and then go into a photosynth tour of particular handbag store in Paris, right? Or preview a restaurant and see what the chefs are doing when they're cooking. The ultimate dream is that we're going to merge the real world and the virtual world into a kind of experience where it's totally seamless. So the idea of using your own photo collection as a kind of entry point into the world, ah, every one of your photos that's taken in an environment with any kind of recognizable landmarks becomes like a wormhole that you can jump through and uh, and use to explore everybody else's photos who, ah, who's also been there and taken pictures in that environment. That's, that's a pretty powerful idea. Live Labs isn't necessarily about what Live Labs is capable of doing. It's more about what Live Labs can do in conjunction with other people from both within Microsoft and outside of Microsoft in terms of partnerships to do things that in some sense than are bigger than any one group.
Smith Tower Blaise Agüera y Arcas, Rick Szeliski, Gary Flake http://vimeo.com/1865536 Dive Into The World of Photosynth You can think about what Photosynth does as linking images together. Whenever images are taken in a common environment, it's as if you form a hyperlink between them. And, and so now if you think about the emergent network of hyperlinks between images that, that can, that can be built by a crawler, say, ah, going out and searching the whole, ah, the whole web it's a very powerful idea. Here's a shot of Saint Peter's Basilica. We're looking at it where we can navigate through hundreds of photos. The fun thing happens when we arrange all of these guys into a common three dimensional environment. Here's a point cloud: a model that's been reconstructed from all of those images. Let's turn all of the images on so we can where they all ended up. You see this kind of complicated picture of lots of photos in their own planes inside that model. Let's go dive in and find the photo that we were looking at. And now we can move back and forth among different photos like this... just moving from side to side. These white boxes that are now appearing on the screen are showing where photos were taken. So, for example, if we want a close up over here, click on that and we see that everything is registered perfectly with the three dimensional model. So you can imagine a technology like this one with many people's photos being registered simultaneously becoming like a three dimensional map or a universe. We have a three dimensional reconstruction of the environment and we can also, of course, look at those photos individually. And then from there we can navigate around the space either via photos or via the entire environment. This is all of them turned on simultaneously which is kind of fun. If we want to look at other images similar to the one that we're looking at right now we can do this trick. Now we've grouped, close to the center of the screen, all of the images that shared a lot of context with that image that we were just looking at before. These are nearly identical shots. Here's, for example, a close up of this clock. Looking at similar shots, we see that the clock also occurred in a number of other photos like this one. So this gives you a way of grouping and navigating between images using the image content without any kind of tagging having taken place beforehand - no hand intervention. This shows you how we can zoom on different parts of the image. And, ah, as we zoom, only the necessary data for that particular part is, is coming in over the network. This is all of the images that had this same content anywhere in them, so... here's another image of the same museum, another image. And you can see the registration happen in real time as we go back and forth between those images. Here, we're moving back and forth among neighboring images - so images that share some content. So this gives you a kind of neighbor tour. It gives you a rapid way of navigating around inside that space. If you had an image like this one somewhere on the web and you wanted to know what's in one of those murals, another photo would just be discoverable like that. This photo could have come from somewhere else entirely. It certainly gives you a way of looking at other perspectives on something, or close ups, or what's around the corner - based on a starting image. Let's say that this close up is on a web page that talks about this particular scene. You could dive in and then dive back out at that web page. And so it gives you a way of linking contextually across different places in the web where the image content actually lives. This long standing dream of augmented reality where the computer will tell you about the world - the real world that you're immersed in - will finally be delivered with this kind of Photosynth technology. We're gonna see a collision of the real world and the virtual world that'll create this incredible experience that people can go and visit and really get a sense of what it's like to see things they've never seen before.
Blaise Agüera y Arcas, Rick Szeliski http://channel9.msdn.com/shows/Going+Deep/PhotoSynth-What-How-Why/ Photosynth: How, What, Why 2006 07 28 Video Interview Going Deep: Photosynth Charles Torre Photosynth: Process, History http://on10.net/blogs/laura/4187/ Photosynth: A global 3D world on your PC! Laura Foy Adam Sheppard Photosynth http://uwtv.org/programs/displayevent.aspx?rid=8282&fid=1483 Photo Tourism and Photosynth: UW CSE, Microsoft Research, and Microsoft Live Labs Create a Winner 2006 10 30 Video Lectures Annual Industrial Affiliates Meeting Paul G. Allen Center, University of Washington Ed Lazowska Noah Snavely, Rick Szeliski, Blaise Agüera y Arcas http://www.youtube.com/watch?v=DqxDjLrCCSk Photosynth demo at Web 2.0 Gary Flake and Blaise Agüera y Arcas giving a demo of Photosynth. Completely blew me away. 2006 11 09 O’Reilly Web 2.0 Summit Palace Hotel: Gold Ballroom San Francisco, California, USA Gary Flake, Blaise Agüera y Arcas http://vimeo.com/1866494 The Making of Photosynth Video Promotion|Mini Documentary It's very exciting to feel like we're really at the crest of this wave with the Photosynth project. A lot of things have come together. There's the basic research in computer vision coming from Microsoft Research and the University of Washington and there's this hunger and entrepreneurial spirit that we, that we certainly had at SeaDragon and that Live Labs also has, really as a, as a startup inside Microsoft. Photosynth is the combination of two technologies. One of them is SeaDragon and Photo Tourism is a project that Noah Snavely did as a graduate student at UW. So at the University of Washington side, Noah and I had been playing with ways of, ah, taking photographs of a scene from different camera viewpoints and trying to create smooth camera transitions between those two camera viewpoints. In parallel with this, Rick Szeliski at Microsoft Research was capturing large data sets from his own travels and he was also interested in creating 3D models and we kind of joined forces and that was the beginning of a very fruitful collaboration. When I first saw, kind of, the first 3D reconstruction which is, um, of a town square in Prague, I got really excited and, you know, this is kinda my, my first major project that I've been working on at UW. Noah was able to download this set of images and automatically reconstruct the 3D model that we could visualize, ah, in 3D and this, this was reconstructed purely from photographs on the internet. On the SeaDragon part was, of course, Blaise who envisioned this world in which you wouldn't have to think about resolution anymore. This is 'Bleak House'. It's the entire book, or if we zoom in deeply we see this kind of progressive resolution. That's real text. It's not, it's not an image of text which I'm proving by zooming in like that. This would be a, a terapixel image if we had actually digitized it or... and were rendering it that way. We have an institute that's at the intersection between lots of these different groups and so being able to straddle multiple worlds, I think it helps us to think about problems in an entirely new and different way. Photosynth - it's been such a pleasure to see evolve from week to week, getting simpler and easier for people to use, but also we keep discovering new ways to up the "Wow!" factor. Making those photos really paste onto the environment in a much richer way and make that environment be about more than just a point cloud, but really something more game-like. In the long term I'm hopeful that this will be a new visual medium just like photography and video where people can basically create photosynths in surprising ways that no one has done before. People are going to enjoy and share their photographs in a much more creative and immersive way. Finding relationships between photos will allow you to explore them in the same way that you can just navigate through the web but in a much more visually compelling way. If you want to take one of your photographs and see, "Who else took photographs from this scene?" "What does this scene look like from different viewpoints?", you can now do that. But you can also transfer information between photographs. If you do a search on a photo service for the tag "Rome", for example, then about 80% of the pictures that you find there are liable to register with other pictures that have something rigid in them and that really raises the possibility of these massive interconnected environments of Rome in 3D, built of hundreds of thousands of user's photographs. We're doing things that are almost like mash-ups in the browser in real time. This is an example of where the whole is really greater than the sum of the parts. I'm very excited about where this is going in the future 'cause I think it's an example of how research can make the bridge from an academic setting where we did it at the University of Washington and Microsoft Research directly into products at Microsoft and it's the new structure of Live Labs that enables that to happen. We have the university, we have Microsoft Research helping to do some fundamental work, then we have Live Labs prototyping all these things and bringing them out to the market so that we can actually bring them to the public. I can't wait to see what happens with this.
Blaise Agüera y Arcas, Steve Seitz, Noah Snavely, Rick Szeliski, Gary Flake http://www.podtech.net/scobleshow/technology/1219/the-best-demo-at-web-20-summit-microsofts-%20photosynth Demo of the year: Microsoft's Gary Flake shows off Photosynth 2006 11 10 Robert Scoble - The ScobleShow So, who are you? Gary Flake - Microsoft My name is Gary Flake. I'm a technical fellow at Microsoft and founder and head of 'Live Labs'. Robert Scoble - The ScobleShow 'Windows Live.' Gary Flake - Microsoft No, no, no, 'Live Labs.' Robert Scoble - The ScobleShow I heard. Gary Flake - Microsoft Yeah, it's just Microsoft Live Labs. Robert Scoble - The ScobleShow So, you are the head of the O'Reilly Web 2.0 Summit conference this week? Gary Flake - Microsoft Well, I am glad to hear you say that. Robert Scoble - The ScobleShow Well, I arrived this morning and like ten people came up to me, did you see Photosynth? Gary Flake - Microsoft That's good; I am glad to hear that. Robert Scoble - The ScobleShow So, that's from Live Labs? Gary Flake - Microsoft Yup, from Live Labs. Robert Scoble - The ScobleShow Okay and you are going to show it to me? Gary Flake - Microsoft Yeah, so, it's right here. So, let me, let me -- before we dive into this, let me give you a little bit of back story. The project is actually the merger of two different technologies that were developed independently. One was this Seadragon Technology, which came from a company that I acquired back in January. What Seadragon does is, it's basically a Client Server Technology that allows for very efficient streaming of high-resolution, big, big chunks of data, but in a very efficient manner. That's one piece. The other piece was a breakthrough piece of Machine Vision Research that was done in collaboration between University of Washington and MSR. And the researchers there -- one was a graduate student, by the name, Noah Snavely; his professor was Steve Seitz, and our researcher within MSR is - I'm blinking on people's names -Rick Szeliski. Thank you. Anyhow, when we saw what that research team had done in terms of this project that they called 'Photo-tourism', it was doing some very beautiful things in terms of stitching photos together automatically and especially relating them to one another. What really got us exciting was -- excited was, when we thought about combining that to the Seadragon Technology in combination, it came together as sort of a complete solution for -- how would you make this as a web service, where you had gigapixels of data that was remote on the server, but you want to give a very fluid environment. So, I'm actually, I'm going to start off in a home position right now. Robert Scoble - The Scoble Show Let me just set down my tripods. So, I can have... Gary Flake - Microsoft Yeah, you can steady. Robert Scoble - The ScobleShow Get a nice steady image for you. Actually, hold on just a second.- I'm ready now. Gary Flake - Microsoft So, what we're going to do is we're going to fly around first, just to kind of get an overall perspective. And, what you see here is a Point Cloud that is constructed by taking a look at all of the photos and finding out what points that they have in common. So, that's where the Point Cloud comes from. I'll do that one more time, just so that you can get the bird's eye view, I'll pause it. You could see that it constructs a rough approximation of a 3D model that stays. Now, we can continue flying down and drop ourselves into the middle here and basically navigate the space. Now, I'm just going to pick an arbitrary spot here and start walking around the square a little bit. And in fact, I can... Robert Scoble - The ScobleShow That way - this is taken from one image? Gary Flake - Microsoft No, no. I'm sorry. These are hundreds of photos that were taken by a photographer... Robert Scoble - The ScobleShow All right. Gary Flake - Microsoft ...and, what we've done is using that back-end technology, that was built by University of Washington and Microsoft Research. We figured out, how all those different photos, spatially relate to one another to effectively create a 3D Model of the world, and then, how those photos actually relate to that 3D Model. So, you're looking at the Point Cloud, which represents the 3D Model, and then now you're looking at individual photo, and if I want to... Robert Scoble - The ScobleShow And, how many photos -- if I wanted to go out to the Golden Gate Bridge and do one of these or something like that, how many photos do you need to really do this well? Gary Flake - Microsoft Well, right now -- it depends on how big the space is. Right now we're looking at a rather big space and we've probably on the order of about 200 photos right here, I am guessing. And so, the larger the space, and more photos that you need; however, dozens of 200's, 2000's of photos actually do nice things, and so.... Robert Scoble - The ScobleShow How do these get in? Did somebody upload them with your tool? Gary Flake - Microsoft No, so it's actually wider way to net (ph). So, one of our program managers went on vacation. He took a whole bunch of photos. We dropped that into our servers to allow the algorithms to crunch on them and that is the step-work. It figures out, how they spatially relate to one another, the Point Cloud is one of the outputs of that process. How the photos, physically relate to that Point Cloud, is another output. And, it's from that, that's all the information we need, in order to reconstruct this 3D environment. Robert Scoble - The ScobleShow So, how long did it take to put 200 photos in, like this guy did? What do you guys think? You guys have estimates? Gary Flake - Microsoft It's a long time, I don't know whether I want into the details. It's about eight to ten hours of CPU time. Robert Scoble - The ScobleShow Eight to ten hours. Gary Flake - Microsoft Yeah. So, it's currently -- it can be an expensive operation. We know that there are lots of ways of actually speeding it up and doing much faster things with it. So, I'm just going to, kind of, walk around a little bit here, and we can take a look, and for all these things -- remember I mentioned the Seadragon Technology before which is about really efficiently transmitting big objects over tight pipes, well, you are seeing it right here because we can zoom in on and get a whole lot of detail, which is pretty fun. We can continue to step back, and step back, and step back, and so this gives a real a merced (ph) experience with respect to exploring a space. And, just maintaining that visual continuity has a really nice effect of helping the user to maintain contacts, where they are. The contact switches are no longer hard; they are actually quite soft. So, take a look at this, I'm just going to tour around here, walk around, I can go back, I can occasionally step back, do a whole bunch of things like that. We call this thing right here splatter mode, where, basically, when we're going to splatter mode, it takes the current image that you are on in 3D setting, puts them into the center and are ordered around the spiraling out from the center, are photos that are visually related to that one. So, you can see we have a whole bunch of photos here, that sort of capture part of that square and we can pick a different one and see a different perspective. For any of these photos, we can do that nice Seadragon eyes thing that I showed you before -- we're basically - now we're zooming in, and getting a lot of data all at once. Again, these are like 8 megapixel images. Now we can go in, we can go out... Robert Scoble - The ScobleShow And are these coming over the wire live or are these locally stored? Gary Flake - Microsoft For the purpose of this kiosk, we set this up so it's a local store, because we are sharing bandwidth with lot of other people, but the experience that you just saw, can be had by anyone using a broadband connection. So, I'm not showing you anything that isn't available today just by having a broadband connection. Robert Scoble - The ScobleShow Wow, and I can download this now, right. Gary Flake - Microsoft You can download this now, so if you go to labs at live.com/photos, and that's where the Add effect (ph) is downloaded. Robert Scoble - The ScobleShow Can I upload my own photos to it? Gary Flake - Microsoft We're not there yet and the... Robert Scoble - The ScobleShow So, it takes eight hours of process or so... Gary Flake - Microsoft Yeah, what we're working on right now is a number of things. One, is a longer term store, in terms of, how we want this to overlap with lots of different products, and what's the true potential of this here because whenever people look at this, automatically, there is a whole bunch of used cases that they come up with. I see this, and I want to take my vacation photos, and I want to see, how they relate to everyone's vacation photos. Someone else looks at this and they say, 'Oh my gosh!, it's the new way of selling real estate'. Someone else says,' I want to take the photos of my child, as they've grown up, and watch them grow up in my house.' So, there are so many different ways of how this could evolve, that instead of taking the shortest path, or like the quickest product, we really want to take a slightly more thoughtful path to a better product. Robert Scoble - The ScobleShow Okay. Gary Flake - Microsoft So, that's our approach; so clearly, what's in the cards is, we had to figure out, how do we make the community of photos available for many people, and I think, that's when a lot of the real mind boggling potential of this will come through. So, I'm also, what I'm going to do right now is, I'm going to show you a different collection, okay? So, right now, we're diving into the artist studio of a guy by the name of Barry Fegan (ph), and he's an artist local to Seattle, and he was kind enough to let us just come in to his studio and watch him work. And so, we can take a look at his studio, walk around, see what's going on. There all sorts of things going on, so I'm just kind of taking a gander of what's going on in the studio, and we've come full circle now. But, now I'm going to turn back and notice, that as I move the mouse over, this indicates that some of these photos we actually have, is much higher resolution; that they are in fact registered in the same process. So, I can take a look at this, step back or go back in, and then zoom in and get just an amazing amount of rich detail. Robert Scoble - The ScobleShow Wow! Gary Flake - Microsoft And so, again, like, Zoan, tell me how big are these scans of this photos? Zoan 80 megapixels. Gary Flake - Microsoft Yeah you are looking at 80... Robert Scoble - The ScobleShow Now, who are these guys behind you? Gary Flake - Microsoft Oh, these guys want to introduce themselves (Voice Overlap). Robert Scoble - The ScobleShow Well, they can't hear because you've got to introduce them because you've got the mike . Gary Flake - Microsoft Oh! We have Adam Shepherd here, he is a group program manager that heads up a lot of product planning and program management within Live Labs. We've Jonathan Duggy (ph) who is a program manager working specifically on Photosynth. We have Blaise Agueray Arcas who is an architect within Live Labs. Robert Scoble - The ScobleShow Excellent. Gary Flake - Microsoft Thank you. Robert Scoble - The ScobleShow Thanks. Awesome stuff, guys. Gary Flake - Microsoft Yeah, so you're looking now at a 80 megapixel image, streamed over. We can now step back and see it in its context again. I'm going to zoom back in just for a second, and we can continue to walk around, and occasionally and we might say, oh, here's another photo over here that I want to dive into, and again, 80 megapixels. Now, you're getting the idea of that. Okay, we can do this for an artist studio. Imagine, if we did this for a major museum, what sort of experience could you have walking through the museum, seeing the 100 megapixel version of something that you could only really get the -- a really suitable experience, if it was done in person, but now, I can look at 80 megapixel objects over my VSL on. Robert Scoble - The ScobleShow Right. Gary Flake - Microsoft That sort of thing. Again, there's just really, really rich detail throughout this whole collection, and a sense of spatial continuity, that's preserved. I'm going to go back to -- towards Gary Fagan, and see, if I can navigate a little bit closer, and see what's going on with him. And here, we see a little bit of what's going on as he's working, and this is really neat, because you start to see things that look a little bit like time-lapse photography, because he's sitting there working and painting in his art wok, and each time we go through a transition, again, we're preserving this spatial continuity of all the different photos. So for fun if you're interested, I'm going to do a couple of more collections. This one is St. Peter's Vatican, and this collection is really nice, because from this collection we can get a sense of depth, that doesn't really exist in the other collections. So, we started from afar, we work our way in, we're going further, further, and so now you're seeing that you can capture orders of magnitude of depth within one experience, and navigate through that experience in a fairly seamless way. That's incredibly compelling in terms of, how you can connect the whole thing together. So, now I'm going to do something that's really unusual here, we're taking a little bit of a 3D, I'm sorry, a 360o tour of this. I'm now going to fly above, see what's going on here. I can take a look at what's on this different pieces, I'm going to keep moving, I'm looking for something special here, hold on. Okay, I want to get to the other side, all right. So, we have a little feature here that we call frusta, where it allows you to see where the photographers were standing when they took the photo, and what photos they were taking out. So you see here, I highlight this, and I'm seeing where the camera was pointed, and what they're taking the photo of. Well notice up over here, and over here, there's some people that would -- some shots that were taking from up high. So, we've now navigated this space, like taking a bird's eye view stepping back, seeing that there's a photographer up there, and now I'm going to click on this, and get that view of above. Now, I'm going to turn off the frusta by clicking on this camera icon, and now we can just see, some of these other images that are from the higher level view. So one more collection I want to show, and I think this will kind of do it all justice. Robert Scoble - The ScobleShow Can I get to all these collections online when I download this? Gary Flake - Microsoft We've put four of them up initially, so this one is not available, but I believe the other ones that I shown are. Robert Scoble - The ScobleShow Okay. Gary Flake - Microsoft So this is Grassy Lake, and what's really -- actually let me -- I want to get to a particular spot here, so I'm looking, I want to see these climbers in action. I'm going to go ahead and pick this one, and then I'm going to go into 3D view, which I'm going to do right now. Now what's really interesting is, again you have this sense of time lapse photography, because you could see the progress of the climbers. Now, even though the camera has moved a little bit, we're preserving the spatial continuity in terms of what the climbers were going through, and now we can actually take a longer, larger view of what it was they were climbing, and how it's spatially it relates to the rest of the world. So again we're given entirely new experiences to what's possible with respect to photos, and being able to see that context more seamlessly and more continuously. So you've been suspiciously silent, what does that mean, is that bad or good? Robert Scoble - The ScobleShow Stunned. Gary Flake - Microsoft Oh good, I'm glad to hear that, and again... Robert Scoble - The ScobleShow You stunned me into silence. Gary Flake - Microsoft That's great, that's good, I've never known you to be silent, so this is -- I consider this (Inaudible) renowned accomplishment. Anyhow, you see what's going on here, there's just -- again, lots of rich information, lots of fun. We think that there's a lot of potential here, we obviously haven't thought through all the different use cases, and the reason why we put this out seven months after starting it was, because we wanted to actually start something -- the dialogue between our team and the rest of world to figure out what were the more compelling things to do. Robert Scoble - The ScobleShow Very cool. Gary Flake - Microsoft Thank you. Robert Scoble - The ScobleShow Very cool guys. This rocks. Gary Flake - Microsoft Thanks.
Robert Scoble Gary Flake, Adam Sheppard, Jonathan Dughi, Blaise Agüera y Arcas http://blogs.msdn.com/billcrow/archive/2006/11/20/photosynth.aspx Bill Crow's Digital Imaging & Photography Blog: Photosynth 2006 11 20 Web Log Entry --- Bill Crow http://on10.net/blogs/jesse/Photosynth-and-Seadragon-offer-a-glimpse-at-the-next-great-UI/ Photosynth and Seadragon offer a glimpse at the next great UI 2007 01 07 Channel 10: Jesse Lewin Smith Tower? Or is this at CES? Jesse Lewin http://on10.net/Blogs/bgauth/photosynth-with-bryan-ressler/ Photosynth with Bryan Ressler 2007 02 26 Channel 10: Benjamin G Benjamin Gauthey Bryan Ressler http://www.ted.com/index.php/talks/blaise_aguera_y_arcas_demos_photosynth.html Blaise Aguera y Arcas demos Photosynth ( The TED TALK ) 2007 03 10 Video Lecture What I'm going to show you first, as quickly as I can, is some foundational work, some new technology that we brought to Microsoft as part of an acquisition almost exactly a year ago. This is Seadragon. And it's an environment in which you can either locally or remotely interact with vast amounts of visual data. We're looking at many, many gigabytes of digital photos here and kind of seamlessly and continuously zooming in, panning through the thing, rearranging it in any way we want. And it doesn't matter how much information we're looking at, how big these collections are or how big the images are. Most of them are ordinary digital camera photos, but this one, for example, is a scan from the Library of Congress, and it's in the 300 megapixel range. It doesn't make any difference because the only thing that ought to limit the performance of a system like this one is the number of pixels on your screen at any given moment. It's also very flexible architecture. This is an entire book, an example of non-image data. This is Bleak House by Dickens. Every column is a chapter. To prove to you that it's really text, and not an image, we can do something like so, to really show that this is a real representation of the text; it's not a picture. Maybe this is a kind of an artificial way to read an e-book. I wouldn't recommend it. This is a more realistic case. This is an issue of The Guardian. Every large image is the beginning of a section. And this really gives you the joy and the good experience of reading the real paper version of a magazine or a newspaper, which is an inherently multi-scale kind of medium. We've also done a little something with the corner of this particular issue of The Guardian. We've made up a fake ad that's very high resolution -- much higher than you'd be able to get in an ordinary ad -- and we've embedded extra content. If you want to see the features of this car, you can see it here. Or other models, or even technical specifications. And this really gets at some of these ideas about really doing away with those limits on screen real estate. We hope that this means no more pop-ups and other kind of rubbish like that -- shouldn't be necessary. Of course, mapping is one of those really obvious applications for a technology like this. And this one I really won't spend any time on, except to say that we have things to contribute to this field as well. But those are all the roads in the U.S. superimposed on top of a NASA geospatial image. So let's pull up, now, something else. This is actually live on the Web now; you can go check it out. This is a project called Photosynth, which really marries two different technologies. One of them is Seadragon and the other is some very beautiful computer vision research done by Noah Snavely, a graduate student at the University of Washington, co-advised by Steve Seitz at U.W. and Rick Szeliski at Microsoft Research. A very nice collaboration. And so this is live on the Web. It's powered by Seadragon. You can see that when we kind of do these sorts of views, where we can dive through images and have this kind of multi-resolution experience. But the spatial arrangement of the images here is actually meaningful. The computer vision algorithms have registered these images together, so that they correspond to the real space in which these shots -- all taken near Grassi Lakes in the Canadian Rockies -- were taken. So you see elements here of stabilized slide-show or panoramic imaging, and these things have all been related spatially. I'm not sure if I have time to show you any other environments. There are some that are much more spatial. I would like to jump straight to one of Noah's original data-sets -- and this is from an early prototype of Photosynth that we first got working in the summer -- to show you what I think is really the punchline behind this technology, the Photosynth technology. And it's not necessarily so apparent from looking at the environments that we've put up on the website. We had to worry about the lawyers and so on. This is a reconstruction of Notre Dame Cathedral that was done entirely computationally from images scraped from Flickr. You just type Notre Dame into Flickr, and you get some pictures of guys in t-shirts, and of the campus and so on. And each of these orange cones represents an image that was discovered to belong to this model. And so these are all Flickr images, and they've all been related spatially in this way. And we can just navigate in this very simple way. (Applause) You know, I never thought that I'd end up working at Microsoft. It's very gratifying to have this kind of reception here. (Laughter) I guess you can see this is lots of different types of cameras: it's everything from cell phone cameras to professional SLRs, quite a large number of them, stitched together in this environment. And if I can, I'll find some of the sort of weird ones. So many of them are occluded by faces, and so on. Somewhere in here there are actually a series of photographs -- here we go. This is actually a poster of Notre Dame that registered correctly. We can dive in from the poster to a physical view of this environment. What the point here really is is that we can do things with the social environment. This is now taking data from everybody -- from the entire collective memory of, visually, what the Earth looks like -- and link all of that together. All of those photos become linked together, and they make something emergent that's greater than the sum of the parts. You have a model that emerges of the entire Earth. Think of this as the long tail to Stephen Lawler's Virtual Earth work. And this is something that grows in complexity as people use it, and whose benefits become greater to the users as they use it. Their own photos are getting tagged with meta-data that somebody else entered. If somebody bothered to tag all of these saints and say who they all are, then my photo of Notre Dame Cathedral suddenly gets enriched with all of that data, and I can use it as an entry point to dive into that space, into that meta-verse, using everybody else's photos, and do a kind of a cross-modal and cross-user social experience that way. And of course, a by-product of all of that is immensely rich virtual models of every interesting part of the Earth, collected not just from overhead flights and from satellite images and so on, but from the collective memory. Thank you so much. (Applause) Chris Anderson: Do I understand this right? That what your software is going to allow, is that at some point, really within the next few years, all the pictures that are shared by anyone across the world are going to basically link together? BAA: Yes. What this is really doing is discovering. It's creating hyperlinks, if you will, between images. And it's doing that based on the content inside the images. And that gets really exciting when you think about the richness of the semantic information that a lot of those images have. Like when you do a web search for images, you type in phrases, and the text on the web page is carrying a lot of information about what that picture is of. Now, what if that picture links to all of your pictures? Then the amount of semantic interconnection and the amount of richness that comes out of that is really huge. It's a classic network effect. CA: Blaise, that is truly incredible. Congratulations. BAA: Thanks so much.
TED 2007 Monterey Conference Center Monterey, California, USA Chris Anderson http://www.microsoft.com/presspass/exec/techfellow/Flake/05-08-2007MSNSAS.mspx Gary Flake: Next Generation Platform 2007 05 08 GARY FLAKE: So I have to apologize, the technical people here did me a little bit of a disservice by setting up my presentation first. So I'll have to do a little bit of driving in real-time here. I think I have the deck now, there we go. Now give me my stream back. There we go. Thank you very much. So I'm combating something here, we're at an actual point for a nice siesta. I imagine everyone is just kind of like, it's time to take a stretch. And I'm going to talk about some things that I think are a pretty exciting and interesting. They may require a bit of thought, because I'm going to present what I think are some big ideas in a new way. But because I'm asking you to make such a big intellectual commitment, in order to get a payoff, I wanted to give you a little bit of a payoff in advance. So what I'm showing here is one-half of the technology that you saw this morning with integration of Silverlight and Seadragon. And so what we're looking at right now is about 1,000 photos. These are, in total, and actually it's more than just photos, there's a lot of mixed media here. There's about 16 gigabytes of data. And the interesting thing about this is, even though there's about 1,000 objects here, they are actually all independently maneuverable, so I could fly in and fly out, and pan around, and dive in deeply, and have a lot of fun with that; or I could lay it all out on the grid; or if I wanted to I could position it on the surface of a sphere. Not that this is necessarily a good way of looking at information, but it goes a long way towards showing what this technology can do. What is really exciting about this is that we believe that there are entirely new forms of advertising, new forms of media, that will emerge from this. So let me just show you what I just did. I just hit a button that basically made about 1,000 objects disappear off the screen, and a couple hundred appear prominently on screen. And you saw a version of a demo like this today as integrated with Silverlight. Now you're seeing it integrated with the larger Seadragon backend, and now we get to see the sort of infinite zoom advertising capabilities that were shown earlier, but done in a context of there being a lot more information here. So I'm going to go and take a look at the earth, dive in, take a look at low level streets. And, again, this is all one data set that I'm looking at, and if this had been stored remotely, or locally, it really doesn't matter. That's the thing about the technology that is somewhat special. So the last thing I want to show you on this before I go right on to the presentation is that this isn't just about images, we also handle a multitude of data types, including TrueType Fonts. So what you're looking at right now is the book "Bleak House" in its entirety. We can step back, see that the chapters are, in fact, columns I'm not suggesting that you would actually want to read a book in this way, but to appreciate the amount of information that is contained just on this one little piece, this one snapshot, is actually quite incredible. And just to prove to you that this is, in fact, TrueType Fonts, we aren't actually so all this stuff is independently positionable, and I can rotate it around, and whatever, and we can step back out. OK, so you get the idea. And that was the whole point of that. With that as a little bit of inspiration, I would like now to dive into the presentation. What I want is to spend the bulk of my time talking about today, and afterwards we will actually go into I'm hoping that's presentation mode. No. I have to apologize, I can't actually yes, there we go. I'm going to talk today about the Internet, and why it's important, and the modest claim that I'm going to make is, despite all of the hype that has been somewhat omnipresent with the Internet or the past couple of years, we are still fundamentally undervaluing the total proposition that it represents to society, and all the various industries that connect to it. So I'm going to first give you a sense of what my motivation is. Whenever you're talking about, whether you're an individual, or a company, or a society, you know, what's important and what's not important. It's important to really have this sense of navigational north, where do we want to go as a society, or an industry, or otherwise. And having this sense of navigational north allows us to sort of focus on that which is important or not important. So that's my motivation, I want to share with you that which I think is important about the Internet, and why it is fundamentally different. So I'm going to talk about this in primarily four different pieces. We're going to talk about four things that have been arguably over-hyped in the past, and we're going to put them together, and show how, in fact, the combination may arguably be undervalued. Following the bulk of this presentation, I'm going to then dive into some additional demos that are built off of the one that you just saw, and I'll get a chance to actually show you what I think are things that point to entirely new advertising models, new advertising products that will introduce entirely new value to merchants and consumers. So let's start with long tail, and I will try to keep the technical things brief, and also keep it fun at the same time. So long tail is something that is often used to characterize a collection of things in terms of how they relate to one another. And we're used to talking about things like the head of the distribution, or the tail of the distribution. Intuitively, you can think about the head as being the relatively small number of large things, and the tail as being the relatively large number of small things. So some examples, in the animal kingdom, we have a small number of whales, whales are very big. In the animal kingdom, we also have a vast number of insects, I'm showing an ant here, and what is very surprising, and the reason why we call this a long tail is that oftentimes the total weight or magnitude of the small things is bigger than the large things. That's pretty counterintuitive when you think about it, because it's the big things that are visible to us, but the small things that are invisible. So this not only applies to the animal kingdom, it also applies, say, to music, where we might have Britney Spears as sort of like the whale of music, I mean no disrespect there, my Uncle Harry here as representing one instance of the tail of the music industry, someone who is unsigned, or can't actually get his song heard, or maybe only is heard by people downloading free MP3s, that sort of thing. And the point is that there are more Harrys than there are Britney Spears. And depending on how we architect the Internet, we may, in fact, find them to be more visible or adding more mass to the total value. This also exists for the Internet as a whole. Yahoo is an example of something that's in the head of the distribution. PunkRockKnitters.com in the tail. And, again, this is sort of separating the mainstream big things versus the very small. The subtleties here is that a long tail is actually a macroscopic phenomenon, it is something that emerges from the behaviors of many smaller things coming together. And in particular what a long tail is about is the barriers to entry dropping across the board. So what we found with the Internet is that barriers to entry have been dropping, creating more participating, yielding a blurring of the lines between the big guys and the small guys in a variety of domains. We see some of these domains are inclusive of content, commerce, communication, code, and some of the examples here really help make the point. Digital photography has made everyone an amateur photographer that can actually, you know, if you have a hit and miss ratio of 1,000 bad photos to one good one, now with digital photography you can actually get the occasional good photo and share it. The other examples around content are numerous, how we're just lowering the barriers to participation by allowing many people to produce lots of things. Commerce is a similar story around e-Bay and v-stores, and paid search, and Yahoo stores, and Office Live, and all of these other technologies have basically turned a set of industries where the barrier to entry was thousands of dollars a month, even if just for a T-1 line, to something that is often free, or maybe $5 a month. That kind of a democratization is pretty exciting. Similar story around communication, where we have e-mail, VOIP, instant messaging, social networks basically increasing the efficiency in which connections are made. And I have a little personal story about this, I actually met my wife about eight or nine years ago on Match.com, and this was a time when you had to lie about how you met. And now we can actually say with no embarrassment, in fact a little bit of pride, that we met on Match.com, because now we were ahead of the curve. But eight years ago, we literally had the cover story of, oh, we actually met in a bookstore, and that whole thing. So, again, we see elements of the Internet that are not mainstream becoming mainstream. Finally, I would just call out code as another example, because we're seeing in a fundamental way software development practice is changing. It used to be in order to experiment on how to build a better search engine, you had to build a vast infrastructure just to get into the game. And now with the advent of Web APIs, and Web services, and other things, we have the ability of piecing together these components in such a way that an individual can actually experiment on the infrastructure that formerly had only existed for big corporations or big universities. The real point, more power to more people, lower barriers to entry for everything. Second part, network effects. We defined a network effect as just a phenomenon whereby the value of the network increases as a function of the number of participants. This is the typical definition that is used, and we often speak about the common examples here being telephones, instant messaging, e-mail, and other things. Well, this background suggests that new value is only a function of the number of participants. It turns out that the value of a network can, in fact, increase as a function of the amount of participation of a constant number of participants. And so what we're finding is that as more people come online, as those people who are online are doing more things, the value of the network can increase as well. So individuals are generating, of course, user generated content, and we find that something exciting, but it's not something that everyone online does, it's just something that some people do. But there's also metadata like tags, and rating. So if you are engaged in making a music playlist, or if you are just giving a thumbs up or a thumbs down to a movie, or saying a product review is helpful or not, those things are adding to the communal pool of knowledge. And even inflicted actions, such as hyperlinks, and clicks through those hyperlinks are adding more value to the ecosystem in terms of what we can do with that data. Thinking about this more broadly, we had mentioned before content, commerce, communication and code, each of these things is adding more value in the context of a larger catalogue of these things. We have more valuable with metadata, and more valuable because of the activity log associated with usage of those things. So as the Internet matures, and we see it evolve into something that looks a little bit more like a mirror of a physical world, we start to see elements of the physical world becoming instrumented online. So who we know is, in some sense, captured by social networking. What we know is captured in some sense by the knowledge that is embedded online. Where we live by maps, and imagery, and a demonstration that I'll show you in a couple of minutes called Photosynth will start to instrument the reality of what we see. Third of four points, Web search. So as a background, our thoughts around Web search as an industry are sometimes a little bit a backwards I'll claim. Let me get to that statement in a couple of moments and frame it just a little bit. We're used to thinking about Web search in a number of different flavors. There's paid, or sponsored search, organic Web search, vertical search, multimedia, the hidden Web, all these other different things that we think about as being parts of Web search, and we focus today mostly on Web pages, on a text query, and text results set. My claim is that in some sense, this is putting the this is confusing the means and the end. Search is not an end of itself. It is a means towards an end. The end itself is discovery, that is the goal. We wish to discover things. Search is the way by which we discover new things. And so if you think about it in terms of this broader framing of the problem, then discovery needs to be demand driven. So I do a search as an example of discovery to find something that's of interest to me. It may also be something that we wish to have some serendipity. I may want to be pleasantly surprised. It may make sense to do it based on who I am as an individual. Now, more participation in this sort of ecosystem of search implies that there's more to be discovered, because as we mentioned before, as more people are generating user generated content, and there's more tagging, and more reviews and things like that, the communal pool becomes greater. But with more stuff, there's a risk that the value of search drops, because if you have more things there, and maybe the new things that are being added are of maybe less or questionable value, then you have something of a needle in a haystack problem. How do I find the piece of information that is there, but now harder to find because of the big pool of information. So there is a really compelling intersection between content, commerce, and community in terms of how all these things connect and intersect with one another. In fact, I'll claim that in combination each makes the other better and more valuable. So content becomes a means of expressing interest. If I like to consume information about baseball, and about mountain biking, and about high technology products as well as trends, then that helps describe me as a person in terms of what my likes or dislikes are. So merely the availability of a greater pool of content helps to describe me. Commerce is something which we find there are implicit quality filters that are in place, because of reputation, endorsement, and simple market signals, things that sell well over long periods of time typically do so because they have value to third parties. Things that don't, typically do things that don't sell well typically don't sell well, because they lack the sort of value relative to the other options on the market. Those are valuable information signals that help inform what is valuable and what's not valuable. A community in this larger view of the world actually becomes something in which we get something of a community filter that helps to sift and sort, and vet quality things from the things that maybe should be ignored. The real point here is that, as search gets better, and these technologies in different domains, such as content, commerce, and community, mature, as more people participate, the better search gets at helping to ease the barriers for participation. So final four of four points here. "The Innovator's Dilemma." So let me just give a quick background, or a definition of innovator's dilemma. I know this is probably familiar to this audience, but I just want to define my terms for a moment. It's normally characterized by a pattern, and that pattern is the first in an industry focuses on a small number of large, and high margin customers. Later on late arrivals have to focus on lower margin customers, because that is what is available to pursue. The late arrivals learn efficiencies, because they compensate for lack of margin with scale. Meanwhile, competition increases for all participants and margins shrink. The established companies rarely learn the efficiencies that younger companies grew up with. And the late arrivals win, because they can make optimizations and apply them to the head, meaning the smaller number of big customers. So that is a pattern that has been talked about in a variety of domains, it has been used successfully to analyze a number of different industries, Clay Christiansen is the I'm sure many people here know about this, but has made a name for himself on a number of books taking about this phenomenon. Our claim that let me just say the dilemma, the reason why we call it the innovator's dilemma is that oftentimes the first in an industry, the innovator, must eventually destroy or redefine their own business before someone else does. And some of the more telling examples of this in the past would include Cray being killed by Silicon Graphics, Silicon Graphics being wedged by Sun, Sun being pressured by the PCs, and maybe who knows what the relationship will be between PCs and cell phones. So what gets really interesting, though, is when you talk about what the innovator's dilemma means as applies to the online world. So let's do a side-by-side comparison between the online and the offline world. Offline there are huge startup costs to get in business, you might have to build a shop, put up a storefront. Online the startup costs are diminishing, it's becoming easier each and every day, and in some cases they've dropped to zero. Offline the aggregate size of the tail of your business is limited by physics. Simple geography in terms of how many people can you connect to at a physical setting, that's limited. Whereas, online the aggregate size of your tail business can potentially be unbounded, or in fact, only in some sense bounded by the size of the world. Offline more business usually implies more employees, to put it another way, you have to work harder. Online more business may not require more employees, it just means that you have to work smarter. Which leads us to the fact that a quality product usually implies high touch in the offline world, but in the online world a quality product might, in fact, be a better algorithm. Innovation iteration in the offline world often follows product and business cycles, which can take years to unwind and unfold. In fact, arguably the innovator's dilemma with respect to the transition of the different hardware paradigms is something that has taken decades to unfold. Whereas, online the innovation iteration patterns follow a data flow cycle. In fact, we've already seen multiple generations of different companies pioneering an industry being having that industry be redefined out from under them, and then moving on to the next thing. So the real point to all of this is that when you consider how innovation happens in the offline world and online world, in the online world there's fundamentally a cross cycle, if you will, that is much more rapid. It's a much more liquid, fluid environment in which change happens. So we've now just considered four different pieces. I would now like to show you that the whole is greater than the sum of the parts when we put these things together, and that it is the combination that is, in fact, undervalued. So to recap, we talked about long tails, which was fundamentally about more carriage, more people, lower barriers to entry for everything. We talked about network effects, which is creating new value as a function of the number of participants, and the participation that they bring on. We talked about Web searches being fundamentally about discovery and as a means of discovery something that gets better as more people participate, and also as their depth of participation increases. And we talked about Internet speed, or the innovator's dilemma as being something which captures the non-physical aspects of the Internet, and how it increases the cross-cycle of change. The interesting thing is that there's a very compelling way of putting all these pieces together. So if lowering barriers to participation is fundamentally about simplified authoring and participation, and if network effects are about extracting new value from pooling combinations of different data sets, and data together in one place, then we can clearly see that the green circle follows from the red circle. That is, more people participate the greater the pool of data there is to collect and put in one place, the greater the potential value. Now, as that pool increases and as there's more value that can be extracted, we have more ways of enhancing modes of discovery, and as we have the ability of enhancing more forms of discovery, that in turn feeds our ability to connect it back to the first red circle, in simplifying participation. So this is a very abstract characterization, let me ground it with a couple of examples. And I forgot to mention the fourth point, rapid iteration rapidly iterating innovation cycles, making this reinforcement something of a snowball effect, or a virtuous cycle, in terms of how it all connects. So let's just talk about some simple examples of how we see this thing playing out. For music, users now are motivated, even if you don't play a musical instrument, to author, create, publish play lists of that which you like and dislike. That collection of information about user taste forms a collective pool from which you can tease out the pattern of, if you're this sort of person that likes this type of music, you'll have a tendency to like that music. And so teasing out those patterns, sifting out the cues that help inform what are the patterns that exist at the forests if you're looking at the level of the trees. That's fundamentally what's happening here. In turn, we can use that value, that new knowledge that's teased out, to facilitate new forms of discovery, which, in fact, encourages people to, once again, have greater engagement, and greater participation for authoring these types of things. We can use Wikipedia as an example, where fundamentally Wikipedia started out as nothing more than a very streamlined and simple way for multiple people to collaborate with one another, and author an entry, or edit an entry. That, in turn, created something that had greater value on the whole than it did individually. So we see that the collection of Wikipedia entries that make up the whole thing is more valuable together than they are apart. This, in turn, facilitates the ability of individuals to discover new things about the world, which, again, inspires them to be more proactive in interacting with that system. I'm going to show you something called Photosynth in just a moment, and this is yet another example. What I'm going to show you is that in the old way of looking at digital photos we had someone taking a digital photo, and the amount of personal utility that they got out of that photo was something that was just a function of their interest, and what they did independently. However, I'm going to show you how we can take a collection of photos and make an entirely new digital artifact, where the whole is greater than the sum of the parts, where there is entirely new, and non-linear value from pooling together lots of digital photos. This, in turn, allows people to explore and experience photos in an entirely new way, which, again, facilitates new forms of discovery. So let's get to the demos. OK. So the first thing I'm going to show you is a tour of St. Peter's. So what we're looking at here is approaching in to St. Peter's from afar, and I'm just going to walk through some photos and start maybe just walking around different portions of this. And some of you may have figured out what's going on already. We have a whole bunch of photos, but those photos are, in fact, organized in such a way that they are laid out relative to one another in a way that preserves the spatial relationship between the individual photos, effectively creating something of a 3-D virtual environment automatically from the collection of photos. Now, I have to emphasize this experience here was created automatically, with no human intervention. We basically took a couple hundred photos, we processed them with an algorithm, and the algorithms basically came up with an output that consisted of two things: one, how did the individual photos relate to one another; and two, what is a three-dimensional model of the universe in which these photos live. So, let me show you a couple things. I want to take a quick flyover, and you can see St. Peters from when I initially did a sort of flyover, and there we go. So, that is St. Peter's, and that is the 3-D model that was actually teased out from the individual photos. I've also just turned on the feature of Photosynth where the cameras of the photographers and their position where they're at are now actually displayed. So, these cones here represent photographers, where they were standing, and I can say, OK, where was this photographer standing, and I click there, and I see. And I can also make out, oh, there was someone up high; where were they? Oh, they were taking a picture up there. And then I can navigate around. And again experience, collections of photos in a way that no one has ever really done before. And I'm also noting that there's a photographer that was standing up here. Let's go back up here, and we can look out and see the entire plaza that makes up St. Peter's, and dive back in. So, let me call out the special things about this. So, you have photos. They have value to you. There exists a whole universe of photos that are either out in sort of a communal pool or other people currently have that they hold privately. The value that you get by putting them together is actually greater than the sum of the individual values, because you can tease out how they relate to one another. So, instead of thinking about what my vacation photos actually look like, we now have the ability of seeing my vacation photos in the context of your vacation photos in the context of the historical photos that existed. And this gets really exciting, and when you start thinking about all the different use cases. We have done this on interiors of houses, we have done this for the surface of mars, we have done this for space stations in the sky, we have done this for multiple locations that include both manmade artifacts, as well as natural landmarks. And in every case we are surprised by the new value that we've seen being able to explore a space in quite this way. And so that is a new way of experiencing photos. Now, to change gears just a little bit, I would like to show you how this same technology can serve as a way of surfacing an entirely new form of advertisement. So, what we have here is something that's not unlike what I just showed you. I have the Photosynth of the interior of a commercial space, shown here, and on the left I have some Web content that happens to correspond to it. So, we're in a place now called Fixed Design. And I can walk into the store, and I can take a look and see, yes, they seem to have a lot of products here related to bathrooms and kitchens and things like that. Now, what's interesting is that I just walk close to this region over here that has lots of faucets. And you notice that this content over here actually changed to correspond with it. And I can -- you know, if I'm interested, I might say, oh, let me take a look at this faucet. And when I do that, of course, the corresponding content, Web content over here changed in order to be consistent with that. And I can again work my way around the space, navigate around, take a step back, look, and I say, oh, okay, maybe there's this sink over here, I can take a look at that. The whole point is we've talked before about technologies like Second Life for creating virtual environments for how people will want to interact with the world. This is something that maybe it's properly referred to as one and a half life. You know, if first life is the physical world that we interact with, Second Life is the virtual gaming world, this is somewhere in between where we have something of a synthetic world that's been created, but it is, in fact, married, it's consistent with physical reality, but also has the ability of being combined with the Internet and the Web content that's already there. So, for example, just thinking about the possibilities for what you can do for connecting the dots between a physical storefront and an online storefront, because keep in mind what we've effectively done here is created the ability to place hyperlinks from your physical store into your online store, and to connect the dots. So, every time you've ever been in a store, and you wish to have the ability of pulling out a rich set of information that was supplemental to the individual products, we now have the ability of surfacing that, giving you the best of both worlds. So, I will now switch gears again, going back to the presentation. You've got to love this; my favorite feature. So, again I had mentioned that Photosynth is another example of this phenomenon whereby, as I said before, users take digital photos, the collection of photos creates something of a model of the world, either something that is narrowly encapsulating a small piece of the world, or something maybe where we can get more aspirational and think about doing this for a significant part of the world. This in turn facilitates people to discover information in entirely new ways, and really kind of bootstraps the whole phenomenon of helping people to discover information. Even more longer term, we have the potential of seeing a similar phenomenon with respect to science, because people want to publish scientific statements. The body of scientific knowledge that's encoded online, that actually facilitates new forms of discovery to be realized, which feeds the whole thing. And this is just to show that this isn't merely about products, this isn't merely about advertisement; it's really profound and even speaks to a day in which things like the scientific method actually change. So, at this point there's a really interesting question that might be on some of your heads: How do we connect the past and the present and the future together in a way where we can sort of rationalize some of the trends that I've been forecasting here to the history of a company like Microsoft, the state of the Internet as it exists today? And I have a rather bold claim to make. And I think before this presentation, if I had made it, you would have not been too motivated to buy the argument, but I think now I may have laid a little bit more groundwork to help make the case. My claim is that things that are often held as encapsulating the Web 2.0 philosophy, or that which is important about the Web, are actually quite aligned and consistent with some of the fundamental values that have been Microsoft's core for decades. So let me talk about this. When we talk about long tails, the real point, the thing that made them important was more power to more people, lowering the barriers of participation. Well, we as Microsoft have really helped push forward a couple of different revolutions of this type with bringing out ubiquitous computing, putting a desktop -- you know, aspiring to put a computer on every desktop, and prioritizing things like desktop publishing, Visual Basic and [Microsoft] Excel macros, and other technologies that really helped democratize participation to bring more people into the fold.
PICNIC '07 Westergasfabriek Amsterdam, The Netherlands Monique van Dusseldorp Photosynth: Process http://www.pbs.org/kcet/wiredscience/video/86-photosynth.html PBS Wired Science: Episode 101 2007 10 03 Video Interview (TV Show) Wired Science Episode 1 Ziya Tong http://www.aiga.org/content.cfm/video-next-2007-blaise-aguera-y-arcas The Prehistory of the Metaverse 2007 10 13 KURT ANDERSEN Our next speaker is - he's done an extraordinary thing in a very short period of time. He started off, only a few years ago at the beginning of this century, as a Princeton Physics BA and Applied Math PhD. He has applied computer power in various interesting ways to neuroscience and history, among other fields. Three years ago, he founded a company called Seadragon to develop software to deal with massive amount of visual information. Microsoft bought his company in 2006 and now employs him and his team in their sort of futuristic lab. He was the architect for, what I believe, was the first great product of Seadragon in his work, which is called Photosynth. It's the kind of software, and this just doesn't do it justice you have to see it, "Takes a large collection of photos of a place or an object, analyzes them for similarities, and then displays those photos in a reconstituted three dimensional space. It's extraordinary. It is an unquestionable leap into the next - I think of that funny a cappella song in the 2020 sequence about our wacky Jetson's future. Well, this fellow makes you think it's closer than you think. I present to you, Blaise Agüera y Arcas. BLAISE AGÜERA Y ARCAS Thank you very much for that very kind introduction. It's a little bit sobering to follow Alex Steffen. The things that we're doing, that my team and I are doing, I think, are really not very important compared to those issues. But maybe there is some relevance in dematerializing that we can bring to there. What I'd like to talk about today is, it's a little more than just Photosynth the tech preview, which probably some of you have seen since it's been on the web now, for some months. A lot of us grew up reading science fiction. And in particular, if you're my age, you might have grown up reading a lot of Cyberpunk in the '80s and then the early '90s. And there was this idea in the air that when computers began to connect together and network in a very real sense, we would start to see cyberspace. We would start to see a kind of mirror of reality behind the screen, and then when the web came along, it was sort of a surprise. You know it was so simple. It's just text documents with some other lined words and hyperlinks. And it was both very simple and very beautiful, and very within reach. It was surprisingly within reach and that's why it changed everything so quickly. I think more quickly than anybody really realized was going to happen because it was so much simpler and because it made use of things that we already had and we already knew how to deal with. And as the years passed those kinds of ideas about, you know, of virtual reality visors, and goggles, and all sorts of things like this, kind of faded away. And cyberspace, these days, has come to mean mostly just being online, I suppose. But a lot of what has motivated us and a lot of what we do is about bringing back certain aspects of that original vision in thinking about what's missing, what pieces are missing and what we can do to bring some of that original vision back, I suppose, in Ballard, on the waterfront in Seattle. And that's where we are now. This is the Smith Tower. When it was being built, I think in 1914, I think it was the tallest building outside of Manhattan at the time. So we're not on the main campus in Redmond. We were actually going to move here originally before Microsoft bought us and part of the deal was we'd still get to move in here and keep a certain amount of independence. And it's worked out really beautifully so far. But having said that, what we came to Microsoft with is actually much less than what we have now. And a lot of that much less is so because of collaborations that we've been making with Microsoft Research and with the University of Washington, which Microsoft Research works with very closely. This is a story - well, let me not jump ahead of myself. Let's have a quick look first at what Seadragon did, and I don't want to spend too much time of this one, but this is the Seadragon engine, and some of you have probably seen this before in demos online. But the idea is that you have, in this case, quite a large number of, not a very large number, just a few hundred, photos, and these are in the six to eight-mega pixel range for the most part. Some of them are larger panoramas. We're accessing them all very fluidly. It almost doesn't matter whether you're looking at these locally or remotely. That is, whether the contents is on the hard drive or whether it's somewhere out there in the cloud on the web. And the reason for that is that the way we think about interacting with documents on computers nowadays is, I think, a little bit broken in the sense that we think about an application that opens a document that means reading the whole thing into memory. And then if it's too big, well, it takes a long time. And if it's too big to show on the screen all at once then we have all of these scroll bars and other sorts of affordances that let you look at pieces of it. But you're not exploiting the fact that either you can see many objects at low resolution, or one object at fairly low resolution, or you can see a part of an object at very high resolution. But you can never see the entire object simultaneously at very high resolution. The amount of information you can see at any given moment is constrained by the number of pixels on your screen. So why not think about documents as being sources of information that you interact with and this is something we've seen in mapping software. For example, we've seen it in online mapping software in the past few years. But we really propose to bring that kind of capability in a very generic way to all sorts of documents and all sorts of interactions. This is one that I always enjoy showing off. It's "Bleak House" by Dickens. Every column is a chapter. And I'm not really saying that this is a good way to read an e-book, but it does show that the architecture is very generic. You can use these kinds of techniques to interact with documents whether there're images, or text, or more complex things. This is an issue of The Guardian that shows off some of the advertising possibilities for things like this. I think if we took this kind of approach we could do away with things like popup ads that are irritating and that are there because of the idea of limited screen real estate, which is really a glass ceiling, of course. So in this case, we've taken a real issue of The Guardian, and it's nice to read with this kind of experience because things like newspapers and magazines really are inherently multiscaled experiences. You have headlines, and small type, and so on. And in this case we've added a fictional ad that has some even smaller type than you can fit into the print publication, and even smaller type, and even smaller type, and we can see what the CO2 emissions are of this car. The reason that I think this kind of approach is really interesting for thinking about rich content on the web, is that there's a real barrier to clicking on a banner ad. For example, if you're on a webpage and you have some dancing figurine, you're not going to click on it because that's going to take you out of the context of that webpage or that environment. It's going to force you to go somewhere else, or it's going to pop something up. There isn't the sense of continuity about the way we interact with information typically in computers that we get in real life. If we can just look more closely and do it in a continuous and natural way, and back out anytime you want. All right. So this was just toy mapping application, which we, I think, won't spend time on. But one thing that I want to point out, it's a really interesting fact, that the amount of processing power in our typical computer, even a laptop like this one, a fairly modest laptop, it has a hidden resource that we don't often think about, which is the graphics processing unit, the GPU. Almost all computers, nowadays, have one and by many measures it has a lot more power than the CPU does. It's almost like the tail that is about to wag the dog. And yet, aside from games, we don't really see very much use of the GPU, yet. And I think this something that is really going to change quite dramatically in the near future. So what I'm showing you here is just that we can do all the same kind of Seadragon stuff in 3D and that was a capability that we just got for free because we're using the graphic's hardware that way, with the same kind of richness of content and so on. Now, this was just a toy. Back when it was Seadragon we didn't have a good use for the 3D. And then we saw this. This was about, I don't know, about a month of two months after the acquisition of Seadragon, and when I saw it for the first time it just blew my socks off. It was a project by a graduate student, Noah Snavely. He was at the University of Washington. He was co-advised by Steve Seitz at UW and Rick Szeliski at Microsoft Research. Maybe it's worth actually just saying parenthetically, because people outside of computer science research don't really know this, I think. Microsoft Research is really a phenomenal place. It's about 15 years old and it's really become like the Bell Labs of computer science, nowadays. It's very different from what one usually thinks about when you think about Microsoft. At the last SIGGRAPH Conference the big graphics conference that happens every year somewhere between a quarter and a third of all the papers had a Microsoft Research coauthor which is staggering, I think a staggering number. Anyway, so at the MSR technology fair in February 2006 we saw Photo Tourism and this is the idea behind Photo Tourism. Noah took collections of photos. And in this case, this is a collection of a few hundred photos of Notre Dame Cathedral that he mined from Flicker and he made a three-dimensional reconstruction of Notre Dame Cathedral based entirely on those photos. I'll show this live in a moment. But that cloud of points that you're seeing in the background is Notre Dame in 3D reconstructed entirely from the photos and all of these little orange cones are where all the pictures were taken from. Those are reconstructions of the positions of the cameras when all of those shots were taken. What you're seeing over here is the frustum or the viewing cone of one of those cameras. Photo Tourism had two different kinds of reactions, one from insiders in computer vision and one from everybody else. And from everybody else it was, holy shit you can do that? It was just nobody knew that it was possible to do this. And within the computer vision reaction was a little different. They also thought it was brilliant, but they thought it was brilliant for a different reason, because the idea of reconstructing 3D from images is actually one that has quite of a lot of history. Just like the web, I think, just like the text documents, and the idea of hyperlinks, and so on, it has an academic history. It's been going on for many years. The first steps along that path happened in the early '80s, and so in that sense, Noah was really using off the shelf techniques to do his 3D reconstructions. The clever thing is that instead of trying to go for the what has always been thought of as the holy grail in reconstruction of 3D from images, which is to do something like make a game level, make a Quake like game level out of nothing but photos and video, instead, he and his advisors realized that there's a lot of value in the photos. And in fact, maybe the photos are more valuable than a Doom like game level that has Notre Dame Cathedral in it. That the photos are actually what matters, and that putting the photos into context in relating them to each other and connecting them, and building a user interface that's all about relating the photos to each other, you actually have something a lot more powerful than just a 3D reconstruction. And it has all sorts of interesting implications. So I don't want to spend too much time, I see I only have 12 minutes and 29 seconds left, so I don't want to spend too much time on the Algorithm, but I do want to give you a bit of appreciation for what's actually going on here. So what you do with images, to start with is you first do something called feature extraction, and then you match those features to each other and that incorporates images, and then you do 3D reconstruction based on the feature matches by David Lowe, at the University of British Columbia. He first showed it off, I think, in 1999. Again, it was the evolution of a lot of work, starting with more back in the '80s. These are pictures from a paper of his of two objects, a plastic frog and a plastic truck. And he learned the features, or he had his computer learn a set of features in these two objects, and then he put them into this cluttered environment, and they're partially obscured, and they're upside down, and so on, and the algorithm finds the objects. This is really the key to being able to do reconstruction of scenes form objects. You find features in the photos and then we find where those features reoccur in other photos even if they're tilted or at different scales or different angles. This is just another example of the same thing. And in this case, these photos that he used to learn features were taken from different points of view from this photo. But all of the occurrences of those features in the big photo are found. This is the same thing. When we do it those are, I don't know how visible it is, but that's lots of little boxes, each representing a feature in this image of the space shuttle. We've done some collaborations in Photosynth with NASA around doing reconstruction of the space shuttle and they've been really interested in using it for doing things like tracking the way the heat tiles get damaged over time and so on. So you take little features out of the images and each feature, each of those little squares that you are seeing, gets what's called a descriptor. And the descriptor is a series of numbers that describes what that feature is in a way that you hope is unique to that feature but that's robust to different points of view in different cameras. In other words, these numbers that correspond to what's in this box, and this, by the way, in David Lowe's original, was just a series of 128 numbers. The hope is that those 128 numbers are going to be very similar for different points of view on this same feature from taken from different cameras and by different users and so on. Okay. We'll skip the geeky stuff. You take a bunch of pictures, you get all the features in each picture, and then you match, and then you do a 3D reconstruction. So matching is just the process of taking all of the features that occur in all these images and finding their closest neighbors in other images, and therefore, figuring out when you have pairs of images that seem like they're looking at the same thing. And now, because you have features and you know where those features occurred in all of those images, you can solve a big system of equations that says well, where did those features have to be in three-dimensional space in order to land at the spots in the imaging plane where they landed on all those photos. Okay. So think about it this way, what you're solving for, what you're trying to figure out, is two things simultaneously. For each feature where was it in 3D space? We know where it was in 2D space in each one of the photos, but where is it in 3D space consistent with having landed at that 2D spot on all the photos, and therefore, where would all the cameras have to have been in order for them to end up there? And that's what it is. Well, when we first saw this, we immediately put our Catalan designer, who designed our cool Seadragon logo, on the job of figuring out what the Photosynth logo was going to be and came up with that. The reason that this seemed like such an interesting marriage was because all of this is about taking large corpuses of images and figuring out their relationships to each other. And in case the connection isn't totally obvious by now, of course, for dealing with large corpuses of images remotely, if they're very high resolution, and if there are many of them, and if you want to access them instantly, Seadragon is a very nice technology for doing this, and we were just waiting for the right way to do things like spatially arrange photos relative to each other. And this was just the prefect marriage. All right. So let me stop with all the PowerPoint stuff and pull up - now, this is live from the web. So any of you who have your laptops open, the wireless in here is pretty crap, so please close them so I get all the bandwidth I can. (AUDIENCE LAUGHTER) This is one of my favorite environments that's already up right now on the web. It's Gary Faigin's studio. He's the NPR art commentator in Seattle. When I click this button on this environment, we get something that looks a lot like a Seadragon view of all these images. And, if the network's willing, we see things refined in the Seadragon way. Some of you don't have your laptops closed yet. And then we can navigate these guys in 3D. Now, in addition to taking a bunch of snapshots around Gary Faigin's studio, what we also did was to take really high-resolution scans of the artworks and throw in (INAUDIBLE). And that means in some of these images we can, in fact, in all of the images on the walls, we can go down and look at the stitches in the canvas. So theses are 80-mega pixel images, at least. And that's true of all of these charcoal sketches on the walls and so on. Let's go have a look at that one. AUDIENCE MEMBER (INAUDIBLE QUESTION). BLAISE AGÜERA Y ARCAS Sorry. AUDIENCE MEMBER (INAUDIBLE). BLAISE AGÜERA Y ARCAS Yeah. Okay. You're asking about the cloud of points, right? So that's right. If we zoom out here, and we look at what's going on, this cloud of points that's been reconstructed is, in fact, all of those features that have been solved for in 3D based on that collection of photos. So that's the 3D model. You can see that Gary's hair over here really got quite a number of features. I suppose because it's go so many objects in it. Okay. So this shows you, I think, why it's interesting, why there are interesting sorts of vertical applications for this kind of technology. You can imagine all sorts of fun things happening with it. For example, let's see, let's pull up, oops, it doesn't work for me? Okay. Let's pull up this guy. This is an interesting commercial application, I think. It shouldn't really be installing at the moment of the talk, but it's okay. This is a small shop in downtown Seattle that sells kitchen and bath things. And what we did here is to take a bunch of photos inside the shop and hook it up to this web sidebar. So as you move through the environment you get things like ecommerce showing up on the left-hand side. We imagine that this is going to be a really interesting way to do things like take stores that have invested a lot of work in design in the physical sense, but that, maybe, don't have the resources to go and hire web design firms and so on to go and design custom sites for themselves, but still want to have an online presence and exploit all of the physicality of their assets of what they have. It allows the authoring to be done entirely with digital camera. So I think that's a very interesting application. And there are a bunch of other really interesting applications along these lines. But the real power of this stuff comes from what we get to do when lots of people take photos and start to synth them together. And when people are allowed to synth their own photos and also, take their own synths and start to combine them and match them up from synths from other people. Now, I'm just pulling up Noah's original demo. I'm sorry, I should really be wrapping up, but I want to show this last thing, this last thing working. This is Noah's original data set. Sorry, it's very early build of Photosynth. We had this working in that same summer and we didn't have the user interface worked out yet. So you can see this craziness of white boxes. But anyway, that's all the photos of Notre Dame Cathedral culled from Flicker. And it included a wide range of photos. Everything from extreme close-ups of little gargoyles in the archways, all the way out to, let's see, keep going, this is all the way across the river. And there are some interesting things in here I can't resist showing. One of the things that happens when you start to match photos with each other is that you get - let's see, yeah, the UI really hadn't been worked out, at this point, very well yet. Here we go. This is a poster of Notre Dame and, in fact, it's recurrent because it's a poster of Notre Dame on Notre Dame. But we connect from the poster to the cathedral, itself. And that's really interesting when you can think about sort of hyperlinks that you can make physically by taking pictures of environments, and putting them up on walls and other environments, and that becomes like a kind of like a wormhole or tunnel. So the kind of authoring possibilities and the sorts of things that you can make with this are really pretty interesting. For me, for my money, this is really the first time that I've seen something that I think is convincingly a platform that we can really build the Metaverse on and it has that same sort of deceptive quality that the web did. It uses off the shelf parts it uses nothing but the corpus of digital photos and other kinds of visual assets that we've been so busy collecting and putting online. You know, the materials are all there already and, in some sense, all we have to do is start to put it all together. And suddenly very large parts of the earth's surface, and in particular, the most interesting parts of it, for us as creations of culture, will start to knit together and start to connect, and it'll be something emergent and it will come from many peoples images. So that's coming. Photosynth community is going to be released sometime next year, in the first half of next year, I think. So we're really looking forward to seeing what happens around it. What people do with it. Thank you very much. I'm not sure if there's time. (APPLAUSE) I know I was supposed to leave five minutes, but I left five seconds instead. KURT ANDERSEN Sure. You've got three seconds now. No. Beam me up. That is - every time I see it, it makes me feel stoned. So um, have designers or artists, yet had access to the authoring sufficient to be able to figure out applications that they might do with this? BLAISE AGÜERA Y ARCAS Well, I think probably not enough yet. But we're still refining. We did a preview of the front end of the viewer, of course, but we only had a few canned environments to put up. And so we haven't yet released the tools that actually let you knit things together. Developing those and making that really work robustly has been a lot of what the past year has been about, so soon, soon. KURT ANDERSEN I mean, it just seems like it is ready, as amazing as a set of tools as it is, it's ready for 100 flowers to bloom to discover its power in ways that I certainly can't predict. BLAISE AGÜERA Y ARCAS We hope so. And I don't think we can predict them either. KURT ANDERSEN From Microsoft's point of view, I mean, you made the Bell Labs comparison; there's all kinds of things Bell Labs did that they had no idea how they were going to become important five, ten, twenty years later. Do you have a sense beyond the going into a store and having the ecommerce available on it, what kind of big practical applications there might be here? BLAISE AGÜERA Y ARCAS When I make these web comparisons, I guess, one of the things that I'm trying to say, is that it's a little bit like, you know, if you were a VC and I were at a startup company called Mosaic, or something like this, and I wanted to make a pitch to you well, what's the practical application. You know, of hyperlink text documents in 1989/1990. KURT ANDERSEN Right. BLAISE AGÜERA Y ARCAS You know, you can come up with a few things but it would look a little bit Jules Verne-ish. And you'd come up with some things and they'd be true, but they'd also be so much less than the reality of what that's become. It's created an entire eco system. And I think in a similar way, this is about creating an eco system in which all sorts of things can happen, some of which we can predict and some of not. You know, when you start to hyperlink photos together and you start to create something that looks like the web, but in a geographic or geospatial form, you know, 1,000 flowers will bloom. I think that's a - KURT ANDERSEN Yeah. And the Notre Dame example is so amazing when it becomes this easily assessable global network of thousands of millions of images. When you mentioned the Photosynth community going online early next year, is that what that will enable? BLAISE AGÜERA Y ARCAS Yes. KURT ANDERSEN Uh huh. And the only sad and disappointing part of this, I'm sure, for most of the people in this audience, is that it's only available on Windows. (AUDIENCE BOOING) BLAISE AGÜERA Y ARCAS Well, actually - KURT ANDERSEN Will that change? BLAISE AGÜERA Y ARCAS I know that this is a really sore point, especially with the design audience. KURT ANDERSEN Yeah. BLAISE AGÜERA Y ARCAS And you should - well, I can't make announcements right now about exactly what we're doing. But - KURT ANDERSEN There's reason for hope? BLAISE AGÜERA Y ARCAS There's absolutely reason for hope. I will say this, we're really interested. (AUDIENCE CHEERING) KURT ANDERSEN Is this, in developing this and making it better and perfecting it, the focus of your work, or is there some next thing that in 2009 you will be able to show and blow everyone's socks off in a new fashion? BLAISE AGÜERA Y ARCAS Well, I hope, I hope so. Yeah. I have a few different projects that I'm overseeing right now. Live Labs has a bunch of things in the pipe that are very, very interesting. So I'm hoping that me and other people from Live Labs are going to have quite a few interesting things to show over the next few years.
Next: AIGA Design Conference Kurt Andersen http://grail.cs.washington.edu/projects/mvscpc/ Multi-View Stereo for Community Photo Collections 2007 10 14 Video Demonstration, Paper ICCV 2007 Rio de Janeiro, Brasil Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steve Seitz Photosynth: Future http://en.sevenload.com/videos/xNyNIRT-Blaise-Aguera-y-Arcas-Photosynth High Order Bit: Photosynth 2007 11 07 Web 2.0 Expo Berliner Congress Center Berlin, Germany Photosynth: Process, Early Work http://on10.net/blogs/nuric/Sultanahmed-Photosynth-Collections/ Sultanahmed Square Photosynth Collections 2008 01 04 Nuri Cankaya Photosynth: Prototype In Action http://www.youtube.com/view_play_list?p=2BFE7D2A85AB03DF&playnext=1&playnext_from=PL TechFest IIT Mumbai 2008 01 25 In 2004, Blaise founded a company, SeaDragon, to develop ideas in scalable architectures and new ones for interacting with massive visual information. Microsoft bought SeaDragon at the beginning of 2006. Since joining Microsoft Live Labs, Blaise became the architect for Photosynth as well as Seadragon. Outside Microsoft, Blaise has applied computational techniques in a variety of fields, including neuroscience and history. This talk centres on two technologies, SeaDragon and Photosynth. Here, we will explore what will happen when these new capabilities converge with the collective effects of Web 2.0. During the lecture, if you have any questions, please raise your hands. I now invite on stage, Blaise Aguera y Arcas. (Applause) As a token of our appreciation, Sir, please accept a bouquet. (Laughs) That's very kind; thank you. Thank you so much. I'm, I'm really, um... I'm really very flattered by how many of you have come to hear me, and uh... uh... thank you for having me out. It's, uh... it's my first time in India, um, and, uh, I'm really looking forward to spending, you know, a couple of days here; I wish it could be longer. So, um, I, I suspect that a lot of you are here because of the, the TED video which was seven minutes. I'm going to try, now, to, to give you, um, well a much more extended and much, uh, more technical presentation about some of the aspects of, uh... uh... of Photosynth, especially the computational vision parts, but, uh, but also maybe we can spend a bit more time, throughout, really talking about, about the implications of some of this technology, its origins, and, uh, and where it's going. So these, of course, are the, are the logos of, uh, um, Seadragon on the, on the upper left and Photosynth on the, on the right. (BREAK IN AVAILABLE RECORDING) ... because this is a, a new organization inside Microsoft, and one that, uh, that, that's quite interesting. It, it started, really just before... (Let's try and get rid of this, uh, message) Uh, Live... Live Labs was founded just before, uh, just before Microsoft acquired my company, uh, in 2005, and it's grown quite quickly to about a hundred people. It's... the idea behind Live Labs is that it's, it's a combination of, um, of research and, um, and development that's, uh, that's a bit unique in, in Microsoft. As, as many of you know, uh, you know, Microsoft, of course, is very well known for, for all of its, uh, uh, operating system and office products type stuff, um, and, uh, that's all well and good. Uh... There is also Microsoft Research, which, uh, which I, I personally find, um, uh, more exciting, I suppose. Uh, I... Microsoft Research is, is a much less public part of Microsoft but it's, uh, uh, it's much more like an acedemic department. And, uh, I, I don't know how well known this is, but, for example, at the last, uh, SIGGRAPH, which is the big graphics conference that's held in August of every year, uh, something like, uh, um, a quarter, maybe... maybe even as many as a third of the accepted papers at this conference had a Microsoft Research co-author, which is an absolutely astonishing number. So, a lot of, a lot of my, um, a lot of collaborators and a lot of people that I had known in, in various ways, uh, long before I worked at Microsoft are, are, are in MSR. Uh, but, of course, being more like an acedemic department, uh, and, and publishing so much, uh, it's, it's also, um... there was also a bit of impedance mismatch between, uh, between Research and all the product groups. Uh, we go to a product that the product groups... uh, you know, we have Word, for example, is not exactly releasing, uh, all of this, uh, um, kind of cutting edge research. (BREAK IN AVAILABLE RECORDING) This is Seadragon. So, uh, I... I won't spend too long because I, I know that many of you have, uh, have seen this before, but maybe I'll show you a couple of new aspects of it. Um... Seadragon is a, is a multiresolution system for interacting with visual information. This is a, a client and server technology, and what that means is that it's, uh, it has, it has... it's a system with two ends. One, one end is, uh, is, is on a server and that actually is trivial. And the more interesting part of it is, is client software that allows access to this visual information. And when I, when I talk about visual information, I'm talking, both, about, uh, ordinary, ordinary digital images, for example, like this one. This is a, a six or eight megapixel image of one of the, um, Angkor Wat temple in Cambodia, I believe. Um.. And, and also, um, and also more complicated kinds of objects. So, somewhere around here there's a, there's a very high resolution image and this is, this is, uh, this is by now a very old collection, so, uh... This is a, an image of about 300 megapixels from The United States Library of Congress. Um, and as, as you can see, the interesting thing... (Applause) No! Please... please, don't applaud. Ask questions, but, uh, but, but don't clap. (Laughter) Um, the interesting thing... so the thing to notice, of course, is that, is that the responsiveness of the software is the same, whether we're looking at a, at a, uh, at an ordinary digital camera image or at a very large image like this. And the reason for that is very simple. Uh, it's, it's because, it's not because of anything magic that the software is doing, but rather because of, uh, of a real mistake that I think is being made in the way, uh, the way images are normally dealt with on the computer. Normally, um... well I, I guess the way ??? images ??? from the very beginning is a kind of raster-based system in which you store all of the pixels in the image, starting at the upper left and kind of going in reading order until you get to the bottom right. And, uh, that's, uh, a ridiculous way of storing an image because it means that you don't know what the, what the bottom right of the picture looks like until almost the end of the entire image stream. And,uh, if the image is, uh, is very large then that could be a very long image stream and you, you might have to wait for a very long time, especially if the source of that imagery is over a narrow bandwidth connection. Now, if you're actually interacting with an image on a screen of a, of a, of a PC, you don't need all of that information. You can either see an entire image at low resolution all at once or you can zoom in and you can see only a part of it at high resolution. You can't see the whole thing at high resolution at once. So the, the trick is very simple. It's just a matter of, uh, of decomposing the image into what's called the multiresolution pyramid and this is a very old idea, uh, which means that, that you, you have, um, tiles of information about the image at low resolution and then you, you break it up spatially into smaller tiles at, at, uh, at higher resolution. In fact, let me show you what these tiles look like. I've just turned on a switch that, that as, as it brings tiles in from the server will, um, will actually color these tiles with random colors. And you can see that as those tiles come in, only the tiles are coming in that were actually... that we actually need as we, as we zoom in on different, on different, uh, parts of this, of this little world. So, it's simply a dialogue. Instead of, instead of being, uh, um, instead of being some... instead of loading an entire image at once when we need it, we're only loading the parts of it that we need as we navigate. There's a continual dialogue happening between the client and the server. If you, if you think about this kind of approach to applications in general, I think it's a very powerful approach because it, it means that you're no longer limited, uh, in terms of how, uh, how much detail you can access; it's absolutely unlimited. Uh, the total amount of bandwidth, the total number of memory, and the total amount of processing power that you need is constrained by the number of pixels on the display and that's it. And this is also a very powerful idea if you think about mismatches between the size of the imagery and the size of the display that you're working with. Uh.. On the web, of course, we, we tend to only see images that are about 400 pixels across, at most, because, because that's about how big things need to be in order to fit on the screen, but if you're, if you're browsing using, uh, say, an iPhone, or, or a small device like this, then the screen, the screen is no longer the right size to deal with those images and so you start to have to, have to do zoom and Apple implemented zoom in a very beautiful way with the iPhone. Uh, but of course, they haven't changed... by, by doing that they haven't changed anything basic about the way the images are actually represented out there in the cloud. And if you, uh, if you instead do a more systemic change in terms of how those images are represented, then suddenly you're in a world where you can access the same kinds of images, the same kinds of visual information on very small screens or on enormous screens the size of the wall, uh, on low bandwidth connections, on high bandwidth connections; it doesn't matter. You've, you've broken all of those dependencies between, uh, memory, bandwidth, and so on. Um, okay, let's see, now I can... (um, I'd better turn off, um, turn off my special mode). This is, uh, so I wrote a couple of other kinds of content in this, in this, in this little, uh, collection, which I could show you. Uh... One of them is, um, content which is not, in fact, based on data, but based on calculation. Many of you will recognize this object; it's a Mandelbrot set. It's a mathematical object. So this is an example of something that' being computed, rather than accessed over the, over the, uh, the internet. And uh, so this is just a little, a little game or a toy but it's, it's, it's designed to show you how this, this multiresolution approach is, is geared toward all sorts of situations, even where, um, where the, where, where you're not, you're not, uh, databased. This is, um, this is the text of a book. It's, uh, it's a very famous, a very famous book, 'Bleak House' by Charles Dickens. And, uh, this... every, every column is a chapter. Now, I'm, I'm doing this, I'm zooming in on a, on a letter like this in order to show you, in order to prove to you, if you will, that, that it's... this is not just an image of the entire book of 'Bleak House' because if this were an image, then, um, it would have to be, um, several terapixels in order to produce this sort of resolution. So you know it's not, it's not an image. That doesn't fit on any, any, uh, on, on any hard disk that, that I know. Um... (Error message appears in the middle of Seadragon's window) Blaise: (Laughs) (Laughter) Blaise: Alright. Sorry. Um... So what's going on here is that, is that we're changing representations at different scales. (Um... This is demoware, obviously.) When, when we're, when we're zoomed very far out of this book, uh, we're, we're dealing in a representation that doesn't represent the individual characters of text because, at this scale, when, when all you can see is, is less than a pixel for every character, it's not efficient to represent text as ASCII anymore. At this point it's, it's more like an image-based representation. Whereas if we start zooming in and we have an 'a' fill up the entire screen, at this point it's no longer efficient to represent that 'a' as pixels; we could just be sending one byte to represent the entire character 'a'. So there are changes of representation at different scales in order to maintain that sense of efficiency across scales. And, and what kinds of representations we pick at what scales depends on, um, on what the data is and on what you're, what you're doing with it. (BREAK IN AVAILABLE RECORDING) ... and it's more pleasant to read this way than it is in the, in the usual web format. We've also played some games with this, uh, with, with this issue of The Guardian by r... for example, replacing an ad with a, uh, with a very high resolution ad that's higher resolution than you would be able to see in, in a, in a print format. And, um, we've also made it a multiresolution ad, so this is just, you can just look... play interesting games like this. Um... other, other ca... other cars perhaps this fictional car manufacturer might have manufactured and even the technical specifications of the car and so on. Um... So you can do things with this that, that simulate physical media, uh, but also that go beyond simulating physical media. This is, of course, the most obvious sort of thing you can do with a multiresolution representation of visual information like this. It's a, it's a map of the world. Uh... It's, uh... I'm, I'm cheating here; it's not, it's not, um, uh, it's not a real mapping application. Uh... It's just a layout of a series of tiles. In, in three dimensions you can sort of see what we're doing there. Um... but the idea here is that there's a high resolution NASA image of a, of a, (a U.S. Space Agency) a high resolution image of the Earth. And, uh, and layered on top of that, uh, within the U.S. are a series of tiles of, um, of street maps. Now, uh, since we have, since we have the time, it's just interesting to have a, to have a quick look at what's going on here because it really is quite interesting. This representation of street maps and this way of doing street maps is very different from, uh, from the sort of thing that you see on the web, uh, in a number of ways. Um... One, one way is, of course, that when you're looking at street maps on the web, you, uh, you have discreet levels of zoom, typically, meaning that there are only certain zoom scales that are allowed. Uh... maybe Flash and in some of the fancier mapping applications, we can use Flash or Silverlight or something to provide, uh, at least a continuous transition between those discrete levels of zoom, but it's all discrete.And here, it's continuous. But the other thing is that, uh, if you look at, at conventional mapping software on the, on the web, you'll find that, um, that the streets of different, of different types are, um, are only present at certain zoom scales and they disappear at other zoom scales. So, for example, if we're, if you really, if you look very closely at a map online, then you'll see all of the streets, but if you zoom out to here, you'll stop seeing the small streets and you'll see only the larger streets. And then out here you'll stop seeing even the ??? streets, and see only the highways, and so on. Now one of the things that you might notice in this demo is that every street is visible at, at every scale. Well, I'm sorry that the, the overhead projector is doing, is, is, uh, is, saturating this image a little bit which is, uh, preventing you from getting the full effect, but all of these little country roads are, are visible, uh, at least on my screen, uh, even, even out at this resolution, um, and that's a very nice thing, because... (Error message appears in the middle of Seadragon's window) Blaise: Sorry. ...that's a very nice thing because it means that, it means that you don't have any discontinuities in the available information, uh, across scales. There, there're some interesting tricks to doing that and they're tricks that I think are more broadly applicable than just street maps. I'll spend just a moment describing what's happening here. Uh... now, in general, if you think about mapping and streets, there're two obvious approaches to, to, to rendering a line, uh, a line drawing of streets on a map. One of them is to use the physical scale of the roads, meaning that, uh, meaning that, that the widths of these, um, of these streets are exactly their physical width in metres and if I were to superimpose an aerial photograph I would see that, I would see it overlap precisely with this image. If you do that then when we zoom out to this level, you would see nothing but white on the map because, at this point, even the largest streets would be, um, about a thirtieth of a pixel wide. Okay, so physical scaling only works when you're zoomed in. Now if, on the other hand, you're using the kind of approach that normal mapping software uses, and you say, "Okay, we're going to forget about metres and instead we're going to draw lines in pixels, and I'll say that a, a small street is one pixel wide, and a, an arterial street is two pixels wide, and a highway is three pixels wide, and so on, then, um, then at this scale it would look black instead of looking white. It would suggest a, a, a complete mess of streets; you wouldn't be able to make anything out. So what's going on here, instead, for those of you who are, uh, mathematically minded in the audience, is that it's using a property of streets, which is that they, uh, have a, a, a fractal dimension. Okay, they're fractals, just like the Mandelbrot set that, uh, that I showed earlier. And, uh, what, what fractals, a property of fractals is that they have powerlaw scaling. So these streets also have powerlaw scaling in their width which is designed to invert their natural powerlaw scaling relationship to each other. I don't know if I'm making any sense to anybody in the room. (Laughs) You can think about it somewhat like this: As I zoom out, the streets are simultaneously getting thicker, but they're getting thicker more slowly than, uh, than, than my scale is making them get thinner. In other words they're getting thinner, but they're getting thinner more slowly than they ought to. And so it gives you the illusion that you're zooming in and out of a physical image. I mean, if I simply zoomed in and out of this image, I'm sure all of you would think that this really is just a, just a picture: just a pixelated image just like the other ones, but it's not. Uh... Something very different is going on as I zoom, and in particular, the different kinds of streets are behaving differently. The, the, the large highways are shrinking more slowly than the small roads and so on. So there are all sorts of tricks like this... I, I, only show you this to give you a sense of, some of the, some of the sorts of games that you can play when scale is continuous and when you're allowed to play with, uh, with multiple scales of data however you want. Alright, so all of these demos that I've been showing you of Seadragon have actually all been based on this same collection of, of, uh, a couple of thousand objects and, um, they're all taking advantage of hardware acceleration on the, on the, uh, PC which is something that has been given to us by the gaming industry. Uh... It's, it's, uh, the story of how the internet and how computers have evolved over the past, um, few decades is very very interesting and often, uh, innovations are driven by, um, by very strange market forces. In this particular case, the market force that has driven the development of, uh, of graphics hardware on commodity computers has been the gaming industry. Uh... 3D games keep on pushing, pushing graphics and making them more realistic and more, and more high fidelity and so on, but there aren't many applications out there now that make use of that graphics hardware in interesting ways, other than games and one of the things that we're trying to do is to, is to change that. (BREAK IN AVAILABLE RECORDING) ... across photos. In other words, you take, you take, uh, features from different photos, that are presumed to be of the same, of the same thing in, in the original three dimensional space and you, you find those correspondences. Then, based on all those correspondences, we do a 3... we do a 3D reconstruction. So let's take, let's take a look at, um, at first, the, uh, feature matching. This is a very brief history of how, how features in images developed. The, the, the first, um, the first important step in this was in 1981, so quite a long time ago now, by Hans Moravec. And he made something called a corner detector that he used for, uh, for stereo algorithms. Those of you who have done any projects in robotics, uh, will know something about what I'm talking about. Uh, stereo is when you have two different cameras and they're offset by a small distance and you're trying to figure out something about the third dimension by comparing those two images with each other. If you want to do that, then there are various approaches to it and one of the best is to use, uh, to use correspondences of features in those two images. Uh... Moravec made, uh, a series of, of uh, filters, of ???-matching filters that could find good candidates for, um, for matching across images... things that remain relatively invariant. Uh... This was improved and developed on over the next several years. Uh... I think the most important step in this evolution probably comes, um, in 1995 when we go from just detecting features in images - just, uh, just points - to starting to have a descriptor connected with those features. And, uh, in the beginning that descriptor was just the window of pixels nearby. Uh... and, uh, and then these... and then, and then this description improved beyond just the pixels, but the point of those descriptors... (BREAK IN AVAILABLE RECORDING) ...fract adjacent levels of that pyramid. So that we get, uh, so we can get something that's like, uh, a differential operator as a function of scale on the image. Okay, so, uh, we now have a three dimensional function of, of space over the image, x and the y, as well as scale - sigma. Okay, so, uh, so this, um, this is what I mean by scale space. Uh... It's, it's a gaussian of, uh, x and y and started abbreviation of sigma composed with the image. Alright, and this is, this is the differential operator that you get when you, when you, when you subtract the adjacent levels of that pyramid. Now what you do is you find in that three dimensional function of x, y, and sigma, local extrema. In other words, uh, you find a pixel that's surrounded, or, or a voxel, I suppose I should say, that's surrounded by, um, values that are either all greater than or all less than it. Right, and we find the precise location of that extrema. Um... Alright, I, I, I fear that, I fear that if, if we go into too much depth here, we're, we're not going to get to the good stuff. Um... Let me try and give you a feeling for what's going on. We now look nearby - near that, uh, near that extrema at the, uh, at the scale that's been defined by, by the extrema. A.., at the, and we look, and we look at the gradients of the, uh, of the intensity channel, uh, of the image nearby. Okay, so we're looking at only the directions of the gradients. Uh, the reason we look only at the directions of the gradients, not at the magnitude is this is one of many levels of invariance that's built into the SIFT algorithm. If an image is, is more exposed than another one, we don't want that to have any effect on the algorithm. Okay, so an overexposed or underexposed image should result in the same, uh, in, in the same thing. So we, we throw away sca..., we throw away, uh, spatial scale by finding these extrema in scale space, we throw away scaling in the intensity by throwing away the magnitude of the gradient, and, and then we, what we do is we come up with, um, with a, a description of what the envireonment of the maximum of that extrema looks like, in terms of the directions of the ma..., of the, um, of the gradient vectors in the image. Uh, and, uh... The particulars are that you just, uh, you, you bin, uh, you bin the envireonment of that maximum into a, uh, into a, into a coarse grid and, uh, within each, within each grid, you make a, a histogram of about eight or so different orientations. And, um, and it's ac..., it's, it's the set of all of those, um, values of those histograms that, that give you your feature vector. Alright, so, if nothing else you'll understand that there's quite a lot of black magic involved in this. I, I don't think that when David Lowe, um, designed the idea of SIFT he did it by, by doing any kind of, um, mathematical derivation of something or numerical ???, it's just, it's his intuition about what kinds of invariances should be built into images to make things work. But, uh, it's intuition that he developed over the course of many years working on computer vision problems. And, uh... so what's the punchline? Well it works very well. Um... Here, here's, here's some pictures from, uh, from one of, from his 2004 paper. So this is not quite acedemic, but, but I want to, I want to try and give you a sense of what's actually going on here. He, he took his feature, uh, his feature algorithm and ran it on these five images. Okay, so this is an image of a, of a, uh, an Indian long house and these are some other images, uh, that are taken of parts of this long house from other points of view. So, for example, this is an image of the, of the totem pole that we see over here but taken head on, so, uh, it, it differs more than thirty degrees from the original image. And, uh, the algorithm finds lots of features at different scales on this totem pole and is able to, is able to correlate those features with points on the, on, on the larger image and all these boxes over here are showing you where those features were, were correctly matched. Okay, here it is when we do it. This is an image of the Space Shuttle, uh, that, uh, we did, we did a very interesting project a few months ago, uh, with the U.S. Space, Space Agency, um, documenting the, the launch pad of the Space Shuttle (some of you may have seen it online). We find features, we extract feature vectors in the way that I've just described for each of those features. Okay, and the properties of those feature vectors attempt to be invariant to ???, resolution changes, cropping and contrast changes, illumination changes, image rotation, and to some degree, uh, three dimensional rotation of the point of view as well. Those describe edge locations and edge orientations. That's what, what we mean by, by the, the orientations of the mag??? to the gradients. And we hypothesize that if you have two feature vectors that are very close, in the Euclidean sense, to each other then they are very likely to come from the same feature out there in the real world even if they're in different images. So, having done that for one image, we find, we find a, an array of, uh, of features from that image, we do the same thing for a series of other images, and then we go to the matching stage. In the matching stage you just take all of those features and put them into some sort of a feature index... a, a, a database or a data structure of some kind that's able to efficiently find nearest neighbors. And, uh, uh, something like a, like a ??? tree would be a classic, classic structure that let's you do this. And then you look up the nearest neighbors for each descriptor. Okay, this is the, this is the ??? Powerpoint part. Alright so what that means is that, uh, I, I don't norm..., yeah, I, I, I stole these slides from somebody; I don't normally do animations and things in Powerpoint. Okay, so what this means is that we, we have feature vectors from different images that have now been correlated with each other and once that's done then you can do 3D reconstruction because, um, you can't generally take an arbitrary collection of 3D points and have them look one way from, from one point of view and have them look an arbitrary other way from another point of view. Okay, whenever you have more than about five or so points in 3D and they're imaged from two different directions then you have a system of equations that's complete or overcomplete and you can solve it. So, and you, and what you're doing is you're solving, simultaneously, for the three dimensional positions all of those points and also for the camera positions and camera orientations - where those were. 'cause it's just a big nonlinear sets of equations that you solve; that's, that's all there is to it. Uh... you, you do it, you can do this using ??? or a variety of other kinds of nonlinear equation solving techniques. You can do it incrementally, meaning that you start with, uh, say just two images and then you start adding images, assuming that you're on the right track, you, you have a, uh, you have a nice, uh, incremental solution that way. And then you estimate geometry and you get, you get, um, the 3D positions of all those points. And this is why, uh, when we look in Photosynth at, uh, at a, at a model in 3D, what you see is a cloud of points. Right? You don't, you don't see a, a real three dimensional model; you see a cloud of points. Each of those points that you see in the point cloud is, uh, is a feature that was identified in at least two images and which has been solved for in three dimensions. (BREAK IN AVAILABLE RECORDING) Um... Now, of course, I mean the, the thing with, with, with the algorithms in Photosynth is that, uh, you can only, you can only see what was photographed and you're, you're never going to get a 3D reconstruction of anything that wasn't in the photos. So, uh, you know, if you want to get an envireonment, um, thouroughly then you'll need, you'll need quite a few. Uh, that's especially true for indoor envireonments. It's one of the, uh, one of the things that we don't really notice about digital cameras these days is how narrow the field of view is. So if you try to capture any kind of, uh, any kind of inside envireonment, you'll find that even, even ??? will require probably about a hundred shots just to get, just to get anything. (Question being asked outside of microphone's range) Okay the question is, "Are we using Silverlight for the GUI of the app... (...on the client side) ... on the a..., on the client's application?" and, uh, I'm, I'm sorry to say that the answer is, "No." Uh... Now, we do have, um, we do have, uh, the basic Seadragon capabilities going into this next refresh of Silverlight which is very exciting, but, um, now, so you can do, you can do the scaling and the zooming and so on, now in Silverlight. But, um, what Silverlight, and, and for that matter, Flash also, both lack, at the moment, is 3D. Uh, and in particular, they both lack the ability to, to projectively texture, uh, images. So, until, until that's possible to do, it'll be very hard, to make a, a, to make a Silverlight, uh, version of the real 3D viewer. What, what the 3D viewer is, right now, is, uh, an ActiveX control, so it's all, uh, native code - binary code. (Question extension) Well, it's, it's an ActiveX control, uh, just like the Flash player itself is, or like Silverlight is, so you go to the webpage and you click 'Yes' and, and then it, it works in the webpage, but it, but it is a binary of a couple of megabytes that end up getting installed, yeah. Um... And you can uninstall it in the usual ways. It works in both, uh, uh, Internet Explorer and Firefox as well. Uh, if I, uh, if I had my way we would also have, by now, a, a Mac port, by the way, uh, and for that matter a Linux port as well, but, uh, our, our, our develop resources have been quite, have been quite limited and we're really much more concerned, right now, about just getting it out there in some form or other. So we, we supported both IE and Firefox because the underlying, uh, hardware is the same. They're both using Direct3D. The, the problem with things like, uh, making the Mac port is that requires an OpenGL implementation as well; it's a very different architecture - a lot more work.
ITT Techfest 2008 Bombay, India ? Photosynth: Introduction, Process http://isohunt.com/download/50141740/CSI+NY+Admissions.torrent CSI NY Season 4, Episode 18, 'Admissions' 2008 04 30 Video (TV Show) 00:10:40 - 00:11:05 | 00:20:20 - 00:20:38 | 00:24:10 - 00:25:38 | 00:27:35 - 00:28:43 CSI NY Episode 'Admissions' http://research.microsoft.com/apps/video/default.aspx?id=103718 Interactive 3D Building Modeling 2008 05 01 00:43:20 - 00:55:55 Virtual Earth and Location Summit 2008 Building 99, Lecture Room 1919 Redmond, Washington Drew Steedly http://research.microsoft.com/apps/video/default.aspx?id=103721 Towards Reconstructing the World from Photos on the Internet 00:00:00 - 00:32:40 Evelyne Viegas Steve Seitz Photosynth: History http://www.cs.washington.edu/homes/snavely/projects/skeletalset/ Skeletal Sets for Efficient Structure from Motion 2008 06 25 Images, Paper Noah Snavely, Steve Seitz, Rick Szeliski http://research.microsoft.com/en-us/um/redmond/events/fs2008/agenda_tue.aspx Stitching the World and Embracing Real Life with Virtual Earth: Intelligent Webs of Photos (By Accident and By Design), Geo-Positioned Media within the Context of Virtual Earth 2008 07 29 Video Lecture, Slides Microsoft Research Faculty Summit 2008 Blaise Agüera y Arcas, Bill Chen http://phototour.cs.washington.edu/findingpaths/ Finding Paths through the World's Photos 2008 08 12 http://on10.net/blogs/nic/ShutterSpeed-Episode-02/ ShutterSpeed Episode 2 2008 08 16 00:01:25 - 00:05:20 Nic Fillingham Photosynth: Tangential http://www.microsoft.com/presspass/press/2008/aug08/08-20photosynth08pr.mspx Gary Flake: What Is Photosynth? 2008 08 20 Video Demonstration, Press Release Photosynth Release To Web Photosynth: News Sound Bites http://on10.net/blogs/nic/ShutterSpeed-EP04-The-Photosynth-Team/ ShutterSpeed Episode 4: The Photosynth Team Blaise Agüera y Arcas, Drew Steedley, Scott Fynn http://channel9.msdn.com/posts/Dan/Blaise-Aguera-y-Arcas-The-technology-behind-Photosynth/ Blaise Aguera y Arcas: The technology behind Photosynth Dan Fernandez http://channel9.msdn.com/posts/Dan/Drew-Steedly-and-Joshua-Podolak-on-Photosynth/ Drew Steedly and Joshua Podolak on Photosynth Drew Steedly, Joshua Podolak Photosynth: Future, Process http://channel8.msdn.com/Posts/PhotoSynth-created-by-STUDENT/ Microsoft PhotoSynth created by student at University of Washington Max Zuckerman Rick Szeliski, Steve Seitz, Noah Snavely http://on10.net/blogs/laura/PhotoSynth/ How To Synth Video Tutorial|Overview Gasworks Park, Marlow Harris' House Photosynth: Instruction http://blog.seattlepi.com/microsoft/archives/146692.asp Microsoft Photosynth, a guided tour Videos, Article Photosynth Launch Nick Eaton http://www.crunchgear.com/2008/08/20/photosynth-its-here-its-awesome-and-now-its-yours/ Photosynth! It’s here, it’s awesome, and now it’s yours for free Devin Coldewey Ken Reppart http://www.foxbusiness.com/search-results/m/20727948/vacation-pictures-get-upgrade.htm Vacation Pictures Get Upgrade 2008 08 21 New York City, New York, USA | Seattle, Washington, USA Brian Sullivan http://www.pcworld.com/article/157205/navigating_photos_in_3d_with_photosynth.html Navigating Photos in 3D with Photosynth 2008 09 04 Video, Article Danny Allen http://vimeo.com/1894219 Appearance stabilization in PhotoTourism 2008 10 06 Pravin Bhat http://www.vexcel.com/geospatial/geosynth/pdfs/GeoSynth1009-ltr-WEB.pdf GeoSynth Datasheet 2008 10 29 Brochure GeoSynth Launch Vexcel Corporation Boulder, Colorado, USA http://www.microsoft.com/iplicensing/productDetail.aspx?productTitle=Microsoft+GeoSynth Microsoft Intellectual Property Licensing: Microsoft Geosynth http://virtualearth4gov.spaces.live.com/Blog/cns!369B39F890CE30C1!2807.entry Microsoft Announces GeoSynth™ Availability GEOINT 2008 Gaylord Opryland Nashville, Tennessee, USA http://research.microsoft.com/apps/pubs/default.aspx?id=75479 Interactive 3D Architectural Modeling from Unordered Photo Collections 2008 12 01 Paper Drew Steedly, Rick Szeliski http://isohunt.com/download/108941169/CSI+Miami+Head+Case.torrent CSI Miami Season 7, Episode 12, 'Head Case'
2009 01 12 00:12:03 - 00:14:14 CSI Miami Episode 'Head Case' Photosynth: Direct3D Viewer http://www.thearf.org/assets/rethink-09-video Gary Flake: Ghosts of the Internet – Past, Present and Future 2009 04 01 ARF Re:think 2009 Marriott Marquis New York City, New York, USA http://www.microsoft.com/presspass/press/2009/may09/05-07PhotosynthVEPR.mspx Microsoft Photosynth Integrates Into Virtual Earth, Marking Commercial Availability of the 3-D Photo Technology 2009 05 07 Video Press Release Photosynth Commercial Availability Photosynth: New Feature http://www.calit2.net/events/popup.php?id=1564 Cyberspace Arriving, Cal. IT2 2009 05 22 Guest Lecture Calit2 Auditorium, Atkinson Hall San Diego, California, USA Photosynth: History, Future http://www.youtube.com/watch?v=6cnP8FqRoPI Cyberspace Arriving, TEDx Dublin 2009 06 12 (Audience clapping) Since, uh, this past summer I've... I've been the... the architect of, uh, Bing Maps (which actually wasn't called Bing Maps over the summer but that's basically what it is now). Uh, it's, uh... it's Microsoft's, um, uh, mapping software and, um, if you like, uh, virtual reality and the real world software as well; Virtual Earth is part of that, uh... part of that group too. Uh, I'm going to be talking mostly about Photosynth. Uh, this is a... this is work that began with a project called "Photo Tourism", uh, which is a... a... a very... a very nice, ah, Industry-Acedemia collaboration, that, uh... that happened, uh, about three years ago between the University of Washington and, uh, Microsoft Research. Um, Noah Snavely was a grad student advised by, uh, by these two people: by Steve Seitz and... and Rick Szeliski: one of them at MSR, one at UW. Uh, and that... and that was... that was his, uh... his thesis work. What Noah had done was, uh, to take sets of photos, uh, from... from flickr and reconstruct three dimensional environments from those photos. So this is, uh... this is one of the data sets that, uh... that he actually used. It's, uh, Notre... pic... five hundred pictures of Notre Dame Cathedral. And, uh, from those photos alone, without any additional information, um, using... using, uh, his techniques we were able to, uh, take all of those photos and figure out those orange, uh; those orange pyramids on the ground are where the photos were actually taken from. And, uh, that point cloud... that three dimensional, um, kind of sketchy reconstruction of Notre Dame is done also entirely from the photos. So this is, uh, this is this inverse problem called computer vision and, um, it has a very long history. Uh... So this is, this is the sort of thing that gets very different sorts of reactions from people, uh, inside the field of Computer Vision and outside. Um, outside it's, uh, percieved as being, uh, incredible. This is... this is something that most people don't, uh... don't know is possible. Uh, but, uh, I would argue that we're now really coming into the maturity of that... of that field and what's, uh... what's incredible now is going to be, uh, absolutely commonplace, uh, within... within five years. Um, it's... it's an inverse problem and, uh... it's an inverse problem to graphics and what I mean by that is, uh, in the same sense that optical character recognition is the inverse problem to printing characters on the screen, for example, or that, uh, speech recognition is the inverse problem to speech synthesis. The inverse problem is always a lot harder than the forward problem. The forward problem is generative and the inverse problem is, uh... is ambiguous. So the way the algorithm works for doing this, um in... in brief, is uh: you take a set of photos and extract features from those photos and what those features look like are points of interest, uh scattered across the... the places on those photos where there are things happening in the pixels nearby. And then, um, when you have the same feature detected in multiple photos, those are matched, and, uh... and then those matched... those matched features form a kind of, uh, a graph or a network and, um... and now you go into the reconstruction phase. And the reconstruction phase is all about assuming that each one of those features that is matched across multiple photos corresponds to the same three dimensional point in space, x, y, and z. And, uh, we have, uh... so we have now lots and lots of unknowns. There's x, y, and z for every matched feature as well as x, y, and z for every camera because that's not known from the beginning either (and the focal length and the angle). And, um... and you... you just, uh, keep on moving the points around and moving the cameras around until, uh, what's called the reconstruction error drops close to zero (the reconstruction error being just, "Well, if I have the x, y, and z for a point and I'm... and I'm proposing certain positions and orientations for the cameras then I know where, on the imaging plane, those points ought to be observed and I know where they actually were observed, too." So the... the difference... those... those lines... the lengths of all those line segments, um, correspond to an error. So you just keep moving the points, keep moving the cameras till the errors drop to zero and you're done.). So, uh, this is, uh, um... this is how... how these feature points actually, uh, look. This... this is from a... a photo of the space shuttle and another set of, uh... of... of feature points. And now when you match those two you... you get these, uh, these correspondences. So, uh, the basic idea here is that the... the... this matching buisness works to the extent that the description of those feature points is, uh, is invariant (meaning doesn't change) based on what kind of camera you use, what time of day it was, what angle you took it from. So it should be invariant to all of those things. And yet at the same time, maximally informative about what it actually was in the world... what feature that were... that was in the world. And those two things are in tension with each other. If you make something that's too invariant then it can't distinguish, uh... it can't distinguish different features in the world from each other. On the other hand, if it's too specific, then, uh, even a slight change in viewpoint is going to break it so you really have to balance those two things and there some black art in doing that the right way. Um, now, you take, so you take all of those feature points from these photos, you match them by putting them in a, into a, into an index and looking for nearest neighbors, you go to the 3D reconstruction (I think I've already... I've already described, uh, at least in as much depth as I can here, how that works). Uh, so now I'm going to, uh... I'm going to show you what we did with, uh... what we did with, um, uh... with this technology in Photosynth which was first released, uh, publicly in August of this past year. We made a website, uh, that, uh... called photosynth.com, and, uh, it's... it's a little bit like, uh, uh... like flickr or like any of these other sorts of, uh, media sites, uh, social media sites in which people can contribute, uh, photos and then those can get viewed by others, but with a... with a twist - with a difference. And, uh, the difference is that, um, in order to... in order to view a synth, uh, originally we required you to download, uh, a control that runs in the browser... an ActiveX control. And, um, the reason we... the reason we did that, in addition to the fact that you need the ActiveX control to view things in 3D, is also because doing all of the... all of the math of reconstructing the relationships among all of those photos - that takes a lot of... a lot of compute. And, uh, we a.., we actually did this on your own computer as you're uploading the photos. And we're... we're sort of exploiting the fact that, uh, at least outside of Japan, upload of... of data is extremely slow. And so while it's uploading it's... it's, uh, it's... it's using your machine to... to do the computer vision at the same time so it's a big distributed computation project as well. So, um, a synth, what does a synth actually look [like] - well there, so there's a synth featured on the front page, of course, and, um, I'll, uh... I'll pull up one of, uh... one of my favorite ones. So this is... this is, uh, um, uh... this is our, uh, nanny, Jessie, and, uh, it's... it's, uh, one of... it's... it's... it was at the time, I think, uh, a very unusual example of a human subject being synthed. So, uh, when... when you... when you synth and you... and you take photos of something from different angles, uh, Photosynth figures that out and it's able to, to reconst... to... to figure out that there was... that there was an... an object there - something of interest - that you took from multiple angles and that's when this halo appears. So, uh, when I drag the halo around, I... I get a sort of sketchy three dimensional representation of Jessie and, um, each of those... each of those segments corresponds to a photo taken from a different point of view. So I can move around to different points of view. Uh, I took some close up photos of her tatoos and, uh... uh, the other thing - the other technology that... that Photosynth incorporates is, uh... is, uh, Seadragon which was, uh, our... our little startup. And one of the things that Seadragon lets you do is experience, uh, multiresolution content on the web, uh, in... in extreme detail and maybe more detail than you really want. (Audience laughter) Uh, but... but this is... it's, uh... this is one of those things, you know? Our... our... our digital cameras, uh, even ones in phones nowadays, are five or eight megapixels and that's an extraordinary amount of information. It's much more than you can usually see on, uh... on a... on a photo sharing site without using something like Seadragon - some sort of multiresolution imaging technology. Now one of the... one of the premises... one of the things that I think really captured people's imagination about Photo Tourism and Photosynth when we first started talking about it in public was, uh, this idea, of using collective intelligence to map out the world. We know, if we... if we... if we could reconstruct Notre Dame Cathedral by going to flickr, then is it the case that, uh, large parts of the physical world are already there, uh, on the... on the web just waiting to be reconstructed in 3D using the... using the photos that are already available? Um, and this... this idea got a bunch of us very very excited but the results, um, I would say, are mixed... they're mixed. The... the results are mixed. Um, this is, um... this is why. So, if... if we take, uh, a famous place like Piazza Navona and we do a regular, uh, image, uh... image search on the web, you get tons of... so 88,000 hits on... on, uh... on Bing Images, um, (roughly the same on Google) (laughs) and, uh, but the... of course many of these, um... many of these images are... are less... are less than a megapixel; they tend to be quite small. Uh, and if you find images on the web they... they often will be re... be reduced to a very small size (because they're not using Seadragon) and... and that's... and that means that they're... they're not, uh... they're not all that useful for reconstructing. But worse than that, um, the images are all taken from the same places. So, you get, uh, you get better images if you go to a website like flickr. Uh, so these are not, uh, pictures that are just on the web at large but on... explicitly on photo sharing sites so you get fewer... fewer images (29,000) on flickr but when you compare with the entire web at 88,000, 29,000 is quite a lot, especially since most of these are really large and beautiful images. And this is just one photo sharing site, so when you... when you add up all of the big photo sharing sites, you get... you get huge... huge numbers and very high quality. However, um, the... the connectivity patterns among those photos are the... are the somewhat dismaying part. Uh... Well, 'dismaying', I don't know. It's... it's... it's interesting. So this is a... an example of a graph of connections between photos of the Ta Prohm temple in Cambodia. So each of those circles is a... a... represents a photo and I've drawn, here, an edge between two photos when they share, uh, feature points, in other words when they're... when they're viewing something that cou... that... that's, ah... that's the same in 3D as... as a... as a neighboring photo. And, uh, what you find is that, um, there... there is... in the case of Ta Prohm, there is, in fact, one, uh, what's called mega connected component, meaning one set of photos that's all connected and that... and that... that's fairly large, so both of these big clusters and this stuff over here and these things all connect together. This is, actually, uh, some images of a particular place in the front of the temple that everybody takes and another place at the back of the temple that everybody takes. Right? And this is exactly what you find. People are not very, uh, imaginative, I suppose, and they take the same picture again and again and again and again and the only thing that changes is which face is in front. Um, uh, to... and this is to the point where... where there have been some... some people not so faecetiously suggesting cameras that consist of nothing but a GPS, uh, and a compass, ah, and... with no lens that, you know, if you... if you go somewhere and you click, it'll just, you know, look up, uh, you know, whoever took, um... And I think if you... if you did... if you did the small twist where you take whoever your travelling companion was and you insert their photo into the center, then you'd be done. That's... that... that'd be it. Um, so, uh... so that's... and that's what you get; you get these very very peaky distributions. If you think about a... a a... statistical distribution over space - over x and y and orientation and so on, over the entire earth, where are photos taken? Of what? It's a distribution that has very very high peaks, uh, in... in a desert... an otherwise... in a... in a... in a desert of zeros. There are, uh... there are interesting things, of course, which we get by looking at, uh, many many images of exactly the same thing like when... when there are fig trees in the Ankor Wat temple complex that grow up and then are cut back and damage that happens to the stones and you want to see this at every possible time of day and, you know, you use a slider to look at its evolution over time, you can do that sort of thing. But you... but you can't really wander around the Ankor Wat temple complex based on flickr photos; that doesn't really work out. Now, on the other hand, one of the really nice things about... about launching a service that's explicitly about people creating synths (and, uh, and they're starting to really become a very active community of people doing this now) is that, uh, people photograph differently when they... when they use this thing. They... they take photos to synth. On the front page of synths of Notre Dame (of which there are quite a few, uh, even after those first three weeks) there was... there was already one synth that was way better than what we could reconstruct from flickr. And all it took was, uh... was one person, um, photographing it sort of systematically. So this was, uh... this is one person just, you know, taking all sorts of... kinds of different views and closeups and that's the point cloud. Um, so it... the, um... the... we've... we like to tell the story, always, about you know the wisdom of crowds and how the crowd does better than the individual and so on, but in this case the individual does a lot better than the crowd. Um, although, uh, a crowd of individuals might do better... better still so that's... that's one of the directions that we're taking Photosynth next. People synth all sorts of stuff, like there... there... there's this little community of people syn... you know, synthing, ah, fruit and so on. Uh, so... so people synth all kinds of stuff but, um, especially among the top favorites, a great many synths are geolocatable. So, I'm greying out, now, the ones that are not geolocatable. And by geolocatable I mean correspond to a particular place on earth at a reasonable scale, um, you know, that's... that's not... that's not too different from... from a map and, um, and that are not private envireonments; they're not somebody's bedroom. If they're indoor spaces, they're, uh... they're... they're public galleries and things like this. Uh, so when you look at a breakdown of those top hundred synths you find some really, uh... some really bizarre stuff, uh, that I... I certainly didn't anticipate. Uh, so art is very popular, which is very nice. At 10% of these top hundred synths were of... were of... of, uh... of... of public scuptures and things like this. Lots and lots of archaelogical sites, so by now, uh, every, uh... every famous archaelogical site has been synthed, um, most of them many times. Uh, lots of beaches and coastlines, and this one is especially intriguing, uh, lots of aerial synths. So, people do it by taking photos from small planes, uh, by, uh... by taking panoramas from very high places... from skyscrapers and... and, uh, ah... and overlooking canyons, they do it, uh, uh, using cameras attached to kites. Uh, it's... it's pretty... it's pretty cool what people are doing. And, uh, this is just a typical example of one of these aerial synths. So this is, uh... this is, uh, a little resort peninsula in Croatia in a place where, ah, none of the mapping sites, I... I might... I might add, have particularly good vector data. This is... and this is just one person's, uh, aerial synth of Primosten from above, ah, which has entire islands that are not even visible in the... the... in the map. So this is... this is just a... a... a... a view inside that synth, and as this guy flew over, um, the town he took... he took aerial images that are, um, more detailed and of higher quality than any, than any of the aerial imagery... imagery on any of the various mapping sites on the internet now. So, uh, so this is... I think this is really an interesting, um, idea and... and trend of having even... even images from the air - even images of very large parts of the earth getting, uh, getting crowd-sourced like that. Alright, so um, I'm going to show one last, uh, thing before stepping down which is, um, a video of, uh... of some of our newer, uh, techniques, for, uh... for doing reconstruction. So, uh, this is... this is just a point cloud, now, that I'm showing you and I've... I've just paused it. It's of Kelvingrove, uh, Art Gallery and, uh, these are... these diamonds in the sky over here are... are the, um... are the cameras (I think this was done with a helicopter). And what you're s... what you're seeing down here is just points. Okay, so it's just... it's just, ah, the Photosynth point cloud augmented with stereo techniques. So that's the... that's the... the level of... of, uh... of three dimensional reconstruction of... of things on the earth that... that one person can, uh... can do in, um, uh... in... in ten minutes, uh, from a helicopter. Um, and a... again, this is just the point cloud, of course; if we... if we, you know... really going to the next step and doing... doing complete 3D reconstructions is obviously something that we're very interested in... in doing as well. But I'm... I'm... I'm hoping that this is the kind of thing that, ah, over the next, uh, five years, uh, is... is going to... is going to really take, um, a lot of our physical experience of the world visually, and create, uh, an online version of it - a mirror world version of it. Uh, and, of course, that let's you... that gives you all sorts of s.., interesting super powers. Right? It let's you, uh, it let's you teleport yourself anywhere in the world you want, uh, it let's you, um, experience augmented reality by... by, um... by giving you a... a model against which your... your camera can, uh... can match its visual experience so that... so that things can... this semantic information can be added on top. Um... Really, all of this is about, ah, extending our... our abilities and, extending our... extending our senses, extending our, um, our... our perception and our understanding of the physical world around us. Uh, I'll end there. Thank you very very much. (Audience clapping)
TEDx Dublin (Independently Organized TED TALKs) Science Gallery, Trinity College Dublin, Ireland http://channel9.msdn.com/shows/Going+Deep/Expert-to-Expert-Gur-Kimchi-Inside-Bing-Maps/ Expert to Expert: Inside Virtual Earth 2009 06 17 Gur Kimchi Photosynth: Tangential, Future http://grail.cs.washington.edu/rome/dense.html Dense Modelling 2009 07 29 Images Yasutaka Furukawa http://grail.cs.washington.edu/rome/ Building Rome in a Day Yasutaka Furukawa, Noah Snavely, Steve Seitz, Rick Szeliski http://research.microsoft.com/en-us/um/redmond/groups/ivm/PlanarStereo/ Piecewise Planar Stereo for Image-based Rendering 2009 09 27 Video Presentation, Paper, Images http://grail.cs.washington.edu/projects/interior/ Reconstructing Building Interiors from Images, Steve Seitz, Rick Szeliski Video, Paper, Slides Yasutaka Furukawa, Steve Seitz, Rick Szeliski http://vimeo.com/6951006 Photosynth Overhead View 2009 10 07 http://microsoftpdc.com/Sessions/VTL30 Managing Development to Inspire Innovation and Create Great User Experiences 2009 11 17 Video Lecture + Demonstration, Slides PDC '09 Los Angeles Convention Center Los Angeles, California, USA http://microsoftpdc.com/Sessions/VTL05 A New Approach to Exploring Information on the Web 2009 11 18 http://blogs.msdn.com/photosynth/archive/2009/12/01/photosynths-in-bing-maps-dive-in.aspx Photosynths in Bing Maps. Dive in! 2009 12 02 Bing Maps Beta Unveiling Photosynth: Bing Maps Integration http://www.youtube.com/watch?v=XN3Zq2dJcMo Bing Maps Demo Video Demonstration (Partial) http://news.cnet.com/8301-13860_3-10408665-56.html Bing's iPhone plans (and more) Video Demonstration + Interview Ina Fried http://channel9.msdn.com/posts/LauraFoy/First-Look-Streetside-in-Bing-Maps/ First Look: Streetside in Bing Maps Chris Pendleton http://www.microsoft.com/presspass/features/2009/dec09/12-02BingMapUpdates.mspx Bing Maps Rolls Out Enhanced Aerial and Street-Level Views Video, Press Release Blaise Agüera y Arcas, Stefan Weitz, Drew Steedly http://www.geekbrief.tv/brief-671-photosynth Brief 671 | Photosynth 2009 12 12 Video Interview, Tutorial GeekBrief.tv October Road Trip Microsoft… Cali Lewis Photosynth: Insights for Success http://www.confectious.net/thinking/archives/2010/01/msr-social-comp-1.html MSR Social Computing: Blaise Agüera y Arcas 2010 01 11 Shorthand Notes Social Computing Symposium 2010 NYU Interactive Telecommunications Program New York, New York, USA http://video.dld-conference.com/watch/NtvHrCY Search 2010 01 25 Digital Life Design 2010 HVB Forum Munich, Germany Jochen Wegner Seadragon: Bing Maps http://www.ted.com/talks/blaise_aguera.html Blaise Aguera y Arcas demos augmented-reality maps 2010 02 10 TED 2010 http://www.microsoft.com/presspass/presskits/bing/videoGallery.aspx?contentID=BingMapsFlickr Bing Maps Streetside Innovations http://www.cs.washington.edu/homes/furukawa/ Towards Internet-scale Multi-view Stereo 2010 02 23 Video Preview http://www.microsoft.com/presspass/presskits/cloud/videogallery.aspx The Cloud: Exciting New Possibilities 2010 03 04 00:21:40 - 00:31:55 Microsoft on Cloud Computing Steve Ballmer http://channel9.msdn.com/posts/LauraFoy/TechFest-2010-Microsoft-ICE-Image-Composite-Editor/ TechFest 2010: Microsoft ICE - Image Composite Editor 2010 03 10 00:01:00 - 00:04:55 TechFest 2010 Matt Uyttendaele Photosynth: Future Panorama Integration http://research.microsoft.com/en-us/um/redmond/groups/ivm/ICE/ Buttery-Smooth Gigapixel Panoramas: ICE Is Now Synthy 2010 03 18 Panorama Support in Photosynth Photosynth: Panorama Integration http://where2conf.com/where2010/public/schedule/detail/11340 The Map as an Information Ecology 2010 03 31 9:00am Video Lecture + Demonstration O'Reilly Where 2.0 Conference San Jose Marriott: Ballroom III - VI San Jose, California, USA http://photosynth.ning.com/video/augmented-reality-event-2010 Bing Maps and Augmented Reality 2010 06 02 2:00pm Augmented Reality Event 2010 Santa Clara Convention Center Santa Clara, California, USA http://www.businessweek.com/magazine/content/10_33/b4191039675158.htm Tech Innovator: Blaise Agüera y Arcas
The architect of Bing Maps and Mobile says phones in the near future will automatically identify landmarks and deliver key information about them 2010 08 05 News Article Dina Bass http://online.wsj.com/article/SB10001424052748704361504575552661462672160.html Taking on Google by Learning From Ants Blaise Agüera y Arcas, the Mind Behind Bing Maps 2010 11 06 Nick Wingfield http://blogs.msdn.com/photosynth/archive/2010/12/15/coming-soon-to-photosynth-gorgeous-mobile-panos.aspx Coming soon: Gorgeous Mobile Panos 2010 12 15 Video Preview, Commentary Josh Lowenstein http://gizmodo.com/5769752 The Future of 2D Photographs Is 3D Video 2011 02 24 Video Demonstration + Commentary Matt Buchanan Sudipta Sinha http://channel9.msdn.com/posts/TechFest-2011-3D-Scanning-with-a-regular-camera-or-phone TechFest 2011: 3D Scanning with a regular camera or phone! 2011 03 09 TechFest 2011 Sudipta Sinha, Johannes Kopf http://www.technologyreview.com/computing/37021/ 3-D Models Created by a Cell Phone 2011 03 23 Johannes Kopf http://channel9.msdn.com/events/MIX/MIX11/RES06 New Technologies for Immersive Content Creation 2011 04 14 00:00:05 - 00:15:15 MIX'11 Eric Stollnitz http://www.youtube.com/watch?v=6BbuPPOVXQo Introducing Microsoft Photosynth Mobile Panorama App for iOS 2011 04 18 Photosynth Mobile Panorama App Launch Blaise Agüera y Arcas, Eric Bennett, Kenneth Bedsted Photosynth: Mobile Panorama Integration http://channel9.msdn.com/posts/Mobile-Photosynth-Panorama-App Mobile Photosynth Panorama App 2011 04 20 Video Demonstration, Interview Larry Larsen Eric Bennett http://www.youtube.com/watch?v=4X9u4JG9H6E Read/Write World (Where 2.0 2011 Delivery) Where 2.0 2011 http://kcts9.org/education/science-cafe/building-collective-digital-world Building a Collective Digital World 2011 05 03 Video Demonstration + Commentary Queen Anne Science Café http://www.youtube.com/watch?v=NZ3ga9Yb240&hd=1 Read/Write World: RML Presentation (ARE 2011 Delivery) 2011 05 18 Powerpoint Slideshow, Commentary Augmented Reality Event 2011 Avi Bar-Zeev http://www.youtube.com/watch?v=tojFFlCUhIs Read/Write World (ARE 2011 Delivery) http://blip.tv/webvisions/webvisions-2011-blaise-aguera-y-arcas-read-write-world-5293245 Read/Write World (WebVisions 2011 Delivery) 2011 05 26 Video Lecture + Preview WebVisions Portland 2011 Portland, Oregon, USA http://america.ecomm.ec/2011/democratizing-locative-media.php Read/Write World: Democratizing Locative Media 2011 06 28 Video Not Posted Yet eComm 2011 San Francisco Airport Marriott http://vimeo.com/25884372 Read/Write World: RML Presentation (Seattle Augmented Reality Meetup Delivery) 2011 06 29 Seattle Augmented Reality Meetup June 2011 Microsoft at The Bravern 2 http://vimeo.com/30013812 Mapping the Augmented City 2011 09 16 PICNIC Festival 2011 http://www.youtube.com/watch?v=nD9Ilu1rj-k&hd=1 Read/Write World (AtlanTech 2011 Delivery) 2011 10 07 AtlanTech Dinner Paris, France Read/Write World (GeoLoco 2011) 2011 11 03 GeoLoco 2011 http://www.youtube.com/watch?v=9uvCtWZotZs Software Guyver A bit of fun for the Seadragon researcher. =) 2009 01 28 Video Tribute | Parody Microsoft Live Labs Out of the Box Week Microsoft Live Labs Blaise Agüera y Arcas, David Gedye Seadragon: Heroes Photosynth: Heroes