What email address or phone number would you like to use to sign in to Docs.com?
If you already have an account that you use with Office or other Microsoft services, enter it here.
Or sign in with:
Signing in allows you to download and like content, which the author will be aware of.
Embed code for: [第6回 Machine Learning 15minutes!] 機械学習サービス「Azure Machine Learning」(+ Microsoft Cognitive Toolkit (CNTK) も少しだけ)
Select a size
第 6回 Machine Learning 15minutes! 機械学習サービス 「 Azure Machine Learning」 (+ Microsoft Cognitive Toolkit (CNTK) も少しだけ )
佐藤 直生 (@ satonaoki )
Security & Management
Media & CDN
VS Team Services
Analytics & IoT
Data Lake Store
VM Image Gallery
& VM Depot
Datacenter Infrastructure (38 Regions, 30 Online)
Cortana Intelligence Suite
Big Data Stores
Machine Learning and Analytics
Data Lake Analytics
(R Server and Spark)
Dashboards & Visualizations
Sensors and devices
Azure Machine Learning
AI, Machine Learning and Depp Learning
Microsoft & Machine Learning Answering questions with experience
Microsoft Research formed
Which email is junk?
Bing maps launches
What’s the best way home?
Skype Translator launches
What is that person saying?
Azure Machine Learning GA
What will happen next?
Bing search launches
Which searches are most relevant?
What does that motion “mean”?
Machine learning is pervasive throughout Microsoft products.
Fully managed cloud service for building and operationalizing ML models
Best in Class Algorithms + R
Deploy in minutes
Simple drag, drop and connect interface for Data Science. No need for programming for common tasks.
Built-in collection of best of breed algorithms. Support for R and popular CRAN packages.
Operationalize models with a single click. Monetize in Machine Learning Marketplace.
No software to install, no hardware to manage, and one portal to view and update.
Model is now a web service that is callable
Blobs and Tables
Relational DB (Azure SQL DB)
Integrated development environment for Machine Learning
Azure Machine Learning Studio
Visual editor for composing, testing, refining, and deploying machine-learning models
Includes hundreds of modules
Includes common algorithms for classification, regression, and more
Supports numerous input formats
Supports R and Python
Machine learning for the masses
The Machine Learning Process
From "Introduction to Microsoft Azure" by David Chappell
Azure Machine Learning Algorithms
Deploying as a Web Service
A button click in ML Studio deploys a model as a Web service and provides sample code for calling it in three languages
Azure ML Studio
Azure Machine Learning Demo
Titanic Dataset – to predict survivors
Create a ML experiment (model)
Publish it as a web service
Cortana Intelligence Gallery
Azure Machine Learning Documentation
Machine learning algorithm cheat sheet
Azure ML Studio
Microsoft Cognitive Toolkit
Deep learning at Microsoft
Microsoft Cognitive Services
ImageNet: Microsoft 2015 ResNet
Microsoft had all 5 entries being the 1-st places this year: ImageNet classification, ImageNet localization, ImageNet detection, COCO detection, and COCO segmentation
Microsoft’s historic speech breakthrough
Microsoft 2016 research system for conversational speech recognition
5.9% word-error rate
enabled by CNTK’s multi-server scalability
[W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, G. Zweig: “Achieving Human Parity in Conversational Speech Recognition,” https://arxiv.org/abs/1610.05256]
Microsoft Translator launching Neural Network based translations for all its speech languages
The Microsoft Cognitive Toolkit (CNTK)
CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.
CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks into complex computational networks, supporting relevant network types and applications.
“CNTK is Microsoft’s open-source, cross-platform toolkit for learning and evaluating deep neural networks.”
open-source model inside and outside the company
created by Microsoft Speech researchers (Dong Yu et al.) in 2012, “Computational Network Toolkit”
open-sourced (CodePlex) in early 2015
on GitHub since Jan 2016 under permissive license
Python support since Oct 2016 (beta), rebranded as “Cognitive Toolkit ”
used by Microsoft product groups; but virtually all code development is out in the open
external contributions e.g. from MIT and Stanford
Linux, Windows, docker, cudnn5, next: CUDA 8
Python and C++ API (beta; C#/.Net on roadmap)
“CNTK is production-ready: State-of-the-art accuracy, efficient, and scales to multi-GPU/multi-server.”
Microsoft Cognitive Toolkit
Microsoft Cognitive Toolkit Demo
Jupyter Notebook on Azure Notebooks
Microsoft Cognitive Services
From faces to feelings, allow your apps to understand images and video
Hear and speak to your users by filtering noise, identifying speakers, and understanding intent
Give your apps a human side
Process text and learn how to recognize what users want
Tap into rich knowledge amassed from the web, academia, or your own data
Access billions of web pages, images, videos, and news with the power of Bing APIs
Why Microsoft Cognitive Services?
Integrate into the language and platform of your choice
Breadth of offerings helps you find the right API for your app
Roll your own with REST APIs
Simple to add: just a few lines of code required
Built by experts in their field from Microsoft Research, Bing, and Azure Machine Learning
Quality documentation, sample code, and community support
GET A KEY
And More ...
Azure VM (IaaS) N-Series (NVIDIA Tesla K80 GPU)
Data Science Virtual Machine (Linux/Windows)
Spark for Azure HDInsight
R Server for HDInsight
Microsoft R Server
Why this Slide:
It shows we have a very broad platform. It about BOTH IaaS and PaaS, that these work together. It shows that we continue to lead in world class IT capabilities and that there’s really nothing missing.
We have already seen how the Azure Platform is IaaS + Pass – but I want you to understand that this is a huge number of capabilities – IT building blocks if you will.
Every one of these blocks you provision anytime, self-service anywhere in the world 24x7. You pay for what you use, you can get more or less anytime and you can fully automate everything…
DON’T spent too much time on this slide – you are going to DEMO (aren’t you!!!)… DON’T go through each block…
Transition to NEXT Slide: Make the build go backwards to show JUST IaaS and then you will go to the demo to show it.
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft has been working on machine learning for over two decades. We formed Microsoft research back in 1991 to tackle the tough problems internally that we’re enabling you to tackle yourselves today.
When we think of learning from experience – past data + human input – a great example is Hotmail. Back in 1997, external email was a relatively new concept. There wasn’t a lot to go on in terms of what email the customer wants and what they do not. With the rise of email, also came spam – and lots of it. Some of those issues were easy – like Nigerian princes we learned pretty quickly don’t give away their fortunes to strangers. But what about “free offer” – maybe that free offer is something the customer always wanted. Maybe it’s something they’d never want. But that’s where the “human input” part comes in as data is being collected – that takes the form of the actual user of the email service saying “yes, this is junk” or “no, I want this” and then the data scientist learning in aggregate and making tweaks to the underlying model in response.
And we kept going with that learning – relying on past data and human input to solve problems like the best way home, which search results are most meaningful to the user and one of the toughest ones to tackle with Kinect. Kinect’s past data was all in the lab – we didn’t have a product in market that captured user input and translated that to active game play so we had to make up the variables. But that only takes us so far. The researchers told me that one thing they didn’t consider was people answering the phone while playing. This happens a lot – and Kinect at first was translating this as a wild motion in the game play – essentially crashing people’s cars or any number of unintended consequences. That was the human input we rely on, which allowed us to learn quickly and adjust the underlying model to ensure that answering the phone would not be considered part of the game moving forward.
Skype translator is another huge machine learning problem to solve if you think of all the ways a person who is speaking English can pronounce the same word – tom-A-to or tom-AH-to – that’s the same word in French so Skype has to adjust quickly to ensure all the millions of variables are considered.
But what about using all this learning to predict what’s next? Many of the same algorithms running behind the scenes of our products in market today are available within Azure ML, allowing you to take your own past data and learn from it what will happen in the future for your business.
Azure Machine Learning is a cloud-based predictive-analytics service that offers a streamlined experience for data scientists of all skill levels. It's accompanied by the Azure Machine Learning Studio (ML Studio), which is a browser-based tool that provides an easy to use, drag-and-drop interface for building machine-learning models. It comes with a library of time-saving experiments and features best-in-class algorithms developed and tested in the real world by Microsoft businesses such as Bing. And its built-in support for R and Python means you can build custom scripts to customize your model. Once you've built and trained your model in the ML Studio, you can easily expose it as a Web service that is consumable from a variety of programming languages, or share it with the community by placing it in the Cortana Intelligence Gallery.
ML Studio simplifies machine learning by providing a drag-and-drop model in which you build workflow. With ML Studio and the rich of assortment of modules it offers for modeling workflow, you can often build sophisticated models without writing a single line of code. However, it allows you to insert R and Python code anywhere in the workflow, providing infinite flexibility in what you can model.
ML starts with data, which can come from a variety of sources. The data typically needs to be "cleaned" before it is used, and ML Studio includes modules to help with the cleaning. (Examples of cleaning include removing rows with missing data, replacing missing data algorithmically, removing duplicate rows, and removing rows containing "outliers." In practice, cleaning the data can be very time-intensive and often consumes 50% of the time required to build the model.) Once the data is ready, you select an algorithm and "train" the model by allowing it to iterate over the data and find patterns in it. After that comes scoring and evaluating the model, which tells you how well the model is able to predict outcomes. All of this is performed visually in ML Studio. Once the model is ready, a few button clicks deploy it as a Web service so it can be called from client apps.
ML Studio provides canned implementations of 25 of the classic algorithms used in machine learning. It divides them into four categories.
Anomaly detection is the identification of items, events, or observations which do not conform to an expected pattern or other items in a dataset. A classic example is examining a dataset representing banking transactions and detecting potentially fraudulent transactions in that group.
Regression algorithms seek to establish and quantify relationships between variables. By establishing a relationship between a dependent variable and one or more independent variables, regression analysis can enable the value of a dependent variable to be predicted given a set of inputs with a quantifiable accuracy. A great example can be seen at https://how-old.net/#, a site that lets you upload a photo and then guesses your age with uncanny accuracy. The site uses Azure Machine Learning and combines classic regression with advanced image recognition.
The purpose of classification algorithms is to identify the category to which an observation belongs based on training data consisting of observations which have already been classified (assigned to a category). A great example is determining whether an e-mail belongs to the "spam" category or the "not-spam" category.
Clustering seeks to group a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters).
The Azure ML Cheat Sheet helps you pick the right algorithm for a model, even if you're not a trained data scientist. One example is if you want to use a set of input values to predict an output value from a continuous set of values (e.g., a person's age), use linear regression. But if you're more interested in the distribution of the output, you might use fast forest quantile regression instead. An example of when you would use the latter is using growth charts to assess child development. "Abby's height is in the 10% quantile of the heights of kids her age." Classification algorithms, by contrast, are used to predict a value from a discrete set of values -- for example, classifying an e-mail as spam or not spam.
Once deployed as a Web service, a model can be used with simple REST calls over HTTP. This enables developers to build "smart apps" that get their intelligence from ML. In the next lab, students will build and train an ML model, deploy it as a Web service, and then write a client app that uses it.
Microsoft cognitive services uses CNTK for several tasks in speech, vision and text based deep learning applications. Microsoft customer service bot, face detection framework to name a few.
Machine translation problems are very amenable to solutions based on deep learning. Skype translator efficiently uses such technologies.
Cortana learns acoustic models using CNTK for scalable training on very large datasets.
Bing uses the toolkit for relevance and ranking.
Bing ads uses the toolkit for recognition of products in advertiser provided images in addition to other prediction and relevance problems.
CNTK is used for object recognition in Augmented reality applications.
Within Microsoft research CNTK has been pioneered and continues to extend the deep learning technology frontier.
Microsoft researchers went into the history book by reaching human parity in conversational speech recognition achieving word-error rate of 5.9% enabled by CNTK’s multi-server scalability. The reference provided has the details of this breakthrough work.
The model training task on other toolkits would have taken months at a time. With Cognitive Toolkit the team was able to cut down the training time to a week allowing for rapid iterations and model tuning.
The toolkit is tested in production setting for accuracy, efficiency and scalability in a multi-GPU/multi-server environment.
The toolkit is designed to bring lego like extensibility allowing for expression of arbitrary NN ……. Several use cases have been built using the toolkit for speech, image, text and combinations of the data types.
(note: may no longer be up-to-date)
In the spirit of open source, Microsoft Cognitive Toolkit incorporates several easy to follow tutorials available from our github site. We are constantly listening to you while adding tutorials. We appreciate your feedback.
Browse the source by clicking on the links, create a fork to modify the code on your own, or express your like by giving us a star.
To run the tutorials, please go to notebook.azure.com/library/cntkbeta2, clone & run the notebooks. The notebook runs under your own virtual environment, thus requiring you to sign in using either outlook.com, Hotmail.com or live.com Microsoft email ids.
You may choose any one of the notebooks and step through the tutorial by clicking on the “run-cell” button or using “shift+enter” keys.
Note the cells are blocks of active code. You can play around with the parameters to explore ways the toolkit works. The azure instances you run are open to anyone and everyone at not cost.
© 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft Build 2016
Some of the features such as how-old.net and caption bot are powered by the Microsoft cognitive services.
How-old.net is an app that predicts the age of the individuals in a picture.
On the right hand side you can see that Microsoft Caption bot is able to automatically derive linguistically rich accurate textual description of the image.
f values (e.g., a person's age), use linear regression. But if you're more interested in the distribution of the output, you might use fast forest quantile regression instead. An example of when you would use the latter is using growth charts to assess child development. "Abby's height is in the 10% quantile of the heights of kids her age." Classification algorithms, by contrast, are used to predict a value from a discrete set of values -- for example, classifying an e-mail as spam or not spam.
The model training task on other toolkits would have taken months at a time. With Cognitive Toolkit the team was able to cut down the training time to a