Tuesday, March 21, 2017

Introduce yourself to Microsoft Azure Machine Learning

Introduce yourself to Microsoft Azure Machine Learning

Objective

Upon completing this lab, you will have hands-on experience with the following functions and concepts related to Azure Machine Learning:
  • Creating and logging in to a free Azure Machine Learning Workspace
  • Creating, modifying, and saving an experiment with ML Studio
    • Running an experiment
    • Using sample datasets in an experiment
    • Browsing modules to use in an experiment
    • Using the search function to find modules for use in an experiment modules in an experiment 
  • Modifying and configuring properties of modules in an experiment
    • Visualizing and exploring data in ML Studio
    • Exploring summary statistics about datasets and features
    • Visualizing features with scatterplots, boxplots, and histograms Visualizing relationships between features with scatterplots and boxplots
    • Visualizing predictive model results
    • Creating a basic predictive model in ML Studio
  • Splitting data into a training dataset and a test dataset
    • Training a Linear Regression model
    • Testing a trained model
    • Evaluating model performance

Scenario

This lab intends to serve as an introduction to creating a predictive model with Azure Machine Learning. 

Business Case 

Consumers often evaluate similar products using specific metrics of interest to them. In the auto industry, miles per gallon (MPG) always comes up as an important metric for consumers. How do manufacturers know what an acceptable MPG will be for the vehicle they are producing? Using advanced analytics, auto manufacturers can use vehicle attributes and MPG from similar automobiles already in the market in order to determine if the model rolling off the assembly line has a competitive MPG rating. 

For this lab, you will be working with a dataset that includes various information about automobiles from the 1970s and early 1980s. The dataset includes attributes like miles per gallon (MPG), horsepower, acceleration, weight, etc. The lab will use a linear regression algorithm to predict an acceptable MPG for an automobile. 

We use Linear Regression to predict a single, numeric value based on one or many independent variables. It does this by fitting a representative line, or function, to a collection of input variables. This line/function is able to be used to predict future values based on new input data.

Virtual Machines

  1. TR22 Base Windows 10

Exercise 1 : Set up your Azure Account

In this exercise you will:
  • Set up your Azure Account

Scenario:  To perform this lab, you must have an Azure account set up that you can modify.  To set up up this account, use the promotional code visible in the Content tab of the lab interface.  This exercise will walk you through the steps for redeeming the code.

Note:  If you already have an Azure subscription (MSDN/Internal) that you can use for this hands-on lab, you can skip this exercise.

  1. Sign In
    If necessary, sign in using the following credentials:  Admin, password:  Passw0rd!  On the Network prompt, click No.
  2. Obtain Microsoft Account
    You will need a Microsoft account (@outlook.com or @live.com, etc).  This account must NOT have an Azure subscription associated with it.  If you do not have an appropriate Microsoft account, please acquire one before continuing this lab. You can obtain an account from the following site:  http://www.microsoft.com/en-us/account.
  3. Open Site (Azure Pass)
    Open the Edge or IE browser, and navigate to http://microsoftazurepass.com.
  4. Submit Promo Code
    Choose from the country drop-down “United States”.  Enter the promotional code (given to you in the lab Content tab) in the Promo Code field.  Click on the Submit button.
  5. Complete Account Request
    Click on the Sign in button to enter your MSA account (@outlook.com/@live.com etc.) Follow any additional instructions to complete the process.
Congratulations!

You have successfully:
  • Set up your Azure Account

Click Continue to advance to the next exercise.

Exercise 2 : Create a Workspace and Experiment

In this exercise you will:
  • Create/Access an Azure Machine Learning Workspace
  • Create a blank experiment

Scenario:  To get started, you will need to create and log in to a free Azure Machine Learning workspace. A workspace is like an all-inclusive development environment with the tools to create, manage, and publish machine learning models.

  1. Sign In
    If necessary, sign in using the following credentials:  Admin, password:  Pass@w0rd!  On the Network prompt, click No.
  2. Open ML Studio
    Go to the ML Studio website by typing http://studio.azureml.net in the address bar.
  3. Sign In to ML Studio
    Click Sign In on the top right corner of the web page.
  4. Enter Credentials
    Enter the email address and password associated with your Microsoft ID, and click the Sign In button.
  5. Close Welcome Video
    If upon logging in, a Welcome video is displayed (usually displays on the first login), click the X at the top right of the video to close it.
  6. Close Samples
    If the Microsoft Samples dialogue box is displayed (usually displays on the first login), go ahead and close it by choosing the X in the top right corner of the pane.
  7. View Environment
    You are now logged into the free workspace associated with your Microsoft ID.
  8. Create Blank Experiment
    Click the NEW button in the bottom left corner of the page.   Make sure EXPERIMENT is highlighted in the NEW dialogue window, and click the Blank Experiment pane. You are now in the ML Studio.
    Create a Blank Experiment

    Next, we will create our first experiment. An experiment is a collection of data, tasks, and machine learning algorithms that make up a model.
  9. View Canvas
    Notice the Canvas in the center of the screen. This is where you will drag and drop modules and string them together to create a data flow for your experiment.
  10. View Navigation Icons
    The Navigation icons on the far left of the site allowing you to browse back to your Workspace.
  11. View Modules Pane
    The Modules pane down the left side of the Canvas are the individual components that make up your Experiment.
  12. View Properties
    The Properties pane down the right side of the Canvas is where you will configure the properties of the different Modules used in your Experiment.
  13. Edit Title
    At the top of the Canvas, highlight and delete the text that reads Experiment created on…, and replace it with Lab - Intro to Azure Machine Learning.
Congratulations!

You have successfully:
  • Created/Accessed an Azure Machine Learning Workspace
  • Created a blank experiment

Click Continue to advance to the next exercise.

Exercise 3 : Explore and Visualize Data

In this exercise you will:
  • Input sample data
  • Explore the input data

Scenario:  Azure Machine Learning offers several ways to connect to and import data. For this lab, we will work with one of the sample datasets included with Azure Machine Learning.

  1. View Saved Data Sets
    On the Modules panel, click Saved Datasets and then Samples.   This expands all of the sample datasets included in ML Studio.
  2. Select Data
    Scroll until you find MPG data for various automobiles.  Click on the MPG dataset and notice the description also shows up at the bottom of the Properties pane.
  3. Add Dataset to Canvas
    Click and drag the MPG dataset onto the Canvas.
  4. Port
    Notice at the bottom of the MPG dataset module on the Canvas, there is a small circle called a port. Ports on the top of modules are called input ports, and ports on the bottom of modules are output ports. These ports are used to connect modules to one another and to provide a menu of additional options for the module.
  5. Visualize Output Port
    Click the output port at the bottom of the MPG dataset module, and select Visualize from the menu that is displayed.
    Explore the Input Data

    A common task in any advanced analytics workflow is to analyze and profile the data you are working with. The following set of steps highlights some of the ways we can explore and visualize th data we just imported.
  6. View Visualization
    The resulting dialogue box provides the number of rows and columns in the dataset as well as the first 100 rows and first 100 columns in the dataset with a histogram for each column.
  7. Highlight Column
    Click anywhere in the first column, MPG, to highlight the column.
  8. View MPG Information
    Notice on the right side of the dialogue box, there is now information in the Statistics pane and Visualizations pane about MPG (you might need to use the horizontal scroll bar in the dialogue box to scroll all the way to the right if Statistics and Visualizations are not visible).
  9. Edit Visualization
    In the Visualizations pane, change the compare to dropdown box from None to Horsepower.
  10. View Results
    Notice the histogram changed to a ScatterPlot comparing MPG to Horsepower.
  11. Edit Visualization
    Next, change the compare to dropdown option from Horsepower to Model.
  12. View Results
    Notice the resulting chart is now a MultiboxPlot with an MPG boxplot displayed for each of the values in the Model column.
  13. Close Visualization
    Click the X in the top right corner of the Visualize dialogue box to return to the Canvas.
Congratulations!

You have successfully:
  • Input sample data
  • Explored the input data

Click Continue to advance to the next exercise.

Exercise 4 : Create a Simple Predictive Model

In this exercise you will:
  • Split input data into Train and Test Data Sets
  • Train a Predictive Model
  • Test the Predictive Model
  • Evaluate the test results

Scenario:  Now that we have explored our data, we are ready to create a predictive model. The first thing we will do is split the original dataset into 2 datasets: one dataset will be used for training a model, and one will be used for testing our model. By testing the model on a separate (test) dataset, we can assess the accuracy of our training model.

  1. Initiate Search (Split)
    In the search box at the top of the Modules pane, type the word split. Notice the list of modules has been filtered to show only those relevant to the search term.
    Split Input Data into Train and Test Data Sets

    Now that we have explored our data, we are ready to create a predictive model. The first thing we will do is split the original dataset into 2 datasets: one dataset will be used for training a model, and one will be used for testing our model. By testing the model on a separate (test) dataset, we can assess the accuracy of our training model.
  2. Add Module to Canvas (Split Data)
    Click and drag the Split Data module onto the Canvas anywhere under the MPG dataset.  Notice the Split Data module has 1 input port and 2 output ports. The Properties pane displays values that can be modified for this module.  There is also a description of the module at the bottom of the Properties pane with a (more help…) link.  A page will open with more details about the module and its configurable properties when this link is clicked.
  3. Join Modules (MPG to Split Data)
    Click and drag the output port from the MPG dataset module to the input port of the Split Data module.
  4. Edit Properties
    In the Properties pane, type 0.75 in the Fraction of rows in the first output dataset textbox.  This configures the module to split 75% of the input rows to the left output port, and 25% of the input rows to the right output port.
  5. Run Experiment
    Click RUN at the bottom of the Canvas. 
  6. View Results
    The experiment will now execute each module in order, beginning with the first module in the workflow. When the experiment is done executing, the words Finished running will display in the top right corner of the Canvas. Notice the Split Data module has a green check mark indicating it completed successfully.
  7. Visualize
    Click the left output port on the Split Data module, and select Visualize from the menu that is displayed.
  8. View Results
    Notice only 294 of the original 392 rows (75%) have been routed to the left output port. The remaining 98 rows (25%) have been routed to the right output port.
  9. Close Visualization
    Click the X in the top right corner to close the Visualize dialogue box.
  10. Initiate Search (Train)
    Type train in the search box at the top of the Modules pane.
    Train a Predictive Model

    Next, you will use a common Linear Regression algorithm to train a model that will predict an automobile’s MPG.
  11. Add Module (Train Model)
    Find the Train Model module, and click and drag it onto the Canvas below the Split Data module.
  12. Join Modules (Split Data to Train)
    Connect the left output port from the Split Data module to the right input port on the Train Model module.
  13. Launch Column Selector
    In the Properties pane, click the Launch column selector button.  This launches the Select Column dialogue box. Here, we will select the column we want the model to predict.
  14. Select Column (MPG)
    In the Available Columns list, select MPG.  Click the > button to move MPG to the Selected Columns list.
  15. Save Selection
    Click the checkmark to save the selection and close the dialogue box.
  16. Clear Search
    Clear the search box in the Modules pane.
  17. View Regression Modules
    In the Modules pane, find and click to expand Machine Learning, and then click Initialize Model, and then click Regression.
  18. Add Module (Linear Regression)
    Click and drag the Linear Regression module onto the Canvas just above and to the left of the Train Model module.  Connect the output port of the Linear Regression module to the left input port of the Train Model module.  You might notice there are several parameters that can be modified in the Properties pane for the Linear Regression module. For this lab, we will use the defaults.
  19. Run Experiment
    Click RUN at the bottom of the Canvas to run the experiment and train the model.  The model will be trained to predict the MPG column using the other fields in the dataset with the Linear Regression algorithm.
  20. Initiate Search (Score)
    In the search box at the top of the Modules pane, type the word score.
    Test the Predictive Model

    Next, we will use the test dataset we created to test our newly trained model. This will be accomplished using our new model to predict the MPG for each row in the test dataset.
  21. Add Module (Score Model)
    Find the Score Model module, and click and drag it onto the Canvas under the Train Model module. 
  22. Join Modules (Train to Score)
    Connect the output port on the Train Model module to the left input port on the Score Model module.
  23. Join Modules (Split Data to Score)
    Connect the right output port on the Split Data module to the right input port on the Score Model module.
  24. Run Experiment
    Click RUN at the bottom of the Canvas to run the experiment and score the test dataset with the trained Linear Regression model.
  25. Visualize
    After the experiment has finished running, click the output port on the Score Model module and select Visualize from the displayed menu.
  26. Select Column (Scored Labels)
    In the list of columns, scroll to the right until Scored Labels is visible, and click Scored Labels to select it.  The Scored Labels column represents the predicted MPG for each row in the test dataset. Notice the Statistics pane and histogram in the Visualizations pane on the right side of the Visualize dialogue box.
  27. Change Compare To Value
    In the Visualizations pane, change the compare to dropdown option to MPG.  The resulting ScatterPlot compares the Scored Labels (predicted MPG) with the actual MPG for each row in the test dataset.
  28. Close Visualization
    Click the X in the top right corner to close the Visualize dialogue box.
  29. Initiate Search (Evaluate)
    In the search box at the top of the Modules pane, type the word evaluate.
    Evaluate the Test Results

    Finally, we will evaluate how well the model performed against the test dataset using a set of standard metrics for measuring regression model performance.
  30. Add Module (Evaluate Model)
    Find the Evaluate Model module, and click and drag it onto the Canvas below the Score Model module.  Connect the output port on the Score Model module to the left input port on the Evaluate Model module.
  31. Run Experiment
    Click RUN at the bottom of the Canvas to run the experiment and train the model.  The model will be trained to predict the MPG column using the other fields in the dataset with the Linear Regression algorithm.
  32. Visualize
    When the experiment has finished running, click the output port on the Evaluate Model module and select Visualize from the displayed menu. The columns and values in the Visualize dialogue box represent common metrics for evaluating the performance of a Linear Regression model. The metrics are calculated using the results of the Score Model module. Many of the metrics are based on the Error, which is the difference between the Scored Labels (predicted value) and the actual values.
  33. Close Visualization
    Click the X in the top right corner to close the Visualize dialogue box.
  34. Sign Out
    Sign out of your workspace by clicking the profile picture at the top right of the page and selecting Sign Out from the displayed menu.
Congratulations!

You have successfully:
  • Split input data into Train and Test Data Sets
  • Trained a Predictive Model
  • Tested the Predictive Model
  • Evaluated the test results

Click Continue to close and finalize this lab.

No comments: