Quantcast
Channel: Intel Developer Zone Articles
Viewing all 1201 articles
Browse latest View live

Using LibRealSense and OpenCV to stream RGB and Depth Data

$
0
0

Table of Contents

Introduction 

In this document I will show you how you can use LibRealSense and OpenCV to stream RGB and depth data. This article assumes you have already downloaded, installed both LibRealSense, OpenCV and have them setup properly in Ubuntu. In this article I will be on Ubuntu 16.04 using the Eclipse Neon IDE though most likely earlier versions will work fine. It just happens to be what version of Eclipse I was working with when this sample was created.

In this article I make the following assumptions that the reader:

  1. Is somewhat familiar with using the Eclipse IDE. The reader should know how to open Eclipse and create a brand new empty C++ project.
  2. Is familiar with C++
  3. Knows how to get around Linux.
  4. Knows what Github is and knows how to at least download a project from a Github repository.

In the end you will have a nice starting point where you use this code base to build upon to create your own LibRealSense / OpenCV applications.

Conventions 

LRS = LibRealSense. I get tired of writing it out. It’s that simple. So, if you see LRS, you know what it means.

Software Requirements 

Supported Cameras 

  • RealSense R200

In theory all the RealSense cameras (R200, F200, SR300) should work with this code sample, however, this was only tested with the R200

Setting up the Eclipse Project 

As mentioned, I’m going to assume that the reader already is familiar with opening up Eclipse and creating a brand new empty C++ project.

What I would like to show you is the various C++ header and linker settings I used for creating my Eclipse project.

Header file includes 

The following image shows which header directories I’ve included. If you followed the steps for installing LRS, you should have your LibRealSense header files located in the proper location. The same goes for OpenCV

Header file includes

Library file includes 

This image shows you the libraries that are needed at runtime. The one LRS library and three OpenCV libraries. Again, I’m taking the assumption you have already setup LRS and OpenCV properly.

Library file includes

The main.cpp source code file contents 

Here is the source code for the example application.

/////////////////////////////////////////////////////////////////////////////

// License: Apache 2.0. See LICENSE file in root directory.

// Copyright(c) 2016 Intel Corporation. All Rights Reserved.

//

//

//

/////////////////////////////////////////////////////////////////////////////

// Authors
// * Rudy Cazabon
// * Rick Blacker
//
// Dependencies
// * LibRealSense
// * OpenCV
//
/////////////////////////////////////////////////////////////////////////////
// This code sample shows how you can use LibRealSense and OpenCV to display
// both an RGB stream as well as Depth stream into two separate OpenCV
// created windows.
//
/////////////////////////////////////////////////////////////////////////////

#include <librealsense/rs.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/highgui.hpp>

using namespace std;
using namespace rs;


// Window size and frame rate
int const INPUT_WIDTH      = 320;
int const INPUT_HEIGHT     = 240;
int const FRAMERATE        = 60;

// Named windows
char* const WINDOW_DEPTH = "Depth Image";
char* const WINDOW_RGB     = "RGB Image";


context      _rs_ctx;
device&      _rs_camera = *_rs_ctx.get_device( 0 );
intrinsics   _depth_intrin;
intrinsics  _color_intrin;
bool         _loop = true;


// Initialize the application state. Upon success will return the static app_state vars address

bool initialize_streaming( )
{
       bool success = false;
       if( _rs_ctx.get_device_count( ) > 0 )
       {
             _rs_camera.enable_stream( rs::stream::color, INPUT_WIDTH, INPUT_HEIGHT, rs::format::rgb8, FRAMERATE );
             _rs_camera.enable_stream( rs::stream::depth, INPUT_WIDTH, INPUT_HEIGHT, rs::format::z16, FRAMERATE );
             _rs_camera.start( );

             success = true;
       }
       return success;
}




/////////////////////////////////////////////////////////////////////////////
// If the left mouse button was clicked on either image, stop streaming and close windows.
/////////////////////////////////////////////////////////////////////////////
static void onMouse( int event, int x, int y, int, void* window_name )
{
       if( event == cv::EVENT_LBUTTONDOWN )
       {
             _loop = false;
       }
}


/////////////////////////////////////////////////////////////////////////////
// Create the depth and RGB windows, set their mouse callbacks.
// Required if we want to create a window and have the ability to use it in
// different functions
/////////////////////////////////////////////////////////////////////////////
void setup_windows( )
{
       cv::namedWindow( WINDOW_DEPTH, 0 );
       cv::namedWindow( WINDOW_RGB, 0 );

       cv::setMouseCallback( WINDOW_DEPTH, onMouse, WINDOW_DEPTH );
       cv::setMouseCallback( WINDOW_RGB, onMouse, WINDOW_RGB );
}


/////////////////////////////////////////////////////////////////////////////
// Called every frame gets the data from streams and displays them using OpenCV.
/////////////////////////////////////////////////////////////////////////////
bool display_next_frame( )
{

       _depth_intrin       = _rs_camera.get_stream_intrinsics( rs::stream::depth );
       _color_intrin       = _rs_camera.get_stream_intrinsics( rs::stream::color );


       // Create depth image
       cv::Mat depth16( _depth_intrin.height,
                                  _depth_intrin.width,
                                  CV_16U,
                                  (uchar *)_rs_camera.get_frame_data( rs::stream::depth ) );

       // Create color image
       cv::Mat rgb( _color_intrin.height,
                            _color_intrin.width,
                            CV_8UC3,
                            (uchar *)_rs_camera.get_frame_data( rs::stream::color ) );

       // < 800
       cv::Mat depth8u = depth16;
       depth8u.convertTo( depth8u, CV_8UC1, 255.0/1000 );

       imshow( WINDOW_DEPTH, depth8u );
       cvWaitKey( 1 );

       cv::cvtColor( rgb, rgb, cv::COLOR_BGR2RGB );
       imshow( WINDOW_RGB, rgb );
       cvWaitKey( 1 );

       return true;
}

/////////////////////////////////////////////////////////////////////////////
// Main function
/////////////////////////////////////////////////////////////////////////////
int main( ) try
{
       rs::log_to_console( rs::log_severity::warn );

       if( !initialize_streaming( ) )
       {
             std::cout << "Unable to locate a camera"<< std::endl;
             rs::log_to_console( rs::log_severity::fatal );
             return EXIT_FAILURE;
       }

       setup_windows( );

       // Loop until someone left clicks on either of the images in either window.
       while( _loop )
       {
             if( _rs_camera.is_streaming( ) )
                    _rs_camera.wait_for_frames( );

             display_next_frame( );
       }


       _rs_camera.stop( );
       cv::destroyAllWindows( );


       return EXIT_SUCCESS;

}
catch( const rs::error & e )
{
       std::cerr << "RealSense error calling "<< e.get_failed_function() << "("<< e.get_failed_args() << "):\n    "<< e.what() << std::endl;
       return EXIT_FAILURE;
}
catch( const std::exception & e )
{
       std::cerr << e.what() << std::endl;
       return EXIT_FAILURE;
}

Source code explained 

Overview 

The structure is pretty simplistic. It’s a one source code file containing everything we need for the sample. We have our header includes at the top. Because this is a sample application, we are not going to worry too much about “best practices” in defensive software engineering. Yes, we could have better error checking, however the goal here is to make this sample application as easy to read and comprehend as possible.

Constants 

Here you can see that we have various constant values for the width, height, framerate. Basic values used for dictating the size of the image we want to stream and size of the window we want to display the stream in as well as the framerate we want. After that we have two string constants. These are used for naming our OpenCV windows.

Global variables 

While I’m not a fan of global variables per-say, in a streaming app such as this I don’t mind bending the rules a little bit. And while simple streaming such as what is in this sample app may not be resource intensive, other things we could bring to the app could be. So, if we can squeeze out any performance now, it could be beneficial down the road.

  • _ctx is used to return a device (camera). Notice here that we are hard coding getting the first device. There are ways to detect all devices however that is out of scope for this article.
  • _rs_camera is the RealSense device(camera) that we are streaming from.
  • _dept_intrin this is a LRS intrinsics object that contains information about the current depth frame. In this case we are mostly interested in the size of the image.
  • _color_intrin this is a LRS intrinsics object that contains information about the current color frame. In this case we are mostly interested in the size of the image.
  • _loop is simply used to know when to stop the processing of images. Initially set to true, is set to false when a user clicks on an image in the OpenCV window.

I want point out that _dept_intrin and _color_intrin is not really necessary. They are not the product of calculations of any type. They are simply used for collecting intrinsic data in the display_next_frame( ) function, making it easier to read when creating the OpenCV Mat objects. These are global so we don’t have to create these two variables every single frame.

Functions 

main(…)

Obviously as the name implies, this is the main function. We don’t need any command line parameters so I’ve chosen to not include any parameters. The first thing that happens is showing how you can use a LRS to log to the console. Here we are asking LRS to print out any warnings to the console. Next we initialize the helper structure _app_state by calling initialize_app_state(). If there is an error, print it out and exit. After that we make a call to setup_windows(). At this point everything is setup and we can begin streaming. This is done in the while loop. While _loop is true, we will see if the camera is streaming, if so wait for the frames. We call get_next_frame to get the next frame from the camera and populate it into the global _app_state variable and then display it.

Once _loop has been set to false, we fall out of the while loop, stop the camera and tell OpenCV to close all its windows. At this point, the app will then quit.

initialize_streaming(…)

This where we initially setup the camera for streaming. We will have two streams, one depth, one color. The images will be the size specified in the constants. We also must specify the format of the stream and framerate. For future expansion, it might be better to add some kind of error checking/handling here. However to keep things simplistic, we have chosen not to do anything fancy. Assuming the happy path.

setup_windows(…)

This is a pretty easy function to understand. We tell OpenCV to create two new named windows. We are using the string constants WINDOW_DEPTH and WINDOW_RGB for the names. Once we have created them we associate a mouse call back function “onMouse”.

onMouse(…)

onMouse will be triggered anytime a user clicks on the body of the window. In specific, where the image is being displayed. We are using this function as an easy way to stop the application. All it does is check to see if the event was a left button click, if so, set the Boolean flag _loop to false. This will cause the code to exit out of the while loop in the main function.

display_next_frame(…)

This function is responsible for displaying the LRS data into OpenCV windows. We start off by getting the intrinsic data from camera. Next we create the depth and rgb OpenCV Mat objects. We specify their dimensions, their format and then assign their buffer to the cameras streams current frame. The depth Mat object gets the cameras depth data, the color Mat object gets the cameras color stream.

The next thing we do is create a new Mat object depth8u. This is to perform a scaling into a 0-255 range as required by OpenCVs imgshow() function which cannot display 16 bit depth images.

Once we have converted the depth image, we display it using the OpenCV function imgshow. We are telling it what named widow to use via the WINDOW_DEPTH constant and giving it the depth image. cvWaitKey(1) tells OpenCV to stop for a brief time to allow other processing to take place, such as key presses. After the depth window, now we move onto the color/rgb window. cvtColor will convert the Mat rgb object from OpenCVs BGR to RGB colorspace. Once that has completed, we show the image and call waitkey again.

Wrap up 

In this article, I’ve attempted to show you just how easy it is to stream data from a RealSense camera using the LibRealSense open source library and display it into a window using OpenCV. While this sample is simple, it does help form a base application from which you can create more complex applications using OpenCV.


OpenCL™ Drivers and Runtimes for Intel® Architecture

$
0
0

What to Download

By downloading a package from this page, you accept the End User License Agreement.

Installation has two parts:

  1. Intel® SDK for OpenCL™ Applications Package
  2. Driver and library(runtime) packages

The SDK includes components to develop applications.  Usually on a development machine the driver/runtime package is also installed for testing.  For deployment you can pick the package that best matches the target environment.

The illustration below shows some example install configurations. 

 

SDK Packages

Please note: A GPU/CPU driver package or CPU-only runtime package is required in addition to the SDK to execute applications

Standalone:

Suite: (also includes driver and Intel® Media SDK)

 

 

Driver/Runtime Packages Available

GPU/CPU Driver Packages

CPU-only Runtime Packages  

Deprecated 

 


Intel® SDK for OpenCL™ Applications 2016 R2 for Linux* (64 bit)

This is a standalone release for customers who do not need integration with the Intel® Media Server Studio (MSS).  It provides  components to develop OpenCL applications for Intel processors. 

Visit https://software.intel.com/en-us/intel-opencl to download the version for your platform. For details check out the Release Notes.

Intel® SDK for OpenCL™ Applications 2016 R2 for Windows* (64 bit)

This is a standalone release for customers who do not need integration with the Intel® Media Server Studio (MSS).  The Windows* graphics driver contains the driver and runtime library components necessary to run OpenCL applications. This package provides components for OpenCL development. 

Visit https://software.intel.com/en-us/intel-opencl to download the version for your platform. For details check out Release Notes.


OpenCL™ 2.0 GPU/CPU driver package for Linux* (64-bit)

The Intel intel-opencl-r3.0 (SRB3) Linux driver package  provides access to the GPU and CPU components of these processors:

  • Intel® 5th, 6th or 7th Generation Core™
  • Intel Pentium J4000 and Intel Celeron J3000
  • Intel® Xeon® v4, or Intel® Xeon® v5 Processors with Intel® Graphics Technology (if enabled by OEM in BIOS and motherboard)

Installation instructions.

Intel has validated this package on CentOS 7.2 for the following 64-bit kernels.

  • Linux 4.7 kernel patched for OpenCL 2.0

Supported OpenCL devices:

  • Intel Graphics (GPU)
  • CPU

For detailed information please see the driver package Release Notes.

 

The intel-opencl-2.0-2.0 driver for Linux is an intermediate release preceding  Intel® SDK for OpenCL™ Applications 2016 R2 for Linux*.  It provides access to the general-purpose, parallel compute capabilities of Intel® graphics for OpenCL applications as a standalone package. 

Intel has validated the intel-opencl-2.0-2.0 driver on CentOS 7.2 for the following 64-bit kernels.

  • Linux 4.4 kernel patched for OpenCL 2.0

Supported OpenCL devices:

  • Intel Graphics (GPU)
  • CPU

For detailed information please see the Release Notes.

For Linux drivers covering earlier platforms such as 4th Generation Core please see the versions of Media Server Studio in the Driver Support Matrix.


OpenCL™ Driver for Intel® Iris™ and Intel® HD Graphics for Windows* OS (64-bit and 32-bit)

The Intel® Graphics driver includes components needed to run OpenCL* and Intel® Media SDK applications on processors with Intel® Iris™ Graphics or Intel® HD Graphics on Windows* OS.

You can use the Intel Driver Update Utility to automatically detect and update your drivers and software.  Using the latest available graphics driver for your processor is usually recommended.


See also Identifying your Intel® Graphics Controller.

Supported OpenCL devices:

  • Intel Graphics (GPU)
  • CPU

For the full list of Intel® Architecture processors with OpenCL support on Intel Graphics under Windows*, refer to the Release Notes.

 


OpenCL™ Runtime for Intel® Core™ and Intel® Xeon® Processors

This runtime software package adds OpenCL CPU device support on systems with Intel Core and Intel Xeon processors.

Supported OpenCL devices:

  • CPU

Latest release (16.1.1)

Previous Runtimes (16.1)

Previous Runtimes (15.1):

For the full list of supported Intel Architecture processors, refer to the OpenCL™ Runtime Release Notes.

 


 Deprecated Releases

Note: These releases are no longer maintained or supported by Intel

OpenCL™ Runtime 14.2 for Intel® CPU and Intel® Xeon Phi™ Coprocessors

This runtime software package adds OpenCL support to Intel Core and Xeon processors and Intel Xeon Phi coprocessors.

Supported OpenCL devices:

  • Intel Xeon Phi Coprocessor
  • CPU

Available Runtimes

For the full list of supported Intel Architecture processors, refer to the OpenCL™ Runtime Release Notes.

What's New? - Intel® VTune™ Amplifier XE 2017 Update 1

$
0
0

Intel® VTune™ Amplifier XE 2017 performance profiler

A performance profiler for serial and parallel performance analysis. Overviewtrainingsupport.

New for the 2017 Update 1! (Optional update unless you need...)

As compared to 2017 initial release

  • Support for the Average Latency metric in the Memory Access analysis based on the driverless collection
  • Support for locator hardware event metrics for the General Exploration analysis results in the Source/Assembly view that enable you to filter the data by a metric of interest and identify performance-critical code lines/instructions
  • Command line summary report for the HPC Performance Characterization analysis extended to show metrics for CPU, Memory and FPU performance aspects including performance issue descriptions for metrics that exceed the predefined threshold. To hide issue descriptions in the summary report, use a new report-knob show-issues option.
  • Summary view of the General Exploration analysis extended to explicitly display measure for the hardware metrics: Clockticks vs. Piepline Slots
  • GPU Hotspots analysis extended to detect hottest computing tasks bound by GPU L3 bandwidth
  • PREVIEW: New Full Compute event group added to the list of predefined GPU hardware event groups collected for Intel® HD Graphics and Intel Iris™ Graphics. This group combines metrics from the Overview and Compute Basic presets and allows to see all detected GPU stalled/idle issues in the same view.
  • Support for hotspot navigation and filtering of stack sampling analysis data by the Total type of values in the Source/Assembly view

Resources

  • Learn (“How to” videos, technical articles, documentation, …)
  • Support (forum, knowledgebase articles, how to contact Intel® Premier Support)
  • Release Notes (pre-requisites, software compatibility, installation instructions, and known issues)

Contents

File: vtune_amplifier_xe_2017_update1.tar.gz

Installer for Intel® VTune™ Amplifier XE 2017 for Linux* Update 1 

File: VTune_Amplifier_XE_2017_update1_setup.exe

Installer for Intel® VTune™ Amplifier XE 2017 for Windows* Update 1 

File: vtune_amplifier_xe_2017_update1.dmg

Installer for Intel® VTune™ Amplifier XE 2017 - OS X* host only Update 1 

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Running Intel® Parallel Studio XE Analysis Tools on Clusters with Slurm* / srun

$
0
0

Since HPC applications target high performance, users are interested in analyzing the runtime performance of such applications. In order to get a representative picture of that performance / behavior, it can be important to gather analysis data at the same scale as regular production runs. Doing so however, would imply that shared memory- focused analysis types would be done on each individual node of the run in parallel. This might not be in the user’s best interest, especially since the behavior of a well-balanced MPI application should be very similar across all nodes. Therefore, users need the ability to run individual shared memory- focused analysis types on subsets of MPI- ranks or compute nodes.

 

There are multiple ways to achieve this, e.g. through

  1. Separating environments for different ranks through the MPI runtime arguments
  2. MPI library specific environments for analysis tool attachment like “gtool” for the Intel®MPI Library
  3. Batch scheduler parameters that allow separating the environments for different MPI ranks

 

In this article, we want to focus on the third option by using the Slurm* workload manager, which allows us to stay independent of the MPI library implementation being utilized.

The Slurm batch scheduler comes with a job submission utility called srun. A very simple srun job submission could look like the following:

$ srun ./my_application

Now, attaching analysis tools such as - Intel® VTune Amplifier XE, Intel® Inspector XE or Intel® Advisor XE from the Intel Parallel Studio XE tools suite– could look like the following:

$ srun amplxe-cl –c hotspots –r my_result_1 -- ./my_application

The downside of this approach, however, is that the analysis tool - VTune in this case – will be attached to each individual MPI rank. Therefore, the user will get at least as many result directories as there are shared memory nodes within the run.

If the user is only interested in analyzing a subset of MPI ranks or shared memory nodes, they can leverage the multiple program configuration from srun. Therefore, the user needs to create a separate configuration file that will define which MPI ranks will be analyzed:

$ cat > srun_config.conf << EOF
0-98    ./my_application
99      amplxe-cl –c hotspots –r my_result_2 -- ./my_application
100-255 ./my_application
EOF

As one can see from this example configuration, the user runs the target application across 256 MPI ranks, where only the 100th MPI process (i.e., rank #99) will be analyzed with VTune while all other ranks remain unaffected.

Now, the user can execute srun leveraging the created configuration file by using the following command:

$ srun --multi-prog ./srun_config.conf

This way, only one result directory for rank #99 will be created.

*Other names and brands may be claimed as the property of others.

MeritData Speeds Up a Big Data Platform

$
0
0

Being able to analyze massive quantities of data is more important than ever before in today’s data-driven world. Chinese company MeritData helps its customers explore and exploit the value in their data using data analysis algorithms and other powerful tools for data processing, mining, and visualization.

To keep at the top of its game, MeritData has to ensure its data mining algorithms are as efficient as possible. And to do that, MeritData turned to Intel.  Intel worked with MeritData’s algorithm engineers to optimize the company’s multiple data mining algorithms using Intel® Data Analytics Acceleration Library (Intel® DAAL) and Intel® Math Kernel Library (Intel® MKL). The result was average performance improvements ranging from 3x all the way to 14x.

“Through close collaboration with Intel engineers, we adopted the Intel® Data Analytics Acceleration Library and Intel® Math Kernel Library for algorithm optimization in our big data analysis platform (Tempo*).” explained Jin Qiang, data mining algorithm architect at MeritData. “The performance―and customers’ experience―is improved significantly. We really appreciate the collaboration with Intel, and are looking forward to more collaboration.”

Get the whole story in our new case study.

An Artificial Intelligence Primer for Developers

$
0
0

Download Document [PDF 453K]

Computer scientists have been pursuing Artificial Intelligence (AI) for over 60 years. While the term has meant different things over the decades, recent advances have brought us closer to achieving machine intelligence than ever before.

Developers who are just learning about AI and machine learning will have many questions, from “What can I do with AI?” or “Why would AI be a programming solution?” to “What is necessary to say my program is learning?” and “What level of machine interaction is necessary to make it seem intelligent?” or even “Does my program need to appear intelligent to be intelligent?”

Introduction

Artificial intelligence (AI) is both a problem and a solution. It is a rapidly growing field of inquiry that could solve deep complex problems such as medical diagnosis or undersea mining. It can also give us fun solutions such as valued competition in video games. Developers not only develop the intelligence but also help mold it to solve problems.

Artificial Intelligence (AI) in the World

AI is a truly massive revolution in computing. It is fundamental in all kinds of computing fields, such as gaming, robotics, medicine, transportation, and Internet of Things (IoT). And it’s happening at a depth that will transform society in much the same ways as the industrial revolution, the technical revolution, and the digital revolution altered every aspect of daily life.

Even though AI has been a part of computing for many decades, the prospects of what AI can do, of what we can do with AI, still have us at the beginning of the field. Here are a few examples of what it will enable as the technology matures.

AI will accelerate how we answer large-scale problems that would otherwise take months, years, or decades to resolve. Medical treatments such as drugs or other interventions will be personalized at the level of an individual’s DNA. Intelligent assistants will forestall mistakes and open new opportunities by providing real-time guidance about the world around us. In commerce, it will be much easier to detect—and in some cases even eliminate—fraud.

AI will unleash new scientific discovery. No longer restricted by human biology and cognitive methods, scientists will be able to mine new insights in the realms of the deep sea and space, the animal and insect kingdoms, particle physics, mysteries of the brain, and more.

AI will augment our human capabilities. A new symbiosis between human and machine will expand our capacity, so that medical diagnoses can be more precise, legal counsel can encompass the entire history of case law, and other services will achieve unprecedented levels of accuracy.

AI will remove the burden of tedious or dangerous tasks, such as driving, firefighting, and mining. We are already seeing the early stages of this field with autonomous cars.

Artificial Intelligence Improved with Intel

At Intel, the ideas of AI are not tied to levels of capability. Rather than seeing AI as the end result, the ability to define human understanding, Intel sees AI as a computational tool for solving human problems. Rather than defining what it takes for an intelligence to be human, or what minimum tests must be passed to attain a threshold of “intelligence,” AI runs in a cycle of Sense, Reason, Act, Adapt. The input (Sense) is analyzed and a result formulated (Reason). Based on this, the proper action is chosen (Act) and based on results, the input is then used to improve how input is gathered and selected, and the calculations made on the input is improved (Adapt).

Rather than go into the different ways to determine if a machine has a human level of intelligence, the four-step cycle used at Intel is all you need to guide your programming to create an AI solution. In addition to the methodology, Intel, of course, offers computer technologies that make the complex calculations necessary to make AI run faster.

What is Artificial Intelligence?

It may be good to call AI a solution, but the question still hides in the background: How do we know when it’s intelligent? A number of tests have been developed to “tell” if an AI’s ability to exhibit intelligent behavior is indistinguishable from a human. The most famous of these is the Turing Test. In daily use cases, the determination of whether or not an intelligence is equivalent to our own is an academic point.

While this is important to understand what intelligence means, there is a more practical consumer application regarding what AI is. If the AI is the solution to our problem then that is what is important. So when you consider how AI is a powerful tool, no matter the technique we might use to create and harness AI or the scope to which AI is employed, the intelligence must be able to sense, reason, and act, then adapt based on experience.

Sensing requires the AI to identify and/or recognize meaningful concepts or objects in the midst of vast pools of data. Is this a tumor or normal tissue? Is this a stoplight or a neon sign, and if it’s a stoplight, is it green, yellow, or red?

Reasoning requires that the AI understands a larger context and make a plan to achieve a goal. If the goal is to make a differential diagnosis, then the machine must consider the patient’s reported symptoms, DNA profile, medical history, and environmental influence in addition to the findings from imaging and lab tests. If the goal is to avoid a vehicle collision, the AI must calculate the likelihood of a crash based on vehicle behaviors, proximity, speed, and road conditions.

Acting means that the AI either recommends or directly initiates the best course of action to maximize the desired outcome. Based on a diagnosis, it may recommend or perform a treatment protocol. Based on vehicle and traffic analyses, it may brake, accelerate, or prepare safety mechanisms.

Finally, we must be able to adapt algorithms (both within the AI and as part of the computing system the AI resides in) at each phase based on experience, retraining them to be ever more intelligent in their inferences. Healthcare algorithms should be re-trained to detect disease with more accuracy, better grasp context, and improve treatment based on previous outcomes. Autonomous vehicle algorithms should be re-trained to recognize more blind spots, factor new variables into the context, and adjust actions based on previous incidents.

Today, the greatest ability lies in the “sense” phase, while progress continues to be made in both reasoning and action. The majority of techniques used involve mathematical or statistical algorithms, including regression, decision trees, graph theory, classification, clustering, and many more. However, an emerging algorithm in deep learning is growing rapidly, harnessing deep neural networks that simulate the basic function of neurons in the human brain.

The Market for Artificial Intelligence

It might be easy to see that there are huge areas that benefit from AI. Cancer research, space exploration, and self-driving vehicles are a few fields. But they seem so overwhelming that a person getting started in AI might feel like they can’t contribute let alone make a difference. But AI is used in places you don’t expect, which is why it’s such a broad, useful tool. You can work on non-player characters in games, on predictive route-finding applications, even sheepherding robots. There is no limit to the possibilities of AI.

Business Interest in Artificial Intelligence

In previous years, businesses have not been as willing to invest in AI because there have been large research costs involved. But this has changed. Business leaders, from CTOs to CFOs to CEOs, recognize the utility and even necessity of AI as a solution.

In 2014, there was more than USD 300 million invested into AI startups, which was up 20 times from USD 15 million in 20101, and the global robotics and AI markets are estimated to grow to USD 153 billion by 2020. The market for AI systems in healthcare alone is estimated to grow from USD 633 million in 2014 to more than USD 6 billion by 20211; and by 2020, autonomous software will participate in 5 percent of all economic transactions2. Companies are putting more and more effort into AI R&D and products; your input will only help.

But What Field Can I Work In?

It’s understandable to see the amount of interest in AI and how opportunities are growing. But what kinds of fields are actually using AI? Maybe you aren’t interested in gaming, and maybe you want to grow in a field that isn’t in academic research. What else can you do? Here is a small list of fields and how AI is growing in them:

Healthcare

  • Image analysis – Medical startups are pursuing technology that will help read X-rays, MRIs, CAT scans, and more.
  • Dulight* – This is a wearable that identifies food, money, and more for the visually impaired.

Automotive

  • Self-driving cars – AI helps autonomous cars recognize road signs, people, and other vehicles.
  • Infotainment – Improved speech recognition helps drivers better engage with music, maps, and more.

Industrial

  • Repairs and maintenance – AI systems can anticipate repairs and improve preventative maintenance.
  • Precision agriculture – AI can help improve food production with efficient fertilization methods and time-to-market.
  • Sales and time-to-market – AI can predict which products will be sold faster or in more volumes in different areas at different times of year, and which times it would be more efficient to keep them in stock or have them drop-shipped to customers.

Sports

  • Performance optimization – AI systems can help coach athletes’ conditioning and nutrition, and improve their skills.
  • Injury prevention – Equipment design and improved play calling, and even predictive rules needs for player safety.

Even from this brief, incomplete list you can see how many opportunities are available. AI can be used to improve lives in so many ways. How, is up to you.

Artificial Intelligence – Driven by Intel

Intel is not merely invested in the growth of AI, we are committed to fueling the AI revolution. AI is a top priority for Intel and we’re committed to leading the charge, both through our own R&D and through acquisitions. Our innovation and integration of capabilities into the CPU, driven by Moore’s Law, will continue to deliver the best possible results for performance, efficiency, density, and cost effectiveness. Also, we have a long history of successfully executing technology shifts driven by groundbreaking technologies, including breakthroughs in memory, graphics, I/O, and wireless, and we have in place today the toolbox and unique capabilities needed for the transformation to AI.

First, Intel is compressing the AI innovation cycle in bold new ways. We’ve acquired the best deep-learning talent and technology on the planet, Nervana*, which will not only accelerate AI data ingestion and model building, but also deliver a substantial training performance versus GPU next year through integrating the Nervana technology into the CPU.

Second, as AI becomes pervasive in applications from datacenters to the IoT, Intel has the unique, complete portfolio to deliver end-to-end AI solutions.

Finally, Intel has the experience of successfully leading past transformations from the client/server model, to server virtualization, to the rise of the cloud.

Intel can offer crucial technologies to drive the AI revolution, but we must work together as an industry – and as a society – to achieve the ultimate potential of AI. To that end, Intel leads the charge for open data exchanges and initiatives, easier-to-use tools, training to broaden the talent pool, and equal access to intelligent technology. We entered a partnership with Data.gov in an open data initiative. An open car collaboration with BMW will reduce duplicated effort and accelerate innovation, with society playing a key role.

Intel is committed to compressing the innovation cycle from conception to deployment of ever more intelligent, robust, and cooperative AI agents, through breakthroughs in data ingestion and the building, training, and deployment of models. These AI capabilities will be driven by a portfolio of powerful technology solutions.

Conclusion

AI is rapidly transforming industries and is an increasingly important source of competitive advantage. To maintain a leadership position in your field, this is the best time to begin integrating AI into your products, services, and your business processes. Visit the Intel Software Developer Zone for Artificial Intelligence: https://software.intel.com/en-us/machine-learning to get started today.

Notes

  1. Clark, Jack. “I'll Be Back: The Return of Artificial Intelligence,” Bloomberg Technology, February 2015. http://www.bloomberg.com/news/articles/2015-02-03/i-ll-be-back-the-return-of-artificial-intelligence
  2. Gartner Press Release. “Gartner Reveals Top Predictions for IT Organizations and Users for 2016 and Beyond,” October 6, 2015. http://gartner.com/newsroom/id/3143718

Preparing for the 2016 HPC Developer conference Python Lab

$
0
0

 

Here are the steps you need to perform to prepare for HPC Developer conference Python Lab

As part of Python profiling we will be using Intel(r) VTune(tm) Amplifier 2017. You can get a free evaluation copy of VTune Amplifier using the following link: https://software.intel.com/en-us/intel-vtune-amplifier-xe/try-buy

Click on either Windows* trial or Linux* trial

You also need a version of Python running on your system. We currently support Python 2.7 and Python 3.5. If you would like to use the Intel version you can get a free copy using the following link:https://software.intel.com/en-us/intel-distribution-for-python

Click on Download free.

 

 

What's New? Intel® Threading Building Blocks 2017 Update 2

$
0
0

The updated version contains several bug fixes when compared to the previous  Intel® Threading Building Blocks (Intel® TBB) 2017 release. Information about new features of previous releases you can find under the following links.

Obsolete

Removed the long-outdated support for Xbox* consoles.

Bugs fixed:

  • Fixed the issue with task_arena::execute() not being processed when the calling thread cannot join the arena.
  • Fixed dynamic memory allocation replacement failure on macOS* 10.12.
  • Fixed dynamic memory allocation replacement failures on Windows* 10 Anniversary Update.
  • Fixed emplace() method of concurrent unordered containers to not require a copy constructor.

You can download the latest Intel TBB version from http://threadingbuildingblocks.org and https://software.intel.com/en-us/articles/intel-tbb


Intel® Deep Learning SDK Release Notes

Recipe: Building NAMD on Intel® Xeon® and Intel® Xeon Phi™ Processors

$
0
0

Purpose

This recipe describes a step-by-step process of how to get, build, and run NAMD, Scalable Molecular Dynamic, code on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors for better performance.

Introduction

NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecule systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.

NAMD is distributed free of charge with source code. You can build NAMD yourself or download binaries for a wide variety of platforms. Find the details below of how to build on Intel® Xeon Phi™ processor and Intel® Xeon® E5 processors and learn more about NAMD at http://www.ks.uiuc.edu/Research/namd/

Building NAMD on Intel® Xeon® Processor E5-2697 v4 (BDW) and Intel® Xeon Phi™ Processor 7250 (KNL)

  1. Download the latest NAMD source code(Nightly Build) from this site: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD
  2. Download fftw3 from this site: http://www.fftw.org/download.html
    • Version 3.3.4 is used in this run
  3. Build fftw3:
    1. Cd<path>/fftw3.3.4
    2. ./configure --prefix=$base/fftw3 --enable-single --disable-fortran CC=icc
                        Use xMIC-AVX512 for KNL or –xCORE-AVX2 for BDW
    3. make CFLAGS="-O3 -xMIC-AVX512 -fp-model fast=2 -no-prec-div -qoverride-limits" clean install
  4. Download charm++* version 6.7.1
  5. Build multicore version of charm++:
    1. cd <path>/charm-6.7.1
    2. ./build charm++ multicore-linux64 iccstatic --with-production "-O3 -ip"
  6. Build BDW:
    1. Modify the Linux-x86_64-icc.arch to look like the following:
      NAMD_ARCH = Linux-x86_64
      CHARMARCH = multicore-linux64-iccstatic
      FLOATOPTS = -ip -xCORE-AVX2 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits -DNAMD_DISABLE_SSE
      CXX = icpc -std=c++11 -DNAMD_KNL
      CXXOPTS = -static-intel -O2 $(FLOATOPTS)
      CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
      CXXCOLVAROPTS = -O2 -ip
      CC = icc
      COPTS = -static-intel -O2 $(FLOATOPTS)
    2.  ./config Linux-x86_64-icc --charm-base <charm_path> --charm-arch multicore-linux64- iccstatic --with-fftw3 --fftw-prefix <fftw_path> --without-tcl --charm-opts –verbose
    3. gmake -j
  7. Build KNL:
    1. Modify the arch/Linux-KNL-icc.arch to look like the following:
      NAMD_ARCH = Linux-KNL
      CHARMARCH = multicore-linux64-iccstatic
      FLOATOPTS = -ip -xMIC-AVX512 -O3 -g -fp-model fast=2 -no-prec-div -qoverride-limits
      DNAMD_DISABLE_SSE
      CXX = icpc -std=c++11 -DNAMD_KNL
      CXXOPTS = -static-intel -O2 $(FLOATOPTS)
      CXXNOALIASOPTS = -O3 -fno-alias $(FLOATOPTS) -qopt-report-phase=loop,vec -qopt-report=4
      CXXCOLVAROPTS = -O2 -ip
      CC = icc
      COPTS = -static-intel -O2 $(FLOATOPTS)
    2. ./config Linux-KNL-icc --charm-base <charm_path> --charm-arch multicore-linux64-iccstatic --with-fftw3 --fftw-prefix <fftw_path> --without-tcl --charm-opts –verbose
    3. gmake –j
  8. Change the kernel setting for KNL: “nmi_watchdog=0 rcu_nocbs=2-271 nohz_full=2-271”
  9. Download apoa and stmv workloads from here: http://www.ks.uiuc.edu/Research/namd/utilities/
  10. Change next lines in *.namd file for both workloads:
    	numsteps         1000
            outputtiming     20
            outputenergies   600

Run NAMD workloads on Intel® Xeon® Processor E5-2697 v4 and Intel® Xeon Phi™ Processor 7250

Run BDW (ppn = 72):

           $BIN +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)

Run KNL (ppn = 136, MCDRAM in flat mode, similar performance in cache mode):

           numactl –m 1 $BIN +p $ppn apoa1/apoa1.namd +pemap 0-($ppn-1)

Performance results reported in Intel® Salesforce repository

(ns/day; higher is better):

WorkloadIntel® Xeon® Processor E5-2697 v4 (ns/day)Intel® Xeon Phi™ Processor 7250 (ns/day)KNL vs. 2S BDW (speedup)
stmv0.450.55  1.22x
Ap0a15.5  6.181.12x

Systems configuration:

ProcessorIntel® Xeon® Processor E5-2697 v4(BDW)Intel® Xeon Phi™ Processor 7250 (KNL)
Stepping1 (B0)1 (B0) Bin1
Sockets / TDP2S / 290W1S / 215W
Frequency / Cores / Threads2.3 GHz / 36 / 721.4 GHz / 68 / 272
DDR4 8x16 GB 2400 MHz(128 GB)6x16 GB 2400 MHz
MCDRAMN/A16 GB Flat
Cluster/Snoop Mode/Mem ModeHomeQuadrant/flat
TurboOnOn
BIOSGRRFSDP1.86B0271.R00.1510301446GVPRCRB1.86B.0010.R02.1608040407
CompilerICC-2017.0.098ICC-2017.0.098
Operating System

Red Hat* Enterprise Linux* 7.2

(3.10.0-327.e17.x86_64)

Red Hat Enterprise Linux 7.2

(3.10.0-327.22.2.el7.xppsl_1.4.1.3272._86_64)

  

Intel® Advisor 2017 Update 1 What’s new

$
0
0

We’re pleased to announce new version of the Vectorization Assistant tool - Intel® Advisor 2017 update 1. For details about download, terms and conditions please refer to Intel® Parallel Studio 2017 program site.

Below are highlights of the new functionality in Intel Advisor 2017 update 1

Cache-aware roofline modeling: To enable this preview feature, set the environment variable ADVIXE_EXPERIMENTAL=roofline before launching the Intel Advisor.

Analysis workflow:

  • Intel® Math Kernel Library (Intel® MKL) support: Intel Advisor results now show Intel MKL function calls.
  • Improved FLOPs analysis performance.
  • Decreased Survey analysis overhead for the Intel® Xeon Phi™ processor.
  • New category for instruction mix data: compute with memory operands.
  • Finalize button for “no-auto-finalize” results, when result is finalized on a separate machine.
  • MPI support in the command line dialog box..

Recommendations:

  • Recommendations display in Refinement Reports.
  • New recommendation: Vectorize call(s) to virtual method.
  • Cached recommendations in result snapshots to speed up display.

Memory analysis:

  • Ability to track refinement analysis progress: You can stop collection if every site executes at least once.

Get Intel Advisor and more information

Visit the product site, where you can find videos and tutorials. Register for Intel® Parallel Studio XE 2017 to download the whole bundle, including Intel Advisor 2017 update 1.

Understanding and Harnessing the Capabilities of Intel® Xeon Phi™ Processor (Code Named Knights Landing) Lab - HPC Developer Conference 2016

$
0
0

At the 2016 HPC Developer Conference in Salt Lake City, we will be running a lab entitled Understanding and Harnessing the Capabilities of Intel® Xeon Phi™ Processor (Code Named Knights Landing).  In order to maximize the benefit from this lab, we are asking all attendees to meet certain requirements and providing some recommendations.

Requirements:

  • A laptop or similar portable computer with wireless connectivity.
  • A modern SSH client installed on the laptop, such as PuTTY* 0.66 or later for Windows*, iTerm2* for OS X*, or OpenSSH* 5.3 for Linux*.
  • Firewall configuration to allow SSH functionality.

Recommended:

  • VNC client/viewer program such as VNC Viewer* installed on the laptop for running graphical applications remotely.
  • Intel® Parallel Studio XE Cluster Edition 2017 installed on the laptop to use the tools locally.
  • Basic familiarity with Intel® C/C++ Compiler, Intel® Trace Analyzer and Collector, Intel® VTune™ Amplifier XE, and Intel® Advisor.
    • These are components of Intel® Parallel Studio XE and will be used in the lab.

We expect that all attendees will have completed the required steps before the lab begins and will not be providing support for installing required tools during the lab.  We will be providing access to an Intel owned cluster for use during the lab, and will provide assistance with connecting to and using this cluster.

HPC Applications for Supercomputing 2016

Tutorial: Intel® IoT Gateway, Industrial Oil & Gas Pressure Sensor, and AWS* IoT

$
0
0

In this use case tutorial we'll use an Intel® NUC and Intel® IoT Gateway Developer Hub to interface an industrial fluid/gas pressure sensor to AWS* IoT running in the Amazon Web Services* Cloud. The application software we'll develop will control the pressure sensor and continuously transmit pressure measurements to AWS IoT where the data will be stored, processed and evaluated in the cloud.

The pressure sensor is an Omega* PX409-USBH which is a high speed industrial pressure transducer with a USB interface. The sensor is available in a number of different sensing configurations and pressure ranges from vacuum to 5,000 PSI. For this tutorial we'll use model PX409-150GUSBH which is a gage pressure transducer with a measurement range of 0 to 150 PSI. It's designed to connect to piping using a ¼-18 NPT threaded pipe fitting.

We'll connect the sensor to a pressurized water line with a secondary gauge for visual inspection of the pressure. Figure 1 shows the sensor configuration and fittings. (1) is the pressure sensor; (2) is the sensor USB cable that will connect to the Intel® NUC; (3) is a secondary gauge for comparison purposes; (4) and (5) are inlet and outlet connections along with control valves for changing the pressure.

Figure 1. The Omega PX409-USBH industrial pressure sensor

Setup Prerequisites

  1. Intel® NUC powered up and connected to a LAN network with Internet access and a development laptop or workstation for logging into the Intel® NUC.
  2. Intel® IoT Gateway Developer Hub running on the Intel® NUC and software updates applied.
  3. Package "packagegroup-cloud-aws" installed on the Intel® NUC.
  4. An active account on Amazon Web Services and familiarity with the AWS console, AWS IoT, and AWS Elasticsearch Service.

Connect Pressure Sensor to the Intel® NUC

  1. Connect the pressure sensor USB cable into the USB port on the front of the Intel® NUC. After connecting the sensor the Intel® NUC will automatically assign a TTY device such as /dev/ttyUSB0 or /dev/ttyUSB1. The exact name will vary depending on whether you’ve connected other USB devices to the Intel® NUC.
  2. Find the sensor’s device name by logging into the Intel® NUC over the LAN network using ssh. You’ll need to know the Intel® NUC’s IP address assigned on the LAN. For this example the IP address is 192.168.22.100 and the login name is gwuser and password gwuser.
ssh gwuser@192.168.22.100

gwuser@WR-IDP-9C99:~$  ls /dev/ttyUSB**
/dev/ttyUSB0

Here we see only one USB device which is named /dev/ttyUSB0 - that’s the pressure sensor. Next we’ll run some verifications tests to confirm the sensor is communicating with the Intel® NUC.

  1. Use the screen utility to connect directly to the USB port and manually issue commands. The commands you type won’t be visible – only the results of the commands will display.

     

    gwuser@WR-IDP-9C99:~$  sudo screen /dev/ttyUSB0 115200
  2. Type in ENQ and hit Enter. You should receive a response that looks like this:
USBPX2
1.0.13.0826
0.0 to 150.0 PSI G

When you see this response it confirms that the pressure sensor is communicating with the Intel® NUC through the USB port. Exit screen by typing these commands: Control-A, Control-\, y.

Configure AWS* IoT and Node-RED*

Node-RED* is a visual tool for building Internet of Things applications. It’s pre-installed on the Intel® NUC as part of the Intel® IoT Gateway Developer Hub.

  1. Log into your AWS account and navigate to the AWS IoT console.
  2. Create a new device (thing) named Intel_NUC and a Policy named PubSubToAnyTopic that allows publishing and subscribing to any MQTT topic.
  3. Create and activate a new Certificate and download the private key file, certificate file, and root CA file (available here) to your computer.
  4. Attach the Intel_NUC device and PubSubToAnyTopic policy to the new certificate.
  5. While logged into the Intel® NUC via ssh as gwuser, create the directory /home/gwuser/awsiot and then use SFTP or SCP to copy the downloaded private key file, certificate file and root CA files from your workstation to the /home/gwuser/awsiot directory on the Intel® NUC.

Create Node-RED* Sensor Flow

In the Intel® IoT Gateway Developer Hub click on the Sensors icon and then the Program Sensors button. This will open the Node-RED canvas. If you get another login prompt use the same gwuser / gwuser login credentials you used when logging into the Intel® IoT Gateway Developer Hub.

You will see a default Node-RED flow for a RH-USB sensor. We’re not using that type of sensor here so you can either delete that flow or disable it by double-clicking the Interval node and setting Repeat to none followed by clicking Done. If you don’t delete the flow, select all the elements of the flow using the mouse and drag them down lower on the screen to open up more room at the top. Click the Deploy button to save and activate the changes.

First we’ll build a flow that continuously reads the pressure sensor and displays the pressure readings in the Intel® IoT Gateway Developer Hub. Follow these steps:

  1. Drag the following types of nodes from the list on the left side onto the canvas and arrange them like shown in Figure 2: (1) inject (input), (2) function (function), (3) serial (output), (4) serial (input), (5) function (function), (6) chart tag (function), (7) mqtt (output).
  2. A couple of the names will automatically change when you drop them on the canvas: inject will change to timestamp, and function will change to blank. Use the mouse to draw lines between the nodes so they look like Figure 2.

Figure 2. Initial Node-RED flow with nodes and connections

  1. Next, double-click on each node corresponding to the numbered callouts and set the node parameters as shown in Figure 3. You may need to move the nodes around to maintain a clean layout.
  2. For the serial nodes use the /dev/ttyUSBn device name corresponding to your pressure sensor (we're using /dev/ttyUSB0).
  3. The serial port name and parameters are set by clicking the pencil icon next to Add new serial-port… and setting the values as shown in item 3A of Figure 3, then clicking Add.
  4. When you're done configuring the nodes the flow should look like Figure 4.

Figure 3. Node configuration details

Figure 4. Configured Node-RED flow and pressure data display

  1. Click the Deploy button to deploy and activate the flow, then click your browser’s refresh button to refresh the entire web page. You should now see a live pressure gauge in the upper part of the Intel® IoT Gateway Hub as shown in Figure 4. You can apply water pressure to the sensor assembly and you’ll see the pressure readings increase and decrease as the pressure varies.

Connect Intel® NUC to AWS IoT

  1. Drag a mqtt output node onto the Node-RED canvas and then double-click it.
  2. In the Server pick list select Add new mqtt-broker… and then click the pencil icon to the right. In the Connection tab, set the Server field to your AWS IoT endpoint address which will look something like aaabbbcccddd.iot.us-east-1.amazonaws.com. You can find the endpoint address by using the AWS CLI command aws iot describe-endpoint on your workstation.
  3. Set the Port to 8883 and checkmark Enable secure (SSL/TLS) connection, then click the pencil icon to the right of Add new tls-config…
  4. In the Certificate field enter the full path and filename to your certificate file, private key file, and root CA file that you copied earlier into the /home/gwuser/awsiot directory. For example, the Certificate path might look like /home/gwuser/awsiot/1112223333-certificate.pem.crt and the Private Key path might look like /home/user/awsiot/1112223333-private.pem.key. The CA Certificate might look like /home/gwuser/awsiot/VeriSign-Class-3-Public-Primary-Certification-Authority-G5.pem.
  5. Checkmark Verify server certificate and leave Name empty.
  6. Click the Add button and then click the second Add button to return to the main MQTT out node panel.
  7. Set the Topic to nuc/pressure, set QoS to 1, and set Retain to false.
  8. Set the Name field to Publish to AWS IoT and then click Done.

Drag another function node onto the Node-RED canvas. Double-click to edit the node and set the Name to Format JSON. Edit the function code so it looks like this:

msg.payload = {
  pressure: Number(msg.payload),
  timestamp: Date.now()
  };
return msg;
  1. Click Done to save the function changes.
  2. Draw a wire from the output of the Extract Data node to the input of Format JSON, and another wire from the output of Format JSON to the input of Publish to AWS IoT. These changes will convert the pressure reading into a JSON object and send it to AWS IoT.
  3. Click the Deploy button to deploy and activate the changes. The finished flow should look like Figure 5.

Figure 5. Finished flow with connection to AWS IoT

Back in the AWS IoT console, start the MQTT Client and subscribe to the topic nuc/pressure. You should see messages arriving once a second containing live pressure readings. Vary the pressure on the sensor and observe the values increasing and decreasing.

Recording and Visualizing Pressure History in AWS* IoT

Now that live pressure data is arriving in AWS IoT, a variety of additional data processing can be performed in the AWS cloud. Here we’ll send the data into Elasticsearch where it can be searched and visualized on dashboards.

  1. In the AWS console, navigate to the Elasticsearch Service and provision an Elasticsearch cluster with a domain name of nucdata. You can initially make it a small cluster with one instance.
  2. Set the security access policy to your Internet access preferences and also add a policy for principal “AWS”: ""* to access “Action”: “ESHttpPut”. Wait for the cluster provisioning to complete and the Domain status to change to Active.
  3. Use curl or a REST tool to create an Elasticsearch index named nucs using the Endpoint URI listed in the AWS Elasticsearch cluster console. When creating the index, include a type named nuc with two properties: pressure of type float and timestamp of type date.
  4. In the AWS IoT console, create a Rule named Record_Pressure.
  5. Set the Description to Record pressure readings to Elasticsearch, set the Attribute to pressure,timestamp and set the Topic Filter to nuc/pressure. Leave Condition blank.
  6. In the action section choose an action of Send the message to a search index cluster (Amazon Elasticsearch Service) and select nucdata for the Domain name. The Endpoint will be filled in automatically.
  7. Set the ID to ${newuuid()}, set the Index to nucs, and set the Type to nuc.
  8. For the Role name, click Create a new role and set the role name to aws_iot_elasticsearch.
  9. Click Add action and then click Create to create the rule. This rule will take data from the nuc/pressure MQTT topic and send it into Elasticsearch where it can be searched and viewed.

To search and view data:

  1. Navigate to your Elasticsearch cluster Kibana URI and create an index pattern named nucs using timestamp as the Time-field name. You should see pressure and timestamp in the fields list.
  2. Navigate to the Kibana Discover tab and enable auto-refresh at a 5 second interval. You should see roughly 5 new data records every 5 seconds – these are the pressure readings coming from the NUC.
  3. Navigate to the Visualize tab and create a Line chart time series visualization using X and Y parameters shown in Figure 6.
  4. Now vary the actual pressure on the sensor and watch the time series graph in Kibana. Figure 6 shows a live pressure cycle starting at 0 PSI, jumping to 82 PSI, incrementally stepping down to 20 PSI, ramping up to 76 and then 82 PSI, dropping abruptly to 0 PSI, stepping up to 55 PSI for a short period, and then stepping up to 82 PSI.

Figure 6. Live graph of pressure reading data in AWS

vHost User Client Mode in Open vSwitch* with DPDK

$
0
0

This article describes the concept of vHost User client mode, how it can be tested, and the benefits the feature brings to Open vSwitch* (OVS) with the Data Plane Development Kit (DPDK). This article was written with OVS users in mind who wish to know more about the feature and for users who may be configuring a virtualized OVS DPDK setup that uses vHost User ports as the guest access method for virtual machines (VMs) running in QEMU.

Note: vHost User client mode in OVS with DPDK is available on both the OVS master branch and the 2.6 release branch. Users can download the OVS master branch as a zip here or the 2.6 branch as a zip here. Installation steps for the master branch can be found here. 2.6 installation steps can be found here.

vHost User Client Mode

vHost User client mode was introduced in DPDK v16.07 to address a limitation in the DPDK, whereby if the vHost User backend (DPDK-based application such as OVS with DPDK) crashes or is restarted, VMs with DPDK vHost User ports cannot re-establish connectivity with the backend and are essentially rendered useless from a networking perspective. vHost User client mode solves this problem.

The vHost User standard uses a client-server model. The server creates and manages the vHost User sockets and the client connects to the sockets created by the server. Before the introduction of this feature, the only configuration used by OVS-DPDK had it acting as the server and QEMU* acting as the client. Figure 1 shows this configuration.

Figure 1.Typical Open vSwitch* with the Data Plane Development Kit (DPDK) configuration using OVS-DPDK vHost server mode and QEMU* using vHost client mode. The direction of the arrow indicates the client connecting to the server.

With this default configuration, if OVS-DPDK was reset the sockets would be destroyed, and when relaunched, connectivity could not be re-established with the VM. With client mode, OVS-DPDK acts as the vHost client and QEMU acts as the server. In this configuration, when the switch crashes, the sockets still exist as they are managed by QEMU. This allows OVS-DPDK to relaunch and reconnect to these sockets and thus resume normal connectivity. Figure 2 shows this configuration.

Figure 2.Open vSwitch* with the Data Plane Development Kit (DPDK) configuration using OVS-DPDK vHost client mode and QEMU* using vHost server mode. The direction of the arrow indicates the client connecting to the server.

As seen in Table 1, OVS-DPDK supports this feature by offering two different types of vHost User ports. The first, dpdkvhostuser operates in the standard server mode. The second, dpdkvhostuserclient operates in client mode as the name suggests.

Table 1. Types of vHost User ports offered by Open vSwitch* with the Data Plane Development Kit (DPDK) and their respective modes

Port NameUses vHost modeRequires QEMU mode
dpdkvhostuserServerClient
dpdkvhostuserclientClientServer

The ability to reconnect is a very useful feature; it reduces the impact of catastrophic failure of the switch as the VMs connected to the switch do not need to be rebooted upon switch failure. It also makes other tasks more lightweight; for example, maintenance tasks such as updating the switch requires rebooting the switch, but no longer requires the VMs connected to the switch to be rebooted as well.

Test Environment

The following describes how to set up an OVS-DPDK configuration with one physical dpdk port and one vHost User dpdkvhostuserclient port, and QEMU in vHost server mode. Next, the steps to demonstrate and reconnect capability are described.

Figure 3 shows the test environment configuration.

Figure 3. Open vSwitch* with the Data Plane Development Kit (DPDK) configuration using OVS-DPDK vHost client mode and QEMU* using vHost server mode. The direction of the arrows denotes the flow of traffic. The tcpdump tool is being used on the guest to monitor incoming traffic on the eth0 interface.

Table 2 shows the hardware and software components used for this setup:

Table 2. Hardware and software components

ProcessorIntel® Xeon® processor E5-2695 v3 @ 2.30 GHz
Kernel4.2.8-200
OSFedora* 22
QEMU*v2.7.0
Data Plane Development Kitv16.07
Open vSwitch*62f0430e903ad29bdde17bd8e8aa814198fac890

Configuration Steps

This section describes how to build OVS with DPDK as described in the installation docs.

Configure the switch as described in the Test Environment section, with one physical dpdk port and one vHost User dpdkvhostuserclient port. Add a flow that directs traffic from the physical port (1) to the vHost User port (2):

ovs-vsctl add-br br0

ovs-vsctl set Bridge br0 datapath_type=netdev

ovs-vsctl add-port br0 dpdk0

ovs-vsctl set Interface dpdk0 type=dpdk

ovs-vsctl add-port br0 vhost0

ovs-vsctl set Interface vhost0 type=dpdkvhostuserclient

ovs-ofctl add-flow br0 in_port=1,action=output:2

Set the location of the socket:

ovs-vsctl set Interface vhost0 options:vhost-server-path="/tmp/sock0"

Logs similar to the following should be printed:

VHOST_CONFIG: vhost-user client: socket created, fd: 28

VHOST_CONFIG: failed to connect to /tmp/sock0: No such file or directory

VHOST_CONFIG: /tmp/sock0: reconnecting...

Launch QEMU in server-mode:

qemu-system-x86_64 -cpu host -enable-kvm -m 4096M -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/hugepages,share=on -numa node,memdev=mem -mem-prealloc  -drive file=/images/image.qcow2 -chardev socket,id=char0,path=/tmp/sock0,server -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off –nographic

Logs similar to the following should be printed:

QEMU waiting for connection on: disconnected:unix:/tmp/sock0,server

VHOST_CONFIG: /tmp/sock0: connected</p>

The important part of the command above is to include “,server” as part of the path argument of the chardev configuration.

Once the VM has booted successfully, test the connection between dpdk0 and vhost0. Run the tcpdump tool on the vHost interface to monitor incoming traffic. Send traffic to dpdk0 (for example, using a traffic generator); it should be switched to vhost0 and captured by the tool:

[root@localhost ~]# tcpdump -i eth0

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

09:57:50.276877 IP 2.2.2.2.0 > 1.1.1.1.0: Flags [], seq 0:6, win 0, length 6

09:57:50.779559 IP 2.2.2.2.0 > 1.1.1.1.0: Flags [], seq 0:6, win 0, length 6

To test client mode reconnection, simply reset the switch and verify connectivity is re-established by continually monitoring the tcpdump instance, and verify traffic is once again switched to the VM after a brief loss of connectivity during the reset.

This is just one of many ways the reconnect capability can be tested. For instance, it can also be tested in the reverse direction by generating traffic (for example, a ping in the VM), and verifying it reaches the physical port. The DPDK pdump tool is a useful way to monitor traffic on physical ports in OVS. Instructions for configuring and using pdump with OVS can be found in the OVS-DPDK documentation mentioned earlier in this article, or in the article DPDK Pdump in Open vSwitch* with DPDK.

Conclusion

In this article, we described and showed how OVS-DPDK vHost User ports can be configured in client mode, allowing reestablishment of connectivity upon a reset of the switch. We have demonstrated one method of how to test this feature and suggested another.

Additional Information

For more details on the DPDK vHost library, refer to the DPDK documentation.

For more information on configuring vHost User in Open vSwitch, refer to INSTALL.DPDK.

Have a question? Feel free to follow up with the query on the Open vSwitch discussion mailing thread.

To learn more about OVS with DPDK, check out the following videos and articles on Intel® Developer Zone and Intel® Network Builders University.

QoS Configuration and usage for Open vSwitch* with DPDK

Open vSwitch with DPDK Architectural Deep Dive

DPDK Open vSwitch: Accelerating the Path to the Guest

DPDK Pdump in Open vSwitch* with DPDK

vHost User NUMA Awareness in Open vSwitch* with DPDK

About the Author

Ciara Loftus is a network software engineer with Intel. Her work is primarily focused on accelerated software switching solutions in user space running on Intel® architecture. Her contributions to OVS with DPDK include the addition of vHost User ports, vHost User client ports, NUMA-aware vHost User, and DPDK v16.07 support.


Intel® Software Guard Extensions Tutorial Series: Part 6, Dual Code Paths

$
0
0

In Part 6 of the Intel® Software Guard Extensions (Intel® SGX) tutorial series, we set aside the enclave to address an outstanding design requirement that was laid out in Part 2, Application Design: provide support for dual code paths. We want to make sure our Tutorial Password Manager will function on hosts both with and without Intel SGX capability. Much of the content in this part comes from the article, Properly Detecting Intel® Software Guard Extensions in Your Applications.

You can find the list of all of the published tutorials in the article Introducing the Intel® Software Guard Extensions Tutorial Series.

There is source code provided with this installment of the series.

All Intel® Software Guard Extensions Applications Need Dual Code Paths

First it’s important to point out that all Intel SGX applications must have dual code paths. Even if an application is written so that it should only execute if Intel SGX is available and enabled, a fallback code path must exist so that you can present a meaningful error message to the user and then exit gracefully.

In short, an application should never crash or fail to launch solely because the platform does not support Intel SGX.

Scoping the Problem

In Part 5 of the series we completed our first version of our application enclave and tested it by hardcoding the enclave support to be on. That was done by setting the _supports_sgx flag in PasswordCoreNative.cpp.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 1;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

Obviously, we can’t leave this on by default. The convention for feature detection is that features are off by default and turned on if they are detected. So our first step is to undo this change and set the flag back to 0, effectively disabling the Intel SGX code path.

PasswordManagerCoreNative::PasswordManagerCoreNative(void)
{
	_supports_sgx= 0;
	adsize= 0;
	accountdata= NULL;
	timer = NULL;
}

However, before we get into the feature detection procedure, we’ll give the console application that runs our test suite, CLI Test App, a quick functional test by executing it on an older system that does not have the Intel SGX feature. With this flag set to zero, the application will not choose the Intel SGX code path and thus should run normally.

Here’s the output from a 4th generation Intel® Core™ i7 processor-based laptop, running Microsoft Windows* 8.1, 64-bit. This system does not support Intel SGX.

CLI Test App

What Happened?

Clearly we have a problem even when the Intel SGX code path is explicitly disabled in the software. This application, as written, cannot execute on a system without Intel SGX support. It didn’t even start executing. So what’s going on?

The clue in this case comes from the error message in the console window:

System.IO.FileNotFoundException: Could not load file or assembly ‘PasswordManagerCore.dll’ or one of its dependencies. The specified file could not be found.

Let’s take a look at PasswordManagerCore.dll and its dependencies:

Additional Dependencies

In addition to the core OS libraries, we have dependencies on bcrypt.lib and EnclaveBridge.lib, which will require bcrypt.dll and EnclaveBridge.dll at runtime. Since bcrypt.dll comes from Microsoft and is included in the OS, we can reasonably assume its dependencies, if any, are already installed. That leaves EnclaveBridge.dll.

Examining its dependencies, we see the following:

Additional Dependencies

This is the problem. Even though we have the Intel SGX code path explicitly disabled, EnclaveBridge.dll still has references to the Intel SGX runtime libraries. All symbols in an object module must be resolved as soon as it is loaded. It doesn’t matter if we disable the Intel SGX code path: undefined symbols are still present in the DLL. When PasswordManagerCore.dll loads, it resolves its undefined symbols by loading bcrypt.dll and EnclaveBridge.dll, the latter of which, in turn, attempts to resolve its undefined symbols by loading sgx_urts.dll and sgx_uae_service.dll. The system we tried to run our command-line test application on does not have these libraries, and since the OS can’t resolve all of the symbols it throws an exception and the program crashes before it even starts.

These two DLLs are part of the Intel SGX Platform Software (PSW) package, and without them Intel SGX applications written using the Intel SGX Software Development Kit (SDK) cannot execute. Our application needs to be able to run even if these libraries are not present.

The Platform Software Package

As mentioned above, the runtime libraries are part of the PSW. In addition to these support libraries, the PSW includes:

  • Services that support and maintain the trusted compute block (TCB) on the system
  • Services that perform and manage certain Intel SGX operations such as attestation
  • Interfaces to platform services such as trusted time and the monotonic counters

The PSW must be installed by the application installer when deploying an Intel SGX application, because Intel does not offer the PSW for direct download by end users. Software vendors must not assume that it will already be present and installed on the destination system. In fact, the license agreement for Intel SGX specifically states that licensees must re-distribute the PSW with their applications.

We’ll discuss the PSW installer in more detail in a future installment of the series covering packaging and deployment.

Detecting Intel Software Guard Extensions Support

So far we’ve focused on the problem of just starting our application on systems without Intel SGX support, and more specifically, without the PSW. The next step is to detect whether or not Intel SGX support is present and enabled once the application is running.

Intel SGX feature detection is, unfortunately, a complicated procedure. For a system to be Intel SGX capable, four conditions must be met:

  1. The CPU must support Intel SGX.
  2. The BIOS must support Intel SGX.
  3. In the BIOS, Intel SGX must be explicitly enabled or set to the “software controlled” state.
  4. The PSW must be installed on the platform.

Note that the CPUID instruction, alone, is not sufficient to detect the usability of Intel SGX on a platform. It can tell you whether or not the CPU supports the feature, but it doesn’t know anything about the BIOS configuration or the software that is installed on a system. Relying solely on the CPUID results to make decisions about Intel SGX support can potentially lead to a runtime fault.

To make feature detection even more difficult, examining the state of the BIOS is not a trivial task and is  generally not possible from a user process. Fortunately the Intel SGX SDK provides a simple solution: the function sgx_enable_device will both check for Intel SGX capability and attempt to enable it if the BIOS is set to the software control state (the purpose of the software control setting is to allow applications to enable Intel SGX without requiring users to reboot their systems and enter their BIOS setup screens, a particularly daunting and intimidating task for non-technical users).

The problem with sgx_enable_device, though, is that it is part of the Intel SGX runtime, which means the PSW must be installed on the system in order to use it. So before we attempt to call sgx_enable_device, we must first detect whether or not the PSW is present.

Implementation

With our problem scoped out, we can now lay out the steps that must be followed, in order, for our dual-code path application to function properly. Our application must:

  1. Load and begin executing even without the Intel SGX runtime libraries.
  2. Determine whether or not the PSW package is installed.
  3. Determine whether or not Intel SGX is enabled (and attempt to enable it).

Loading and Executing without the Intel Software Guard Extensions Runtime

Our main application depends on PasswordManagerCore.dll, which depends on EnclaveBridge.dll, which in turn depends on the Intel SGX runtime. Since all symbols need to be resolved when an application loads, we need a way to prevent the loader from trying to resolve symbols that come from the Intel SGX runtime libraries. There are two options:

Option #1: Dynamic Loading      

In dynamic loading, you don’t explicitly link the library in the project. Instead you use system calls to load the library at runtime and then look up the names of each function you plan to use in order to get the addresses of where they have been placed in memory. Functions in the library are then invoked indirectly via function pointers.

Dynamic loading is a hassle. Even if you only need a handful of functions, it can be a tedious process to prototype function pointers for every function that is needed and get their load address, one at a time. You also lose some of the benefits provided by the integrated development environment (such as prototype assistance) since you are no longer explicitly calling functions by name.

Dynamic loading is typically used in extensible application architectures (for example, plug-ins).

Option #2: Delayed-Loaded DLLs

In this approach, you dynamically link all your libraries in the project, but instruct Windows to do delayed loading of the problem DLL. When a DLL is delay-loaded, Windows does not attempt to resolve symbols that are defined by that DLL when the application starts. Instead it waits until the program makes its first call to a function that is defined in that DLL, at which point the DLL is loaded and the symbols get resolved (along with any of its dependencies). What this means is that a DLL is not loaded until the application needs it. A beneficial side effect of this approach is that it allows applications to reference a DLL that is not installed, so long as no functions in that DLL are ever called.

When the Intel SGX feature flag is off, that is exactly the situation we are in so we will go with option #2.

You specify the DLL to be delay-loaded in the project configuration for the dependent application or DLL. For the Tutorial Password Manager, the best DLL to mark for delayed loading is EnclaveBridge.dll as we only call this DLL if the Intel SGX path is enabled. If this DLL doesn’t load, neither will the two Intel SGX runtime DLLS.

We set the option in the Linker -> Input page of the PasswordManagerCore.dll project configuration:

Password Manager

After the DLL is rebuilt and installed on our 4th generation Intel Core processor system, the console test application works as expected.

CLI Test App

Detecting the Platform Software Package

Before we can call the sgx_enable_device function to check for Intel SGX support on the platform, we first have to make sure that the PSW package is installed because sgx_enable_device is part of the Intel SGX runtime. The best way to do this is to actually try to load the runtime libraries.

We know from the previous step that we can’t just dynamically link them because that will cause an exception when we attempt to run the program on a system that does not support Intel SGX (or have the PSW package installed). But we also can’t rely on delay-loaded DLLs either: delayed loading can’t tell us if a library is installed because if it isn’t, the application will still crash! That means we must use dynamic loading to test for the presence of the runtime libraries.

The PSW runtime libraries should be installed in the Windows system directory so we’ll use GetSystemDirectory to get that path, and limit the DLL search path via a call to SetDllDirectory. Finally, the two libraries will be loaded using LoadLibrary. If either of these calls fail, we know the PSW is not installed and that the main application should not attempt to run the Intel SGX code path.

Detecting and Enabling Intel Software Guard Extensions

Since the previous step dynamically loads the PSW runtime libraries, we can just look up the symbol for sgx_enable_device manually and then invoke it via a function pointer. The result will tell us whether or not Intel SGX is enabled.

Implementation

To implement this in the Tutorial Password Manager we’ll create a new DLL called FeatureSupport.dll. We can safely dynamically link this DLL from the main application since it has no explicit dependencies on other DLLs.

Our feature detection will be rolled into a C++/CLI class called FeatureSupport, which will also include some high-level functions for getting more information about the state of Intel SGX. In rare cases, enabling Intel SGX via software may require a reboot, and in rarer cases the software enable action fails and the user may be forced to enable it explicitly in their BIOS.

The class declaration for FeatureSupport is shown below.

typedef sgx_status_t(SGXAPI *fp_sgx_enable_device_t)(sgx_device_status_t *);


public ref class FeatureSupport {
private:
	UINT sgx_support;
	HINSTANCE h_urts, h_service;

	// Function pointers

	fp_sgx_enable_device_t fp_sgx_enable_device;

	int is_psw_installed(void);
	void check_sgx_support(void);
	void load_functions(void);

public:
	FeatureSupport();
	~FeatureSupport();

	UINT get_sgx_support(void);
	int is_enabled(void);
	int is_supported(void);
	int reboot_required(void);
	int bios_enable_required(void);

	// Wrappers around SGX functions

	sgx_status_t enable_device(sgx_device_status_t *device_status);

};

Here are the low-level routines that check for the PSW package and attempt to detect and enable Intel SGX.

int FeatureSupport::is_psw_installed()
{
	_TCHAR *systemdir;
	UINT rv, sz;

	// Get the system directory path. Start by finding out how much space we need
	// to hold it.

	sz = GetSystemDirectory(NULL, 0);
	if (sz == 0) return 0;

	systemdir = new _TCHAR[sz + 1];
	rv = GetSystemDirectory(systemdir, sz);
	if (rv == 0 || rv > sz) return 0;

	// Set our DLL search path to just the System directory so we don't accidentally
	// load the DLLs from an untrusted path.

	if (SetDllDirectory(systemdir) == 0) {
		delete systemdir;
		return 0;
	}

	delete systemdir; // No longer need this

	// Need to be able to load both of these DLLs from the System directory.

	if ((h_service = LoadLibrary(_T("sgx_uae_service.dll"))) == NULL) {
		return 0;
	}

	if ((h_urts = LoadLibrary(_T("sgx_urts.dll"))) == NULL) {
		FreeLibrary(h_service);
		h_service = NULL;
		return 0;
	}

	load_functions();

	return 1;
}

void FeatureSupport::check_sgx_support()
{
	sgx_device_status_t sgx_device_status;

	if (sgx_support != SGX_SUPPORT_UNKNOWN) return;

	sgx_support = SGX_SUPPORT_NO;

	// Check for the PSW

	if (!is_psw_installed()) return;

	sgx_support = SGX_SUPPORT_YES;

	// Try to enable SGX

	if (this->enable_device(&sgx_device_status) != SGX_SUCCESS) return;

	// If SGX isn't enabled yet, perform the software opt-in/enable.

	if (sgx_device_status != SGX_ENABLED) {
		switch (sgx_device_status) {
		case SGX_DISABLED_REBOOT_REQUIRED:
			// A reboot is required.
			sgx_support |= SGX_SUPPORT_REBOOT_REQUIRED;
			break;
		case SGX_DISABLED_LEGACY_OS:
			// BIOS enabling is required
			sgx_support |= SGX_SUPPORT_ENABLE_REQUIRED;
			break;
		}

		return;
	}

	sgx_support |= SGX_SUPPORT_ENABLED;
}

void FeatureSupport::load_functions()
{
	fp_sgx_enable_device = (fp_sgx_enable_device_t)GetProcAddress(h_service, "sgx_enable_device");
}

// Wrappers around SDK functions so the user doesn't have to mess with dynamic loading by hand.

sgx_status_t FeatureSupport::enable_device(sgx_device_status_t *device_status)
{
	check_sgx_support();

	if (fp_sgx_enable_device == NULL) {
		return SGX_ERROR_UNEXPECTED;
	}

	return fp_sgx_enable_device(device_status);
}

Wrapping Up

With these code changes, we have integrated Intel SGX feature detection into our application! It will execute smoothly on systems both with and without Intel SGX support and choose the appropriate code branch.

As mentioned in the introduction, there is sample code provided with this part for you to download. The attached archive includes the source code for the Tutorial Password Manager core, including the new feature detection DLL. Additionally, we have added a new GUI-based test program that automatically selects the Intel SGX code path, but lets you disable it if desired (this option is only available if Intel SGX is supported on the system).

SGX Code Branch

The console-based test program has also been updated to detect Intel SGX, though it cannot be configured to turn it off without modifying the source code.

Coming Up Next

We’ll revisit the enclave in Part 7 in order to fine-tune the interface. Stay tuned!

[Infographic] Artificial Intelligence - The Next Big Revolution in Computing

Amazing Video Experiences Enable Game-changing Sports Calls

$
0
0

sportsfieldAs entertainment, sports, fashion, science, food and more info-junkies, people are fast using video as a quick, easy way to satisfy their needs to stay informed and connected about the things they care about. Whether viewing via the internet, TV or mobile devices—video is a part of everyday life. And how do you make it an excellent experience for millions of viewers?  

Video solution providers are vying in this space to deliver real-time, reliable content available everywhere, and at high-quality with brilliant colors and immersive experiences—all with an efficiency that allows room for profits, innovation, growth, and more reach. And sports is the perfect place to show this evidence.

Open the hood to see how it’s done, and you’ll see it’s all driven by computing—from data centers to encoders/decoders, to edge devices. This is where Intel® Xeon® processors with built-in media accelerators best fit the bill for performance, coupled with Intel’s media software tools to speed video transcoding, deliver efficient/high-density streaming, and help video solution vendors build competitive features for their products and services. Video acceleration is fast growing with game-changing results in the world of sports—below is a key example of just how.

Innovation by Design, Technology provides Video Replay Advantage

We all remember viscerally that one play where the referee got the game-changing call wrong and our favorite team lost the competition. It’s human error. It happens. But it doesn’t need to anymore. Now, new solutions with the latest Intel technologies can help resolve challenging referee decision moments with fast, high-quality specialized video replay systems. 

slomotvSlomo.tv, a producer of instant replay servers, innovated a family of videoReferee* systems that provide instant high-quality video replays from up to 18 cameras direct to referee viewing systems—allowing it to join the sports competition sidelines.

Built with Intel® Xeon® processors (E3-1500 v5) for extreme processing power, and optimized by Intel® Media Server Studio software for fast, high-quality video transcoding helped the company transform its solutions, which are being used around the world.

In addition to the server and video cameras placed around the sports play area, the system includes a monitor and an easy-to-use control panel that any referee can learn in less than an hour. Video can be reviewed in Quad mode simultaneously from 4 cameras at different angles, in slow motion, or using zoom function for objective, error-free gameplay analysis.

Helping Referees across a Diversity of Sports

Sports organizations like Kontinental Hockey League; basketball leagues in Korea, Russia, and Lithuania; and even canoe racing at the Rio Olympics in 2016 took notice of videoReferee—using it to help get their sports judging right. The Fédération Internationale de Football Association (FIFA) is also testing similar systems for possible use in its worldwide football (known as soccer in some parts of the world) competitions.

What makes slomo.tv’s solution different from those offered by other video solution providers and manufacturers, is how it manages video compression. By using Intel’s Media Server Studio (a tool suite that provides the Intel® Media SDK, runtimes, and advanced analysis tools for media, video and broadcast application development), to optimize, accelerate and compress its video streams, the company sees flexibility and an efficiency advantage over traditional hardware-only designs to manage that function.

Slomo.tv CTO Igor Vitiorets says, “Without the Intel® Media SDK <in Intel Media Server Studio>, we could not have created our innovative video replay and server products now in use worldwide, as it was the cornerstone for our software development and innovation.”

Today, instant replays are in the center of sports. What could be ahead? Providing fans and sports consumers a more in-depth, insightful view of key support sports decision-making process than available today, or even more immersive virtual reality views on-the-spot.

VR video

 

Delivering Immersive Video Experiences with Real-time 4K HEVC Streaming

See how Wowza, Rivet VR and Intel created a 360-degree view of a live concert delivered via 4K streaming made possible by Intel technologies. 
Video | Article

 


Learn More

  • intel.com/visualcloud>
     
  • Media developers: Try out the Intel Media Server Studio free Community Edition: makebettercode/mediaserverstudio>
     
  • See slomo.tv sports video judging. And check out another innovative slomo.tv server, RED ARROW, which can simultaneously provide 4 channels recording, 4 channels search and 2 channels playback with six 4K physical video ports—all in 4K 50/60p built with Intel Xeon processors and optimized by Intel Media Server Studio.

Xeon-MSS-slomotv

 

 
 

OVS-DPDK Datapath Classifier

$
0
0

Overview

This article describes the design and implementation of the datapath classifier – aka dpcls – in Open vSwitch* (OVS) with Data Plane Development Kit (DPDK). It presents flow classification and the caching techniques, and also provides greater insight on functional details with different scenarios.

Virtual switches are drastically different from traditional hardware switches. Hardware switches employ ternary content-addressable memory (TCAM) for high-speed packet classification and forwarding. Virtual switches, on other hand, use multiple levels of caches and flow caching to achieve higher forwarding rates. OVS is an OpenFlow-compliant virtual switch that can handle advanced network virtualization use cases, resulting in longer packet-processing pipelines with reduced hypervisor resources. To achieve higher forwarding rates, OVS stores the active flows in caches.

OVS user space datapath uses DPDK for fastpath processing. DPDK includes poll mode drivers (PMDs) to achieve line-rate packet processing by bypassing the kernel stack for receive and send. OVS-DPDK has three-tier look-up tables/caches. The first-level table works as an Exact Match Cache (EMC) and as the name suggests, the EMC can’t implement a wildcard matching. The dpcls is the second-level table and works as a tuple space search (TSS) classifier. Despite the fact that it is implemented by hash tables (i.e., subtables), it works as a wildcard matching table. Implementation details with examples are explained in the following sections. The third-level table is the ofproto classifier table and its content is managed by an OpenFlow-compliant controller. Figure 1 depicts the three-tier cache architecture in OVS-DPDK.


Figure 1. OVS three-tier cache architecture

An incoming packet traverses multiple tables until a match is found. In the case of a match, the appropriate forwarding action is executed. In real-world virtual switch deployments handling a few thousand flows, EMC is quickly saturated. Most of the packets will then be matched against the dpcls, so its performance becomes a critical aspect for the overall switching performance.

Classifier Deep Dive

A classifier is a collection of rules or policies, and packet classification is about categorizing a packet based on a set of rules. The OVS second-level table uses a TSS classifier for packet classification, which consists of one hash table (tuple) for each kind of match in use. In a TSS classifier implementation, a tuple is defined for each combination of field length, and the resulting set is called a “tuple space.” Since each tuple has a known set of bits in each field, by concatenating these bits in order, a hash key can be derived, which can then be used to map filters of that tuple into a hash table.

A TSS classifier with some flow matching packet IP fields (e.g., Src IP and Destination IP only) is represented as one hash table (tuple). If the controller inserts a new flow with a different match (e.g., Src and Dst MAC address), it will be represented as a second hash table (tuple). Searching a TSS classifier requires searching each tuple until a match is found. While the lookup complexity of TSS is far from optimal, it still is efficient compared to decision tree classification algorithms. In decision tree algorithms, each internal node represents a decision, which has a binary outcome. The algorithm starts by performing the test at the root of the decision tree and based on the outcome, the program branches to one of the children and continues until a leaf node is reached and the output is produced. The worst-case complexity is therefore the height of the decision tree. In case of TSS classifier with “N” subtables, the worst case complexity is O(N) and much of the overhead is in hash computation. Though the TSS classifier has high lookup complexity, it still fares better than decision trees in below ways.

Tuple Space Search vs. Decision Tree Classification

  1. With a few hundred-thousand active parallel flows, the controller may add and remove new flows often. This will be inefficient with decision trees as node insertion – and most of all deletion – are costly operations that could consume significant CPU cycles. Instead, hash tables require many fewer CPU cycles for both insertions and deletions.
  2. TSS has O(N) memory and time complexity. In worst-case scenarios, TSS may make memory accesses equal to the number of hash tables and the number of hash tables can be as many as the number of rules in the database, and is still better than decision trees.
  3. TSS generalizes to an arbitrary number of packet header fields.

With a few dozen hash tables around, the classifier lookup means all the subtables should be traversed until a match is found. The flow cache entries in hash tables are unique and are non-overlapping. Hence, the first match is the only match and the search operation can be terminated on a match. The order of subtables in the classifier is random and the tables are created and destroyed at runtime. Figure 2 depicts the packet flow through dpcls with multiple hash tables/subtables.


Figure 2. Packet handling by dpcls on an EMC miss

dpcls Optimization Using Subtables Ranking

In OVS 2.5, using the long-term support (LTS) branch, a classifier instance is created per PMD thread. For each lookup, all the active subtables should be traversed until a match is found. On an average, a dpcls lookup has to visit N/2 subtables for a hit, with “N” being the number of active subtables. Though a single hash table lookup is inherently fast, a performance degradation is observed because of the expensive hash computation before each hash table lookup.

To address this issue, OVS 2.6 implements a ranking of the subtables based on the count of hits. Moreover, a dpcls instance is created per ingress port. This comes from the consideration that in practice there is a strong correlation between traffic coming from an ingress port and one or a small subset of subtables that have hits. The purpose of the ranking is to sort the subtables so that the most-hit subtables will be prioritized and ranked higher. Therefore, this allows the performance to improve by reducing the average number of subtables that need to be searched in a lookup.

Figure 3 depicts the multiple dpcls instance creation for the corresponding ingress ports. In this case, there are three ingress ports of which two are physical ports (DPDK_1, DPDK_2) and one vHost-user port for VM_1. DPDK_1 dpcls, DPDK_2 dpcls and VM_1 dpcls instances are created for the DPDK_1, DPDK_2 and VM_1 vHost-user port respectively. Each dpcls will manage the packets coming from its corresponding port. For example, the packets from vHost-user port of VM_1  are processed by PMD thread 1. A hash key is computed from header fields of the ingress packet and a lookup is done on the first-level cache EMC 1 using the hash key. On a miss, the packet is handled by VM_1 dpcls. 


Figure 3. OVS-DPDK classifier in OVS 2.6 with dpcls instance per ingress port

How Hash Tables are Used for Wildcard Matching

Here we will discuss how hash tables are used to implement wildcard matching.  

Let’s assume the controller installs a flow where the rule – referred as Rule #1 from here on – is for example:

Rule #1: Src IP = “21.2.10.*”

so the match must occur on the first 24 bits, and the remaining 8 bits are wildcarded. This rule could be installed by an OVS command, such as:

$ ovs-ofctl add-flow br0 dl_type=0x0800,nw_src=21.2.10.1/24,actions=output:2

With the above rule inserted, packets with Src IP like “21.2.10.5” or “21.2.10.123” shall match the rule.

When a packet like “21.2.10.5” is received there will be a miss on both EMC and dpcls (see Figure 1). A matching flow will be found in the ofproto classifier. A learning mechanism will cache the found flow into dpcls and EMC. Below is a description of the details of this use case with a focus on dpcls internals only.

Store a Rule into a Subtable

As a prior step to store the wildcarded Rule #1, we create a proper “Mask #1” first. That will be done by considering the wildcarded bits of the Rule #1. Each bit of the mask will be set to 1 when a match is required on that bit position; otherwise, it will be 0. So in this case, Mask #1 will be “0xFF.FF.FF.00.”  

A new hash-table “HT 1” will then be instantiated as a new subtable.

The mask is applied to the rule (see Figure 4) and the resulting value is given as an input to the hash function. The hash output will point to the subtable location where Rule #1 could be stored (we’re not considering collisions here, that’s outside the scope of this document).


Figure 4. Insert Rule #1 in to HT 1

HT 1 will collect Rule #1 and any other “similar” rule (i.e., with the same fields and the same wildcard mask). For example it could store a further rule, like:

Rule #1A: Src IP = “83.83.83.*”

because this rule specifies Src IP – and no other field – and its mask is equal to Mask #1.

Please note that in case we want to insert a new rule with different fields and/or a different mask, we will need to create a new subtable.

Lookup for Packet #1

Packet #1 with Src IP = 21.2.10.99 arrived for processing. It will be searched on the unique existing hash table HT 1 (see Figure 5).

Mask #1 of the corresponding HT 1 is applied on the Src IP address field and hash is computed thereafter to find a matching entry in the subtable. In this case the matching entry is found => hit.


Figure 5. Lookup on HT 1 for ingress Packet #1

Insert a Second Rule

Now let’s assume the controller adds a second wildcard rule – for simplicity, still on the source IP address – with netmask 16, referred as Rule #2 from here on.

Rule #2: Src IP = “7.2.*.*”

$ ovs-ofctl add-flow br0 dl_type=0x0800,nw_src=7.2.21.15/26,actions=output:2

With the above inserted rule, packets with Src IP like “7.2.15.5” or “7.2.110.3” would match the rule.

Based on the wildcarded bits in the last 2 bytes, a proper “Mask #2” is created, in this case: “0xFF.FF.00.00.” (see Figure 6)

Note that HT 1 can store rules only with netmask “0xFF.FF.00.00.” That means that a new subtable HT 2 must be created to store Rule #2.

Mask #2 is applied to Rule #2 and the result is an input for the hash computation. The resulting value will point to a HT 2 location where Rule #2 will be stored.


Figure 6. Insert Rule #2 in to HT 2

Lookup for Packet #2

Packet #2 with Src IP = 7.2.45.67 arrived for processing.

As we have 2 active subtables (in this case, HT 1 and HT 2), a lookup shall be performed by repeating the search on each hash table.

We start by searching into HT 1 where the mask is “0xFF.FF.FF.00.” (see Figure 7).

The corresponding table’s mask will be applied to the Packet #2 and a hash key will then be computed to retrieve a subtable entry from HT 1.


Figure 7. Lookup on HT 1 for ingress Packet #2

The outcome is a miss, as we find no entry into HT 1.

We continue our search (see Figure 8) on the next subtable –HT 2 – and proceed in a similar manner.

Now we will use the HT 2 mask, which is “0xFF.FF.00.00.”


Figure 8. Lookup on HT 2 for ingress Packet #2

The outcome is successful because we find an entry that matches the Packet #2.

A Scenario with Multiple Input Ports

This use case demonstrates the classifier behavior on an Intel® Ethernet Controller X710 NIC card which features four input 10G ports. On each port a dedicated PMD thread will process the incoming traffic. For simplicity Figure 9 shows just pmd60 and pmd63 as the PMD threads for port0 and port3, respectively. Figure 9 shows the details:

  1. dpcls after processing two packets with Src IP address: [21.2.10.99] and [7.2.45.67].
  2. The EMC after processing the two packets: [21.2.10.99] and [7.2.45.67].
  3. pmd63 is processing packets from port 3. The figure shows the content of the tables after processing the packet with IP: [5.5.5.1]
  4. The new packet [21.2.10.2] will not find a match into EMC of pmd60; instead it will find a match into the Port 0 Classifier. Also, a new entry will be added into pmd60 EMC.
  5. The new packet [5.5.5.8] will get a miss on pmd63 EMC; however, it will find a match on the Port 3 Classifier. A new entry will then be added into pmd63 EMC.


Figure 9. dpcls with multiple PMD threads in OVS-DPDK

Conclusion

In this article, we have described the working of the user space classifier with different test cases and also demonstrated how different tables in OVS-DPDK are set up. We’ve also discussed the shortcomings of the classifier in the OVS 2.5 release and how this is improved in OVS 2.6. A follow-up blog on the OVS-DPDK classifier will discuss the code flow, classifier bottlenecks, and ways to improve the classifier performance on Intel® architecture.

For Additional Information

For any question, feel free to follow up with the query on the Open vSwitch discussion mailing thread.

Videos and Articles

To learn more about OVS with DPDK, check out the following videos and articles on Intel® Developer Zone and Intel® Network Builders University.

Open vSwitch with DPDK Architectural Deep Dive

DPDK Open vSwitch: Accelerating the Path to the Guest

The Design and Implementation of Open vSwitch

Tuple Space Search

To learn more about the Tuple Space Search:

V. Srinivasan, S. Suri, and G. Varghese. Packet Classification Using Tuple Space Search. In Proc. of SIGCOMM, 1999

About the Authors

Bhanuprakash Bodireddy is a Network Software Engineer with Intel. His work is primarily focused on accelerated software switching solutions in user space running on Intel® architecture. His contributions to OvS-DPDK include usability documentation, Keep-Alive feature, and improving the datapath Classifier performance.

Antonio Fischetti is a Network Software Engineer with Intel. His work is primarily focused on accelerated software switching solutions in user space running on Intel® architecture. His contributions to OVS with DPDK are mainly focused on improving the datapath Classifier performance.

DPDK/NFV DevLab Trip Report: July 11, 2016

$
0
0

The Intel® Developer Zone Data Plane Development Kit (DPDK) DevLab was designed to both improve platform knowledge and deepen the interest of new and current networking virtualization and DPDK developers. It was a full-day event at the Intel Santa Clara Campus, with hosted presentations, demos, and hands-on training for the developers in attendance.

There were ten presentations by experts from Intel and Berkeley, two hands-on sessions, three in-class demos, and four independent software vendor demos so participants could learn from architects and experts from industry, academia, and Intel.

This report contains videos and PowerPoint slides that capture the day’s presentations. You can use them to learn, review, and get involved in the DevLab. The two hands-on sessions at the DevLab are not posted here because the reader will not have access to the hardware set up at the lab, and hence would be of limited value. Updates from Intel® Network Builders University and DPDK open source community are included so you can refer to these resources outside of Intel Developer Zone for your learning.

Table of Contents

Software Defined Infrastructure/Network Function Virtualization/ONP Ingredients
DPDK Overview and Core APIs
DPDK API and Virtual Infrastructure
DPDK and Virtio
Open vSwitch* with DPDK: Architecture and Performance
BESS  ̶  A Virtual Switch Tailored for NFV
Intel® VTune™ and Performance Optimizations
DPDK Performance Benchmarking
DPDK Open Source Community Update
Intel Network Builders University

Software Defined Infrastructure/Network Function Virtualization/ONP Ingredients

This presentation starts with an overview of Software Defined infrastructure (SDI) and describes how Software Defined Networking (SDN) and Network Function Virtualization (NFV) come together to achieve a flexible and scalable independent software framework: OpenStack*. Intel® Open Network Platform (Intel® ONP), which is based on Open Platform for NFV (OPNFV), is introduced here. OPNFV aims to provide a reference architecture for NFV and SDN deployments in the real world. Presented by Sujata Tibrewala and Ashok Emani.

View slides

DPDK Overview and Core APIs

This presentation centers on DPDK design and how it is used. In addition to an overview of DPDK, Network Function Virtualization (NFV), Vector Packet Processing (VPP)/Fast Data I/O (FD.io), and the new Transport Layer Development Kit (TLDK) used in current deployments are discussed. Further, the presentation shows how DPDK is used within Virtual Network Function (VNF)/NFV systems to accelerate these cloud applications, including how DPDK is used to improve the performance of Cisco’s routing software. Presented by Keith Wiles.

View slides

DPDK API and Virtual Infrastructure

This presentation showcases DPDK API Virtualization Support and how it opens up multiple network interfaces that can be used to deliver packets from the physical Network Interface Card (NIC) to a VM/VNF in the NFV setup. In addition, the presentation lists various virtual devices with available Poll Mode Drivers in DPDK API, and delivers insights into how toproperly build your NFVi from the beginning. Presented by Rashmin Patel.

View slides

DPDK and Virtio

The presentation starts with an overview of Virtio and how it is used with DPDK and in a VNF/NFV cloud. A simple example of how to use Virtio APIs follows, and the presentation finishes with the design of VNF/NFV software with respect to how these layers combine into a cloud product. Presented by Keith Wiles.

View slides

Open vSwitch with DPDK: Architecture and Performance

This presentation covers Open vSwitch (OVS), a production-quality, multilayer virtual switch that supports SDN control semantics via the Open Flow protocol and its OVSDB management interface.  Performance is not sufficient to support Telco NFV use cases, so the presentation further shows how DPDK is integrated into native OVS to boost performance. The presentation specifically covers OVS multilevel table support, vhost multi-queue, and related features used with DPDK to achieve maximum performance. The presentation ends with benchmark results on OVS for the most common use cases. Presented by Irene Liew.

View slides

BESS - A Virtual Switch Tailored for NFV

This presentation discusses Berkeley Extensible Software Switch (BESS), an extensible platform for rapid development of software switches. BESS allows you to implement a fully customizable packet processing data path. In this session, we present some technical details of BESS and then demonstrate how to implement a custom virtual switch in just 30 minutes. Presented by Joshua Reich and Sangjin Han.

View BESS Intro slides

View BESS Walkthrough slides

Intel® VTune™ and Performance Optimizations

This presentation is a tutorial about performance optimization best practices and includes a demo with a link to a do-it-yourself cookbook called “Profiling DPDK Code with Intel® VTune™ Amplifier.” Role-playing sessions are used, with the audience acting as various building blocks of a CPU pipeline. It emphasizes the thought process for analysis of Non-Uniform Memory Access (NUMA) affinity, followed by a discussion of microarchitecture optimizations with VTune. The presentation concludes with information about how viewers can replicate the demo shown with VTune profiling DPDK micro benchmarks and identify hotspots in their own applications. Presented by Muthurajan Jayakumar (M Jay).

View slides

DPDK Performance Benchmarking

This presentation describes the standard process for performing high-throughput networking performance benchmarking tests using the DPDK Layer 3 forwarding (l3fwd) sample application workload. This includes hardware and software configurations for performance optimization and tuning. The session is also a tutorial on reading DPDK performance reports produced by the Intel® NPG PMA team posted on http://cat.intel.com (Note: NDA required) and for performing some essential platform performance tuning. Presented by Georgii Tkachuk.

View slides

DPDK Open Source Community Update

This presentation describes the history of the DPDK open source community. It describes the increasing level of multi-architecture support now available in DPDK, including information on the number of contributions and main contributors to DPDK releases. It further explains how new members can contribute and provides links to more information. Presented by Tim O’Driscoll.

View slides

Intel Network Builders University

This presentation gives an overview of the Intel Network Builders University. Network Builders University is an NFV/SDN Training Program for Network Builders Partners and end users. Presented by George Ranallo.

View slides

For More Information

For more information about topics in this article visit the Intel® Developer Zone's Networking site and Intel® Network Builders University. If you're in the San Francisco Bay area, check out the Out Of The Box Network Developers Meetup.

Viewing all 1201 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>