Quantcast
Channel: Intel Developer Zone Articles
Viewing all 1201 articles
Browse latest View live

Parallel STL: Parallel Algorithms in Standard Template Library

$
0
0

C++17 standards include enabling multi-threading and vectorization for STL algorithms. Intel® C++ Compiler 18.0 Beta and above supports Parallel STL. The beauty of STL is that the data storage (STL Containers) are abstracted from the operations performed on the data (STL algorithms) by a concept called STL iterators. Irrespective of which container a developer chooses for their application, most operations like parsing the container, sorting etc are common operations. For instance, let’s consider two different STL containers:

std::vector<int> a(N);
std::unordered_map<int> b(N);

One STL algorithm which can parse through the above two containers is:

std::for_each(a.begin(), a.end(), [&](auto &c){ std::cout<<c<<”\n”; });
std::for_each(b.begin(), b.end(), [&](auto &c){ std::cout<<c<<”\n”;

But the above STL algorithm is single threaded. But Modern processors are multi-core with SIMD units in each core. For more efficient use of the silicon, the operation done by the STL algorithm needs to be multi-threaded and vectorized. Parallel STL (PSTL) feature in C++17 standards provides with different execution policies which controls how the algorithm will run. The implementation of Parallel STL in Intel Compiler is under pstl namespace. The four execution policies which Intel Compiler implements for PSTL are:

Execution PolicyDescription
pstl::execution::seqSingle threaded, Scalar
pstl::execution::unseqSingle threaded, Vectorized
pstl::execution::parMulti-threaded, Scalar
pstl::execution::par_unseqMulti-threaded, Vectorized

To evaluate the Intel Compiler’s PSTL implementation, please refer to Getting Started Article. More information Parallel STL with simple examples are well explained in Intel Parallel Universe Magazine article.

Does traditional STL coexist with Parallel STL implementation?

Yes, they coexist. The traditional STL implementation is in std namespace while the Parallel STL implementation is in pstl namespace.

Does Intel Compiler’s PSTL implementation with Microsoft or GNU’s PSTL implementation?

Yes, they coexist. The Microsoft’s PSTL implementation is under std::experimental::parallel namespace and GNU’s PSTL implementation is under __gnu_parallel namespace.

Which threading model is used for Parallelism?

Intel Compiler’s PSTL implementation uses Intel Threading Building Blocks (Intel® TBB) runtime, GNU’s PSTL implementation uses OpenMP runtime and Microsoft’s PSTL implementation uses native threads.

PSTL Threading:

Consider a simple sorting example using stl::sort() to start with:

#include<iostream>
#include<algorithm>
#include<vector>
#include<chrono>
#define N 99999999
int main(){
        std::chrono::time_point<std::chrono::system_clock> timer_start, timer_stop;
        srand(time(NULL));
        std::vector<int> myvec1(N), myvec2(N);
        for(int i = 0; i < N; i++)
              myvec1[i] = myvec2[i] = rand();
        //Sorting the content of vector using STL algorithm
       timer_start = std::chrono::system_clock::now();
       std::sort(myvec1.begin(), myvec1.end(), [&](int j, int k){ return(j>k); });
       timer_stop = std::chrono::system_clock::now();
       std::chrono::duration<double> elapsed_seconds = timer_stop - timer_start;         std::cout<<"Standard STL: Time taken in seconds is "<<elapsed_seconds.count()<<" seconds \n";
       return 0;
}

Enabling multi-threading using Parallel STL:

#include<iostream>
#include<pstl/algorithm>
#define TBB_PREVIEW_GLOBAL_CONTROL 1
#include<tbb/global_control.h>
#include<vector>
#include<chrono>
#define N 99999999
int main(){
        tbb::global_control c(tbb::global_control::max_allowed_parallelism,2);
        std::chrono::time_point<std::chrono::system_clock> timer_start, timer_stop;
        srand(time(NULL));
        std::vector<int> myvec1(N), myvec2(N);
        for(int i = 0; i < N; i++)
              myvec1[i] = myvec2[i] = rand();
        //Sorting the content of vector using STL algorithm
       timer_start = std::chrono::system_clock::now();
       std::sort(pstl::execution::par, myvec1.begin(), myvec1.end(), [&](int j, int k){ return(j>k); });
       timer_stop = std::chrono::system_clock::now();
       std::chrono::duration<double> elapsed_seconds = timer_stop - timer_start;         std::cout<<"Standard STL: Time taken in seconds is "<<elapsed_seconds.count()<<" seconds \n";
       return 0;
}

Does just enabling multi-threading in STL algorithm work in every case?

Not necessarily. In the above case, the vectors are essentially broken down into mutually exclusive chunks and given to individual threads for sorting. The chances of 2 threads accessing the same location is 0. But that need not be the case with every algorithm. For instance, consider the scenario of Naïve Bayes Supervised Classification Learning Algorithm which is based on Bayes Theorem. The example involves training the program with a census dataset from UCI Machine Learning Repository. The dataset has 14 attributes for each person (like age, sex, capital gain etc.) and his/her annual salary tracked either has <=50K or >50K. Each attribute can have multiple values. The data structure used to hold the learnt model is:

std::vector<std::vector<std::unordered_map<std::string, int> > >

The outermost vector will be of size two, one for each annual salary category (<=50K and >50K). The inner vector holds the 14 attributes and for each attribute, the unordered_map stores the attribute value and number of occurrence as a <key, value> pair. The main computation happens in the below loop:

	for_each(dataset.begin(), dataset.end(), [&](std::string &s) {
		size_t start = 0, end = 0, ques = 0, index;
		char line[300];
		for (size_t num_of_cols = 0; num_of_cols < 15; num_of_cols++) {
			end = s.substr(start, s.length()).find(',');
			std::string newString = s.substr(start, end);
			ques = newString.find('?');
			if (ques != std::string::npos)
				continue;
			//Windowing for certain numeric fields with continuous values
			switch (num_of_cols) {
			case 0:
				if (newString.find("<=50K") != std::string::npos)
					index = 0;
				else
					index = 1;
				break;
			case 1:
				newString = std::move(string(itoa((atoi(newString.c_str()) / 10) * 10, line, 10)));
				break;
			case 3:
				newString = std::move(string(itoa((atoi(newString.c_str()) / 10000) * 10000, line, 10)));
				break;
			case 11:
				newString = std::move(string(itoa((atoi(newString.c_str()) / 1000) * 1000, line, 10)));
				break;
			case 12:
				newString = std::move(string(itoa((atoi(newString.c_str()) / 1000) * 1000, line, 10)));
				break;
			case 13:
				newString = std::move(string(itoa((atoi(newString.c_str()) / 10) * 10, line, 10)));
				break;
			default: break;
			}
			std::pair<std::unordered_map<std::string, int>::iterator, bool> p = (iter[index].begin())[num_of_cols].insert(std::pair<std::string, int>(newString, 1));			if (!p.second) {
				p.first->second++;
			}
			start = start + end + 1;
		}
		});

To enable parallelism using Parallel STL, try adding the execution policy pstl::execution::par for the above for_each algorithm as shown below:

for_each(pstl::execution::par, dataset.begin(), dataset.end(), [&](std::string &s) {
….
});

When executing this program in multi-threaded mode, it crashes. Debugging this program will clearly point to the insert() of STL unordered_map as shown below:

 

The insert() of STL unordered_map is not thread safe and thus when two threads try to concurrently insert values into the unordered_map, it errors out. Intel® TBB offers thread safe equivalent of STL unordered_map which is tbb::concurrent_unordered_map and it supports the same interfaces which STL container offers. Modifying the data structure in our program to replace unordered_map to concurrent_unordered_map as shown below:

From:

std::vector<std::vector<std::unordered_map<std::string, int> > >

To:

std::vector<std::vector<tbb::concurrent_unordered_map<std::string, int> > >

With the above modification, multiple threads can concurrently insert into the unordered_map and the program demonstrates 2x improvement in performance with 2 TBB threads. But checking the output file reveals that though the program ran successfully with performance, the frequency of occurrence of the individual attribute values are registered wrongly. This is because the operation of incrementing the frequency of occurrence of attribute value if this record already exist in the unordered_map is not atomic and results in a data race condition. This can be fixed by changing the data structure to accommodate the following change:

From:

std::vector<std::vector<tbb::concurrent_unordered_map<std::string, int> > >
.
.
std::pair<std::unordered_map<std::string, int>::iterator, bool> p = (iter[index].begin())[num_of_cols].insert(std::pair<std::string, int>(newString, 1));			if (!p.second) {
				p.first->second++;
			}

To:

std::vector<std::vector<tbb::concurrent_unordered_map<std::string, tbb::atomic<int> > > >
.
.
std::pair<std::unordered_map<std::string, int>::iterator, bool> p = (iter[index].begin())[num_of_cols].insert(std::pair<std::string, int>(newString, 1));			if (!p.second) {
				p.first->second.fetch_and_increment();
			}

By doing the following changes, the code runs faster with multiple threads without compromising the accuracy. The key learning from the above exercise is watch out for needs of concurrent containers and potential data race conditions.

PSTL Vectorization:

Consider the example of searching for an integer in a std::vector:

#include<vector>
#include<algorithm>
#include<iostream>
#include<chrono>
#include<stdlib.h>
#define N1 999999999
#ifdef PSTL
#include"pstl/algorithm"
#include"pstl/execution"
#endif
#ifdef GNU_PSTL
#include"parallel/algorithm"
#include<omp.h>
#endif
using namespace std;
std::vector<long long>::iterator mysearch(long long n1, std::vector<long long> &n2) {
#ifdef PSTL
        return find(pstl::execution::unseq, n2.begin(), n2.end(), n1);
#elif defined(GNU_PSTL)
        return __gnu_parallel::find(n2.begin(), n2.end(), n1);
#else
        return find(n2.begin(), n2.end(), n1);
#endif
}
int main(int argc, char *argv[]){
        long long num_to_search;
        if(argc < 2)
        {
                std::cout<<"Enter the number to searched as command line argument range [0 - 999999999]\n";
                return 0;
        }
        else
                num_to_search = atoi(argv[1]);
        static long long p;
        std::vector<long long> myvec(N1);
        std::chrono::time_point<std::chrono::system_clock> timer_start, timer_end;
        timer_start = std::chrono::system_clock::now();
        #ifdef PSTL
                generate(pstl::execution::unseq, myvec.begin(), myvec.end(), [&]() { return p++; });
        #elif defined(GNU_PSTL)
                omp_set_num_threads(1);
                __gnu_parallel::generate(myvec.begin(), myvec.end(), [&]() { return p++; });
        #else
                generate(myvec.begin(), myvec.end(), [&]() { return p++; });
        #endif
        timer_end = std::chrono::system_clock::now();
        std::chrono::duration<double> elapsed_seconds = timer_end - timer_start;
        std::cout<<"Time taken by generate algorithm is "<<elapsed_seconds.count()<<"\n";
        timer_start = std::chrono::system_clock::now();
        auto result = mysearch(num_to_search, myvec);
        timer_end = std::chrono::system_clock::now();
        elapsed_seconds = timer_end - timer_start;
        std::cout<<"Time taken by find algorithm is "<<elapsed_seconds.count()<<"\n";
        if(result != myvec.end())
                std::cout<<"Found the element "<<*result<<", p = "<<p<<"\n";
        else
                std::cout<<"Element not found, p = "<<p<<"\n";
        return 0;
}

Intel Compiler auto-vectorizes the code and the vectorized code performs better than GCC 5.1 generated binary. Please download the code samples attached, evaluate and compare the PSTL implementations provided by different compiler vendors. The PSTL specific code path and auto-vectorized code path will perform the same in this case. In general, Intel Compiler has a very good vectorization heuristics to identify different code patterns and vectorizes them when it is safe to do so. For instance, consider the histogram loop pattern as shown below:

#include<vector>
#include<algorithm>
#include<iostream>
#include<chrono>
#include<stdlib.h>
#define N1 9999999
#ifdef PSTL
#include"pstl/algorithm"
#endif
#ifdef GNU_PSTL
#include<parallel/algorithm>
#include<omp.h>
#endif
using namespace std;
int main(int argc, char *argv[]){
        std::vector<long long> hist(10);
        fill(hist.begin(), hist.end(), 0);
        std::vector<long long> myvec(N1);
        std::cout<<"---------------------\n";
        std::chrono::time_point<std::chrono::system_clock> timer_start, timer_end;
        timer_start = std::chrono::system_clock::now();
        #ifdef PSTL
                generate(pstl::execution::unseq, myvec.begin(), myvec.end(), std::rand);
        #elif defined(GNU_PSTL)
                omp_set_num_threads(1);
                __gnu_parallel::generate(myvec.begin(), myvec.end(), std::rand);
        #else
                generate(myvec.begin(), myvec.end(), std::rand);
        #endif
        timer_end = std::chrono::system_clock::now();
        std::chrono::duration<double> elapsed_seconds = timer_end - timer_start;
        std::cout<<"Time taken by generate algorithm is "<<elapsed_seconds.count()<<"\n";
        timer_start = std::chrono::system_clock::now();
        #ifdef PSTL
                for_each(pstl::execution::unseq, myvec.begin(), myvec.end(), [&](long long &p){ hist[(p%4)]++;
);
        #elif defined(GNU_PSTL)
                __gnu_parallel::for_each(myvec.begin(), myvec.end(), [&](long long &p){ hist[(p%4)]++; });        #else
                for_each(myvec.begin(), myvec.end(), [&](long long &p){ hist[(p%4)]++; });
        #endif
        timer_end = std::chrono::system_clock::now();
        elapsed_seconds = timer_end - timer_start;
        std::cout<<"Time taken by for_each algorithm is "<<elapsed_seconds.count()<<"\n";
        for_each(hist.begin(), hist.end(), [&](const long long &q){ std::cout<<q<<"\n"; });
        return 0;
}

When pstl::execution::unseq execution policy is enabled with Intel Compiler, the vectorized is forced using #pragma omp simd pragma (SIMD pragma from OpenMP4.0). One important point to remember is when using #pragma omp simd, the compiler’s vectorization heuristics will not perform the routine data dependency and data flow analysis but just follows the user’s directive to go ahead and vectorize. So always exercise caution when using this. For instance, in the above program (p%4) will result on values 0,1,2,3 in the SIMD register when targeting SSE (no duplicate values), but when targeting for AVX the SIMD register will have 0,1,2,3,0,1,2,3. When trying to execute hist[p%4] in vectorized mode, there is a data race condition. Intel® AVX-512 instruction set supports a conflict detection instruction (vconflict) which will look for conflict if any (duplicates if any) in the SIMD register. Please try the above example with the build script attached and see the performance difference between Intel Compiler generated code and GNU compiler generated code.


Designing Scalable IoT Architectures

$
0
0

Designing for the Internet of Things is challenging. The technology is rapidly changing, and architecting for these situations can be complex. This article will discuss both design considerations for IoT and new methods in creating a robust network using Intel® processors.

Latency, Bandwidth and Reliability

Design practices for Internet of Things (IoT) devices are changing. It used to be that developers just watched processes from afar, but now we control them in real-time. A result of this change has been an increase in IoT network complexity. For IoT devices that depend on Internet access, this can result in several challenges when it comes to network paths to cloud servers: high latencies, low bandwidths, and decreased reliability.

These trends have led to new topologies in IoT networks, such as Fog Computing (a network layer below the cloud). Deploying cloud elements closer to the edge of the network (or even onsite) reduces latencies while also preventing bandwidth bottlenecks. In order to achieve these goals, edge networks and Fog Computing require high-performance computing resources, as well as high-speed storage and networking.

Scalable Design

The challenge for IoT is twofold:

1) Design scalable and reliable devices

2) Architect flexible cloud elements with the lowest possible latency, highest bandwidth, and best reliability possible.

IoT Design with Intel® Processors

Intel supplies a wide range of processor products that allow IoT designers to scale both hardware and software to meet these design goals (flexible, scalable, and reliable). Many of these processor families also have integrated GPUs which offers extra processing resources.

There are four main product families:

  1. Intel® Quark™ processor
  2. Intel® Core™ processor
  3. Intel Atom® processor
  4. Intel® Xeon® processor


Figure 1 Designing to Scale

Early Big Data and Current IoT Architectures

Early big data architectures were based on sensors with networking capabilities. These accessed the Internet and transmitted data into cloud applications for later retrieval and analysis.

Current IoT architectures evolved into networks that either forward data in near real-time (to generate event-based responses) or function as sensor-actor networks.

About Sensors

Sensors are devices that detect or measure a physical property (temperature, humidity, light, etc.). Controllers receive input from sensors and initiate actions. These actions usually include using an actor or actuator to adjust or maintain desired outputs of specific processes. Let's consider, for example, a plant watering system based on sensors. The moisture sensor measures the water saturation of soil, and if that level falls below a certain threshold, a controller initiates an action to open a water valve.

Evolution of IoT Networks
Figure 2 Evolution of IoT Networks

Evolving IoT Networks

Figure 2 illustrates how latency becomes a significant issue when the IoT network becomes more complex (specifically, the Sensor-Actor Real-Time Data Model). IoT designs must take into account two things: 1) a rapidly progressing network of sensors and 2) systems acting upon the network.  Integrating fog architectures into existing IoT networks helps to reduce latency issues, bringing cloud elements closer to the edge!

As IoT evolves and more sophisticated applications are designed, the entire end-to-end IoT chain will need even more computing resources, while still requiring power consumption optimization. This need for processing power is constantly growing, meaning that designers need to account for some extra headroom for future software upgrades.

Migrating the cloud elements to the edge network or to the LAN (Figure 3) reduces network latencies accordingly. The real-time data path to the ON-PREMISES DATACENTER bypasses the access network, and results in the bandwidth and reliability benefits of LAN.

Comparison of different types of IoT Networks
Figure 3 Comparison of different types of IoT Networks

IoT Network Stacks

An increase in network complexity, along with a growing demand for IoT, have resulted in the exponential growth of complex networks stacks. Now network stacks not only need to worry about IoT protocols, they also must account for security, encryption, and independent processors that handle additional tasks.

IoT Network Architecture

When planning architecture for an IoT network, it’s important to consider the downstream processing of the network. Let's consider, for example, a smart building where a sensor is linked to a lighting appliance. The appliance may be part of a larger building application. The smart building may also be part of a smart city network. In this case, you would want to consider that data is not only being passed locally, but also being transferred to a larger building network, and ultimately to a much larger city network. 

Application Demands

As sensors grow in complexity and their implementation becomes widespread, it’s important to ensure processors account for additional demands (i.e., not only network connectivity). Increasingly large data sets are now communicating with sensors. Digital sensors that use GPIO or analog connections now have large volumes of data to run and manage in real-time. It’s important to scale independent microcontroller and bus interfaces in system designs to meet application demands. For example, Fog Node or edge computing will be needed as LIDAR, radar, ultrasound, and video(vision) sensors are added in order to keep up with real-time computing applications.

Autonomous Systems

Autonomous control and adaptive learning control systems should be accounted for in current or future IoT system designs. Implementation of autonomous systems is becoming more widespread. Being able to scale a design for future use is just as advantageous as offering the solution in your design as emerging technologies continue to progress. Smart homes, connected cars, artificial intelligence, and embedded deep learning are coming soon to the marketplace.

IoT Power and Performance with an Intel® Processor Family

Intel offers four families of processors that make achieving low latency, high bandwidth and increased reliability possible, all without increasing power consumption or affecting performance. The Internet of Things is a fast-growing, and complex system with many design considerations, such as latency issues, or ISP bottlenecks. These both can be rectified with with Intel® processors. Moving big data computing to the edge (and within LAN Fog Nodes) increases onsite computing resources, sensor capability, frees up bandwidth and increases reliability in IoT networks.

More on Scaling Processors at the Edge   Edge-to-Cloud Integration   Sensors

Intel® Distribution for Python 2017 Update 3 Readme

$
0
0

Intel® Distribution for Python powered by Anaconda gives you ready access to tools and techniques for high performance to supercharge all your Python applications on modern Intel platforms. Whether you are a seasoned high-performance developer or a data scientist looking to speed up your workflows, the Intel Distribution for Python powered by Anaconda delivers an easy-to-install, performance-optimized Python experience to meet even your most demanding requirements.

The Intel® Distribution for Python 2017 Update 3 for Linux*, Windows*, and macOS* packages are now ready for download. The Intel® Distribution for Python is available as a stand-alone product and as part of the Intel® Parallel Studio XE.

New in this release:

  • Updates to several modules for improved stability and performance

Refer to the Intel® Distribution for Python Release Notes for more details.

Contents:

  • Intel® Distribution for Python 2017 Update 3 for Linux*
    • File: l_python2_pu3_2017.3.053.tgz

      A File containing the complete product installation for Python 2.7 on Linux (x86-64bit/Intel® Xeon Phi™ coprocessor development)

    • File: l_python3_pu3_2017.3.052.tgz

      A File containing the complete product installation for Python 3.5 on Linux (x86-64bit/Intel® Xeon Phi™ coprocessor development)

  • Intel® Distribution for Python 2017 Update 3 for Windows*
    • File: w_python27_pu3_2017.3.052.exe

      A File containing the complete product installation for Python 2.7 on Windows (x86-64bit development)

    • File: w_python35_pu3_2017.3.052.exe

      A file containing the complete product installation for Python 3.5 on Windows (x86-64bit development)

  • Intel® Distribution for Python 2017 Update 3 for macOS*
    • File: intelpython27-2017.3.053.tgz

      A File containing the complete product installation for Python 2.7 on macOS (x86-64bit development)

    • File: intelpython35-2017.3.053.tgz

      A file containing the complete product installation for Python 3.5 on macOS (x86-64bit development)

BigDL: Bring Deep Learning to the Fingertips of Big Data Users and Data Scientists

$
0
0

Big data and analytics play a central role in today’s smart and connected world, and are continuously driving the convergence of big data, analytics, and machine learning/deep learning. We open sourced BigDL, a distributed deep learning library for Apache Spark*, in December 2016, for the very purpose of uniting the deep learning community and the big data community. The rest of this article provides an overview of recent enhancements available in the BigDL 0.1.0 release (as well as in the upcoming 0.1.1 release).

  • Python* Support
     

    Python* is one of the most widely used languages in the big data and data science community, and BigDL provides full support for Python APIs (using Python 2.7), based on PySpark since its 0.1.0 release; this allows users to use deep learning models in BigDL together with existing Python libraries (for example, NumPy and Pandas), which automatically run in a distributed fashion to process large volumes of data across Hadoop*/Spark clusters. For instance, we can create the LeNet-5 model, a classic convolutional neural network, using the BigDL Python API as follows:

    def build_model(class_num):
       model = Sequential()
        model.add(Reshape([1, 28, 28]))
        model.add(SpatialConvolution(1, 6, 5, 5).set_name('conv1'))
        model.add(Tanh())
        model.add(SpatialMaxPooling(2, 2, 2, 2).set_name('pool1'))
        model.add(Tanh())
        model.add(SpatialConvolution(6, 12, 5, 5).set_name('conv2'))
        model.add(SpatialMaxPooling(2, 2, 2, 2).set_name('pool2'))
        model.add(Reshape([12 * 4 * 4]))
        model.add(Linear(12 * 4 * 4, 100).set_name('fc1'))
        model.add(Tanh())
        model.add(Linear(100, class_num).set_name('score'))
        model.add(LogSoftMax())
        return model

    In addition, we continue to improve Python support in BigDL; the upcoming BigDL 0.1.1 release will add Python 3.5support, as well as support for users to automatically deploy their customized Python dependency across YARN* clusters.

  • Notebook Integration
     

    With full Python API support in BigDL, data scientists and analysts can now explore their data using powerful notebooks (such as the Jupyter Notebook) in a distributed fashion across the cluster, combining Python libraries, Spark SQL / DataFrames and MLlib, deep learning models in BigDL, as well as interactive visualization tools. For instance, the Jupyter Notebook tutorial contained in BigDL 0.1.0 demonstrates how we can evaluate the prediction result of a text classification model (using both accuracy and confusion matrix) as follows:

    predictions = trained_model.predict(val_rdd).collect()
    
    def map_predict_label(l):
        return np.array(l).argmax()
    def map_groundtruth_label(l):
        return l[0] - 1
    
    y_pred = np.array([ map_predict_label(s) for s in predictions])
    
    y_true = np.array([map_groundtruth_label(s.label) for s in val_rdd.collect()])
    acc = accuracy_score(y_true, y_pred)
    print "The prediction accuracy is %.2f%%"%(acc*100)
    
    cm = confusion_matrix(y_true, y_pred)
    cm.shape
    df_cm = pd.DataFrame(cm)
    plt.figure(figsize = (10,8))
    sn.heatmap(df_cm, annot=True,fmt='d');

    Figure 1

  • TensorBoard* Support
     

    TensorBoard* is a suite of web applications for inspecting and understanding deep learning program runs and graphs, and BigDL 0.1.0 provides support for visualizations using TensorBoard (as well as inline plotting libs such as Matplotlib* within the notebook). First, a BigDL program can be configured to generate summary information for training and/or validation, as illustrated below (using Python APIs):

    optimizer = Optimizer(...)
    ...
    log_dir = 'mylogdir'
    app_name = 'myapp'
    train_summary = TrainSummary(log_dir=log_dir, app_name=app_name)
    val_summary = ValidationSummary(log_dir=log_dir, app_name=app_name)
    optimizer.set_train_summary(train_summary)
    optimizer.set_val_summary(val_summary)
    ...
    trainedModel = optimizer.optimize()

    After we start to run the BigDL program, the train and validation summary is saved to and respectively; after that, we can use TensorBoard to visualize the behaviors of the BigDL program, including the Loss and Throughput curves under the SCALARS tab (as illustrated below).

    Figure 2

    We can also use TensorBoard to visualize weights, bias, gradientWeights, and gradientBias under the DISTRIBUTIONS and HISTOGRAMS tabs (as illustrated below). 

    Figure 3

    Figure 4

  • Better RNN Support
     

    Recurrent neural networks (RNN) are powerful models for analyzing speech, text, times series, sensor data, and so on. The BigDL 0.1.0 release provides comprehensive support for RNN, including different variants of long short-term memory such as gated recurrent unit (GRU), LSTM with peephole, and dropout in recurrent neural networks. For instance, we can create a simple LSTM model (using the Python API) as follows:

    model = Sequential()
    model.add(Recurrent()
                 .add(LSTM(embedding_dim, 128)))
    model.add(Select(2, -1))
    model.add(Linear(128, 100))
    model.add(Linear(100, class_num))

We have seen major advancements in deep learning in recent years; while the deep learning community continues to push the technology envelope, BigDL helps make these breakthroughs more accessible and convenient to use for data scientists and data engineers (who are not necessarily experts in deep learning technologies). We continue to work on enhancements in BigDL beyond the 0.1 release (for example, support for reading/writing TensorFlow models, Convolutional Neural Network (CNN) implementations for 3Dimages, recursive nets, and so on), so that big data users can continue the use of familiar tools and infrastructure to build their deep learning-powered analytics applications.

Deploying BigDL on Microsoft’s Azure* Data Science Virtual Machine

$
0
0

Automated Installation of BigDL Using Deploy to Azure*

To make it easier to deploy BigDL, we created a “Deploy to Azure” button on top of the Linux* (Ubuntu*) edition of the Data Science Virtual Machine (DSVM). This button encapsulates all the necessary installation steps to create a new Azure* DSVM and installs BigDL after the virtual machine (VM) is provisioned.

Azure Virtual Machines provide a mechanism to automatically run a script during post provisioning when using Azure Resource Manager (ARM) templates. On Github*, we have published the Azure Resource Manager (ARM) template and the script to install BigDL on the DSVM for Linux (Ubuntu) when creating the VM on Azure. 

Deploy to Azure

Clicking the Deploy to Azure button takes the user to the Azure portal wizard, leads them through the VM creation process, and automatically executes the necessary script to install/configure BigDL so that it is ready for use once the VM is successfully provisioned. The user can directly run /opt/BigDL/run_notebooks.sh to start a Jupyter* notebook server to execute the samples.

Note: It may take as long as 10 minutes to fully provision DSVM—perfect time for a coffee break!

Please note: For ease of use, we suggest selecting the password option rather than the SSH option in the DSVM provisioning prompt.

Figure 1

For completeness, we also provide, below, the manual, step-by-step installation procedure to create the data science steps in case you already have a DSVM (Ubuntu) instance, or just want to understand the details of what the automated steps do, above.

Manual Installation of BigDL on the DSVM

Provisioning DSVM

Before you start, you need to provision the Microsoft Data Science Virtual Machine for Linux (Ubuntu) by visiting the Azure product detail page and following the directions in the VM creation wizard.

Figure 2

Figure 3

When DSVM is configured, make a note of its public IP address or DNS name; you will need it to connect to DSVM via your connect tool of choice.  The recommended tool for text interface is SSH or Putty. For the graphical interface, Microsoft* recommends an X Client called X2GO*.

Note: You may need to configure your proxy server correctly if your network administrators require all connections to go through your network proxy. The only session type supported by default on DSVM is Xfce*.

Building Intel’s BigDL

Change to root and clone BigDL from Github; switch to released branch-0.1:

sudo -s

     cd /opt

     git clone https://github.com/intel-analytics/BigDL.gi

     git checkout branch-0.1

Building BigDL with Spark* 2.0:

     $ cd BigDL
       $ bash make-dist.sh -P spark_2.0

If successful, you should see the following messages:

Figure 4

Examples of DSVM Configuration Steps to Run BigDL

Switch to Python* 2.7.

     $ source /anaconda/bin/activate root

Confirm Python* version.

     $ python - - version

Figure 5

Install Python Packages

     $ /anaconda/bin/pip install wordcloud
     $ /anaconda/bin/pip install tensorboard

Creating Script Files to Run Jupyter* Notebook and TensorBoard*

In the directory where you cloned BigDL library (/opt/BigDL), create a script, and run_notebook.sh with the following content:

#begin run_notebook.sh
#!/bin/bash
#setup paths
BigDL_HOME=~/BigDL

#this is needed for MSFT DSVM
export PYTHONPATH=${BigDL_HOME}/pyspark/dl:${PYTHONPATH}
#end MSFT DSVM-specific config

#use local mode or cluster mode
#MASTER=spark://xxxx:7077
MASTER="local[4]"
PYTHON_API_ZIP_PATH=${BigDL_HOME}/dist/lib/bigdl-0.1.0-python-api.zip
BigDL_JAR_PATH=${BigDL_HOME}/dist/lib/bigdl-0.1.0-jar-with-dependencies.jar
export PYTHONPATH=${PYTHON_API_ZIP_PATH}:${PYTHONPATH}
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=~/notebooks  --ip=* "

source ${BigDL_HOME}/dist/bin/bigdl.sh

${SPARK_HOME}/bin/pyspark \
    --master ${MASTER} \
    --driver-cores 5  \
    --driver-memory 10g  \
    --total-executor-cores 8  \
    --executor-cores 1  \
    --executor-memory 10g \
    --conf spark.akka.frameSize=64 \
  --properties-file ${BigDL_HOME}/dist/conf/spark-bigdl.conf \
    --py-files ${PYTHON_API_ZIP_PATH} \
    --jars ${BigDL_JAR_PATH} \
    --conf spark.driver.extraClassPath=${BigDL_JAR_PATH} \
    --conf spark.executor.extraClassPath=bigdl-0.1.0--jar-with-dependencies.jar
# end of create_notebook.sh
-----

chmod +x run_notebook.sh

In the same BigDL directory, create start_tensorboard.sh with the following content:

#begin start_tensorboard.sh
PYTHONPATH=/anaconda/lib/python2.7/site-packages:$PYTHONPATH
/anaconda/lib/python2.7/site-packages/tensorboard/tensorboard --logdir=/tmp/bigdl_summaries
#end start_tensorboard.sh

Please note that ‘/anaconda/lib/python2.7/site-packages/’ is installation-dependent and may change in future releases of DSVM. Thus, if these instructions do not work for you out of the box, you may need to update this path.

Figure 6

Note the URL at the end of the log http://10.0.2.4:6006. Open your DSVM browser with it to see the TensorBoard pane.

Launching a Text Classification Example

Execute run_notebook.sh and start_tensorboard.sh via bash commands from different terminals:

       $bash run_notebook.sh
       $bash start_tensorboard.sh

Open two browser tabs, one for text_classification.ipynb and another for TensorBoard.

Navigate to the text_classification example:

http://localhost:YOUR_PORT_NUMBER/notebooks/pyspark/dl/example/tutorial/simple_text_classification/text_classfication.ipynb# —Check location of sample.

Run the notebook. This will take a few minutes. In the end, you will see a loss graph like this one:

Figure 7

Your TensorBoard may look like this for the Text Classification example.

Figure 8

Automating the Installation of BigDL on DSVM

Azure Virtual Machines provide a mechanism to automatically run a script during post provisioning when using Azure Resource Manager (ARM) templates. On Github, we published the ARM template and the script to install BigDL on the DSVM for Linux (Ubuntu) when creating the VM on Azure.  On the same Github directory there is also a Deploy to Azure button that takes the user to the Azure portal wizard, leads them through the VM creation, and automatically executes the above script to install/configure BigDL so that it is ready for use once the VM is successfully provisioned. The user can directly run /opt/BigDL/run_notebooks.sh to start a Jupyter notebook server to execute the samples.

Conclusion

In this blog post, we demonstrated that in just a few small steps one can take advantage of the Intel BigDL library running on Apache Spark* to execute deep learning jobs on Microsoft’s Data Science Virtual Machine. BigDL continues to evolve and enjoys solid support from the open-source community as well as from Intel’s dedicated software engineering team.

Resources

Appendix

Installing and configuring Spark 1.6 for legacy code implementation:

Installing Spark 1.6.1 WITH spark 2.0)

          Install Spark 1.6.1: http://spark.apache.org/downloads.html
          Select 1.6.1.
          Download
          ​cd Downloads
          tar -xzf spark-1.6.1-bin-hadoop2.6.tgz

Move the directory from the download location to where Spark is stored on the system.

Figure 9

To switch back to the Python 3.5 environment:

     $source activate py35 (for Python 3.5)

To install Python packages in the Python 3.5 environment:

     $sudo /anaconda/envs/py35/bin/conda install xxxx (for Python 3.5 env)

(Do the same for pip installs.)

Installing BigDL on the Data Science Virtual Machine for Linux (CentOS*):

To run BigDL on DSVM CentOS* edition, first you need to install Maven* on the DSVM before compiling BigDL.

Installing Maven. Note that on CentOS-based Linux, instead of Ubuntu's apt-get, you need to use yum to install new packages:

Figure 10

DSVM’s default JAVA_HOME environmental variable points to an empty directory, "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64". You need to change it to another already existing one that contains the Java* 8 installation:

   Export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64".

Check that Maven is installed correctly:

   $ mvn –v

Figure 11

After this, you should be able to run a build on BigDL following the steps in the main section above. 

Introduction to pyDAAL

$
0
0

This paper shows how the python* API of the Intel® Data Analytics Acceleration Library (Intel® DAAL) tool works. First, we explain how to manipulate data using the pyDAAL programming interface and then show how to integrate it with python data manipulation/math APIs. Finally, we demonstrate how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem.

Data Science is a new recent field that put together lots of concepts of other areas such as: Data mining, Data Analysis, Data modeling, Data Prediction, Data Visualization and so on. The need for performing such tasks as quickly as possible has become the main issue in today's data solutions. With that in mind, the Intel DAAL, is a highly optimized library whose goal is to provide a full solution for data analytics targeting today's highly parallel systems such as Intel® Xeon Phi™ processors.


Intel DAAL delivers solutions for many steps of a data analytics pipeline, such as pre-processing, data transformations, dimensionality reduction, data modeling, prediction, and several drivers for reading and writing in most of the common data formats. A summary of all features inside the library can be seen in Figure 1.

Figure 1. Main algorithms delivered by Intel® Data Analytics Acceleration Library

As can be seen in Figure 1, all APIs are compatible with C++, Java*, and Python* (a recent addition available from version 2017 beta). Many of the algorithms implemented inside the tool can be executed in 3 main modes:

  • Batch: in this mode, the processing occurs in a serial way, e.g., the training algorithm is executed in a single node sequentially;
  • Distributed: as the name suggests, in this processing mode, the dataset must be split and distributed among the computing nodes. The algorithm then calculate partial solutions and, at the last step, unifies such solutions; and
  • Online: in this processing mode, the data is considered as being a continuous stream. The processing occurs by building incremental models, and, at the end, building a full model from the partial models.

More on the processing modes, together with additional details on Data Management and how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem are covered in this whitepaper.

Source available on GitHub

 

Why & When Deep Learning Works: Looking Inside Deep Learning

$
0
0

Ronny Ronen
The Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)1

In recent years, Deep Learning has emerged as the leading technology for accomplishing broad range of artificial intelligence tasks (LeCun et al. (2015); Goodfellow et al. (2016)). Deep learning is the state-of-the-art approach across many domains, including object recognition and identification, text understating and translation, question answering, and more. In addition, it is expected to play a key role in many new usages deemed almost impossible before, such as fully autonomous driving.

While the ability of Deep Learning to solve complex problems has been demonstrated again and again, there is still a lot of mystery as to why it works, what is it really capable of accomplishing, and when it works (and when it does not). Such an understanding is important for both theoreticians and practitioners, in order to know how such methods can be utilized safely and in the best possible manner. An emerging body of work has sought to develop some insights in this direction, but much remains unknown. The general feeling is that Deep learning is still by and large “black magic” we know it works, but we do not truly understand why. This lack of knowledge disturbs the scientists and are a cause for concern for developers would you let an autonomous car be driven by a system whose mechanisms and weak spots are not fully understood?

The Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI) has been heavily supporting Machine Learning and Deep Learning research from its foundation in 2012. We have asked six leading ICRI-CI Deep Learning researchers to address the challenge of “Why & When Deep Learning works”, with the goal of looking inside Deep Learning, providing insights on how deep networks function, and uncovering key observations on their expressiveness, limitations, and potential.

The output of this challenge call was quite impressive, resulting in five papers that address different facets of deep learning. These papers summarize the researchers’ ongoing recent work published in leading conferences and journals as well as new research results made especially for this compilations. These different facets include a high-level understating of why and when deep networks work (and do not work), the impact of geometry on the expressiveness of deep networks, and making deep networks interpretable.

Understating of why and when deep networks work (and do not work)

  1. Naftali Tishby and Ravid Schwartz-Ziv in Opening the Black Box of Deep Neural Networks via Information study Deep Networks by analyzing their information-theoretic properties, looking at what information on the input and output each layer preserves, and suggests that the network implicitly attempts to optimize the Information-Bottleneck (IB) tradeoff between compression and prediction, successively, for each layer. Moreover, they show that the stochastic gradient descent (SGD) epochs used to train such networks have two distinct phases for each layer: fast empirical error minimization, followed by slow representation compression. They then present a new theoretical argument for the computational benefit of the hidden layers.
     
  2. Shai Shalev-Shwartz, Ohad Shamir and Shaked Shamma in Failures of Gradient-Based Deep Learning attempt to gain a deeper understanding of the difficulties and limitations associated with common approaches and algorithms. They describe four families of problems for which some of the commonly used existing algorithms fail or suffer significant difficulty, illustrate the failures through practical experiments, and provide theoretical insights explaining their source and suggest remedies to overcome the failures that lead to performance improvements.
     
  3. Amnon Shashua, Nadav Cohen, Or Sharir, Ronen Tamari, David Yakira and Yoav Levine in Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions analyze the expressive properties of deep convolutional networks. Through an equivalence to hierarchical tensor decompositions, they study the expressive efficiency and inductive bias of various architectural features in convolutional networks (depth, width, pooling geometry, inter-connectivity, overlapping operations etc.). Their results shed light on the demonstrated effectiveness of convolutional networks, and in addition, provide new tools for network design.

    The impact of geometry on the expressiveness of deep networks
     
  4. Nathan Srebro, Behnam Neyshabur, Ryota Tomioka and Ruslan Salakhutdinov in Geometry of Optimization and Implicit Regularization in Deep Learning argue that the optimization methods used for training neural networks play a crucial role in generalization ability of deep learning models, through implicit regularization. They demonstrate that generalization ability is not controlled simply by network size, but rather by some other implicit control. Then, by studying the geometry of the parameter space of deep networks and devising an optimization algorithm attuned to this geometry, they demonstrate how changing the empirical optimization procedure can improve generalization performance.

    Interpretability of deep networks
     
  5. Shie Mannor, Tom Zahavy and Nir Baram in Graying the black box: Understanding DQNs present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. They propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. Using these tools they reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover, they are able to look into the network to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize deep neural networks in reinforcement learning.

References

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016.

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553): 436–444, 2015.

1 This work was done with the support of the Intel Collaborative Research institute for Computational Intelligence (ICRI-CI). This paper is the preface part of the ’Why & When Deep Learning works looking inside Deep Learning’ ICRI-CI paper bundle.

Vector (SIMD) Function ABI

$
0
0

Vector Function Application Binary Interface

adapted from version of November 2015 by

Xinmin Tian, Hideki Saito, Sergey Kozhukhov, Kevin B. Smith,
Robert Geva, Milind Girkar and Serguei V. Preis
Intel® Mobile Computing and Compilers

Please see attachment.

 


The Evil within the Comparison Functions

$
0
0

Perhaps, readers remember my article titled "Last line effect". It describes a pattern I've once noticed: in most cases programmers make an error in the last line of similar text blocks. Now I want to tell you about a new interesting observation. It turns out that programmers tend to make mistakes in functions comparing two objects. This statement looks implausible; however, I'll show you a great number of examples of errors that may be shocking to a reader. So, here is a new research, it will be quite amusing and scary.

Problematics

Here is my statement: programmers quite often make mistakes in rather simple functions that are meant to compare two objects. This claim is based on the experience of our team in checking a large number of open source projects in C, C++ and C#.

The functions we are going to consider here are IsEqual, Equals, Compare, AreEqual and so on or overloaded operators as ==, !=.

I noticed that when writing articles, very often I come across errors related to the comparison functions. I decided to explore this question in detail and examined the base of errors we found. I did a search of functions throughout the base containing words Cmp, Equal, Compare and such. The result was very impressive and shocking.

In fact this story is similar to the one we had when writing the article "Last line effect". Similarly, I noticed an anomaly and decided to explore it more carefully. Unfortunately, unlike the aforementioned article, I don't know how to bring statistics here and which figures to provide. Perhaps, later I'll come up with a solution with the statistics. At this point I am guided by intuition and can only share my feelings. They see that there are a lot of errors in the comparison functions and I am sure, you will get the same feeling when you see that huge amount of truly impressive examples.

Psychology

For a moment let's go back to the article "Last line effect". By the way, if you haven't read it, I suggest taking a break and looking at it. There is a more detailed analysis of this topic: "The last line effect explained"

In general, we can conclude that the cause of the errors in the last lined is related to the fact that the developer has already mentally moved to the new lines/tasks instead of focusing on the completion of the current fragment. As a result - when writing similar blocks of text, there is a higher probability that a programmer will make an error in the last one.

I believe that in the case of writing a comparison function, a developer in general often don't focus on it, considering it to be too trivial. In other words, he writes the code automatically, without thinking over it. Otherwise, it is not clear how one can make an error like this:

bool IsLuidsEqual(LUID luid1, LUID luid2)
{
  return (luid1.LowPart == luid2.LowPart) &&
         (luid2.HighPart == luid2.HighPart);
}

PVS-Studio analyzer detected this error in the code of RunAsAdmin Explorer Shim (C++) project: V501 There are identical sub-expressions to the left and to the right of the '==' operator: luid2.HighPart == luid2.HighPart RAACommon raacommonfuncs.cpp 1511

A typo. In the second line it should be: luid1.HighPart == luid2.HighPart.

The code is very simple. Apparently, the simplicity of code spoils everything. A programmer immediately thinks of the task to write such a function as standard and uninteresting. He instantly thinks of the way to write the function and he has just to implement the code. This is a routine, but unfortunately an inevitable process to start writing more important, complex and interesting code. He is already thinking about the new task... and as a result - makes an error.

In addition, programmers rarely write unit tests for such functions. Again the simplicity of these functions prevents from it. It seems that it would be too much to test them, as these functions are simple and repetitive. A person has written hundreds of such functions in his life, can he make an error in another function? Yes, he can and he does.

I would also like to note that we aren't talking about code of students who are just learning to program. We are talking about bugs in the code of such projects as GCC, Qt, GDB, LibreOffice, Unreal Engine, CryEngine 4 V Chromium, MongoDB, Oracle VM Virtual Box, FreeBSD, WinMerge, the CoreCLR, MySQL, Mono, CoreFX, Roslyn, MSBuild, etc. It's all very serious.

We are going to have a look at so many diverse examples that it would be scary to sleep at night.

Erroneous Patterns in Comparison Functions

All errors in comparison functions will be divided into several patterns. In the article we'll be talking about errors in projects in C, C++ and C#, but it makes no sense to separate these languages, as most of the patterns are similar for different languages.

Pattern: A < B, B > A

Very often in the comparison functions there is a need to make such checks:

  • A < B
  • A > B

Sometimes programmers think that is more elegant to use the same operator <, but to switch the variables.

  • A < B
  • B < A

However, due to the inattentiveness, we get such checks:

  • A < B
  • B > A

In fact, one and the same comparison is done twice here. Perhaps, it's not clear what it is about here, but we'll get to the practical examples and it'll all become clearer.

string _server;
....
bool operator<( const ServerAndQuery& other ) const {
  if ( ! _orderObject.isEmpty() )
    return _orderObject.woCompare( other._orderObject ) < 0;

  if ( _server < other._server )
    return true;
  if ( other._server > _server )
    return false;
  return _extra.woCompare( other._extra ) < 0;
}

PVS-Studio analyzer detected this error in the code of MongoDB (C++): V581 The conditional expressions of the 'if' operators situated alongside each other are identical. Check lines: 44, 46. parallel.h 46

This condition:

if ( other._server > _server )

Will always be false, as the same check was done two lines before. Correct code variant:

if ( _server < other._server )
  return true;
if ( other._server < _server )
  return false;

This error was detected in the code of Chromium project (C++):

enum ContentSettingsType;
struct EntryMapKey {
  ContentSettingsType content_type;
  ...
};

bool OriginIdentifierValueMap::EntryMapKey::operator<(
    const OriginIdentifierValueMap::EntryMapKey& other) const {
  if (content_type < other.content_type)
    return true;
  else if (other.content_type > content_type)
    return false;
  return (resource_identifier < other.resource_identifier);
}

PVS-Studio warning: V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 61, 63. browser content_settings_origin_identifier_value_map.cc 61

That was a C++ example, now it's C# turn. The next error was found in the code of IronPython and IronRuby (C#).

public static int Compare(SourceLocation left,
                          SourceLocation right) {
  if (left < right) return -1;
  if (right > left) return 1;
  return 0;
}

PVS-Studio warning (C#): V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless. SourceLocation.cs 156

I think there is no need in explanation.

Note. For C# there was just one example of an error, but for C++ - two. In general, there will be less bugs in the C# code, than for C/C++. But I do not recommend rushing to the conclusion that C# is much safer. The thing is that PVS-Studio analyzer has only recently learned to check C# code relatively recently, and we have just checked less projects written in C#, than in C and C++.

Pattern: a Member of the Class is Compared with itself

The comparison functions usually consist of successive comparisons of structure/class members. This code tends to be more erronreous, when the member of the class starts being compared with itself. I can specify two subtypes of errors.

In the first case, a programmer forgets to specify the name of the object and writes in the following way:

return m_x == foo.m_x &&
       m_y == m_y &&            // <=
       m_z == foo.m_z;
In the second case, the same name of the object is written.
return zzz.m_x == foo.m_x &&
       zzz.m_y == zzz.m_y &&    // <=
       zzz.m_z == foo.m_z;

Let's take a closer look at practical examples of this pattern. Pay attention that incorrect comparison often occurs in the last block of similar code blocks, which reminds us of the "last line effect" again.

The error is found in the code of Unreal Engine 4 (C++) project:

bool
Compare(const FPooledRenderTargetDesc& rhs, bool bExact) const
{
  ....
  return Extent == rhs.Extent&& Depth == rhs.Depth&& bIsArray == rhs.bIsArray&& ArraySize == rhs.ArraySize&& NumMips == rhs.NumMips&& NumSamples == rhs.NumSamples&& Format == rhs.Format&& LhsFlags == RhsFlags&& TargetableFlags == rhs.TargetableFlags&& bForceSeparateTargetAndShaderResource ==
         rhs.bForceSeparateTargetAndShaderResource&& ClearValue == rhs.ClearValue&& AutoWritable == AutoWritable;           // <=
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: AutoWritable == AutoWritable rendererinterface.h 180

The code of Samba (C) project:

static int compare_procids(const void *p1, const void *p2)
{
  const struct server_id *i1 = (struct server_id *)p1;
  const struct server_id *i2 = (struct server_id *)p2;

  if (i1->pid < i2->pid) return -1;
  if (i2->pid > i2->pid) return 1;
  return 0;
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '>' operator: i2->pid > i2->pid brlock.c 1901

The code of MongoDB (C++) project:

bool operator==(const MemberCfg& r) const {
  ....
  return _id==r._id && votes == r.votes &&
         h == r.h && priority == r.priority &&
         arbiterOnly == r.arbiterOnly &&
         slaveDelay == r.slaveDelay &&
         hidden == r.hidden &&
         buildIndexes == buildIndexes;        // <=
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: buildIndexes == buildIndexes rs_config.h 101

The code of Geant4 Software (C++) project:

inline G4bool G4FermiIntegerPartition::
operator==(const G4FermiIntegerPartition& right)
{
  return (total == right.total &&
          enableNull == enableNull &&          // <=
          partition == right.partition);
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: enableNull == enableNull G4hadronic_deex_fermi_breakup g4fermiintegerpartition.icc 58

The code of LibreOffice (C++) project:

class SvgGradientEntry
{
  ....
  bool operator==(const SvgGradientEntry& rCompare) const
  {
    return (getOffset() == rCompare.getOffset()&& getColor() == getColor()            // <=&& getOpacity() == getOpacity());      // <=
  }
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: getColor() == getColor() svggradientprimitive2d.hxx 61

The code of Chromium (C++) project:

bool FileIOTest::MatchesResult(const TestStep& a,
                               const TestStep& b) {
  ....
  return (a.data_size == a.data_size &&             // <=
          std::equal(a.data, a.data + a.data_size, b.data));
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: a.data_size == a.data_size cdm_file_io_test.cc 367

The code of FreeCAD (C++) project:

bool FaceTypedBSpline::isEqual(const TopoDS_Face &faceOne,
                               const TopoDS_Face &faceTwo) const
{
  ....
  if (surfaceOne->IsURational() !=
      surfaceTwo->IsURational())
    return false;
  if (surfaceTwo->IsVRational() !=         // <=
      surfaceTwo->IsVRational())           // <=
    return false;
  if (surfaceOne->IsUPeriodic() !=
      surfaceTwo->IsUPeriodic())
    return false;
  if (surfaceOne->IsVPeriodic() !=
      surfaceTwo->IsVPeriodic())
    return false;
  if (surfaceOne->IsUClosed() !=
      surfaceTwo->IsUClosed())
    return false;
  if (surfaceOne->IsVClosed() !=
      surfaceTwo->IsVClosed())
    return false;
  if (surfaceOne->UDegree() !=
      surfaceTwo->UDegree())
    return false;
  if (surfaceOne->VDegree() !=
      surfaceTwo->VDegree())
    return false;
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions 'surfaceTwo->IsVRational()' to the left and to the right of the '!=' operator. modelrefine.cpp 780

The code of Serious Engine (C++) project:

class CTexParams {
public:

  inline BOOL IsEqual( CTexParams tp) {
    return tp_iFilter     == tp.tp_iFilter &&
           tp_iAnisotropy == tp_iAnisotropy &&             // <=
           tp_eWrapU      == tp.tp_eWrapU &&
           tp_eWrapV      == tp.tp_eWrapV; };
  ....
};

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '==' operator: tp_iAnisotropy == tp_iAnisotropy gfx_wrapper.h 180

The code of Qt (C++) project:

inline bool qCompare(QImage const &t1, QImage const &t2, ....)
{
  ....
  if (t1.width() != t2.width() || t2.height() != t2.height()) {
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '!=' operator: t2.height() != t2.height() qtest_gui.h 101

The code of FreeBSD (C) project:

static int
compare_sh(const void *_a, const void *_b)
{
  const struct ipfw_sopt_handler *a, *b;

  a = (const struct ipfw_sopt_handler *)_a;
  b = (const struct ipfw_sopt_handler *)_b;
  ....
  if ((uintptr_t)a->handler < (uintptr_t)b->handler)
    return (-1);
  else if ((uintptr_t)b->handler > (uintptr_t)b->handler) // <=
    return (1);

  return (0);
}

PVS-Studio warning: V501 There are identical sub-expressions '(uintptr_t) b->handler' to the left and to the right of the '>' operator. ip_fw_sockopt.c 2893

The code of Mono (C#) project:

static bool AreEqual (VisualStyleElement value1,
                      VisualStyleElement value2)
{
  return
    value1.ClassName == value1.ClassName && // <=
    value1.Part == value2.Part &&
    value1.State == value2.State;
}

PVS-Studio warning: V3001 There are identical sub-expressions 'value1.ClassName' to the left and to the right of the '==' operator. ThemeVisualStyles.cs 2141

The code of Mono (C#) project:

public int ExactInference (TypeSpec u, TypeSpec v)
{
  ....
  var ac_u = (ArrayContainer) u;
  var ac_v = (ArrayContainer) v;
  ....
  var ga_u = u.TypeArguments;
  var ga_v = v.TypeArguments;
  ....
  if (u.TypeArguments.Length != u.TypeArguments.Length) // <=
    return 0;

  ....
}

PVS-Studio warning: V3001 There are identical sub-expressions 'u.TypeArguments.Length' to the left and to the right of the '!=' operator. generic.cs 3135

The code of MonoDevelop (C#) project:

Accessibility DeclaredAccessibility { get; }
bool IsStatic { get; }

private bool MembersMatch(ISymbol member1, ISymbol member2)
{
  if (member1.Kind != member2.Kind)
  {
    return false;
  }

  if (member1.DeclaredAccessibility !=          // <=1
      member1.DeclaredAccessibility             // <=1
   || member1.IsStatic != member1.IsStatic)     // <=2
  {
    return false;
  }

  if (member1.ExplicitInterfaceImplementations().Any() ||
      member2.ExplicitInterfaceImplementations().Any())
  {
    return false;
  }

  return SignatureComparer
    .HaveSameSignatureAndConstraintsAndReturnTypeAndAccessors(
       member1, member2, this.IsCaseSensitive);
}

PVS-Studio warning: V3001 There are identical sub-expressions 'member1.IsStatic' to the left and to the right of the '!=' operator. CSharpBinding AbstractImplementInterfaceService.CodeAction.cs 545

The code of Haiku (C++) project:

int __CORTEX_NAMESPACE__ compareTypeAndID(....)
{
  int retValue = 0;
  ....
  if (lJack && rJack)
  {
    if (lJack->m_jackType < lJack->m_jackType)           // <=
    {
      return -1;
    }
    if (lJack->m_jackType == lJack->m_jackType)          // <=
    {
      if (lJack->m_index < rJack->m_index)
      {
        return -1;
      }
      else
      {
        return 1;
      }
    }
    else if (lJack->m_jackType > rJack->m_jackType)
    {
      retValue = 1;
    }
  }
  return retValue;
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '<' operator: lJack->m_jackType < lJack->m_jackType MediaJack.cpp 783

Just below there is exactly the same error. As I understand, in both cases a programmer forgot to replace lJack with rJack.

The code of CryEngine V (C++) project:

bool
CompareRotation(const Quat& q1, const Quat& q2, float epsilon)
{
  return (fabs_tpl(q1.v.x - q2.v.x) <= epsilon)&& (fabs_tpl(q1.v.y - q2.v.y) <= epsilon)&& (fabs_tpl(q2.v.z - q2.v.z) <= epsilon)     // <=&& (fabs_tpl(q1.w - q2.w) <= epsilon);
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '-' operator: q2.v.z - q2.v.z entitynode.cpp 93

Pattern: Evaluating the Size of a Pointer Instead of the Size of the Structure/Class

This type of error occurs in programs written in C and C++ and is caused by incorrect use of the sizeof operator. The error in evaluating not the size of the object, but the size of the pointer. Example:

T *a = foo1();
T *b = foo2();
x = memcmp(a, b, sizeof(a));

Instead of the size of the T structure, a size of the pointer gets evaluated. The size of the pointer depends on the used data model, but usually it is 4 or 8. As a result, more or less bites in the memory get compared than take the structure.

Correct variant of the code:

x = memcmp(a, b, sizeof(T));

or

x = memcmp(a, b, sizeof(*a));

Now let's move on to the practical part. Here is how such a bug looks in the code of CryEngine V (C++) code:

bool
operator==(const SComputePipelineStateDescription& other) const
{
  return 0 == memcmp(this, &other, sizeof(this));
}

PVS-Studio warning: V579 The memcmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. graphicspipelinestateset.h 58

The code of Unreal Engine 4 project (C++):

bool FRecastQueryFilter::IsEqual(
  const INavigationQueryFilterInterface* Other) const
{
  // @NOTE: not type safe, should be changed when
  // another filter type is introduced
  return FMemory::Memcmp(this, Other, sizeof(this)) == 0;

}

PVS-Studio warning: V579 The Memcmp function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. pimplrecastnavmesh.cpp 172

Pattern: Repetitive Arguments of Cmp(A, A) Type

Comparison functions usually call other comparison functions. At the same time one of the possible errors is that the reference/pointer is passed to the same object twice. Example:

x = memcmp(A, A, sizeof(T));

Here the object A will be compared with itself, which, is of course, has no sense.

We'll start with an error, found in the debugger GDB (C):

static int
psymbol_compare (const void *addr1, const void *addr2,
                 int length)
{
  struct partial_symbol *sym1 = (struct partial_symbol *) addr1;
  struct partial_symbol *sym2 = (struct partial_symbol *) addr2;

  return (memcmp (&sym1->ginfo.value, &sym1->ginfo.value,    // <=
                  sizeof (sym1->ginfo.value)) == 0&& sym1->ginfo.language == sym2->ginfo.language&& PSYMBOL_DOMAIN (sym1) == PSYMBOL_DOMAIN (sym2)&& PSYMBOL_CLASS (sym1) == PSYMBOL_CLASS (sym2)&& sym1->ginfo.name == sym2->ginfo.name);
}

PVS-Studio warning: V549 The first argument of 'memcmp' function is equal to the second argument. psymtab.c 1580

The code of CryEngineSDK project (C++):

inline bool operator != (const SEfResTexture &m) const
{
  if (stricmp(m_Name.c_str(), m_Name.c_str()) != 0 ||   // <=
      m_TexFlags != m.m_TexFlags ||
      m_bUTile != m.m_bUTile ||
      m_bVTile != m.m_bVTile ||
      m_Filter != m.m_Filter ||
      m_Ext != m.m_Ext ||
      m_Sampler != m.m_Sampler)
    return true;
  return false;
}

PVS-Studio warning: V549 The first argument of 'stricmp' function is equal to the second argument. ishader.h 2089

The code of PascalABC.NET (C#):

private List<string> enum_consts = new List<string>();
public override bool IsEqual(SymScope ts)
{
  EnumScope es = ts as EnumScope;
  if (es == null) return false;
  if (enum_consts.Count != es.enum_consts.Count) return false;
  for (int i = 0; i < es.enum_consts.Count; i++)
    if (string.Compare(enum_consts[i],
                       this.enum_consts[i], true) != 0)
      return false;
  return true;
}

PVS-Studio warning: V3038 The 'enum_consts[i]' argument was passed to 'Compare' method several times. It is possible that other argument should be passed instead. CodeCompletion SymTable.cs 2206

I'll give some explanation here. The error in the factual arguments of the Compare function:

string.Compare(enum_consts[i], this.enum_consts[i], true)

The thing is that enum_consts[i] and this.enum_consts[i are the same things. As I understand, a correct call should be like this:

string.Compare(es.enum_consts[i], this.enum_consts[i], true)

or

string.Compare(enum_consts[i], es.enum_consts[i], true)

Pattern: Repetitive Checks A==B && A==B

Quite a common error in programming is when the same check is done twice. Example:

return A == B &&
       C == D &&   // <=
       C == D &&   // <=
       E == F;

Two variants are possible in this case. The first is quite harmless: one comparison is redundant and can be simply removed. The second is worse: some other variables were to be compared, but a programmer made a typo.

In any case, such code deserves close attention. Let me scare you a little more, and show that this error can be found even in the code of GCC compiler (C):

static bool
dw_val_equal_p (dw_val_node *a, dw_val_node *b)
{
  ....
  case dw_val_class_vms_delta:
    return (!strcmp (a->v.val_vms_delta.lbl1,
                     b->v.val_vms_delta.lbl1)&& !strcmp (a->v.val_vms_delta.lbl1,
                        b->v.val_vms_delta.lbl1));
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions '!strcmp(a->v.val_vms_delta.lbl1, b->v.val_vms_delta.lbl1)' to the left and to the right of the '&&' operator. dwarf2out.c 1428

The function strcmp is called twice with the same set of arguments.

The code of Unreal Engine 4 project (C++):

FORCEINLINE
bool operator==(const FShapedGlyphEntryKey& Other) const
{
  return FontFace == Other.FontFace&& GlyphIndex == Other.GlyphIndex   // <=&& FontSize == Other.FontSize&& FontScale == Other.FontScale&& GlyphIndex == Other.GlyphIndex;  // <=
}

PVS-Studio warning: V501 There are identical sub-expressions 'GlyphIndex == Other.GlyphIndex' to the left and to the right of the '&&' operator. fontcache.h 139

The code of Serious Engine project (C++):

inline BOOL CValuesForPrimitive::operator==(....)
{
  return (
 (....) &&
 (vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType) &&
 ....
 (vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType) &&
 ....
);

PVS-Studio warning: V501 There are identical sub-expressions '(vfp_ptPrimitiveType == vfpToCompare.vfp_ptPrimitiveType)' to the left and to the right of the '&&' operator. worldeditor.h 580

The code of Oracle VM Virtual Box project (C++):

typedef struct SCMDIFFSTATE
{
  ....
  bool  fIgnoreTrailingWhite;
  bool  fIgnoreLeadingWhite;
  ....
} SCMDIFFSTATE;
/* Pointer to a diff state. */

typedef SCMDIFFSTATE *PSCMDIFFSTATE;

/* Compare two lines */
DECLINLINE(bool) scmDiffCompare(PSCMDIFFSTATE pState, ....)
{
  ....
  if (pState->fIgnoreTrailingWhite    // <=
   || pState->fIgnoreTrailingWhite)   // <=
    return scmDiffCompareSlow(....);
  ....
}

PVS-Studio warning: V501 There are identical sub-expressions 'pState->fIgnoreTrailingWhite' to the left and to the right of the '||' operator. scmdiff.cpp 238

Pattern: Incorrect Use of the Value, Returned by memcmp Function

The memcmp function returns the following values of int type:

  • < 0 - buf1 less than buf2;
  • 0 - buf1 identical to buf2;
  • > 0 - buf1 greater than buf2;

Please note that '>0' can be any number, not only 1. These numbers can be: 2, 3, 100, 256, 1024, 5555, 65536 and so on. This means that this result cannot be placed to a variable of the char and short type. The high bits can be lost, which might violate the logic of program execution.

Also this means that the result cannot be compared with constants 1 or -1. In other words, it is wrong to write this:

if (memcmp(a, b, sizeof(T)) == 1)
if (memcmp(x, y, sizeof(T)) == -1)

Correct comparisons:

if (memcmp(a, b, sizeof(T)) > 0)
if (memcmp(a, b, sizeof(T)) < 0)

The danger of this code is that it may successfully work for a long time. The errors may start showing up when moving to a new platform or with the change of the compiler version.

The code of ReactOS project (C++):

HRESULT WINAPI CRecycleBin::CompareIDs(....)
{
  ....
  return MAKE_HRESULT(SEVERITY_SUCCESS, 0,
   (unsigned short)memcmp(pidl1->mkid.abID,
                          pidl2->mkid.abID,
                          pidl1->mkid.cb));
}

PVS-Studio warning: V642 Saving the 'memcmp' function result inside the 'unsigned short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. recyclebin.cpp 542

The code of Firebird project (C++):

SSHORT TextType::compare(ULONG len1, const UCHAR* str1,
ULONG len2, const UCHAR* str2)
{
  ....
  SSHORT cmp = memcmp(str1, str2, MIN(len1, len2));

  if (cmp == 0)
    cmp = (len1 < len2 ? -1 : (len1 > len2 ? 1 : 0));
  return cmp;
}

PVS-Studio warning: V642 Saving the 'memcmp' function result inside the 'short' type variable is inappropriate. The significant bits could be lost breaking the program's logic. texttype.cpp 338

The code of CoreCLR project (C++):

bool operator( )(const GUID& _Key1, const GUID& _Key2) const
  { return memcmp(&_Key1, &_Key2, sizeof(GUID)) == -1; }

PVS-Studio warning: V698 Expression 'memcmp(....) == -1' is incorrect. This function can return not only the value '-1', but any negative value. Consider using 'memcmp(....) < 0' instead. sos util.cpp 142

The code of OpenToonz project (C++):

bool TFilePath::operator<(const TFilePath &fp) const
{
  ....
  char differ;
  differ = _wcsicmp(iName.c_str(), jName.c_str());
  if (differ != 0)
    return differ < 0 ? true : false;
  ....
}

PVS-Studio warning: V642 Saving the '_wcsicmp' function result inside the 'char' type variable is inappropriate. The significant bits could be lost, breaking the program's logic. tfilepath.cpp 328

Pattern: Incorrect Check of Null References

This error pattern is typical for C# programs. Sometimes in the comparison functions programmers write the type casting with the help of the as operator. The error is that inadvertently a programmer verifies against null not the new reference, but the original one. Let's take a look at a synthetic example:

ChildT foo = obj as ChildT;
if (obj == null)
  return false;
if (foo.zzz()) {}

The check if (obj == null) protects from the situation, if the obj variable contains a null reference. However, there is no protection from the case if it turns out that the as operator returns a null reference. The correct code should be like this:

ChildT foo = obj as ChildT;
if (foo == null)
  return false;
if (foo.zzz()) {}

Typically, this error occurs due to negligence of the programmer. Similar bugs are possible in the programs in C and C++, but I haven't found such a case in our error base.

The code of MonoDevelop project (C#):

public override bool Equals (object o)
{
  SolutionItemReference sr = o as SolutionItemReference;
  if (o == null)
    return false;
  return (path == sr.path) && (id == sr.id);
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'sr'. MonoDevelop.Core SolutionItemReference.cs 81

The code of CoreFX (C#):

public override bool Equals(object comparand)
{
  CredentialHostKey comparedCredentialKey =
                                  comparand as CredentialHostKey;

  if (comparand == null)
  {
    // This covers also the compared == null case
    return false;
  }

  bool equals = string.Equals(AuthenticationType,
        comparedCredentialKey.AuthenticationType, ....
  ....
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'comparand', 'comparedCredentialKey'. CredentialCache.cs 4007

The code of Roslyn project (C#):

public override bool Equals(object obj)
{
  var d = obj as DiagnosticDescription;

  if (obj == null)
    return false;

  if (!_code.Equals(d._code))
    return false;
  ....
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'd'. DiagnosticDescription.cs 201

The code of Roslyn (C#):

protected override bool AreEqual(object other)
{
  var otherResourceString = other as LocalizableResourceString;
  return
    other != null &&
    _nameOfLocalizableResource ==
      otherResourceString._nameOfLocalizableResource &&
    _resourceManager == otherResourceString._resourceManager &&
    _resourceSource == otherResourceString._resourceSource &&
    ....
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'other', 'otherResourceString'. LocalizableResourceString.cs 121

The code of MSBuild project (C#):

public override bool Equals(object obj)
{
   AssemblyNameExtension name = obj as AssemblyNameExtension;
   if (obj == null)  // <=
   {
     return false;
   }
   ....
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'name'. AssemblyRemapping.cs 64

The code of Mono project (C#):

public override bool Equals (object o)
{
  UrlMembershipCondition umc = (o as UrlMembershipCondition);
  if (o == null)                                      // <=
    return false;

  ....

  return (String.Compare (u, 0, umc.Url, ....) == 0); // <=
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'o', 'umc'. UrlMembershipCondition.cs 111

The code of Media Portal 2 project (C#):

public override bool Equals(object obj)
{
  EpisodeInfo other = obj as EpisodeInfo;
  if (obj == null) return false;
  if (TvdbId > 0 && other.TvdbId > 0)
    return TvdbId == other.TvdbId;
  ....
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'other'. EpisodeInfo.cs 560

The code of NASA World Wind project (C#):

public int CompareTo(object obj)
{
  RenderableObject robj = obj as RenderableObject;
  if(obj == null)                                 // <=
    return 1;
  return this.m_renderPriority.CompareTo(robj.RenderPriority);
}

PVS-Studio warning: V3019 Possibly an incorrect variable is compared to null after type conversion using 'as' keyword. Check variables 'obj', 'robj'. RenderableObject.cs 199

Pattern: Incorrect Loops

In some functions, collections of items are compared. Of course, different variant of the loops are used for its comparison. If a programmer writes the code inattentively, it's easy to mix something up, as it is with the comparison functions. Let's look at a few of these situations.

The code of Trans-Proteomic Pipeline (C++):

bool Peptide::operator==(Peptide& p) {
  ....
  for (i = 0, j = 0;
       i < this->stripped.length(), j < p.stripped.length();
       i++, j++) {
  ....
}

PVS-Studio warning: V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. tpplib peptide.cpp 191

Note that the comma operator is used in the condition. The code is clearly incorrect, because the condition, written to the left of the coma is ignored. That is, the condition on the left is evaluated, but its result is not used in any way.

The code of Qt project (C++):

bool equals( class1* val1, class2* val2 ) const
{
  ...
  size_t size = val1->size();
  ...
  while ( --size >= 0 ){
    if ( !comp(*itr1,*itr2) )
      return false;
    itr1++;
    itr2++;
  }
  ...
}

PVS-Studio warning: V547 Expression '-- size >= 0' is always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154

The code of CLucene project (C++):

class Arrays
{
  ....
   bool equals( class1* val1, class2* val2 ) const{
     static _comparator comp;
     if ( val1 == val2 )
       return true;
     size_t size = val1->size();
     if ( size != val2->size() )
       return false;
     _itr1 itr1 = val1->begin();
     _itr2 itr2 = val2->begin();
     while ( --size >= 0 ){
       if ( !comp(*itr1,*itr2) )
         return false;
       itr1++;
       itr2++;
     }
   return true;
  }
  ....
}

PVS-Studio warning: V547 Expression '-- size >= 0' is always true. Unsigned type value is always >= 0. arrays.h 154

The code of Mono project (C#):

public override bool Equals (object obj)
{
  ....
  for (int i=0; i < list.Count; i++) {
    bool found = false;
    for (int j=0; i < ps.list.Count; j++) {     // <=
      if (list [i].Equals (ps.list [j])) {
        found = true;
        break;
      }
    }
    if (!found)
      return false;
  }
  return true;
}

PVS-Studio warning: V3015 It is likely that a wrong variable is being compared inside the 'for' operator. Consider reviewing 'i' corlib-net_4_x PermissionSet.cs 607

Apparently, there is a typo here, and the variable j instead of i should be used in the nested loop:

for (int j=0; j < ps.list.Count; j++)

Pattern: A = getA(), B = GetA()

Quite often in the comparison functions a programmer has to write code of this kind:

if (GetA().x == GetB().x && GetA().y == GetB().y)

Intermediate variables are used to reduce the size of the conditions or for optimization:

Type A = GetA();
Type B = GetB();
if (A.x == B.x && A.y == B.y)

But inadvertently, a person sometimes makes a mistake and initializes temporary variables with the same value:

Type A = GetA();
Type B = GetA();

Now let's take a look at these errors in the code of real applications.

The code of LibreOffice project (C++):

bool CmpAttr(
  const SfxPoolItem& rItem1, const SfxPoolItem& rItem2)
{
  ....
  bool bNumOffsetEqual = false;
  ::boost::optional<sal_uInt16> oNumOffset1 =
        static_cast<const SwFmtPageDesc&>(rItem1).GetNumOffset();
  ::boost::optional<sal_uInt16> oNumOffset2 =
        static_cast<const SwFmtPageDesc&>(rItem1).GetNumOffset();

  if (!oNumOffset1 && !oNumOffset2)
  {
    bNumOffsetEqual = true;
  }
  else if (oNumOffset1 && oNumOffset2)
  {
    bNumOffsetEqual = oNumOffset1.get() == oNumOffset2.get();
  }
  else
  {
    bNumOffsetEqual = false;
  }
  ....
}

PVS-Studio warning: V656 Variables 'oNumOffset1', 'oNumOffset2' are initialized through the call to the same function. It's probably an error or un-optimized code. Check lines: 68, 69. findattr.cxx 69

The code of Qt project (C++):

AtomicComparator::ComparisonResult
IntegerComparator::compare(const Item &o1,
                           const AtomicComparator::Operator,
                           const Item &o2) const
{
  const Numeric *const num1 = o1.as<Numeric>();
  const Numeric *const num2 = o1.as<Numeric>();

  if(num1->isSigned() || num2->isSigned())
  ....
}

PVS-Studio warning: V656 Variables 'num1', 'num2' are initialized through the call to the same function. It's probably an error or un-optimized code. Consider inspecting the 'o1.as < Numeric > ()' expression. Check lines: 220, 221. qatomiccomparators.cpp 221

Pattern: Sloppy Copying of the Code

A large amount of errors, cited previously can be called the consequences of sloppy Copy-Paste. They fell under some categories of the erroneous pattern and I decided that it would be logical to describe them in corresponding sections. However, I have several errors that have clearly appeared because of sloppy code copying, but I have no idea how to classify them. That's why I collected these errors here.

The code of CoreCLR project (C++):

int __cdecl Compiler::RefCntCmp(const void* op1, const void* op2)
{
  ....
  if (weight1)
  {
    ....
    if (varTypeIsGC(dsc1->TypeGet()))
    {
      weight1 += BB_UNITY_WEIGHT / 2;
    }
    if (dsc1->lvRegister)
    {
      weight1 += BB_UNITY_WEIGHT / 2;
    }
  }

  if (weight1)
  {
    ....
    if (varTypeIsGC(dsc2->TypeGet()))
    {
      weight1 += BB_UNITY_WEIGHT / 2;       // <=
    }
    if (dsc2->lvRegister)
    {
      weight2 += BB_UNITY_WEIGHT / 2;
    }
  }
  ....
}

PVS-Studio warning: V778 Two similar code fragments were found. Perhaps, this is a typo and 'weight2' variable should be used instead of 'weight1'. clrjit lclvars.cpp 2702

The function was long that's why it is shortened for the article. If we examine the code of the function, we'll see that a part of the code was copied, but in one fragment a programmer forgot to replace the variable weight1 with weight2.

The code of WPF samples by Microsoft project (C#):

public int Compare(GlyphRun a, GlyphRun b)
{
  ....
  if (aPoint.Y > bPoint.Y)      // <=
  {
    return -1;
  }
  else if (aPoint.Y > bPoint.Y) // <=
  {
    result = 1;
  }
  else if (aPoint.X < bPoint.X)
  {
    result = -1;
  }
  else if (aPoint.X > bPoint.X)
  {
    result = 1;
  }
  ....
}

PVS-Studio warning: V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 418, 422. txtserializerwriter.cs 418

The code of PascalABC.NET project (C#):

public void CompareInternal(....)
{
  ....
  else if (left is int64_const)
    CompareInternal(left as int64_const, right as int64_const);
  ....
  else if (left is int64_const)
    CompareInternal(left as int64_const, right as int64_const);
  ....
}

PVS-Studio warning: V3003 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 597, 631. ParserTools SyntaxTreeComparer.cs 597

The code of SharpDevelop project (C#):

public int Compare(SharpTreeNode x, SharpTreeNode y)
{
  ....
  if (typeNameComparison == 0) {
    if (x.Text.ToString().Length < y.Text.ToString().Length)
      return -1;
    if (x.Text.ToString().Length < y.Text.ToString().Length)
      return 1;
  }
  ....
}

PVS-Studio warning: V3021 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains method return. This means that the second 'if' statement is senseless NamespaceTreeNode.cs 87

The code of Coin3D (C++):

int
SbProfilingData::operator == (const SbProfilingData & rhs) const
{
  if (this->actionType != rhs.actionType) return FALSE;
  if (this->actionStartTime != rhs.actionStopTime) return FALSE;
  if (this->actionStartTime != rhs.actionStopTime) return FALSE;
  ....
}

PVS-Studio warning: V649 There are two 'if' statements with identical conditional expressions. The first 'if' statement contains function return. This means that the second 'if' statement is senseless. Check lines: 1205, 1206. sbprofilingdata.cpp 1206

The code of Spring (C++):

bool operator < (const aiFloatKey& o) const
  {return mTime < o.mTime;}
bool operator > (const aiFloatKey& o) const
  {return mTime < o.mTime;}

PVS-Studio warning: V524 It is odd that the body of '>' function is fully equivalent to the body of '<' function. assimp 3dshelper.h 470

And here is the last, particularly interesting code fragment that PVS-Studio analyzer found in MySQL project (C++).

static int rr_cmp(uchar *a,uchar *b)
{
  if (a[0] != b[0])
    return (int) a[0] - (int) b[0];
  if (a[1] != b[1])
    return (int) a[1] - (int) b[1];
  if (a[2] != b[2])
    return (int) a[2] - (int) b[2];
  if (a[3] != b[3])
    return (int) a[3] - (int) b[3];
  if (a[4] != b[4])
    return (int) a[4] - (int) b[4];
  if (a[5] != b[5])
    return (int) a[1] - (int) b[5]; // <=
  if (a[6] != b[6])
    return (int) a[6] - (int) b[6];
  return (int) a[7] - (int) b[7];
}

PVS-Studio warning: V525 The code containing the collection of similar blocks. Check items '0', '1', '2', '3', '4', '1', '6' in lines 680, 682, 684, 689, 691, 693, 695. sql records.cc 680

Most likely, a programmer wrote the first comparison, then the second and got bored. So he copied to the buffer a text block:

if (a[1] != b[1])
  return (int) a[1] - (int) b[1];

A pasted it to the text of the program as many times as he needed. Then he changed indexes, but made a mistake in one place and got an incorrect comparison:

if (a[5] != b[5])
  return (int) a[1] - (int) b[5];

Note. I discuss this error in more detail in my mini-book "The Ultimate Question of Programming, Refactoring, and Everything" (see a chapter "Don't do the compiler's job").

Pattern: Equals Method Incorrectly Processes a Null Reference

In C# the accepted practice is to implement the Equals methods in such a way, so that they correctly process a situation, if a null reference is passed as an argument. Unfortunately, not all the methods are implemented according to this rule.

The code of GitExtensions (C#):

public override bool Equals(object obj)
{
  return GetHashCode() == obj.GetHashCode(); // <=
}

PVS-Studio warning: V3115 Passing 'null' to 'Equals(object obj)' method should not result in 'NullReferenceException'. Git.hub Organization.cs 14

The code of PascalABC.NET project (C#):

public override bool Equals(object obj)
{
  var rhs = obj as ServiceReferenceMapFile;
  return FileName == rhs.FileName;
}

PVS-Studio warning: V3115 Passing 'null' to 'Equals' method should not result in 'NullReferenceException'. ICSharpCode.SharpDevelop ServiceReferenceMapFile.cs 31

Miscellaneous Errors

The code of G3D Content Pak project (C++):

bool Matrix4::operator==(const Matrix4& other) const {
  if (memcmp(this, &other, sizeof(Matrix4) == 0)) {
    return true;
  }
  ...
}

PVS-Studio warning: V575 The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269

One closing bracket is put incorrectly. As a result, the amount of bites compared is evaluated by the statement sizeof(Matrix4) == 0. The size of any class is more than 0, which means that the result of the expression is 0. Thus, 0 bites get compared.

Correct variant:

if (memcmp(this, &other, sizeof(Matrix4)) == 0) {

The code of Wolfenstein 3D project (C++):

inline int operator!=( quat_t a, quat_t b )
{
  return ( ( a.x != b.x ) || ( a.y != b.y ) ||
           ( a.z != b.z ) && ( a.w != b.w ) );
}

PVS-Studio warning: V648 Priority of the '&&' operation is higher than that of the '||' operation. math_quaternion.h 167

Apparently, in one fragment the && operator was accidentally written instead of ||.

The code of FlightGear project (C):

static int tokMatch(struct Token* a, struct Token* b)
{
  int i, l = a->strlen;
  if(!a || !b) return 0;
  ....
}

PVS-Studio warning: V595 The 'a' pointer was utilized before it was verified against nullptr. Check lines: 478, 479. codegen.c 478

If we pass NULL as the first argument to the function, we'll get null pointer dereference, although the programmer wanted the function to return 0.

The code of WinMerge project (C++):

int TimeSizeCompare::CompareFiles(int compMethod,
                                  const DIFFITEM &di)
{
  UINT code = DIFFCODE::SAME;
  ...
  if (di.left.size != di.right.size)
  {
    code &= ~DIFFCODE::SAME;
    code = DIFFCODE::DIFF;
  }
  ...
}

PVS-Studio warning: V519 The 'code' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 79, 80. Merge timesizecompare.cpp 80

The code of ReactOS project (C++):

#define IsEqualGUID(rguid1, rguid2) \
  (!memcmp(&(rguid1), &(rguid2), sizeof(GUID)))

static int ctl2_find_guid(....)
{
  MSFT_GuidEntry *guidentry;
  ...
  if (IsEqualGUID(guidentry, guid)) return offset;
  ...
}

PVS-Studio warning: V512 A call of the 'memcmp' function will lead to underflow of the buffer 'guidentry'. oleaut32 typelib2.c 320

A pointer is written here as the first argument. As a result, the address of the pointer gets evaluated, which has no sense.

Correct variant:

if (IsEqualGUID(*guidentry, guid)) return offset;

The code of IronPython and IronRuby project (C#):

public static bool Equals(float x, float y) {
  if (x == y) {
    return !Single.IsNaN(x);
  }
  return x == y;
}

PVS-Studio warning: V3024 An odd precise comparison: x == y. Consider using a comparison with defined precision: Math.Abs(A - B) < Epsilon. FloatOps.cs 1048

It's not clear what is the point of a special check against NaN here. If the condition (x == y) is true, it means that both x and y and different from NaN, because NaN isn't equal to any other value, including itself. It seems that the check against NaN is just not necessary, and the code can be shortened to:

public static bool Equals(float x, float y) {
  return x == y;
}

The code of Mono project (C#):

public bool Equals (CounterSample other)
{
  return
    rawValue         == other.rawValue         &&
    baseValue        == other.counterFrequency &&   // <=
    counterFrequency == other.counterFrequency &&   // <=
    systemFrequency  == other.systemFrequency  &&
    timeStamp        == other.timeStamp        &&
    timeStamp100nSec == other.timeStamp100nSec &&
    counterTimeStamp == other.counterTimeStamp &&
    counterType      == other.counterType;
}

PVS-Studio warning: V3112 An abnormality within similar comparisons. It is possible that a typo is present inside the expression 'baseValue == other.counterFrequency'. System-net_4_x CounterSample.cs 139

How Do these Programs Work at all?

Looking through all the errors, it seems miraculous that all these programs generally work. Indeed, the comparison functions do a very important and responsible task in program.

There are several explanations of why these programs work despite these errors:

  1. In a lot of functions, only a part of the object is compared incorrectly. The partial comparison is enough for most of the tasks in this program.
  2. There are no situations (yet) when the function works incorrectly. For example, this applies to the functions that aren't protected from null pointers or those, where the result of the memcmp function call is placed into the variable of char type. The program is simply lucky.
  3. The reviewed comparison function is used very rarely or not used at all.
  4. Who said that the program is working? A lot of programs really do something wrong!

Recommendations

I demonstrated how many errors can be found in the comparison functions. It follows that the efficiency of these functions should be checked with unit-tests by all means.

It is really necessary to write unit-tests for the comparison operators, for Equals functions and so on.

I am quite sure that there was such an understanding among programmers before reading this article, that unit tests for such functions is extra work and they won't detect any errors anyway: the comparison functions are just so simple at the first glance... Well, now I showed the horror that can hide in them.

Code reviews and using static analysis tools would also be a great help.

Conclusion

In this article we mentioned a large amount of big-name projects that are developed by highly qualified experts. These projects are thoroughly tested using different methodologies. Still, it didn't stop PVS-Studio from finding errors in them. This shows that PVS-Studio can become a nice complement to other methodologies used to improve the quality and reliability of the code.

Nightdive turns games of the past into a bright future…virtually

$
0
0

Nightdive turns games of the past into a bright future…virtually

Many game companies open up an office space, get a development team together to work in that office, grind away for a couple of years to create a new intellectual property (IP), then put the product up for sale through retail outlets and digital-distribution sites, such as Steam. Hopefully, profit follows, so they can do it all over again.

Nightdive Studios, on the other hand, took a drastically different path, and its website reveals that core mission: “Bringing lost and forgotten gaming treasures back from the depths…”

By acquiring the rights to already-released games, updating them to work on contemporary platforms, and offering the revamped games through direct-distribution outlets, Nightdive can avoid having to lease office space, and it doesn’t need to employ dozens of local employees to facilitate the work. The development company operates a virtual office environment, which means the people involved in updating and coding the games don’t need to move from their respective countries, or even their homes. All of that contributes to Nightdive’s profits, which the studio uses to, indeed, do it all over again…and again…and again.

A shocking trip

Nightdive was founded in late 2012 by Stephen Kick, now Nightdive’s CEO. Back then, Kick was a character artist with Sony Online Entertainment, but was getting a little tired of making games for others. He decided to embark on a world trip to find new inspiration, and, like many travelers, he brought some games with him — in this case, some classics from his youth.

Stephen Kick
Stephen Kick. Image Credit: Nightdive Studios.

"One night, I was playing — or attempting to play — System Shock 2, and I couldn’t get the game running,” Kick explains. “I went online, attempted to purchase the game (on GOG.com), and I discovered there was no legal way to commercially buy the product. So, I did some digging, and discovered that the IP had been transferred to an insurance company after Looking Glass Studios had gone out of business. I approached [the insurance company] about digitally re-releasing the game on GOG, Steam, and other digital platforms, and that was pretty much the birth of Nightdive Studios."

Kick says the success with the System Shock 2 re-release was the first step for the newborn company, but it quickly led to “finding other games that were lost to time,” and following the same procedure to bring them back to market. As the classic song goes, “Everything old is new again,” and Nightdive is proving that to be quite true with its retro games. The studio has over 100 products in its catalog — available on Steam, GOG, and Humble Bundle’s Humble Store— including, The 7th Guest, Shadow Man, Space Rogue, and the Wizardry series.

"Our inspiration really lies in all the games that we grew up with and that we remember fondly," Kick says, "And our desire to replay those games, preserve them for future generations to enjoy, and just to continue, I guess, the stewardship of making sure these games are available for everybody to play again."

Out of the fog

In March 2017, Nightdive brought out its latest release: Turok 2: Seeds of Evil. This first-person shooter debuted in 1998 on the Nintendo 64 console, courtesy of Acclaim Entertainment, and ported to Windows a year later. Nightdive has already released its Turok 2 update on PC, and is also working on a port to the Xbox One console.

Split-screen multiplayer action in Turok 2
Split-screen multiplayer action in Turok 2. Image Credit: Nightdive Studios

One of the features Nightdive has included is for Turok 2 to be playable on almost any PC. That enables players on a wide variety of systems to still be able to enjoy a stable game with high visual fidelity.

"It’s interesting…we worked in cooperation with Intel, using their toolsets; Intel provides a variety of different software tools to optimize your game performance," says Larry Kuperman, Nightdive’s director of business development. "One of the things we found with the Intel set, we were able to make sure that [Turok 2] would play on the widest spectrum of computers available, so that if you wanted to fire up Turok on your laptop on the way home, it would play smoothly."

Another change Kuperman points out has to do with the game’s viewing distance. Because of the constraints of the processors in the late ’90s, the original game-developers used fog to limit the distance the player could see ahead, which enabled them to provide highly detailed graphics at a relatively short distance. However, nearly 20 years on, with the increase of CPU power and video cards, distance-limiting fog wasn’t needed.

Larry Kuperman
Larry Kuperman. Image Credit: Nightdive Studios

"We were able to roll back the fog, and give the game a whole new visual treatment,” Kuperman explains. “These are not games that are intended to compete with the highest-end, highest-requirement games out there, but, visually, they’re certainly appealing."

Another Nightdive development team is working on a reboot of System Shock. Nightdive has managed to acquire full rights to the game, so the studio is rebuilding it from the ground up using the Unreal Engine.

"The ultimate goal for us acquiring the license,” Kick says, “is to be able to reintroduce the franchise to the current generation of gamers. That really kicked off around the end of June [2016], when we launched our Kickstarter. We were able to raise 150% of our goal for a total of $1.35 million in order to faithfully reboot the first game in the series."

Their virtual reality

Nightdive’s virtual office environment means that the studio has people all around the world working on projects. As Kick explains, this means that development happens on pretty much a 24/7 basis, with tools (such as GitHub, JIRA, and Slack) enabling collaboration and communication across the team. Software enables managers to track each person’s contribution to make sure everyone is generating what they need to for the project. Kick bemoans some of the tradeoffs — such as the lack of in-office socializing and camaraderie — but Kuperman counters that the distributed office means there are no complaints that a co-worker cracks his knuckles or plays her music too loudly.

Kuperman feels that this is a great time to be in game development, with changes to the creation process enabling end-to-end benefits. With crowdfunding platforms, such as Kickstarter and Fig, it’s easier for a studio to work on a project without needing to make a deal (and share future revenue) with a publisher. Game engines, such as Unity and Unreal, are incredibly powerful, but also free to use until you start selling the product you’ve created. And there are a bunch of digital-sales platforms on which to retail a product, so a developer can self-publish quite easily. Even if the developer opts to work with a publisher to bring a product to market, Kuperman says there are still benefits from those tools.

"A developer can be relatively self-sufficient and come to the publisher, saying 'Look at what I’ve produced so far. Is this something that you’d be interested in?' So you have all those things out there — you have a very robust ecosystem for games development now."

How F5 Networks Profiles for Success

$
0
0

When Seattle-based F5 Networks, Inc. needed to amp up its BIG-IP DNS* solution for developers, it got help from Intel.

Business users expect their applications to be fast, secure, and always available. Anything less is unacceptable. That’s why F5 gives the developers who build those applications the tools they need to deliver maximum speed, security, and availability.

The company’s BIG-IP DNS improves the performance and availability of applications by sending users to the closest or best-performing physical, virtual, or cloud environment. It also hyperscales and secures developers’ domain name service (DNS) infrastructure from distributed denial of service (DDoS) attacks and delivers a real-time domain name system security extensions (DNSSEC) solution that protects against hijacking.

“Intel® VTune™ Amplifier helped us identify potential performance bottlenecks in the design and engineering of our high-performance networking systems,” explained James Hendergart, strategic initiatives director for F5 Networks. “We worked with the Intel VTune Amplifier team for about a month. They were very responsive to our needs, adding the capability to run Intel VTune Amplifer remotely and in headless environments. It was a great collaboration between Intel and F5.”

Get the whole story in our new case study.

Accelerate Deep Learning Inference with Intel® Processor Graphics

$
0
0

Introduction

This paper introduces Intel software tools recently made available to accelerate deep learning inference in edge devices (such as smart cameras, robotics, autonomous vehicles, etc.) incorporating Intel® Processor Graphics solutions across the spectrum of Intel SOCs. In particular, this paper covers Intel’s Deep Learning Deployment Toolkit (available via the Intel® Computer Vision SDK Beta) and how these tools help developers increase the performance and perhaps even more importantly - the performance per watt of AI Inference in your product. The paper also introduces the underlying Compute Library for Deep Neural Networks(clDNN), a Neural Network kernel optimizations written in OpenCL and available in open source.

Target audience: Software developers, platform architects, data scientists, and academics seeking to maximize deep learning performance on Intel® Processor Graphics.

Note: Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL) are used interchangeably in this paper. The larger field is artificial intelligence. This article is focusing on the Machine Learning piece of AI or more specifically the multi-layered neural networks form of Machine Learning called Deep Learning. 

Background on AI and the Move to the Edge

Artificial Intelligence or AI has been a domain of research with fits and starts over the last 60 years. AI has increased significantly in the last 5 years with the availability of large data sources, growth in compute engines and modern algorithms development based on neural networks. Machine learning or the many layers of deep learning are propelling AI into all parts of modern life as it is applied to varied usages from computer vision to identification and classification from natural language processing to forecasting. These base level tasks help to optimize decision-making in many areas of life.

As a data scientist Andrew Ng noted. AI is the next electricity: “Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years.”

This wave of AI work began in the cloud running on servers. While AI usage in the cloud continues to grow quickly, there is a trend to perform AI inference on the edge. This trend to devices performing machine learning locally versus relying solely on the cloud is driven by the need to lower latency, persistent availability, lower costs and address privacy concerns. We are moving to the day that devices from phones and PCs to cars, robots and drones to embedded devices like refrigerators and washing machines all will have AI embedded in them. As Andrew Ng pointed out, companies in all industries are figuring out their AI strategy. Additionally, the field of AI is rapidly changing, with novel topologies being introduced on a weekly basis. This requires product developers to design for flexibility to modify AI software frequently in their products.

Intel® Processor Graphics as a Solution for AI Inference on the Edge

Intel Processor Graphics (Intel® HD Graphics, Intel® Iris™ Graphics and Intel® Iris™ Pro Graphics) provides a good balance of fixed function acceleration with programmability to deliver good performance/power across the emerging AI workloads with the flexibility to allow customers to adopt the latest AI topologies. Specifically, Intel Processor Graphics provides the characteristics of:

Ubiquity– Intel Processor Graphics as part of Intel’s SOCs have already shipped in more than a billion devices ranging from servers to PCs to embedded devices. This makes it a widely available engine to run machine learning algorithms.

Scalability– As AI becomes embedded in every product, the design points of power and performance will vary greatly. Intel Processor Graphics is available in a broad set of power/performance offerings from Intel Atom processors, Intel® Core™ processors, and Intel® Xeon® processors.

Leadership in Media– More than 70% of internet traffic is video. One of the top usages for AI in devices will be computer vision.  Along with compute for AI, encoding, decoding and processing video will be employed concurrently. Intel® Quick Sync Video technology is based on the dedicated media capabilities of Intel Processor Graphics to improve the performance and power efficiency of media applications, specifically speeding up functions like decode, encode and video processing. See Intel Quick Sync Video page to learn more. When developers use the Intel® Media SDK or Intel® Media Server Studio - an API provides access these media capabilities and to hardware-accelerated codecs for Windows* and Linux*.

Powerful and Flexible Instruction Set Architecture (ISA) - The Instruction Set Architecture (ISA) of the Processor Graphics SIMD execution units is well suited to Deep Learning. This ISA offers rich data type support for 32bitFP, 16bitFP, 32bitInteger, 16bitInteger with SIMD multiply-accumulate instructions. At theoretical peak, these operations can complete on every clock for every execution unit. Additionally, the ISA offers rich sub register region addressing to enable efficient cross lane sharing for optimized convolution implementations, or efficient horizontal scan-reduce operations. Finally, the ISA provides efficient memory block loads to quickly load data tiles for optimized convolution or optimized generalized matrix multiply implementations.

Memory architecture– When using discrete graphics acceleration for deep learning, input and output data have to be transferred from system memory to discrete graphics memory on every execution – this has a double cost of increased latency and power. Intel Processor Graphics is integrated on-die with the CPU. This integration enables the CPU and Processor Graphics to share system memory, share memory controller, and share portions of the cache hierarchy. Such a shared memory architecture can enable efficient input/output data transfer and even “zero copy” buffer sharing. Additionally, Intel has sku offerings with additional package integrated eDRAM. 

Intel’s Deep Learning Deployment Toolkit

To utilize the hardware resources of Intel Processor Graphics easily and effectively, Intel provides the Deep Learning Deployment Toolkit, available via the Intel Computer Vision SDK. This toolkit takes a trained model and tailors it to run optimally for specific endpoint device characteristics. In addition, it delivers a unified API to integrate inference with application logic.

The Deep Learning Deployment Toolkit comprises two main components: the Model Optimizer and the Inference Engine (Figure 1).  

Figure 1: Model flow through the Deep Learning Deployment Toolkit

Model Optimizer is a cross-platform command line tool that performs static model analysis and adjusts deep learning models for optimal execution on end-point target devices. In detail, the Model Optimizer:

  • Takes as input a trained network in a framework specific format (for example from the Caffe* framework)
  • Performs horizontal and vertical fusion of the network layers
  • Prunes unused branches in the network
  • Quantizes weights
  • Produces as output an Internal Representation (IR) of the network - a pair of files that describe the whole model:
    • Topology file - an XML file that describes the network topology
    • Trained data file - a .bin file that contains the weights and biases binary data

The produced IR is used as an input for the Inference Engine.

Inference Engine is a runtime that delivers a unified API to integrate the inference with application logic. Specifically it:

  • Takes as input an IR produced by the Model Optimizer
  • Optimizes inference execution for target hardware
  • Delivers inference solution with reduced footprint on embedded inference platforms.

The Deep Learning Deployment Toolkit can optimize inference for running on different hardware units like CPU, GPU and will support FPGA in future. For acceleration on CPU it uses the MKL-DNN plugin – the domain of Intel® Math Kernel Library which includes functions necessary to accelerate the most popular image recognition topologies. It's planned to add FPGA support using plugin for Intel® Deep Learning Inference Accelerator . For GPU, the Deep Learning Deployment Toolkit has clDNN– a library of OpenCL kernels. The next section explains how clDNN helps to improve inference performance.

Compute Library for Deep Neural Networks (clDNN)

clDNN is a library of kernels to accelerate deep learning on Intel® Processor Graphics. Based on OpenCL, these kernels accelerate many of the common function calls in the popular topologies (AlexNet*, VGG*, GoogleNet*, ResNet*, Faster-RCNN*, SqueezeNet* and FCN* are supported today with more being added). To give developers the greatest flexibility and highest achievable performance Intel is delivering:

1) The full library as open source so developers and customers can use existing kernels as models to build upon or create their own hardware specific kernels running deep learning. 

2) Compute extensions to expose the full hardware capabilities to developers.

During network compilation clDNN breaks the workflow optimizations into in three stages described below.

Figure 2: Model flow from topology creation to execution

Network Compilation and the 3 Stages of clDNN

Stage 1:  Network Level

Fusing is one of most efficient ways to optimize graphs in DL. In clDNN, we have created 2 ways to perform fusing – one more automated to run on a single accelerator (naive inference client) and the second  for a more experienced data scientist to tune to run across multiple accelerators (Set of fused primitives).  In more detail:

  • Naive inference client – you have a workload and want it to be run on one accelerator. In this case user can ask clDNN to perform fusing during network compilation
  • Set of fused primitives – in this approach, the user who is experienced in tuning models, does the graph compilation with pattern matching in his application to balance the work across various accelerators. For this approach we expose already fused primitives

Currently clDNN supports 3 fusions: convolution with activation, fully connected with activation and deconvolution with activation fused primitives. Additional fusions are in development.

Another part of network level optimizations is the padding implementation. Choosing OpenCL buffers as data storage requires padding by either adding conditions inside the kernels or providing a buffer with a frame around the input data. The first approach would consume the full register budget, which would constrain the available registers for the convolutions kernels, negatively impacting performance.

Experiments have shown that adding the proper aligned frame around the buffers provides better performance results, when it is done as follows:

Consider network with two primitives A and B. B contains padding equals 2:

Figure 3: Padding Example

This requires adding a frame with size 2x2:

To add the frame we need to add the reorder primitive:

and fuse this with the A primitive:

Stage 2: Memory Level

As soon as the topology is defined and data is provided, the network is ready to compile. The first step of network compilation is the determination of the activation layout. In DNN’s, data stored in hidden layers is defined as 4D memory chunks. In clDNN, the layout description is defined with 4 letters:

  • B - number of patches in batch
  • F - number of feature maps or channels
  • X - spatial or width
  • Y - spatial or height

Figure 4: Example of a memory chunk

Figure 5: For most cases the most optimal layout is BFYX

If data type is half precision (fp16), the batch size is greater or equal to 32 and the convolutions are using split parameter (depth split like in Alexnet* convolutions), then the clDNN layout is YXFB.

Figure 6: YXFB layout

During memory level optimization, after kernels for every primitive have been chosen, clDNN runs weights optimizations, which transform user provided weights into ones that are suitable for the chosen kernel. Weights for convolutions are stored in:

Figure 7: Weights for convolutions in IS_IYX_OSV16

 

For fully connected networks depending on data type (fp16/fp32), weights can be transformed into one of the following:

Figure 8: memory layouts for optimized fully connected primitives

Stage 3: Kernel Level:

To enable modern topologies in an efficient way on Intel® Processor Graphics, a focus on convolution implementation is needed. To do this, clDNN uses output blocks that enable each thread on the Intel Processor Graphics to compute more than one output at a time. The size of the block depends on the convolution stride size. If the block size is greater than the stride, then clDNN uses shuffle technology to reuse weights and inputs within the neighborhood. This approach yields 85% of performance peak on Alexnet* convolution kernels. All reads and writes are using more optimal block_read/block_write functions. A similar approach is applied to achieve high efficiency running deconvolution and pooling primitives.

Performance Numbers

The Intel Iris Pro Graphics provides more peak performance and the Intel HD Graphics provides more performance/watt.

Details:

Batch1 FP16

Intel® HD Graphics 530 (blue) configuration: Intel® Core™ i5-6500 CPU @ 3.20GHz, Intel® HD Graphics 530, fixed frequency - 1000 Mhz, CentOS 7.2 kernel 4.2,  OpenCL driver: Intel SRB 4.1., Memory: 2x8GB DDR4 2133

Intel® Iris® Pro Graphics 580 (orange) configuration: Intel® Core™ i7-6770HQ CPU @ 2.60GHz, Intel® Iris® Pro Graphics 580, fixed frequency – 950 Mhz, CentOS 7.2 kernel 4.2,  OpenCL driver: Intel SRB 4.1., Memory: 2x4GB DDR4 2133

Topologies: AlexNet*, VGG16-FACE*

Memory Bandwidth vs Compute

In topologies with memory bound sequences (like Alexnet*), we can increase the batch size, reusing weights in multi batches to gain greater images/second performance.  But for topologies that are compute bound (like VGG16-FACE*) even with single image on input, we see little benefit with larger batch sizes:

Systems used for these measurements are configured in the same way as at previous pair of benchmarks.

Power Efficiency

In some power constrained workloads, it can be more important to maximize performance/watt versus absolute performance.  Since decreasing the clock rate causes the power to decrease linearly but voltage is squared, the GPU performance per Watt is increasing linearly as frequency is lowered. Intel HD Graphics can show a better FPS/Watt ratio running with lower frequency on lower power states. Also different Intel processor products offer different leakage and power behavior.  For example the 6th and 7th Generation Intel “Y skus” such as the Intel® Core™ m7-6Y75 Processor with Intel® HD Graphics 515 provide lower peak performance but more performance / watt. Through the combination of selecting the right Intel SOC across a wide range of power and performance points and choosing the appropriate frequency, the developer has the ability to tune to a broad range of workloads and power envelopes.

Conclusion

AI is becoming pervasive, driven by the huge advancements in machine learning and particularly deep learning over the last few years. All devices on the edge are moving toward implementing some form of AI, increasingly performed locally due to cost, latency and privacy concerns. Intel Processor Graphics provides a good solution to accelerate deep learning workloads. This paper described the Deep Learning Model Optimizer, Inference Engine and clDNN library of optimized CNN kernels that is available to help developers deliver AI enabled products to market. 

For more information or to get started, download the tools or libraries from the links below:  

Appendix A: List of Primitives in the clDNN Library

Compute Library for Deep Neural Networks (clDNN) is a middle-ware software for accelerating DNN inference on Intel® HD and Iris™ Pro Graphics. This project includes CNN primitives implementations on Intel GPUs with C and C++ interfaces.

clDNN Library implements set of primitives:

  • Compute Primitives
    • Convolution
    • Deconvolution
    • Fully connected (inner product)
    • Element-Wise
  • Pooling
    • average
    • maximum
    • ROI pooling
  • Normalization
    • LRN across/within channel
    • Normalize
    • Batch-Normalization
  • Activation
    • rectified linear unit (RelU)
  • Auxiliary
    • Crop
    • Concantenation
    • Simpler NMS
    • Prior box
    • Detection output
    • Reorder
  • Softmax

With this primitive set, user can build and execute most common image recognition, semantic segmentation and object detection networks topologies like:

  • AlexNet*
  • GoogleNet*
  • ResNet*
  • VGG16-FACE*
  • Faster-RCNN*
  • FCN*

Tall-and-Skinny and Short-and-Wide Optimizations for QR and LQ Decompositions

$
0
0

    Intel® Math Kernel Library (Intel® MKL) 2017 updates 3 and later versions provide optimized functionality for calculating QR decompositions of tall-and-skinny (TS) matrices, and for calculating LQ decompositions of short-and-wide (SW) matrices.

New routines have been added to Intel MKL to allow for the calculations of QR and LQ factorizations using the TS/SW modifications described above for appropriate matrix sizes. These routines are generalized for all sizes (i.e. they will also work on matrices that are not TS/SW, as they include paths to return to the generic routines when the matrix size is not sufficiently TS/SW). Details of the new routines and parameter specifications can be found in the Intel MKL Developer Reference (https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation). The routines to reference are listed below:

New TS/SW Routine

Generic Routines

QR Decomposition

  • ?geqr
  • ?gemqr

 

 

LQ Decomposition

  • ?gelq
  • ?gemlq

QR Decomposition

  • ?geqrf
  • ?ormqr (real)
  • ?unmqr (complex)

 

LQ Decomposition

  • ?gelqf
  • ?ormlq (real)
  • ?unmlq (complex)

 

    A general overview of the TSQR algorithm is provided into TSKB_QRLQ.pdf file attached. In addition, this pdf provides example code to call the QR decomposition of a matrix using the new TSQR routines.

    The following charts show the speedup of DGEQR compared to DGEQRF. Performance results of ?GELQ compared to ?GELQF routines show similar speedup, thus are not displayed here

The first chart shows these speedups on an Intel® Xeon® CPU E5-2699 v4 processor, 

 

 and the second on an Intel® Xeon Phi™ 7250 processor. 

 

Tutorial: Unlock Intel® GPU capabilities with Intel OpenCL™ Extensions

$
0
0

Download tutorial code here.

Based on an IWOCL 2017 tutorial Unlock Intel GPUs for High Performance Compute, Media and Computer Vision.  

 

Introduction

Intel provides many extensions to the Khronos OpenCL(tm) standard to help you utilize the full range of hardware capabilities.  

  • Subgroups
  • Video Motion Estimation (VME)
  • VEBox

These extensions are not standalone.  They build upon each other.

 

The tutorial code focuses on subgroups, VME, and VEBox.  Image processing and sharing extensions are also used in the tutorials code as solution components.

For more information on Intel extensions: https://software.intel.com/en-us/articles/opencl-intel-graphics-extensions

 

Subgroups

Intel subgroups are 

  • subset of a work group
  • equal to the SIMD width (8,16,or 32)
  • in the same hardware thread of the EU
  • share thread resources (including register space)
  • execute together 

Intel subgroup functions add

  • barrier, broadcast, reduce, scan 
  • shuffle
  • block read/write

More info: Spec

 

Video Motion Estimation (VME)

Intel Gen GPUs accelerate the search for motion in video.  This is a core codec component but can also be used in a wide range of applications from custom bitrate control to computer vision.

 

VEBox

Intel GPUs contain a specialized IP block designed for video enhancement operations.

 

For more info:

 

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

 

2017 Intel® Level Up Contest Closed

$
0
0

Intel® Level Up Contest

Thank you for your interest in the 2017 Intel® Level Up Game Dev Contest. The contest closed on May 9, 2017.

 


Modern VR is ‘like the dog who catches a car but doesn’t know what to do with it’

$
0
0

Kim Pallister, VR expert at Intel, and sci-fi author Austin Grossman at GamesBeat Summit 2017

The modern virtual reality market is new, but the idea of virtual worlds has existed in fiction for decades.

Austin Grossman is a sci-fi author who has written books such as the techno-thriller You: A Novel. He also helped write games like PC classics Deus Ex and System Shock as well as the more recent Dishonored series. At the GamesBeat Summit today in Berkley, California, Grossman discussed in an on-stage interview with Kim Pallister, VR expert at Intel, how sci-fi stories from our past can tell us about the future of virtual reality — and how we’re struggling how to deal with it.

Grossman brought up novels like Snow Crash and Ready Player One, which both featured VR social spaces. These ideas used to be science fiction, but modern virtual reality devices and online games are maker them closer to reality. But Grossman says that we’re like the dog who catches a car but doesn’t know what to do with it.

VR isn’t just an entertainment experience that people use for 10 minutes at a time, Grossman said, in novels. It is an integral part of society that people use for work as much as play. It’s also a tool used to escape from dystopian nightmares. In Ready Player One, many people are living in ghettos of skyscrapers made of trailer park homes. Its protagonist spends as much time in virtual reality as possible, using it to have access to things he doesn’t have in the real world: friends, education, and adventures.

This could present a danger to our VR future. What if people use the coming virtual worlds to escape the real one? Could we potentially forsake the planet and our ties to it in favor of a more palpable digital illusion?

So, the future of VR presented in fiction could be an unsettling one. But fiction hasn’t gotten everything right. In The Matrix, people need to be in pods or other constrictive devices to be connected to virtual worlds (and that’s besides the fact that most humans were imprisoned and having their energy sucked out by evil robots). But we aren’t using neural interfaces.

“It’s a wonderful thing that we got wrong,” Grossman said. Actual VR has players moving around. He says that this makes VR more exciting and less of a terrifying dystopia.

Grossman noted that world-building is the key skill needed for making enjoyable VR experiences. To make a world for a novel, that takes him two or three years of planning. But for modern virtual reality games, more work goes into designing and programming the experience. Less attention is given to narrative, characters, and history. These are the things that make people fall in love with and to live in a fictional world.

Licensing IP is kind of a cheat, Grossman says. It gives you an immediate world that audiences love. VR designers need to make new worlds of its own. The recent Star Trek: Bridge Crew is a good example of this. Beyond the gameplay, people enjoy the game just because it lets them be in Star Trek.

Virtual reality has the potential to change people and how they relate to each other. Forcing us to interact with others in unique ways. But Grossman noted that he also looks forward to having VR teach him. He anticipates full-body-tracking, since a VR program could then teach him how to dance. That certainly sounds more pleasant than having machine overlords plugging us into a placating VR world while they suck energy from our imprisoned bodies.

Create Draft Article

Publish Draft Article

Test Moderation and Editing Published

Test Moderation and Editing

Viewing all 1201 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>