Altair advances frontal crash simulation with help from Intel® Software Development products.

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

Envivio Helps Ensure the Best Video Quality and Performance

Intel® Parallel Studio XE helps Envivio create safe and secured code.

ESI Group Designs Quiet Products Faster

ESI Group achieves up to 450 percent faster performance on quad-core processors with help from Intel® Parallel Studio.

F5 Networks Profiles for Success

F5 Networks amps up its BIG-IP DNS* solution for developers with help from
Intel® Parallel Studio and Intel® VTune™ Amplifer.

Fixstars Uses Intel® Parallel Studio XE for High-speed Renderer

As a developer of services that use multi-core processors, Fixstars has selected Intel® Parallel Studio XE as the development platform for its lucille* high-speed renderer.

Golaem Drives Virtual Population Growth

Crowd simulation is one of the most challenging tasks in computer animation―made easier with Intel® Parallel Studio XE.

Lab7 Systems Helps Manage an Ocean of Information

Lab7 Systems optimizes BioBuilds™ tools for superior performance using Intel® Parallel Studio XE and Intel® C++ Compiler.

Massachusetts General Hospital Achieves 20X Faster Colonoscopy Screening

Intel® Parallel Studio helps optimize key image processing libraries, reducing compute-intensive colon screening processing time from 60 minutes to 3 minutes.

Moscow Institute of Physics and Technology Rockets the Development of Hypersonic Vehicles

Moscow Institute of Physics and Technology creates faster and more accurate computational fluid dynamics software with help from Intel® Math Kernel Library and Intel® C++ Compiler.

NERSC Optimizes Application Performance with Roofline Analysis

NERSC boosts the performance of its scientific applications on Intel® Xeon Phi™ processors up to 35% using Intel® Advisor.

Nik Software Increases Rendering Speed of HDR by 1.3x

By optimizing its software for Advanced Vector Extensions (AVX), Nik Software used Intel® Parallel Studio XE to identify hotspots 10x faster and enabled end users to render high dynamic range (HDR) imagery 1.3x faster.

Novosibirsk State University Gets More Efficient Numerical Simulation

Novosibirsk State University boosts a simulation tool’s performance by 3X with Intel® Parallel Studio, Intel® Advisor, and Intel® Trace Analyzer and Collector.

Pexip Speeds Enterprise-Grade Videoconferencing

Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.

Schlumberger Parallelizes Oil and Gas Software

Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.

Ural Federal University Boosts High-Performance Computing Education and Research

Intel® Developer Tools and online courseware enrich the high-performance computing curriculum at Ural Federal University.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® System Studio

CID Wireless Shanghai Boosts Long-Term Evolution (LTE) Application Performance

CID Wireless boosts performance for its LTE reference design code by 6x compared to the plain C code implementation.

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.

NERSC Optimizes Application Performance with Roofline Analysis

NERSC boosts the performance of its scientific applications on Intel® Xeon Phi™ processors up to 35% using Intel® Advisor.

Daresbury Laboratory Speeds Computational Chemistry Software

Scientists get a speedup to their computational chemistry algorithm from Intel® Advisor’s vectorization advisor.

Novosibirsk State University Gets More Efficient Numerical Simulation

Novosibirsk State University boosts a simulation tool’s performance by 3X with Intel® Parallel Studio, Intel® Advisor, and Intel® Trace Analyzer and Collector.

Pexip Speeds Enterprise-Grade Videoconferencing

Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.

Schlumberger Parallelizes Oil and Gas Software

Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.

Intel® Computer Vision SDK

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.

Intel® Data Analytics Acceleration Library

MeritData Speeds Up a Big Data Platform

MeritData Inc. improves performance—and the potential for big data algorithms and visualization.

Intel® Distribution for Python*

DATADVANCE Gets Optimal Design with 5x Performance Boost

DATADVANCE discovers that Intel® Distribution for Python* outpaces standard Python.

Intel® Inspector XE

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

Envivio Helps Ensure the Best Video Quality and Performance

Intel® Parallel Studio XE helps Envivio create safe and secured code.

ESI Group Designs Quiet Products Faster

ESI Group achieves up to 450 percent faster performance on quad-core processors with help from Intel® Parallel Studio.

Fixstars Uses Intel® Parallel Studio XE for High-speed Renderer

As a developer of services that use multi-core processors, Fixstars has selected Intel® Parallel Studio XE as the development platform for its lucille* high-speed renderer.

Golaem Drives Virtual Population Growth

Crowd simulation is one of the most challenging tasks in computer animation―made easier with Intel® Parallel Studio XE.

Schlumberger Parallelizes Oil and Gas Software

Schlumberger increases performance for its PIPESIM* software by up to 10 times while streamlining the development process.

Intel® Integrated Performance Primitives

JD.com Optimizes Image Processing

JD.com Speeds Image Processing 17x, handling 300,000 images in 162 seconds instead of 2,800 seconds, with Intel® C++ Compiler and Intel® Integrated Performance Primitives.

Tencent Optimizes an Illegal Image Filtering System

Tencent doubles the speed of its illegal image filtering system using SIMD Instruction Set and Intel® Integrated Performance Primitives.

Tencent Speeds MD5 Image Identification by 2x

Intel worked with Tencent engineers to optimize the way the company processes millions of images each day, using Intel® Integrated Performance Primitives to achieve a 2x performance improvement.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® Math Kernel Library

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.

MeritData Speeds Up a Big Data Platform

MeritData Inc. improves performance―and the potential for big data algorithms and visualization.

Qihoo360 Technology Co. Ltd. Optimizes Speech Recognition

Qihoo360 optimizes the speech recognition module of the Euler platform using Intel® Math Kernel Library (Intel® MKL), speeding up performance by 5x.

Intel® Media SDK

NetUP Gets Blazing Fast Media Transcoding

NetUP uses Intel® Media SDK to help bring the Rio Olympic Games to a worldwide audience of millions.

Intel® Media Server Studio

ActiveVideo Enhances Efficiency

ActiveVideo boosts the scalability and efficiency of its cloud-based virtual set-top box solutions for TV guides, online video, and interactive TV advertising using Intel® Media Server Studio.

Kraftway: Video Analytics at the Edge of the Network

Today’s sensing, processing, storage, and connectivity technologies enable the next step in distributed video analytics, where each camera itself is a server. With Kraftway* video software platforms can encode up to three 1080p60 streams at different bit rates with close to zero CPU load.

Slomo.tv Delivers Game-Changing Video

Slomo.tv's new video replay solutions, built with the latest Intel® technologies, can help resolve challenging game calls.

SoftLab-NSK Builds a Universal, Ultra HD Broadcast Solution

SoftLab-NSK combines the functionality of a 4K HEVC video encoder and a playout server in one box using technologies from Intel.

Vantrix Delivers on Media Transcoding Performance

HP Moonshot* with HP ProLiant* m710p server cartridges and Vantrix Media Platform software, with help from Intel® Media Server Studio, deliver a cost-effective solution that delivers more streams per rack unit while consuming less power and space.

Intel® MPI Library

Moscow Institute of Physics and Technology Rockets the Development of Hypersonic Vehicles

Moscow Institute of Physics and Technology creates faster and more accurate computational fluid dynamics software with help from Intel® Math Kernel Library and Intel® C++ Compiler.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® Threading Building Blocks

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

Johns Hopkins University Prepares for a Many-Core Future

Johns Hopkins University increases the performance of its open-source Bowtie 2* application by adding multi-core parallelism.

Pexip Speeds Enterprise-Grade Videoconferencing

Intel® analysis tools enable a 2.5x improvement in video encoding performance for videoconferencing technology company Pexip.

Quasardb Streamlines Development for a Real-Time Analytics Database

To deliver first-class performance for its distributed, transactional database, Quasardb uses Intel® Threading Building Blocks (Intel® TBB), Intel’s C++ threading library for creating high-performance, scalable parallel applications.

University of Bristol Accelerates Rational Drug Design

Using Intel® Threading Building Blocks, the University of Bristol helps slash calculation time for drug development—enabling a calculation that once took 25 days to complete to run in just one day.

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Intel® VTune™ Amplifer

CADEX Resolves the Challenges of CAD Format Conversion

Parallelism Brings CAD Exchanger* software dramatic gains in performance and user satisfaction, plus a competitive advantage.

F5 Networks Profiles for Success

F5 Networks amps up its BIG-IP DNS* solution for developers with help from
Intel® Parallel Studio and Intel® VTune™ Amplifer.

GeoVision Gets a 24x Deep Learning Algorithm Performance Boost

GeoVision turbo-charges its deep learning facial recognition solution using Intel® System Studio and Intel® Computer Vision SDK.

Nik Software Increases Rendering Speed of HDR by 1.3x

Walker Molecular Dynamics Laboratory Optimizes for Advanced HPC Computer Architectures

Intel® Software Development tools increase application performance and productivity for a San Diego-based supercomputer center.

Idea Behind This Project

The idea behind this project was to provide a demonstration of parallel processing in gaming with Unity* and how to perform gaming-related physics using this game engine. In this domain, realism is important as an indicator of success. In order to mimic the actual world, many things need to happen at the same time, which requires parallel processing. Two different applications were created and then compared to a single-threaded application run on a single core.

The first application was developed to run on a multi-threaded CPU, and the second to perform physics calculations on the GPU. To demonstrate the results of these techniques, the developed application presented schools of fish which were created utilizing a flocking algorithm.

Flocking Algorithm

Most flocking algorithms rely on three rules:

Figure 1. Description of the three flocking rules (source: http://www.red3d.com/cwr/boids/).

What is a Flock?

In this case, a flock was defined as a school of fish. Each fish was calculated to “swim” within a school if it was within a certain distance from any other fish in the school. Members of a school will not act as individuals, but only as members of a flock, sharing the same parameters such as speed and direction.

Figure 2. A flock containing four fish.

Complexity

The complexity of this algorithm is O(n²), where n is the number of fish. To update the movement of a single fish, the algorithm needs to look at every other n fish in the environment in order to know if the fish can: 1) remain in a school; 2) leave a school; or 3) join a new school. It is possible that a single fish could “swim” by itself for a time, until it has an opportunity to join a new school. This needs to be executed for every fish n times.

The algorithm is as follows:

For each fish (n).

Look at every other fish (n).

If this fish is close enough.

Apply rules: Cohesion, Alignment, Separation.

Implementation of the Flocking Algorithm Using C#

To apply the rules for each fish, a Calc function was created, which needed one parameter: the index of the actual fish inside the environment.

Data is stored inside two buffers that represent the state of each fish. The two buffers are used alternatively to read and to write. The two buffers are required to maintain in memory the previous state of each fish. This information is then used to calculate the next state of each fish. Before every frame, the current Read buffer is read in order to update the scene.

Figure 3. Functional flow block diagram.

State of Fish

The state of each fish contains the following:

fishState {
	float speed;
	Vector3 position, forward;
	Quaternion rotation;
}

Figure 4. A fish with its fishState.

The variable forward contains the direction the fish is facing.

The variable rotation is a quaternion, representing a 3D rotation, which allows the fish to rotate to face the direction it is aiming.

Flocking Algorithm

The complete flocking algorithm used was:

Neighbor State

Once the flocking algorithm was executed, each fish was identified as either being a member of a school or not.

The Neighbor function was created using as parameters the distance between two fish and the forward direction of each fish. The idea was to get the behavior to be more realistic. If the distance between any two fish was small enough, and the two fish were traveling in the same direction, there was a possibility that they could merge together. However, if they were not traveling in the same direction, they would be less likely to merge. This merging behavior was created using a piecewise quadratic function and a dot product of the forward vectors.

Figure 5. Representation of the mathematical function.

The distance between two fish must be smaller than the maximum distance, which is dynamically changed based on the dot product of the forward vectors.

Before calling the Neighbor function, which is pretty heavy, there is a call of another function: Call. The Call function tells the algorithm whether or not the Neighbor function is required to determine if any two fish are close enough to have a chance of being in the same flock or not. The Call function only checks the positions of these two elements (fish) regarding their x-position. The x-position is preferred because this is the widest dimension, allowing the fish to be distributed the furthest apart.

Update of the State

If a fish is alone, it moves forward in a certain direction and speed. However, if a fish has neighbors, it will need to adapt its direction and speed to the flock's direction and speed.

Speed is always changed linearly as a matter of smoothness. Speed does not change to another speed without a transition.

There is a defined environment. Fish are not permitted to swim beyond the dimensional limits of that environment. If a fish collides with a boundary, the fish is deflected back inside the defined environment.

Figure 6. Flocking behavior.

If a fish is about to swim out of bounds, the fish is given a new random direction and speed in order to remain inside the defined environment.

Figure 7. Boundary behavior.

It is also necessary to check if a fish is about to collide with a rock. The algorithm must calculate if a fish’s next position will be inside a rock. If so, the fish will avoid the rock in a similar fashion as avoiding a boundary.

Once the states have been calculated, all of the fish can be made to “swim,” along with making any required rotations. The next state of each fish is then updated with new direction, speed, position, and rotation variables (for n fish). This occurs for every new frame update.

For example, any fish has its direction added to its position, in order to “swim” inside the environment.

Integration Inside Unity*

The main programming component in Unity is a GameObject. Inside GameObjects you can add different things such as scripts to be executed, colliders, textures, or materials to customize the objects in order to have them behave as desired. It is then convenient to access these objects inside a C# script. Each public object in a script will create a field in the editor that allows you to drop any object which matches the desired requirements.

C# scripts were used to create flocking behavior.

Import Assets Into Unity

Click Assets, click Import package, and then click Custom package.
Click on All.
Click on Import.

Next, drag and drop the Main scene from the Project tab to the Hierarchy tab. Right click on the default scene and select “Remove Scene.”

All game objects needed to run the application, along with attached scripts, are ready to run. The only missing part is the rock model, which must be added manually.

Download “Yughues Free Rocks” from the Unity Asset Store. The Asset Store can be accessed within Unity (or by using this link: http://u3d.as/64P). Once downloaded, the window on the left appears. Choose “Rock 01” and import it.

Before using the rock model, an adjustment needs to be made because the scale of the default model is too large. The scale factor of the import settings of its mesh should be resized to 0.0058. Once a rock is added to the scene, if it has a scale of 1, it will match the scale of a 3D sphere of 1, which will be used as a collider for the object.

Figure 8. Pop-up window: Import Unity Package.

Drag and drop the Rock 01 prefab to the Rock field, inside the main script contained by the fish Manager object.

Figure 9. Inspector tab of an imported object – change of the scale.

Initializing the Application

The initialization of the application is done inside the “Start” function of the main script. This function is called once at start-up. Inside this initialization the following steps are executed:

Creating a 2-dimensional array of fishState and creating as many arrays of 4x4 matrices as are needed.
Every fish is given a random position, forward direction, and speed. Their rotation is calculated considering their other properties.
The fishState image of each fish and the corresponding TRS matrix are initialized.

The application can be built as an .exe file. In this case, some parameters can be changed if launched from the command line prompt using the following arguments:

“-t”: Height of the allowed area for the fish ( <~> tank).
“-f”: Number of fish to display.
“-r”: Number of rocks to add to the scene.
“-n”: Maximum distance for two fish to be neighbors (i.e., interact together).
“-s”: Displays a simpler scene with a visible tank (see images below).
“-m”: Mode to launch the application. 0: CPU Single-thread; 1: CPU Multi-thread; 2: GPU.

Considering the size and the neighbor distance, these are unit-less parameters. As a matter of comparison, the smallest distance fish are allowed to swim side by side to avoid collision is 0.3.

For example, the command“-t 6 -f 1200 -n 1.5 -m -s” will launch a no-rocks multithreaded CPU application of 1,200 fish, with a maximum neighbor distance of 1.5, inside a tank having a height of 6.

The depth and length of the environment depend on the height. These coefficients are stored and can be changed in the code to alter the look of the scene.

Figure 10. Simpler Scene.

Figure 11. Fish area to interact with another fish.

Figure 12. Scene with underwater effects.

Constants which can be changed to tweak the behavior of the fish:

Speed: Maximum speed for a fish, rotation speed, speed to add to a fish when it nears a boundary.
Velocity: Parameters related to the three flocking rules: cohesion, alignment, and separation. (Cohesion is set to 1 by default. It is not recommended that this setting be changed.)
Dimension: Determines the depth and length of the environment. These parameters are calculated based on the given height.

How to Draw the Instances

The DrawMeshInstanced function of Unity is used to display a fish. This allows the drawing of N instances of the same fish. Each fish is represented by a mesh and a material. To use this function, three parameters are required: a mesh, a material, and an array of a 4x4 matrix.

Each 4x4 matrix is configured as a Translation Rotation and Scaling (TRS) matrix. The TRS matrix is reconfigured after each update of all of the fish's states, using their updated positions and rotations as parameters. The global variable scale is the same for each fish in order to resize them if needed (in this case, the factor is 1).

The mesh has been previously resized and rotated in Blender to avoid any mismatch.

Inside each script, Unity’s Update function is used to refresh the states for each frame.

DrawMeshInstanced provides good performance, but has a limit of 1,023 instances drawn per call. This means that in the case of more than 1,023 fish in an environment, this function has to be called more than once. The array containing the matrices to display the fish must be split to create chunks no larger than 1,023 items. Each chunk is then updated in the Calc function, which is called several times.

Several calls will be made to both the DrawMeshInstanced and Calc functions to update and then display all the fish for each frame.

Figure 13. Splitting the array of matrices – calculation of variables.

The variables are calculated: nbmat represents the number of arrays of matrices that the application is using; rest represents the number of fish in the last matrix.

Each fish is updated: i represents the index of the matrix; j represents a fish's index inside the current chunk. This is needed in order to update the right matrix inside the array shown above.

Additional Features

Underwater effects

There are different assets that were added to this Unity project in order to create a number of underwater effects. Unity provides built-in packages, including water models, for example, that were used for this project. There are also many textures and associated materials (“skins”), which may be applied to any object. All of these (and more) can be found in the Unity Asset Store.

Caustics - Light reflections and shadows

Caustics lighting effects were added to the underwater scene of this project. A “projector” (a Unity object) is required for caustics. The projector displays caustic effects in a scene. The projected caustic is changed by assuming a certain frequency (Hz), which provides an effect that the caustics are moving.

Blur

A blur was added to the underwater scene. If the camera is below the surface level of the water, a progressive blur will be enabled and displayed. The background of the scene will be changed and become blue. The default background is a sky background (skybox). Additionally, the fog setting was enabled inside Unity. (Microsoft Windows*, Lightning ,Other Settings, Fog box checked.).

Moving the camera

A script was added to the camera object in order to move inside the scene using the keyboard and the mouse. This provides controls similar to a first-person shooter game. The directional keys may be used to move forward/backward or strafe left/right. The mouse allows for moving up/down, along with turning the camera to point left/right.

transform.Rotate (0, rotX, 0);

The move variables represent the directional keys input, while rot* represents the mouse orientation. Modifying the transform of an object, which holds a script, in this case the camera, makes it rotate and translate in the scene.

Building an .exe file

As previously mentioned, there is the possibility of building an .exe file to change the parameters of the application without changing the source code. To do so, follow these steps:

Click: Edit and then click Project Settings and then click Quality.
In the Quality tab, scroll down to Other, and find V Sync Count.
Change the V Sync Count setting to “Don’t Sync.” This lets the application display more than 60fps, if possible.
Click: File and then click Build and Run to build the .exe file.

Note: instead of using Build and Run, you may go to the Build Settings in order to choose a specific platform (i.e., Microsoft Windows, Linux*, Mac*, etc.)

Coding Differences: CPU vs. GPU

CPU

There is only one difference between coding for a single-threaded and multi-threaded application: How the Calc function is called. In this case, the Calc function is critical to the execution time, as it is being called n times for each frame.

Single-threaded

Coding for a single-threaded application is accomplished in a classic way, through a “for loop” as shown here:

Multi-threaded

Coding for a multi-threaded application is accomplished by utilizing the “Parallel.For” class. The goal of the Parallel.For class is to split multiple calls of a function and execute them in parallel in different threads. Each thread contains a chunk of multiple calls to execute. Application performance depends, of course, on the number of available cores of the CPU.

GPU

Compute shader

GPU processing is accomplished in a similar way to CPU multi-threading. By moving the process-heavy Calc function to the GPU, which has a larger number of cores than a CPU, faster results may be expected. To do so, a “shader” is used and executed on the GPU. A shader adds graphical effects to a scene. For this project a “compute shader” was used. The compute shader was coded using HLSL (High-Level Shader Language). The compute shader reproduces the behavior of the Calc function (e.g., speed, position, direction, etc.), but without the calculations for rotation.

The CPU, using the Parallel.For function, calls the UpdateStates function for each fish to calculate its rotation and create the TRS matrices before drawing each fish. The rotation of the fish is calculated using the Unity function Slerp, of the “Quaternion” class.

Adaptation of the code to the compute shader

Although the main idea is to move the Calc function loop to the GPU, there are some additional points to consider: random number generation and the need for data to be exchanged with the GPU.

The biggest difference between the Calc function for the CPU and the compute shader for the GPU is random number generation. In the CPU, an object from the Unity Random class is used to generate random numbers. In the compute shader, NVidia* DX10 SDK functions are used.

Data needs to be exchanged between the CPU and GPU.

Some parameters of the application, like the number of fish or rocks, are wrapped either inside vectors of floats or in a single float. For example, a Vector3 from C# in the CPU will match the memory mapping of a float3 in HLSL on the GPU.

Fish-state data (fishState) in the Read/Write buffers and rock-state data (s_Rock) in a third buffer in the CPU must be defined as three distinct ComputeBuffers of the compute shader on the GPU. For example, a Quaternion in the CPU matches the memory mapping of a float4 on the GPU. (A Quaternion is a structure containing 4 floats.) The Read/Write buffers are declared as RWStructureBuffer <State> in the compute shader on the GPU. The same is true for the structure describing rocks on the CPU with a float to represent the size of each rock and a vector of three floats to represent each rock’s position.

On the CPU, the RunShader function creates ComputeBuffer states and calls the GPU to execute its compute shader at the beginning of each frame.

Once the ComputeBuffer states are created on the CPU, they are set to match their associated buffers states on the GPU (for example, the Read Buffer on the CPU is associated with “readState” on the GPU). The two empty buffers are then initialized with fish-state data, the compute shader is executed, and the Write buffer is updated with the data from its associated ComputeBuffer.

On the CPU, the Dispatch function sets up the threads on the GPU and launches them. nbGroups represents the number of groups of threads to be executed on the GPU. In this case, each group contains 256 threads (a group of threads cannot contain more than 1,024 threads).

On the GPU, the “numthreads” property must correspond with the number of threads established on the CPU. As seen below, “int index = 16*8*8/4” provides for 256 threads. The index of each thread needs to be set to the corresponding index of each fish, and each thread needs to update the state of each fish.

Results

Figure 14. Results for a smaller number of fish.

The results show that for fewer than 500 fish, both the single-threaded and multi-threaded CPUs performed better than the GPU. This may be because of the data exchanges which were completed for each frame between the CPU and GPU.

When the number of fish reached 500, the performance for the single-threaded CPU diminished compared to the multi-threaded CPU and GPU (CPU ST = 164fps vs. CPU MT = 295fps and GPU = 200fps). When the number of fish reached 1,500, the performance of the multi-threaded CPU diminished (CPU ST = 23fps and CPU MT = 88fps vs. GPU = 116fps). This may be because of the larger number of cores inside the GPU.

For 1,500 fish and above, in all cases the GPU outperformed both the single-threaded and multi-threaded CPUs.

Figure 15. Results for a larger number of fish.

Although in all cases the GPU performed better than both instances of the CPU, the results show that 1,500 fish provided for the best overall GPU performance (116fps). As more fish were added, GPU performance degraded. Even so, at 2,000 fish, only the GPU performed better than 60fps, and at 2,500 fish, better than 30fps. The GPU finally degraded below 30fps at approximately 6,500 fish.

The most likely reason why the GPU’s performance degraded with larger numbers of fish is because of the complexity of the algorithm. For example, for 10,000 fish there were 10,000², or 100 million iterations for each fish to interact with every other fish in every frame.

Profiling the application resulted in highlighting several critical points inside the application. The function which calculated the distance between each fish was heavy, and the Neighbor function was slow due to the dot product. Replacing the Neighbor call with the distance between two fish (which must be smaller than the maximum distance) would increase the performances a bit. This would mean, however, that any two fish that swim within proximity of each other would now be caused to swim in the same direction.

Another way to possibly improve performance would be to focus on the O(n²) complexity of the algorithm. It is possible that an alternate sorting algorithm for the fish could yield an improvement.

(Consider two fish: f1 and f2. When the Calc function is called for f1, f1’s Neighbor state will be calculated for f2. This Neighbor state value could be stored and used later when the Calc function is called for f2.)

Hardware Used for this Benchmark

Figure 16. Hardware used to run the tests.

Independent game developers face an important decision when selecting a distribution channel. While many devs plan to simply push their game to Steam*, a multichannel distribution model makes more sense. Using more than one channel takes a bit of additional work, but you could reach a far bigger audience and potentially make a lot more money.

Steam*
Figure 1: Steam* is the #1 choice for most indie gamers, but it's not the only one.

Physical boxes and shareware days

For independent game developers, distributing physical, boxed games to brick-and-mortar retailers was often prohibitively expensive. One early workaround was the concept of shareware titles, such as Doom* from id Software, which launched the first-person-shooter genre, courtesy of a free executable with a small footprint. Players could download the first 10 levels before purchasing the entire title, and demand was so intense on the first day that servers were overwhelmed. Players were encouraged to distribute the shareware version freely, and customers eventually bought over one million copies of the full game.

Doom*
Figure 2: Doom* from id Software was released in 1993 as downloadable shareware.

Early online distribution services, such as GameLine*, for the Atari* 2600, and Stardock Central*, lacked any kind of marketing assistance or title curation, and had other distribution issues. In 2004, the Valve Corporation launched the Steam platform—and a revolution.

Steam soon became the largest digital distributor of games for PCs. The advantages are obvious, as Gabe Newell, creator of Steam, explained to RockPaperShotgun.com. “The worst days . . . were the cartridge days for the NES(Nintendo Entertainment System*) . It was a huge risk—you had all this money tied up in silicon in a warehouse somewhere, and so you’d be conservative in the decisions you felt you could make, very conservative in the IPs you signed, your art direction would not change, and so on. Now it’s the opposite extreme . . . there’s no shelf-space restriction.”

Distribution platforms

Figure 3: Distribution platforms to choose from or to include in an all-of-the-above approach.

Multiple sites now centralize purchasing and downloading digital content. Some platforms also serve as digital rights management systems (DRMs) to control the use, modification, and distribution of games and to handle in-game purchases, the keys to unlock content, and more. The three main models are:

Proprietary systems run by large publishers (such as Electronic Arts Inc.*, Ubisoft*, and Tencent*), which allow them to sell direct and to aggregate user information.
Retail systems that sell third-party titles and third-party DRMs. Examples: Green Man Gaming*, Humble Bundle*, and GamersGate*.
Digital distribution platforms selling third-party titles and proprietary DRMs. Table 1 shows the page visits of leading platforms.

Table 1: Top distribution platforms ranked by page visits (Source: Newzoo Q2’17: Global Games Market).

Platform	Web Address	Number of Games	Total Monthly Visits
Steam*	www.steampowered.com	14,000	163,000,000
Humble Bundle*	www.humblebundle.com	5,000	41,600,000
GOG	www.gog.com	2,000	19,000,000
itch.io	www.itch.io	63,000	10,100,000
Green Man Gaming*	www.greenmangaming.com	7,500	6,200,000
GamersGate*	www.gamersgate.com	6,000	2,000,000
OnePlay*	www.oneplay.com	2,000	127,000

Make it a true partnership

With so many options to choose from, it can be difficult to choose a distribution partner. Some key questions to answer before settling on a partner are:

What is your business model? If you are relying on in-game purchases, you’ll need a strong DRM system to manage those microtransactions.
Is your game free or fee-based? Pricing is a tough choice you should make early in your developer’s journey—find more information on our separate article “Get Ready: Pricing Your Indie Game”.
Who is your target audience? If you’re focused on a narrow niche, you may not want to risk getting lost on the largest distribution platforms. Look for sites dedicated to your target audience.
What devices are your potential customers using? If you are releasing a mobile puzzle-game, focus on sites that distribute such titles.
What channels are your potential customers using? Find out what site(s) your target audience relies on.

Direct distribution can still work

Alarmed at the thought of losing revenue to an online distributor, some indie devs might be thinking about distributing their game’s installation package by themselves. The average split for selling through a retailer is 70/30, but can vary depending on the platform and the leverage of the developer. Some sites even offer a Pay What You Want option or allow customers to direct some of the money to charity.

To keep more than 70 percent of the revenue for yourself, you can hook up with a full-stack digital commerce platform, such as Fastspring*, which enables global subscriptions and payments for online, mobile, and in-app experiences. Or you can set up your own digital store using the tools at Binpress*.

“Don’t just rely on distributors to sell your game for you . . . There is still significant money to be made from direct sales,” writes Paul Kilduff-Taylor, part of the team at Mode 7 in Oxford, United Kingdom. When setting up your own distribution channel, you’ll need a reliable payment provider, a clear, optimized website, and you’ll have to work hard to drive potential customers to your site with a good marketing plan, Kilduff-Taylor advises.

Use your own efforts to augment a complete, multichannel distribution strategy. “To have a decent success on the PC with a downloadable game, you’ll need to be on every major portal,” Kilduff-Taylor said.

Don’t stop with Steam

Steam controls a significant portion of the PC game market space, and by late 2017 the service had over 275 million active users—growing at 100,000 per week, according to SteamSpy.com. In 2017 an estimated 6,000+ games were released on Steam.

One of its key attractions is the Humble Indie Bundle, which curates indie titles, giving smaller games a chance to shine. The SteamDirect FAQ lays out some issues you should be aware of. Be sure to emphasize the points that make your game unique and be organized with your marketing efforts—with a plan, collateral pieces, press information, and a compelling trailer all ready to go. Trailers are a key ingredient, and producers such as Kert Gartner are highly sought after. <Link to Kert Gartner Explains the Genius Behind Mixed-Reality VR Game Trailers> Make sure your game stands out, or you could get lost in the daily release avalanche.

Patrick DeFreitas, software partner marketing manager at Intel, advises indies to consider distributing through multiple channels. “Many indie developers on the PC gaming side see Steam as the be-all and end-all for distribution. They believe that if they have their title on Steam, they’re good to go. But it’s important to consider additional digital and retail distribution channels to get your title out there.”

Secondary retailers and channels focus on curating high-quality games that are compelling to their base and may be able to perfectly match your title to their followers. DeFreitas also points out that some retailers may do a better job in a single region. “They’re all looking for a portfolio of titles that they can sell through their channels,” he said. “At the end of the day, you could potentially end up as an indie developer with a dozen different channels where you are selling your titles directly to consumers worldwide.”

Data gathering aids decision making

Investigate the statistics the various distribution channels can gather for you. Over time, you should have plenty of data to analyze as you determine sales trends, response to promotions, geographical strengths, and buyer personas. Steam is so big that third-party sites, such as SteamCharts.com, have sprung up providing data snapshots. At SteamAnalyst.com you can find out what in-game purchases are trending. Google Analytics* can be paired with Steam data to analyze your Steam Store page or Community Hub for anonymized data about your traffic sources and visitor behavior.

Steam*
Figure 4: Steam* conducts monthly surveys to help guide your decision making (source: Steampowered.com).

Be sure to take advantage of data collection opportunities, so you can develop and perfect the player personas in your target audience. The more you know about your sales and your customers the better are the decisions you can make about additional distribution choices.

Figure 5: Third-party sites such as SteamCharts.com offer continual snapshots of Steam* data (source: Steamcharts.com)

Multiplatform releases boost incomes, headaches

Releasing a title across mobile devices, consoles, and PC operating systems is a good way to boost your income flow but probably not a good choice for most beginners with a single title. Learning the ropes for so many different systems all at once is a big challenge. Game engines such as Unity* software and Unreal* offer ways to reach multiple platforms from the same code base, but be prepared to make a big investment in testing and quality control. You might want to concentrate on making the very best PC game you are capable of, rather than extending yourself across every available platform.

Bundling for fun

Getting into an original equipment manufacturer bundle is a great way to jump-start distribution; you develop more of a business-to-business model, and the bundler handles much of the promotion. Instead of trying to stand out from dozens of titles released around the same time, you only compete with the handful of titles in the bundle. Reddit* maintains a good overview of the current bundles for sale, and a list of sites that offer game bundles. IndieGameBundles.com* keeps a similar list completely devoted to indies.

Writing at venturebeat.com*, Joe Hubert says, “You don’t enter into a bundle in hopes of retiring on a nice island. You enter a bundle for the residual influence it has on your game. (The exposure outweighs the low price-point of the sale.) Your game will get eyeballs, lots and lots of eyeballs, to look at your game, see what it’s about, and recognize it in the future,” Hubert wrote.

HumbleBundle.com offers an FAQ to help guide you through their submission steps. Fanatical.com* starts their process with an email, while Green Man Gaming has an online form.

You can also contact publishers and bundlers at shows and events as part of your own networking. Major gaming sites and magazines can steer you toward hot, new platforms. Chat up fellow devs for their takes on distribution trends as well.

Once you get into a bundle, you may be expected to participate with your own marketing efforts. The good news is that you’ll have more news and information to fill up your social networking feeds. For more information about promotion strategies and marketing deliverables, check out this article about attending events, <link to Get Noticed: Attending Your First Event as an Indie Game Developer> and this article about approaching industry influencers. <link to Get Big: Approaching Industry Influencers for Indie Game Developers>

Several organizations host annual contests for indie game developers. The Independent Games Festival offers cash prizes and publicity, while the Game Development World Championship offers trips to Finland and Sweden, and visits to top game studios. These are also great marketing bullets for your promotional materials. Also be on the lookout for contests that offer help in distributing your game in a bundle, or as a stand-alone title. The Intel® Level Up Game Developer Contest, for example, puts your game in front of Green Man Gaming. Check the PixelProspector.com* site for its updated list of contests to enter.

The power of good distribution

Bastion*, an action role-playing game from indie developer Supergiant Games, was nearly sunk by a troubled preview version at a recent Game Developer Conference (GDC). When they brought a playable version to the Penny Arcade Expo, however, it started picking up awards. Crucially, this led to Warner Bros. Interactive Entertainment publishing it on Microsoft Xbox*. It was next ported to Microsoft Windows* PC on Steam, and a browser game was created for Google Chrome*. It sold at least 500,000 copies in one year.

Figure 6: Bastion* overcame early obstacles to become available in multiple versions.

When Dustforce* was included in Humble Indie Bundle 6, they witnessed an enormous boost in sales. In a short two-week period after the bundle was rolled out, the game sold 138,725 copies and pulled in USD 178,235.

Promotions often provide a spur to plateauing sales. Alexis Santos, editor at Binpress*, said that Pocketwatch Games’ Monaco* made USD 215,000 by participating in Humble Indie Bundle 11. Monaco was included in 370,000 bidders out of 493,000 bundles sold; bidders had to beat the average price of USD 4.71 per bundle to receive Monaco. That meant that Pocketwatch didn’t receive a big income boost per title, but it distributed hundreds of thousands of copies of the game. What it did receive was a pretty good income boost to a game that had been on the market for 10 months, and there was no major impact on Steam sales of the full-priced title outside of the bundle.

Aki Järvinen, founder of GameFutures.com*, recently wrote on 10 trends shaping the gaming industry and pointed to the evolution of business models that benefit from new distribution schemes. “Companies like Playfield* and Itch.io are building services that try to tackle the indie discoverability issue,” he said, “both for the player community and the developers.” His guess is that with distribution platforms providing more support for marketing, public relations, and data analytics, in the future we may be seeing more of what Morgan Jaffit calls “triple-I” titles and studios.

Rather than the indiepocalypse that pundits worried about in 2015—a super-saturated indie market leading to smaller slices of a slow-growing pie—there will always be room for creative, unique games. The trick will be in making them easy to find and buy. With multiple evolving distribution channels, you’ll have to work hard to distribute your intellectual property through appropriate channels, in order to maximize reach, audience, and revenue. Don’t be afraid to ask for help, either. Intel and Green Man Gaming just teamed up to form a new digital content distribution site for publishers, retailers, and channel partners. To learn more about getting involved with the Intel^® Software Distribution Hub, visit https://isdh.greenmangaming.com.

Resources

Intel® Developer Zone

Intel® Level Up Game Developer Contest

Indie Games on Steam

Man in front of a positive urban landscape

Getting noticed in the vast digital world, with its myriad social networks and other channels of influence, might appear to require mountains of money and resources. This could be a problem for indie game developers with limited budgets. Expensive PR agencies might have once been the only option, but today's internet-based marketing channels are free for the asking. The networks and people who can provide the exposure you need often have as much to gain from your success as you do—it's your content that keeps them in business. More than they create, influencers endorse and attract. They need a constant flow of new and visionary material to keep viewers interested. Indie game developers can feed that appetite for content as well as any major game studio, but how do you make that connection?

Getting the word out today means more than sprinkling seeds to the four winds of the web and nurturing the ones that take root. Today's influencers—the streamers, YouTube* gamers and bloggers—can multiply your exposure many times over, and it's important to identify and target the ones that play your type of game. You also can gain exposure from your indie peers, traditional media outlets, gaming conferences, and from the consumers themselves.

The trick to approaching these disparate groups is in knowing how to identify the influencers of highest value to you, designing a plan of attack for each, and implementing and tracking the results of that plan.

This article covers strategies for approaching:

Social network communities
Streamers on Twitch*, YouTube*, and others
Game retailers
Gaming event attendees

It leans heavily on the know-how of Patrick DeFreitas and Dan Fineberg, marketing experts at Intel, who share time-tested techniques for publicizing and distributing titles on a budget that indies can afford.

Social Networks: Start with What You Know

Social media channels play heavily into an overall brand-building strategy. Identify the social networks you're already familiar with, and start promoting your game there.

During the early stages of development, indies can already identify the aspects of their game that make it unique and fun. Before a game is even playable, screenshots or renderings of game scenery can be used to promote the game on sites such as Facebook*, where it's easy for others to help spread the word, generate interest, and perhaps even spawn a community. "Even if you don't have anything to show but a single screenshot, if you have a good story, and something to share with the gaming community that they feel would be a value to their own work, then that's another way of bringing visibility to your game in the very, very early stages," says Patrick DeFreitas, Intel marketing manager for software, user experience, and media.

What is different about your game—its narrative, characters, or flow? Identify the characteristics of your game that will capture people's interest and post about them on social sites. One post could cover how the game riffs on a popular storyline, another its original setting, and the next how it augments reality in a way that's never been done before. It could be anything, but it should be something that's yours and yours alone. DeFreitas says that outreach should begin at an early stage. Dan Fineberg—a software marketing and planning consultant at Intel—points out that new social channels, such as Medium*, are being launched frequently and are getting a lot of attention. "It's a relatively new medium when you think about it. There's a lot of change, and you just have to stay abreast of it."

Dan also said that for generalized social media channels such as Facebook, Twitter*, Instagram*, and YouTube, your strategy must be carefully tailored. "There is a lot of nuance in terms of what each platform is best at doing." Different social media channels have differing value to gamers and the game developer. "Not just in gaming, but in general software-related areas, you might find that you can get a lot more engagement on one medium such as Facebook, but that for creating more awareness of your game another channel such as Instagram or Twitter might be better—but your results are unique to you."

The social aspect for some specialized sites might be relatively small, but having a presence can pay off later. "Places like Green Man Gaming and others like it have obvious relevance," says Fineberg. They are good places to get early visibility and make inroads to the opportunities the sites offer to increase the distribution of your title. He added that Twitch also has become a powerful social platform, and can lead to engagements with influencers and others that might be interested in promoting your title.

Conversation bubble, represents communication to a wide, different crowd
Figure 1. Identify and spread the word about the unique aspects of your process and game.

"You've also got Reddit*," notes DeFreitas. "There are so many different groups within those channels that really cater to developers and individual developer programs, and I think they'll continue to use those social channels." The trick is to balance your game development time with the time you spend on social networks. DeFreitas added: "You only have so much time in the day to dedicate to exchanging content and information with your digital community online, versus focusing on creating your game." He advises that selecting communities and channels that yield the best return-per-engagement might involve some trial and error. Make sure you're providing value and tapping into a channel that sees what you're doing is innovative, progressive or unique, and parallel to what that channel or community is all about. The community itself will let you know, either by silence or by storm.

Puzzle game with many pieces, represents puzzle resolution
Figure 2. As the pieces of your game come together, put them in the public eye to generate interest and build a community.

An Agenda for Events

A great place to make direct contact with influencers, industry figures, and game enthusiasts is at developer events, such as the Intel® Buzz Workshop series, where the focus isn't necessarily on showing people a game that's ready for market. "You could be talking about your development techniques, the challenges, and things that you've overcome," says DeFreitas. Also, on the agenda could be some of the different solutions you've implemented that other developers may find interesting or valuable.

The most important element of the indie's marketing campaign is to approach influencers who can spread the word about your game, and get people interested in buying and playing it. Schedule appointments ahead of time, and take advantage of events that draw together personalities that you otherwise would have to spend a vast amount of time and money tracking down individually. Think ahead and plan your campaign.

Once your game has reached playability, offering a closed beta is a great way to give gamers a sneak preview. DeFreitas described how Polish game developers Destructive Creations are using this strategy for the upcoming release of Ancestors Legacy. "They created videos on YouTube. They created a product page on Steam* and, to get the word out for the game, they created all of this content—and obviously the game's not even ready for market."

Promotion image for game - Ancestors Legacy
Figure 3. To create buzz, Destructive Creations created prerelease content for Ancestors Legacy.

"The closed beta means getting people to hammer on the game itself, and you're still capturing all the feedback," says DeFreitas. "That feedback won't necessarily affect your ratings on Steam, because everyone understands that this is a closed environment." Testers are made to feel like insiders, which likely means they'll talk about the game more, and it gives them a stake in your game's success—and a say in what goes into it. "It's a clever way of doing that, when you think about it. Besides them playing your game and giving feedback, if they love it they'll get a free copy, or a couple of copies to give to friends and family when everything is ready." You offer something interesting, exclusive, and unique to your audience, as well as a reward for participating.

Strategies for Streamers

Is your game a first-person shooter, side-scroller, or immersive adventure? Does it take place in space or a fantasy land? One-on-one or online multiplayer? Identify streamers who play your type of game. Be brave, aim high, and list them all, regardless of their status. "Don't be afraid to start with the mid-tier or upper-tier influencers and see if they'll be willing to stream," says DeFreitas. After all, streamers need content to fill their pipelines, gain new viewers, and increase their influence. Their inclination will be to listen, but your time with them will be limited. So, develop and practice your pitch—you might only get one shot.

"If you can engage with people like that, and implement some of their ideas in your game, they will likely feel really positive toward what you're doing and help promote it," Fineberg added. This can be a critical strategy in any market—luminaries and influencers get on your side to champion your cause because you've realized their vision. "That's important, because they're opinion leaders, and they have ideas that lots of people care about. You can help them build equity in their value to their audience, and they'll be inclined to help you, in return."

Of course, there could be roadblocks. "Once you start reaching the celebrity influencers and streamers that are out there, often they're committed to a specific title or genre, or under commitments made to a sponsor," says DeFreitas. This makes it a greater challenge to pull someone in to stream your game, especially if it doesn't have the level of success of other games they're currently streaming. There may be, however, opportunities to partner with the sponsors themselves. "Channel folks can help," Fineberg says, "because as you develop relationships in distribution, that can become an entrée into their joint go-to-market activities, including engaging with influencers."

DeFreitas agrees. He says that developers should also look to the hardware companies producing the kit used by gamers. "Some of the independent software vendor developers we're working with today wouldn't have reached out to some of the streamers that we work with, if we didn't insert ourselves as part of the equation."

Tap Existing Contacts

Identify the sponsors of the streamers you plan to approach and exploit any existing relationship or connection you might have with them. "Those influencers are already getting paid through sponsorship, so if they have a channel and they need to fill that channel with content they may be open to opportunities to insert your game into that channel, which is already being paid for and covered by the bigger partner or brand sponsoring it," says DeFreitas.

Intel, your game engine maker, and others, also might maintain influencer networks as part of their developer programs. Some might even have their own streaming channels. "It takes considerable energy, time, and resources to keep one of those up and running and filled with content," says DeFreitas. "So, they are probably always looking for opportunities to pull in new content, especially if it's a title that's related to their technology."

Talk to Game Retailers

"Companies like Green Man Gaming and Humble Bundle want to increase their revenues, so they engage the developers of games they distribute in go-to-market promotional activities to build interest and demand for the titles," says Fineberg. Retailers have affiliates, influencer channels, and networks, too, and all are aimed at generating revenue. Green Man Gaming maintains a network of about 3,000 influencers, explains DeFreitas, but access doesn't come for free. Retailers usually expect you to contribute time, effort, and possibly money, to the go-to-market program.

Some of that time should include putting together an influencer kit that describes the game in positive terms. Include artwork and other relevant game assets in your kit. And make it easy for retailers and influencers to understand, help promote, and sell your product. After signing a retail contract, you'll be working with either an account manager or with a marketing team. According to DeFreitas, your proposal might be to set aside 50 influencers and give each of them three keys to give out to their audience. "You're most likely going to get some visibility on their channels."

Repeat this process for other distribution channels. "Now you're taking their networks, and leveraging their audiences on your behalf, without really doing a lot of work," says DeFreitas. "Essentially, you're giving them the keys, you're giving them the artwork, you're giving them some interesting facts about the game itself, you're packaging it up, and you're pushing it out. Ultimately what they are trying to do is bring visibility to your game, drive audiences back to their respective retail channels, and convert those sales."

Map of possible options, represents strategic planning and organization
Figure 4. A successful publicity strategy will include many interrelated components, working together.

Take a Holistic Approach

As an independent game developer your business strategy needs to begin early in your design process, evolve as the game does, and continue through release and distribution. You must find the right balance between coding and marketing, learn from mistakes, focus on the strategies that succeed, identify the most efficient influencers, and prioritize your contact and engagement with them.

photorealistic penguin about to wake polar bear with cymbals
Figure 5. Be brave. No matter how big the influencer, it's a mutually beneficial partnership.

Awareness marketing has historically been thought of as separate from lead generation. "That's all changed. Social media really combines both awareness and lead generation in one fell swoop," says DeFreitas. The reason for this is simply due to the sheer mass of sites such as Facebook, YouTube, and Twitter. "When viral content exists on one, or there's a controversy or what have you, it creates a ripple effect throughout the entire mass communication media spectrum," he explained.

Influencers need you as much as you need them. So, remember to be brave, and try to avoid being eaten.

Resources

Green Man Gaming: https://www.greenmangaming.com/
Humble Bundle: https://www.humblebundle.com/
Taboola: https://www.taboola.com/
Intel® Developer Zone: https://software.intel.com/en-us
Intel® Buzz Workshop Series: https://software.intel.com/en-us/event/buzzworkshop

Introduction

Health professionals and researchers have access to plenty of healthcare data. However, the implementation of artificial intelligence (AI) technology in healthcare is very limited, primarily due to lack of awareness about AI. AI is still a problem for most healthcare professionals. The purpose of this article is to introduce AI to the healthcare professional, and its application to different types of healthcare data.

IT (information technology) professionals such as data scientists, AI developers, and data engineers are also facing challenges in the healthcare domain; for example, finding the right problem,¹ lack of data availability for training of AI models, and various issues with the validation of AI models. This article highlights the various potential areas of healthcare where IT professionals can collaborate with healthcare experts to build teams of doctors, scientists, and developers, and translate ideas into healthcare products and services.

Intel provides educational software and hardware support to health professionals, data scientists, and AI developers. Based on the dataset type, we highlighted a few use cases in the healthcare domain wheref AI was applied using various medical datasets.

Artificial Intelligence

AI is an intelligent technique that enables computers to mimic human behavior. AI in healthcare uses algorithms and software analyzing of complex medical data to find the relationships between patient outcomes and prevention/treatment techniques.² Machine learning (ML) is a subset of AI. It uses various statistical methods and algorithms, and enables a machine to improve with experience. Deep learning (DL) is subset of ML.³ It takes machine learning to the next level with multilayer neural network architecture. It indentifes a pattern or does other complex tasks like the human brain does. DL has been applied in many fields such as computer vision, speech recognition, natural language processing (NLP), object detection, and audio recognition.⁴ Deep neural networks (DNNs) and recurrent neural networks (RNNs), examples of deep learning architectures, are utilized in improving drug discovery and disease diagnosis.⁵

Relationship of AI, machine learning, and deep learning.

Figure 1. Relationship of artificial intelligence, machine learning, and deep learning.

AI Health Market

According to Frost & Sullivan (a growth partnership company), the AI market in healthcare may reach USD 6.6 billion by 2021, a 40 percent growth rate. AI has the potential to reduce the cost of treatment by up to 50 percent.⁶ AI applications in healthcare may generate USD 150 billion in annual savings by 2026, according to the Accenture analysis. AI-based smart workforce, culture, and solutions are consistently evolving to provide comfort to the healthcare industry in multiple ways, such as ⁷

Alleviating the burden on clinicians and giving medical professionals the tools to do their jobs more effectively.
Filling in gaps during the rising labor shortage in healthcare.
Enhancing efficiency, quality, and outcomes for patients.
Magnifying the reach of care by integrating health data across platforms.
Delivering benefits of greater efficiency, transparency, and interoperability.
Maintaining information security.

Healthcare Data

Hospitals, clinics, and medical and research institutes generate a large volume of data on a daily basis, which includes lab reports, imaging data, pathology reports, diagnostic reports, and drug information. Such data is expected to increase greatly in the next few years when people expand their use of smartphones, tablets, the IoT (Internet of things), and Fitness Gazette to generate information.⁸ Digital data is expected to reach 44 zettabytes by 2020, doubling every year.⁹ The rapid expansion of healthcare data is one of the greatest challenges for clinicians and physicians. Current literature suggests that big data ecosystem and AI are solutions to processing this massive data explosion along with meeting the social, financial, and technological demands of healthcare. Analysis of such big and complicated data is often difficult and it requires a high level of skill for data analysis. Moreover, the most challenging part is an interpretation of results and recommendations based on the outcome, and medical experience, and requires many years of medical involvement, knowledge, and specialized skill sets.

In healthcare the data are generated, collected, and stored in multiple formats including numerical, text, images, scans, and audios or videos. If we want to apply AI to our dataset, we first need to understand the nature of the data, and all questions that we want to answer from the target dataset. Data type helps us to formulate the neural network, algorithm, and architecture for AI modeling. Here, we introduce a few AI-based cases as examples to demonstrate the application of AI in healthcare, in general. Typically, it can be customized accordingly, based on the project and area of interest (that is, oncology, cardiology, pharmacology, internet medicine, primary care, urgent care, emergency, and radiology). Below is a list of AI applications based on the format of various datasets that are gaining momentum in the real world.

Healthcare Dataset: Pictures, Scans, Drawings

One of the most popular ways to generate data in healthcare is with images such as scan (PET Scan image with credit Susan Landau and William Jagust at UC Berkeley)¹⁰, tissue section¹¹, drawing¹², organ image¹³ (Figure 2A). In this scenario, specialists look for particular features in an image. A pathologist collects such images under the microscope from tissue sections (fat, muscle, bone, brain, liver biopsy, and so on). Recently, Kaggle organized the Intel and MobileODT Cervical Cancer Screening Competition to improve the precision and accuracy of cervical cancer screening using a big image data set (training, testing, and additional data set).¹⁴ The participants used different deep learning models such as the faster region-based convolution neural network (R-CNN) detection framework with VGG16,¹⁵ supervised semantics-preserving deep hashing (SSDH) (Figure 2B), and U-Net for convolutional networks.¹⁶ Dr Silva achieved 81 percent accuracy using the Intel® Deep Learning SDK and GoogLeNet* using Caffe* on the validation test.¹⁶

Similarly, Xu et al. investigated datasets of over 7,000 images of single red blood cells (RBCs) from eight patients with sickle cell disease. They selected the DNN classifier to classify the different RBC types.¹⁷ Gulshan et al. applied deep convolutional neural network (DCNN) in more than 10,000 retinal images collected from 874 patients to detect moderate and worse referable with about 90 percent sensitivity and specificity.¹⁸

Various types of healthcare image data

Figure 2. A) Various types of healthcare image data. B) Supervised semantics-preserving deep hashing (SSDH), a deep learning model, used in the Intel and MobileODT Cervical Cancer Screening Competition for image classification. Source: 10-13,16

Positron Emission Tomography (PET), computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound images (Figure 2A) are another source of healthcare data where images of tissue inside are collected from internal organ (like brain, tumors) without invasion. Deep learning models can be used to measure the tumor growth over time in cancer patients on medication. Jaeger et al. applied convolutional neural network (CNN) architecture on a diffusion-weighted MRI. Based on an estimation of the properties of the tumor tissue, this architecture reduced false-positive findings, and thereby decreased the number of unnecessary invasive biopsies. The researchers noticed that deep learning reduced the motion and vision error, and thus provided more stable results in comparison to manual segmentation.¹⁹ A study conducted in China showed that deep learning helped to achieve 93 percent accuracy in distinguishing malignant and benign cancer on the elastogram of ultrasound shear-wave elastography of 200 patients.^20,21

Healthcare Dataset: Numerical

Example of numerical data

Figure 3. Example of numerical data.

Healthcare industries collect a lot of patient/research-related information such as age, height, weight, blood profile, lipid profile, sugar, blood pressure, and heart rate. Similarly, gene expression data (for example, fold change) and metabolic information (for example, level of metabolites) are also expressed by the numbers.

The literature showed several cases where the neural network was successfully applied in healthcare. For instance, Danaee and Ghaeini from Oregon State University (2017) used a deep architecture, stacked denoising autoencoder (SDAE) model, for the extraction of meaningful features from gene expression data of 1097 breast cancer and 113 healthy samples. This model enables the classiﬁcation of breast cancer cells and identification of genes useful for cancer prediction (as biomarkers) or as the potential for therapeutic targets.²² Kaggle shared the breast cancer dataset from the University of Wisconsin containing formation radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, and fractal dimension of the cancer cell nucleus. In the Kaggle competition, the participants had successfully built a DNN classifier to predict breast cancer type (malignant or benign). ²³

Healthcare Dataset: Textual

Example of textual data

Figure 4. Example of textual data.

Plenty of medical information is recorded as text; for instance, clinical data (cough, vomiting, drowsiness, and diagnosis), social, economic, and behavioral data (such as poor, rich, depressed, happy), social media reviews (Twitter, Facebook, Telegram*, and so on), and drug history. NLP, a type of neural network, translates free text into standardized data. It enhances the completeness and accuracy of electronic health records (EHRs). NLP algorithms extract risk factors from notes available on the EHR.
For example, NLP was applied on 21 million medical records. It identified 8500 patients who were at risk of developing congestive heart failure with 85 percent accuracy.²⁴ The Department of Veterans Affairs used NLP techniques to review more than two billion EHR documents for indications of post-traumatic stress disorder (PTSD), depression, and potential self-harm in veteran patients.²⁵ Similarly, NLP was used to identify psychosis with 100 percent accuracy on schizophrenic patients based on speech patterns.²⁶ IBM Watson* analyzed 140,000 academic articles, which cannot be read, understood, or remembered by humans, and suggested recommendations about a course of therapy for cancer patients.²⁴

27,31

Figure 5. Examples of electrogram data. Source:27,31

Healthcare Dataset: Electrogram

Architecture of deep learning with convolutional neural network model

Figure 6. Architecture of deep learning with convolutional neural network model useful in classification of EEG data. (Source: 28-29)

Electrocardiogram (ECG)²⁷, electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), and sleep test are some examples of graphical healthcare data. Electrogram is the process of recording the electrical activity of the target organ (such as heart, brain, and muscle) over a period of time using electrodes placed on the skin.

Schirrmeister et al. from the University of Freiburg designed and trained a deep ConvNets (deep learning with convolutional network) model to decode raw EEG data, which is useful for EEG-based brain mapping.^28,29 Paurbabaee et al. from Concordia University, Canada used a large volume of raw ECG time-series data and built a DCNN model. Interestingly, this model learned key features of the paroxysmal atrial fibrillation (PAF)—a life-threatening heart disease, and was thereby useful in PAF patient screening. This method can be a good alternative to traditional ad hoc and time-consuming user's handcrafted features.³⁰ Sleep stage classification is an important preliminary exam of sleep disorders. Using 61 polysomnography (PSG) time series data, Chambon et al. built a deep learning model for classification of sleep stage. The model showed a better performance, relative to traditional method, with little run time and computational cost.³¹

Healthcare Dataset: Audio and Video

Example of audio data

Figure 7. Example of audio data.

Sound event detection (SED) deals with detection of the onset and offset times for each sound event in an audio recording and associates a textual descriptor. SED has been drawing great interest recently in the healthcare domain for healthcare monitoring. Cakir et al. combined CNNs and RNNs in a convolutional recurrent neural network (CRNN) and applied it to a polyphonic sound event detection task. They observed a considerable improvement in the CRNN model.³²

Videos are a sequence of images; in some cases they can be considered as a time series, and in very particular cases as dynamical systems. Deep learning techniques helps researchers in both computer vision and multimedia communities to boost the performance of video analysis significantly and initiate new research directions to analyze video content. Microsoft started a research project called InnerEye* that uses machine learning technology to build innovative tools for the automatic, quantitative analysis of three-dimensional radiological images. Project InnerEye employs algorithms such as deep decision forests as well as CNNs for the automatic, voxel-wise analysis of medical images.³³ Khorrami et al. built a model on videos from the Audio/Visual Emotion Challenge (AVEC 2015) using both RNNs and CNNs, and performed emotion recognition on video data.³⁴

Healthcare Dataset: Molecular Structure

Molecular structure of 4CDG

Figure 8. Molecular structure of 4CDG (Source: rcbs.org)

Figure 8 shows a typical example of the molecular structure of one drug molecule. Generally, the design of a new molecule is associated with the historical dataset of old molecules. In quantitative structure-activity relationship (QSAR) analysis, scientists try to find known and novel patterns between structures and activity. At the Merck Research Laboratory, Ma et al. used a dataset of thousands of compounds (about 5000), and built a model based on the architecture of DNNs (deep neural nets).³⁵ In another QSAR study, Dahl et al. built neural network models on 19 datasets of 2,000‒14,000 compounds to predict the activity of new compounds.³⁶ Aliper and colleagues built a deep neural network–support vector machine (DNN–SVM) model that was trained on a large transcriptional response dataset and classified various drugs into therapeutic categories.³⁷ Tavanaei developed a convolutional neural network model to classify tumor suppression genes and proto-oncogenes with 82.57 percent accuracy. This model was trained on tertiary structures proteins obtained from protein data bank.³⁸ AtomNet* is the first structure-based DCNN. It incorporates structural target information and consequently predicts the bioactivity of small molecules. This application worked successfully to predict new, active molecules for targets with no previously known modulators.³⁹

AI: Solving Healthcare Problems

Here are a few practical examples where AI developers, startups, and institutes are building and testing AI models:

As emotional intelligence indicators that detect subtle cues in speech, inflection, or gesture to assess a person’s mood and feelings
Help in tuberculosis detection
Help in the treatment of PTSD
AI chatbots (Florence*, SafedrugBot*, Babylon Health*, SimSensei*)
Virtual assistants in helping patients and clinicians
Verifying insurance
Smart robots that explain lab reports
Aging-based AI centers
Improving clinical documentation
Personalized medicine

Data Science and Health Professionals: A Combined Approach

Deep learning has great potential to help medical and paramedical practitioners by:

Reducing the human error rate⁴⁰ and workload
Helping in diagnosis and the prognosis of disease
Analyzing complex data and building a report

The examination of thousands of images is complex, time consuming, and labor intensive. How can AI help?

A team from Harvard Medical School’s Beth Israel Deaconess Medical Center noticed a 2.9 percent error rate with the AI model and a 3.5 percent error rate with pathologists for breast cancer diagnosis. Interestingly, the pairing of “deep learning with pathologist” showed a 0.5 percent error rate, which is an 85 percent drop.⁴⁰ Litjens et al. suggest that deep learning holds great promise in improving the efficacy of prostate cancer diagnosis and breast cancer staging. ^41,42

Intel® AI Academy

Intel provides educational software and hardware support to health professionals, data scientist and AI developers, and makes available free AI training and tools through the Intel® AI Academy.

Intel recently published a series of AI hands-on tutorials, walking through the process of AI project development, step-by-step. Here you will learn:

Ideation and planning
Technology and infrastructure
How to build an AI model (data and modeling)
How to build and deploy an app (app development and deployment)

Intel is committed to providing a solution for your healthcare project. Please read the article on the Intel AI Academy to learn more about solutions using Intel® architecture (Intel® Processors for Deep Learning Training). In the next article, we explore examples of healthcare datasets where you will learn how to apply deep learning. Intel is committed to help you to achieve your project goals.

References

Faggella, D. Machine Learning Healthcare Applications – 2018 and Beyond. Techemergence.
Artificial intelligence in healthcare - Wikipedia. (Accessed: 12th February 2018)
Intel® Math Kernel Library for Deep Learning Networks: Part 1–Overview and Installation | Intel® Software. (Accessed: 14th February 2018)
Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015).
Mamoshina, P., Vieira, A., Putin, E. & Zhavoronkov, A. Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics13, 1445–1454 (2016).
From $600 M to $6 Billion, Artificial Intelligence Systems Poised for Dramatic Market Expansion in Healthcare. (Accessed: 12th February 2018)
Accenture. Artificial Intelligence in Healthcare | Accenture.
Marr, B. How AI And Deep Learning Are Now Used To Diagnose Cancer. Foboes
Executive Summary: Data Growth, Business Opportunities, and the IT Imperatives | The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. Available at: . (Accessed: 12th February 2018)
Lifelong brain-stimulating habits linked to lower Alzheimer’s protein levels | Berkeley News. (Accessed: 21st February 2018)
Emphysema H and E.jpg - Wikimedia Commons (Accessed : 23^rd February 2018). https://commons.wikimedia.org/wiki/File:Emphysema_H_and_E.jpg
Superficie_ustioni.jpg (696×780). (Accessed: 23rd February 2018). https://upload.wikimedia.org/wikipedia/commons/1/1b/Superficie_ustioni.jpg
Heart_frontally_PDA.jpg (1351×1593). (Accessed: 27th February 2018). https://upload.wikimedia.org/wikipedia/commons/5/57/Heart_frontally_PDA.jpg
Kaggle competition-Intel & MobileODT Cervical Cancer Screening. Intel & MobileODT Cervical Cancer Screening. Which cancer treatment will be most effective? (2017).
Intel and MobileODT* Competition on Kaggle*. Faster Convolutional Neural Network Models Improve the Screening of Cervical Cancer. December 22 (2017).
Kaggle*, I. and M. C. on. Deep Learning Improves Cervical Cancer Accuracy by 81%, using Intel Technology. December 22 (2017).
Xu, M. et al. A deep convolutional neural network for classification of red blood cells in sickle cell anemia. PLoS Comput. Biol.13, 1–27 (2017).
Gulshan, V. et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA316, 2402 (2016).
Jäge, P. F. et al. Revealing hidden potentials of the q-space signal in breast cancer. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)10433 LNCS, 664–671 (2017).
Ali, A.-R. Deep Learning in Oncology – Applications in Fighting Cancer. September 14 (2017).
Zhang, Q. et al. Sonoelastomics for Breast Tumor Classification: A Radiomics Approach with Clustering-Based Feature Selection on Sonoelastography. Ultrasound Med. Biol.43, 1058–1069 (2017).
Danaee, P., Ghaeini, R. & Hendrix, D. A. A deep learning approach for cancer detection and relevant gene indentification. Pac. Symp. Biocomput.22, 219–229 (2017).
Kaggle: Breast Cancer Diagnosis Wisconsin. Breast Cancer Wisconsin (Diagnostic) Data Set: Predict whether the cancer is benign or malignant.
What is the Role of Natural Language Processing in Healthcare? (Accessed: 1st February 2018)
VA uses EHRs, natural language processing to spot suicide risks. (Accessed: 1st February 2018)
Predictive Analytics, NLP Flag Psychosis with 100% Accuracy. (Accessed: 1st February 2018)
Heart_block.png (450×651). (Accessed: 23rd February 2018)
Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG Short title: Convolutional neural networks in EEG analysis. (2017).
Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp.38, 5391–5420 (2017).
Pourbabaee, B., Roshtkhari, M. J. & Khorasani, K. Deep Convolutional Neural Networks and Learning ECG Features for Screening Paroxysmal Atrial Fibrillation Patients. IEEE Trans. Syst. Man, Cybern. Syst. 1–10 (2017). doi:10.1109/TSMC.2017.2705582
Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G. & Gramfort, A. A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. arXiv:1707.0332v2 (2017).
Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H. & Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Trans. Audio, Speech, Lang. Process.25, 1291–1303 (2017).
Project InnerEye – Medical Imaging AI to Empower Clinicians. Microsoft
Khorrami, P., Le Paine, T., Brady, K., Dagli, C. & Huang, T. S. HOW DEEP NEURAL NETWORKS CAN IMPROVE EMOTION RECOGNITION ON VIDEO DATA.
Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model.55, 263–274 (2015).
Dahl, G. E., Jaitly, N. & Salakhutdinov, R. Multi-task Neural Networks for QSAR Predictions. (University of Toronto, Canada. Retrieved from http://arxiv.org/abs/1406.1231, 2014).
Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm.13, 2524–2530 (2016).
Tavanaei, A., Anandanadarajah, N., Maida, A. & Loganantharaj, R. A Deep Learning Model for Predicting Tumor Suppressor Genes and Oncogenes from PDB Structure. bioRxiv October 22, 1–10 (2017).
Wallach, I., Dzamba, M. & Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. 1–11 (2015). doi:10.1007/s10618-010-0175-9
Kontzer, T. Deep Learning Drops Error Rate for Breast Cancer Diagnoses by 85%. September 19 (2016).
Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep.6, (2016).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal.42, 60–88 (2017).

Cloth Simulations

Realistic cloth movement can bring a great amount of visual immersion into a game. Using PhysX Clothing is one way to do this without the need of hand animating. Incorporating these simulations into Unreal Engine* 4 is easy, but, as it is a taxing process on the CPU, it’s good to understand their performance characteristics and how to optimize them.

Disabling cloth simulation

Cloth simulations in Unreal* are in the level they will be simulated, whether they can be seen or not. Optimization can prevent this risk. Do not rely on the Disable Cloth setting for optimizing simulated cloth, as this only works in the construction, and has no effect while the game is in play.

Unreal* Physics Stats

To get a better understanding of cloth simulation and its effect on a game and system, we can use a console command, Stat PHYSICS, in Unreal.

After entering Stat PHYSICS at the command line, the physics table overlay appears (Figure 1). To remove it, just enter the same command into the console.

Figure 1. Physics overlay table.

While there is a lot of information available, we need only worry about the first two (Cloth Total and Cloth Sim) for the purposes of this paper.

Cloth Total represents the total number of cloth draws within the scene, and Cloth Sim (simulation) represents the number of active cloth meshes currently simulated. Keeping these two numbers within a reasonable level to your target platform helps prevent a loss of frame rate due to the CPU being loaded down with processing cloth. By adding an increasing number of cloth meshes to the level, the number of simulations the CPU can handle at once becomes apparent.

Suspending Cloth Simulation

Added to the Unreal Engine Blueprint system is the ability to suspend and resume the cloth simulations on a skeletal mesh. These added nodes solve the previous issue of the cloth simulation being reset every time with the level of detail method of cloth optimization.

Figure 2. Resume and suspend on a was recently rendered function switch.

For the purpose of this document all of the methods discussed below in the Level of Detail section still apply, but now you can exchange the Set Min LOD nodes with the Resume and Suspend Clothing Simulation nodes.

Time delay switch

With cloth simulation suspension, we are able to be more dynamic with cloth while still being able to optimize performance. However, using only an occlusion switch can lead to a dropping banner problem; wherein a cloth simulation has dynamic movement, and the player turns away (which pauses the cloth simulation), and then after some time turns back to see the cloth simulation hovering in mid-air before continuing its movement.

To solve this issue, we can use an occlusion switch and add a Boolean check to our suspension; in this way a delay before suspending the simulation can be used, giving the cloth enough time to finish its movement before coming to a final rest and remaining suspended.

Figure 3. Time delay switch.

Level of Detail

When creating a skeletal mesh and attaching an apex cloth file to it, that cloth simulation will always be tied to the zero value of the level of detail (LOD) of that mesh. If the mesh is ever switched off of LOD 0, the cloth simulation will no longer take place. Using this to our advantage, we can create an LOD 1 that is the same in every way as our LOD 0 (minus the cloth apex file), and use it as a switch whenever we want to use the cloth simulation (Figure 4).

Figure 4. Level of detail information.

Boolean switch

Now that we have a switch we can set up a simple blueprint to control it. By creating an event (or function), we can branch using a Boolean switch between simulating the cloth (LOD 0) and not simulating the cloth (LOD 1). This event could be called on a trigger entered to begin simulating the cloth meshes in the next area, and again when the player leaves that area to stop those simulations or any number of methods, depending on the game level.

Occlusion Culling Switch
Figure 5. Switch blueprint.

Occlusion culling switch

If a more automated approach is desired, occlusion culling can be used as the switching variable. To do this, call the Was Recently Rendered function, and attach its return to the switch branch (Figure 6). This will stop the cloth simulation when the actor is no longer rendered.

Figure 6. The was recently rendered function in the switch blueprint.

The problem with this method comes from the simulation reset that occurs when the simulation is switched back on. If the cloth mesh is drastically different when it is simulated, the player will always see this transition. To mitigate the chance of this happening, the bounds of the mesh can be increased with import settings. However, this also means intentionally rendering objects that cannot be seen by the player, so make sure it is worthwhile in terms of the game’s rendering demands.

A level design approach to solving this issue would include making sure all dynamically capable cloth meshes (such as flags) are placed in the same direction as the wind.

It may be possible to program a method in C++ that will save the position data of every vertex of the cloth simulation and translate the mesh back into that position when the simulation in turned back on. That could be a very taxing method, depending on the data structure used and the amount of cloth simulations in the level.

Figure 7. Cloth simulations without occlusion culling switch.

Figure 8. Cloth simulations with occlusion culling switch.

Combination/Set Piece Switch

If the level happens to have a very dynamic set piece that is important enough to always look its best, an additional branch that uses a Boolean switch can be attached to the actor; in figure 9 we call it Optimize Cloth?

Figure 9. Set piece switch.

With this new switch, importance can be given to certain cloth meshes that should always be simulated by switching their Optimize Cloth? value to false.

Using a set piece switch

In figure 10 below, three cloth meshes are flags that turn away and point backwards, relative to their starting position. It takes a few seconds for this to look natural, but because they really sell the fact that they are not hand animated, I set them to be set pieces (Optimize Cloth? false), so they are always being simulated.

Figure 10. Complex flags used with set piece switches.

Image of 3D Asset Pipeline

As computer hardware continues to advance, with software to match, we have entered an age where creating amazing-looking digital media is easier than ever before, even at an entry level. However, creating good design goes beyond just looks. It also must function well for the user, as well as for those down the pipeline who create and set up the functionality. Having a solid creative pipeline for creating assets is important in saving you time and frustration as well as develop a more interactive and playable asset.

When creating 3D assets, I break the process into four main parts: preplanning and research, the first pass, implementing and testing, and the final pass. This upfront work might seem like a lot of extra effort, but it saves time in the long run by identifying and resolving problems earlier in the pipeline.

Let’s dive in and see how these four steps can enhance your creative pipeline.

Understand What You Need to Make Before You Start to Make It

Novice digital artists often make the mistake of not taking the time to fully understand what they are going to make before they hop into their program of choice and start creating. The problem is that if you’re creating an asset for a game or commercial purpose, most likely some level of functionality and interactivity will need to be accounted for. By taking the time at the start to understand how the asset at hand is expected to work, you can break it down into components and piece it together as a whole. Otherwise, you run the risk of having to break apart an already completed asset and patch up the seams or, worse, start over.

Here are some tips to help you better understand what you need to make before jumping in to make it:

Find plenty of references and study them. As obvious as it sounds, many digital artists don’t spend enough time finding quality references to influence their design. I always ask the client to provide references, if possible, to get a better idea of what they want. I also allow myself a reasonable amount of time to find and study references to help me better understand the subject matter. Believe it or not, Pinterest* is a great tool for finding references and creating reference boards on a per-project basis.
Concept: quantity over quality. Normally you would think the opposite — quality over quantity — is true, but during the concept phase having numerous ideas gives you more options in selecting the best one, than does settling on one idea that is just okay. Make sure you have your references handy. A great practice is to take timed sprints of 25 minutes to an hour, spending at least 5 minutes per concept, but no more than 10. During this process, make sure to zoom out and focus on the overall form and not to get caught up in details, which you’ll do after you review the first few rounds of concepts.

Different versions of the main 3D figure

Watch concept video

A goodmodel sheet removes a lot of the guesswork. A standard model sheet will have a front, side, and top view of the asset you plan to make, acting as a great guide for the modeling process. To get the most from your model sheets, preplan any functioning parts to see if they mechanically and aesthetically work in all views. This allows you to also see the various functioning parts, so you have a better idea of how many pieces your asset may need. If something seems off, you can address any issues before the modeling process, saving you time and frustration.

Graphics of the main 3D figure

The benefits of taking adequate time to preplan will result in a more well thought out product, ultimately saving time for you and your teammates.

Now that we have a good understanding of what we want to make and a fancy model sheet as guide, let’s model it.

First Passes for First Impressions

The first pass is similar to the concept phase, in that we want to focus on the main forms and functionality. As mentioned before, the easy part of creating a 3D asset is making it look good, so at this stage we want to ensure our asset’s interactivity is center stage and developed properly. Once again, keeping it simple at this stage allows for more wiggle room if we need to address any issues after testing. Details can be added easily later, but having to work around them can become problematic and frustrating.

Here are some tips to speed up the modeling process as well as optimization:

Set the scene scale before modeling. It's easy to want to start creating as soon as we open our program of choice, but we need to ensure that what we're making matches the specifications given. Although scaling up or down after the fact isn't that hard to do, not having to do it at all is much easier.
Not every asset needs to start as a single mesh. It's much easier to combine meshes together and clean up loose ends rather than rack your brain on how a single mesh can be built outward. This is especially important for functioning parts, because having a separate mesh that can be altered on its own without affecting other meshes is easier to deal with.
Mind your edge flow and density. Having a smooth mesh is appealing to the eye, but at this phase having a higher poly density increases the amount of detail we have to maintain, sometimes for even the smallest changes. Keep it simple for now, because we can always add extra subdivisions once we're happy with the form as a whole.

3D figure set to Merge Center

For symmetrical pieces, focus on one side and mirror the appropriate axis. This approach guarantees that the mesh will be symmetrical, saving you a lot of time reviewing and cleaning up. Expect to do this multiple times as you develop the mesh to get a better sense of the object as a whole. If you end up with any gaps down the seams when mirroring, you can either sew the vertexes together with Merge To set to Center, or you can select all the vertices on the seam, move the pivot point to the central origin, and then scale them toward each other.
If duplicating a mesh, UV it first. UV unwrapping is already time intensive, so why spend extra time when you can do it once and have it carry over to the rest. Although you can copy and paste UVs for duplicated meshes, sometimes you may end up with undesirable results, which requires extra time to fix.

After the mesh resembles a rough form of what we're aiming for, I recommend setting up some base materials with simple colors to mock up the details. If we’ve managed our edge flow well enough, we should have proximate areas for a base level of textures that we'll apply per faces. Doing this saves a lot of time because we’re not committing to the arduous task of perfecting our UV unwrapping, which will come during the polish phase. We’ll also have more control updating colors on a per-material basis when we test in-scene.

Colored base materials to mock up the details

Now all we need to do is export our asset so we can test it in the engine to ensure it looks and functions as intended. However first we need to do some important prep work in order to avoid having to re-export and import multiple times.

Make sure to address the following details before exporting:

Is your hierarchy concise and are your meshes properly named? With any asset that's going to have various functioning parts, you will most likely be dealing with multiple meshes. Taking the time to have a concise hierarchy will make sense of what meshes interact together or independently, and properly naming them will avoid confusion as to what each part in the hierarchy is.

Example of properly named meshes

Are the origins of your points of interaction set up properly and locked in? Not every mesh you create will need its origin point at 0,0,0. This is especially true when you're working with multiple meshes and moving them about based on the hierarchy. So we want to be sure to set the pivot points to where they make sense and freeze the transformations when we have them where we want them. This will make it easier to manipulate any aspect of the asset in a scene.

Steps to set Freeze Transformations

Are your materials set up and managed well? Try to avoid using default materials and shaders, because it will overload the project with duplicate materials in various places, causing confusion when you need to assign them in the editor. When dealing with any faces that may have been missed and left as default material, I recommend going to the Hypershade menu, and then holding down right-click to select all the faces with that material. If there aren't any, we're good to go. If there are, they are now selected and we can assign them to what we want them to be.

Watch exporting setup video

With our asset set up and prepped, we can export without having to make any major changes and re-exporting. When using Maya*, I recommend using the Game Exporter because its tool settings for exporting are easy to understand and adjust. It's also good practice to set the exporter to Export Selection for more control over what you're exporting and so you don't end up with stray meshes, materials, cameras, and so on. Once we export our asset, we can now test it in scene and see how it works and feels.

Steps to export selection

Time to Test Your Mettle and See How It Came Out

With all the preparations we took to ensure a solid design before jumping into modeling and making a rough version of the model focusing on functionality instead of minor details, it's time to see where we stand. We'll start by adding our exported model to the project in an appropriate folder. Metadata files are automatically created, as well as a folder with any materials we assigned. Because we chose to make multiple materials as opposed to a single UV layout/material to save time by not overcommitting to details, we will have to clean up this folder once we have the final version of our asset.

Now, let's drag our asset into the scene to review and test it to ensure it works and feels as intended.

Here's what we'll want to watch for:

Is it in scale with the rest of the assets in the scene? This is the most obvious thing to check, and it will stand out the most if it's drastically off. If the scaling is off, it is important NOT to scale the in-scene asset. Doing this will not only have no effect on the scale of the base asset, but it will also cause frustration down the road because we’ll have to manually scale the asset each time it is brought into the scene. To adjust the scale in-engine, we want to do so via the asset's Import Settings under Scale Factor. However, I recommend noting the scaling difference and adjusting it when we make a final pass on our asset and then re-exporting, because any updates to scale factor are stored in the metadata file, which may change when reimporting, not retaining the scaling changes we make.
Do any areas of the mesh not read as intended? Even though we made a stellar model sheet to work out any odd details before we began modeling, sometimes once the mesh is in-scene to scale among other assets, some areas and features may not read as well when in context. Rather than reviewing the asset in Scene View, we want to view it from Game View to get a better idea of how it will look to the user and avoid being overly critical of areas that may not even be noticeable when in use. Our goal is to achieve overall readability. If the asset doesn't quite seem to be what we intended, we need to note why and figure out ways to make corrections for our final pass.
Are the pivot points where they need to be and zeroed out? We took the time before exporting to ensure our pivot points were where they needed to be and locked in, so now we want to ensure this carried over properly in-engine. Before pondering what might have gone wrong, check to make sure Tool Handle mode is set to Pivot and not Center. As we double-check each mesh within the asset's hierarchy, we also want to verify that the transforms are zeroed out as well. This way if any part accidentally gets moved out of place, anyone can zero it out and it'll be right where it needs to be.

Checking minor details like this will ensure things are done right the first time and let us progress more quickly to other tasks. That said, we can't always expect perfection on the first try, which is why it's important to keep things rough at first, and then refine them as we go along.

Ideally, with all the precautions we took to ensure high quality, minimal, if any, alterations will be necessary after we've reviewed our work, in which case we can get cracking at a final pass and make it nice and shiny.

Polish It up and Put a Bow on It

This is the moment we've been working toward. First, we need to address any shortcomings we noted when reviewing in-scene, keeping in mind the techniques we used during our first pass. After we have made any alterations to the base form, we can start cleaning up the mesh and adding those sweet deets we've wanted since the beginning.

Here are some pointers to keep in mind as we’re cleaning up and polishing:

Be selective when adding subdivisions. When smoothing out a mesh, it’s best to avoid just clicking the Smooth Tool button on the entire mesh, because it doubles the number of polygons. Yes, it will look nice and smooth, but at the cost of performance in the long run. Instead I like to be more selective with the areas that I add subdivisions to. Start with larger areas and smooth the edges to see if that does the trick. If not, resort to selecting faces of the area that needs smoothing and use the smooth tool then, because it allows better control of poly count.
Bevel hard edges to make them seem less sharp. This is similar to adding subdivisions to faces, but instead you’re adding to edges. This makes them appear softer where softening the edge doesn't quite do the trick (usually edges with faces at 135 degrees or less). It also has a nice effect when baking ambient occlusion. As with smoothing faces, we want to be selective when choosing which edges to bevel so as not to drastically increase poly count.

Example of beveled hard edges

Be mindful of edge flow and density. There's no need to have multiple edge loops on large flat surfaces. Minimize poly density when you’re able to as long as it does not disrupt edge flow. This will make UV unwrapping easier as well.

Example of minimized poly density

Don’t be afraid of triangles. There’s a common misconception with 3D modelers of various skill levels that a mesh needs to consist primarily of quads. Although we don’t want anything higher than a quad, there’s nothing wrong with tris. In fact, the majority of game engines, such as Unity3D* and Unreal*, will convert quads into tris for optimization.

Example of mesh made with triangles

Delete deformer history from time to time. The more we clean up our mesh, the more deformer history is being saved, crowding our attribute editor and possibly affecting the use of other deformer tools. When we're happy with the results after using deformers such as bevel and smooth, we can delete deformer history by selecting the mesh and then going to Edit>Delete All by Type>History or by pressing Alt+Shft+D. This will ensure a clean attribute editor and prevent other deforming tools from not performing properly down the line.

Steps to delete deformer history

Although we normally want to aim for a low poly count for our assets, when creating assets for virtual reality (VR), we don’t have that luxury. Keep in mind that because the user can get up close and personal with many of the assets in a VR environment, hard edges can look daunting thus requiring slightly higher poly counts.

Now that our mesh is cleaned up and polished, it's time to move on to one of the more tedious parts of 3D modeling: UV unwrapping. Here are a few pointers to make the most of your UVs:

Start by planar mapping. Automatic UV unwrapping may seem like a good idea, and for simple meshes it can be, but for more complex meshes it ends up slicing UVs into smaller components than you want, and then you have to spend time stitching them together. On the other hand, planar mapping projects the UVs on a plane silhouetting the mesh similar to a top, side, or front view, and makes a single shell that you can break into components of your choosing. I find it best to choose the plane with the largest surface area when planar mapping.

Steps to set planar mapping

Cut the shell into manageable sections. After planar mapping, you can create seams that will break the shell into smaller pieces by selecting edges and using the Cut UV tool. This makes it easier to manage sections as opposed to trying to unfold a larger shell and having to spend time making minor adjustments. You can always sew shells together for fewer seams after the fact, saving time and frustration.
Utilize UV image and texture tools for less confusion. UVs can be confusing at times, because a shell may look the way you want but will be flipped, giving you undesired results. To ensure you know which way your UVs are facing, enable the Shade UVs tool (blue=normal, red=reversed). Another tool worth enabling is the Texture Borders toggle. This clearly defines the edges of your UV shells in the UV editor, as well as on the mesh in the main scene, making it easier to see where your UV seams are.

Example of Shade UVs tool usage

Example of Texture Borders tool enabled

Areas that will have more details should have a larger UV space. Although it's nice to have all the UVs to scale with each other, often that will be areas in which we want more detail than in others. By having the areas that require more detail utilizing more UV space, we can ensure those sections are clear.
Think of UV unwrapping as putting together a puzzle can make the process seem more like a challenge and less like a dreaded chore.

Once we have our UVs laid out and reduced to as few materials as possible, we can export our mesh (using the same prep guidelines from the rough phase) and bring it into Substance Painter*.

Substance Painter is a great tool, because it gives you 3D navigation tools similar to that of many 3D modeling programs, layer and brush customizations of digital art programs, and the ability to paint on either the UV layout or mesh itself. I recommend starting with a few fill layers of the material of your choice to recreate the base materials from the rough-out phase. By using layer masks, we can quickly add or remove our selected materials per UV or UV shells. Custom brushes with height settings can add details such as mud, scratches, fabrics, and so on that can be baked into normal maps, adding a lot of life with a few simple strokes.

Before exporting our textures and materials, we need to do some prep work in order to get the most out of what Substance Painter has to offer:

Increase or decrease resolution. One of the advantages of Substance Painter is the quality at which it can increase or decrease resolution and go back again without forfeiting details. Not all assets need to be at a high resolution. If your asset reads well with little to no noise, pixilation, or distortion when at a lower resolution, change it. With Substance Painter you can always go back up in resolution without losing the original amount of detail. If your asset is going to be used in VR, it’s best to increase resolution to ensure all details are as crisp as possible.

Steps to increase resolution

Bake maps. This will make the most of any height details you create by baking them to a normal map, and ambient occlusion maps add subtle shadows that give assets that little extra pop to boost their readability. When baking maps, I usually set their resolution down a step from my material and texture maps as they tend to be subtler.

Steps to bake maps

Choose an export setting based on the designated engine to be used. Another great feature of Substance Painter is the export presets based on the various engines that can be used. This helps ensure you don’t get any strange effects when adding your maps to the asset in the engine.

Options to choose export settings

Watch exporting in painter video

We did it! We took the time to plan out our asset to its fullest, roughed out the major forms with functionality in mind, tested our asset in-engine, and detailed and cleaned up our asset in a final pass. Now we can hand off our hard work with the confidence that not only does our asset look great, but it’s also set up in a way that works efficiently and is easy to understand, so that someone down the pipeline can add interactivity and playability. And with all the time we saved and frustration we avoided, our level of creativity remains high for the next project.

Final image of the 3D figure

Carnegie Mellon University

Principal Investigator

Dr. Ole J. Mengshoel is a Principal Systems Scientist in the Department of Electrical and Computer Engineering at CMU Silicon Valley. His current research focuses on: scalable computing in artificial intelligence and machine learning; machine learning and inference in Bayesian networks; stochastic optimization; and applications of artificial intelligence and machine learning. Dr. Mengshoel holds a Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign. His undergraduate degree is in Computer Science from the Norwegian Institute of Technology, Norway. Prior to joining CMU, he held research and leadership positions at SINTEF, Rockwell, and USRA/RIACS at the NASA Ames Research Center.

Description

Scalability of artificial Intelligence (AI) and machine learning (ML) algorithms, methods, and software has been an important research topic for a while. In ongoing and future work at CMU Silicon Valley, we take advantage of opportunities that have emerged due to recent dramatic improvements in parallel and distributed hardware and software. With the availability of Big Data, powerful computing platforms ranging from small (smart phones, wearable computers, IoT devices) to large (elastic clouds, data centers, supercomputers), as well as large and growing business on the Web, the importance and impact of scalability in AI and ML is only increasing. We will now discuss a few specific results and projects.

In the area of parallel and distributed algorithms, we have developed parallel algorithms and software for junction tree propagation, an algorithm that is a work-horse in commercial and open-source software for probabilistic graphical models. On the distributed front, we are have developed and are developing MapReduce-based algorithms for speeding up learning of Bayesian networks from complete and incomplete data, and experimentally demonstrated their benefits using Apache Hadoop* and Apache Spark*. Finally, we have an interest in matrix factorization (MF) for recommender systems on the Web, and have developed an incremental MF algorithm that can take advantage of Spark. Large-scale recommender systems, which are currently essential components of many Web sites, can benefit from this incremental method since it adapts more quickly to customer choices compared to traditional batch methods, while retaining high accuracy.

Caffe* is a deep learning framework - originally developed at the Berkeley Vision and Learning Center. Recently, Caffe2*, a successor to Caffe, has been officially released. Facebook has been the driving force in developing developing the open source Caffe2 framework. Caffe2 is a lightweight, modular, and scalable deep learning framework supported by several companies, including Intel. In our hands-on machine learning experience with Caffe2, we have found it to support rapid prototyping and experimentation, simple compilation, and better portability than earlier versions of Caffe.

We are experimenting with Intel’s PyLatte machine earning library, which is written in Python and is optimized for Intel CPUs. Goals of PyLatte includes ease of programming, high productivity, high performance, and leveraging the power of CPUs. A CMU SV project has focused on implementation of speech recognition and image classification models using PyLatte, using deep learning with neural networks. In speech recognition experiments, we have found PyLatte to be ease to use, with a flexible training step and short training time.

We look forward to continuing to develop parallel, distributed, and incremental algorithms for scalable intelligent models and systems as an Intel® Parallel Computing Center at CMU Silicon Valley. We create novel algorithms, models, and applications that utilize novel hardware and software computing platforms including multi- and many-core computers, cloud computing, MapReduce, Hadoop, and Spark.

Related websites:

http://sv.cmu.edu/directory/faculty-and-researchers-directory/faculty-and-researchers/mengshoel.html
https://users.ece.cmu.edu/~olem/omengshoel/Home.html
https://works.bepress.com/ole_mengshoel/

The earliest computers were often pushed to the limit performing even a single task, between hammering the hard drive, swapping memory frantically, and crunching through computations. With Microsoft Windows* 3.1 and then Windows 95, multi-tasking began to take form, as systems were finally able to handle more than one program at a time. Now, with the advent of double-digit cores in a single CPU, the concept of “megatasking” is gaining traction. The latest entries for the enthusiast are in the Intel® Core™ X-Series processor family, ranging from 4 to 18 cores. These Intel® Core™ i9 processors can simultaneously handle tasks that previously required multiple complete systems—enter extreme megatasking.

Intel® Core™ i9 Processor Extreme Consider the challenge of simultaneously playing, recording, and streaming a Virtual Reality (VR) game. Game studios rely on video-trailers to spark interest in new VR titles, but showing off the experience of a 3D game in a 2D video has always been a challenge, as a simple recording of what the player sees offers only part of the story. One way to solve this – mixed reality – captures the player against a green screen, and then blends the perspectives into a third-person view of the player immersed in that world. (For more information about this technique, refer to this article.) This often requires one PC to play and capture the game, and another PC to acquire the camera feed with the gamer. Add the idea of streaming that complete session live to a global audience of expectant fans, and you could be looking at a third system for encoding the output into a high-quality uploadable format. But an Intel team recently demonstrated that production crews can now complete all of these CPU-intensive tasks on a single Intel® Core™ i9 processor-based system, with each engaged core chugging merrily along.

Moore’s Law and System Specs

When originally expressed by Intel co-founder Gordon Moore in 1965, “Moore’s Law” predicted that the number of transistors packed into an integrated circuit would repeatedly double approximately every two years (Figure 1). While transistor counts and frequencies have increased, raw compute power is now often measured in the number of cores available. Each core acts as a CPU and can be put to work on a different task, enabling better multi-tasking. But simple multi-tasking becomes extreme megatasking with simultaneous, compute-intensive, multi-threaded workloads aligned in purpose.

Moore's Law change for technology
Figure 1. Moore's Law expresses the accelerating rate of change for technology (source: time.com)

The calculation originally used to measure supercomputer performance now applies to desktop gaming PCs: FLOPS, or FLoating point Operations Per Second. These are used to measure arithmetic calculations on numbers with decimal points, which are harder to make than operations on integers. The equation is:

FLOPS = (sockets) x (cores per socket) x (cycles per second) x (FLOPS per cycle)

Picture a single-socket CPU with six cores, running at 3.46 GHz, using either single-precision (8) or double-precision (16) FLOPS per cycle. The result would be 166 gigaflops (single-precision) and 83 gigaflops (double-precision). By comparison, in 1976, the Cray-1 supercomputer performed just 160 megaflops. The new Intel® Core™ i9-7980XE Extreme Edition Processor runs at about 4.3 GHz (faster if overclocked) and thus should calculate to 1.3 teraflops. For perspective, the world’s fastest supercomputer runs 10.65 million cores, performing at 124.5 petaflops. In 1961, a single gigaflop cost approximately USD 19 billion in hardware (around USD 145 billion today). By 2017, that cost had fallen to USD 30 million.

To achieve that raw compute power, the Intel® Core™ i9-7980XE Extreme Edition Processor uses several technology upgrades. With up to 68 PCIe* 3.0 lanes on the platform, gamers have the ability to expand their systems with fast Intel® Solid State Drives (Intel® SSDs), up to four discrete GFX cards, and ultrafast Thunderbolt™ 3 technology solutions. Updated Intel® Turbo Boost Max Technology 3.0 improves core performance. Intel® Smart Cache has a new power-saving feature that dynamically flushes memory based on demand. The Intel Core X-series processor family is also unlocked to provide additional headroom for overclockers. New features include the ability to overclock each core individually, Intel® Advanced Vector Extensions 512 (Intel® AVX-512) ratio controls for more stability, and VccU voltage control for extreme scenarios. Combined with tools like Intel® Extreme Tuning Utility (Intel® XTU) and Intel® Extreme Memory Profile (Intel® XMP), you have a powerful kit for maximizing performance.

Intel reports that content creators can expect up to 20 percent better performance for VR content creation, and up to 30 percent faster 4K video editing, over the previous generation of Intel® processors (see Figure 2). This means less time waiting, and more time designing new worlds and experiences. Gamers and enthusiasts will experience up to 30 percent faster extreme megatasking for gaming, over the previous generation.

Gregory Bryant, senior vice president and general manager of the Client Computing Group at Intel Corporation, told the 2017 Computex Taipei crowd that the new line of processors will unleash creative possibilities throughout the ecosystem. “Content creators can have fast image-rendering, video encoding, audio production, and real-time preview—all running in parallel seamlessly, so they spend less time waiting, and more time creating. Gamers can play their favorite game while they also stream, record and encode their gameplay, and share on social media—all while surrounded by multiple screens for a 12K experience with up to four discrete graphics cards.”

Figure 2. Intel® Core™ X-series processor family partial specifications.

Another way to measure system performance is through CPU utilization, which you can find in your own Microsoft Windows PC through Task Manager > Resource Monitor. Josh Bancroft, Intel Developer Relations Content Specialist working with the gaming and VR communities, was part of the Intel® Core™ Extreme Processors rollout at Computex Taipei in early 2017, and helped coin the term “extreme megatasking” in showing off CPU utilization. Bancroft used one of the new Core i9 X-Series processor-based PCs to show a green-screen VR mixed-reality demo, simultaneously playing a VR title at 90 fps, recording the game-play, compositing the player into the scene from a separate camera, and then combining and syncing the images precisely, and streaming the result live to Twitch*.

Later, Bancroft was part of the first Intel® Core™ i9 Extreme Processor rollout at E3 in Los Angeles, where he showed the same demo on a system with 18 cores. He still recalls that event fondly: “It was really exciting to do the world’s first public demo on an 18-core i9-based system. The case was gigantic, with two water loops with this blue, opaque fluid, and really cool-looking.”

The demo, hosted by Gregory Bryant, went off smoothly, but wasn’t without tension. “When you stack those 4 or 5 extreme tasks together, you can overload a system and bring it to its knees,” Bancroft explained. But the 18 cores performed flawlessly, with the CPU utilization graphs showing what was going on under the hood. “When we turned on the recording, when we turned on the streaming, when we did everything that cranked it up, you saw those 36 graphs jump up to 90-plus percent utilization. You could see all of those threads were working really hard.”

The demo illustrated Intel’s commitment to VR, PC gaming, and multi-core processing power in one neat package. Since VR requires enormous resources to pull this off smoothly, it’s a perfect world in which to demo new systems in general. Using Bancroft’s mixed-reality technique allows developers, streamers, and content creators to make trailers and show people a VR experience without actually having to put them in a headset. Best of all, one new system can replace the multiple devices previously required to pull it off.

Trailers are one of the most important tools in an indie developer’s marketing toolkit. Creating a compelling, enticing game trailer for VR is of vital importance to indies getting started on their own titles. However, the 3D experience of VR doesn’t translate well to a 2D trailer, which is where the mixed-reality technique comes in. Mixed-reality VR was pioneered by Vancouver, BC-based Northway Games*, run by husband-and-wife team Sarah and Colin Northway, who added enabling code in their Unity-based game Fantastic Contraption* (Figure 3). The ability to record what the gamer is seeing as they play, as well as how they would look in a third-person view, greatly helps market VR titles by communicating the experience. In addition, the Northways showed how entertaining their game was, by including shots of onlookers watching and laughing from a sofa.

Figure 3. Creating and streaming a mixed-reality trailer—like this one for Fantastic Contraption*—is now possible on a single PC.

Not Invented Here, Just Enhanced

Bancroft is quick to share the credit for his mixed-reality, single-machine demos, which he learned in a cramped studio, complete with scaffolding, lighting, a green screen, and multiple cameras. The Northways wrote a blog post that offered a step-by-step walkthrough of the tasks involved, and Bancroft relied on it heavily to get started. From there, he and his team came up with some additional tweaks, all developed and shared openly.

Many of the software programs require immense power; just playing a VR title for Oculus Rift* or HTC VIVE* at 90 fps is quite a task. At a lower frame-rate, players can experience dizziness, vomiting, and other physical reactions, so a machine has to start with the power to play a game properly, before engaging any more of a load.

For mixing and compositing, Bancroft is fond of MixCast*, a growing VR broadcast and presentation tool that simplifies the process of creating mixed-reality videos. Created by Blueprint Studios*—a Vancouver, BC-based leader in the interactive technology space—the tool enables dragging and dropping the MixCast VR SDK into Unity projects, so end-users can showcase their experience in real time.

In addition, Bancroft uses Open Broadcaster Software (OBS), a free and open source software program known to most streamers for compositing, recording, and live streaming. It offers high-performance, real-time audio- and video-capturing and mixing; video filters for image masking, color correction, and chroma keying; and supports streaming platforms such as Twitch*, Facebook*, and YouTube*.

Of course, there are multiple tools to create the same end result, but that’s the current software stack. A full description of Bancroft’s efforts can be found at <link to Mega-tasking step-by-step article>.

Jerry Makare is the Intel® Software TV video producer, and works closely with Josh Bancroft to create videos that test the raw-compute boundaries of extreme megatasking. He sees important benefits to using a single, powerful system for VR. “Being able to split our tasks into multiple places, especially rendering, is a big deal,” he said. “Once you start rendering, generally you end up killing your machine. There’s almost nothing else you can do. The ability for us to split these large, compute-intensive tasks like rendering and compositing into multiple buckets is a major time-saver.”

Makare is particularly eager to task an Intel® Core™ i9 processor-based system with building out a very large-scale room, using a 3-D modeling program to get a baseline for how much time it saves. He also looks forward to putting the new system to work on some real-world applications that his team can learn from.

Eye to the Future

With so much raw computing power now available, it’s exciting to think of the different ways in which these new systems could be used. Gamers can anticipate more vivid, immersive, and realistic experiences. Creating and editing video from raw, 4K footage was a complex, processing-intensive chore, but now professionals and novices alike can edit in native 4K, creating stunning visual effects, and compose music with more depth and nuance. The reach of VR extends beyond gaming into virtual walkthroughs, construction planning, city modeling, and countless simulation scenarios. Scientists in fields such as biology, geology, chemistry, medicine, and astronomy may unlock even more secrets, thanks to the raw computing power behind extreme megatasking.

Additional Resources

Intel® Core™ i9-7980XE Extreme Edition Processor Resources:
https://www.intel.com/content/www/us/en/products/processors/core/x-series/i9-7980xe.html
Intel® Developer Zone (Intel® DZ):
https://software.intel.com/en-us
Computex Taipei 2017 Keynote:
https://www.youtube.com/watch?v=C2DrvCTHAcA&feature=youtu.be&t=37m4s
E3 2017 Intel Press Room:
https://newsroom.intel.com/press-kits/2017-e3/

Before you Begin

The Intel® Context Sensing SDK for Linux* is a Node.js*, Go*, and Python*-based framework supporting the collection, storage, sharing, analysis, and use of sensor information.

This getting started guide contains steps to set up the broker and Go framework supported by the SDK, then run a sample provided in the SDK.

Additionally, this document contains tutorials to create a simple provider, sample application using the provider, a microservice to run the application, and steps to run the microservice to publish events to the broker.

Every command or chunk of code can be copy-pasted directly from the document without any required modifications unless explicitly stated.

Requirements

Software

OS: Ubuntu* 14.04 or 16.04
Go: 1.8.3
Docker*: 17.0.3

Network

The document assumes Intel proxies are configured on the host machine.

Verify you have access to below URLs:

hub.docker.intel.com
hub.docker.com

Getting Started

Setting up the Broker

There are two options to set up the broker:

Dockerized: Using the context repo from hub.docker.intel.com(preferred)
Non-dockerized: Using the context-broker-VERSION.tgz file

This document only covers the preferred Dockerized method.

The section assumes you have Docker already set up with Intel credentials. (Refer: Setting up Docker)

The broker requires a running instance of MongoDB*.

Use Docker to pull the mongo image onto your machine:
```
docker pull mongo
```
Create a container named mymongodb and run it for the very first time:
```
docker run --name=mymongodb -d mongo
```

Note: For subsequent runs, use: docker start mymongodb

Pull the broker image:

docker pull hub.docker.intel.com/context/context-broker:v0.10.5

Create a container named contextbroker and run it for the very first time:

docker run --name contextbroker -it -p 8888:8888 --link mymongodb -e MONGODB_HOST=mymongodb hub.docker.intel.com/context/context-broker:v0.10.5

Note: For subsequent runs, use: docker start -i contextbroker
-i or -it is used to run in the foreground to see the output in the current terminal.

To stop the context broker instance, use CTRL+C to interrupt when running foreground or docker stop contextbroker when running in background.

In order to remove the container if it's preventing the use of Docker, use: docker rm –f contextbroker

Setting up the SDK for Go

If you haven’t set up the required Go environment on your machine (Refer: Setting up the Go Environment)

Use the command go env to ensure both $GOPATH and $GOROOT are populated with paths for Go projects and Go distribution, respectively.

Download the Go X Net Package:
```
go get golang.org/x/net
```
Download the Logrus* package:
```
go get github.com/sirupsen/logrus
```

Note: In some cases you may encounter the error 'can't load package: package golang.org/x/net: no buildable Go source files in $GOPATH/src/golang.org/x/net'. Verify your setup by checking if $GOPATH/src/golang.org/x/net actually contains items from https://github.com/golang/net repo.

Copy the context_linux_go directory from the extracted release package to the $GOPATH/src directory.

Running an SDK Sample

Make sure a broker instance is running.

To run the local_ticktock sample, navigate to the $GOPATH/context_linux_go/samples/local_ticktock directory and enter: go run main.go

Note: All the providers and samples provided in the SDK can be found in the $GOPATH/context_linux_go/providers and $GOPATH/context_linux_go/samples directories respectively.

Tutorials

The tutorials showcase how to use the SDK to create a provider, a sample application that utilizes the provider, and a microservice that can run the sample application.

Creating a Simple Provider

Next, we'll create a provider that takes a time period in milliseconds as options and publishes the string “Hello World” to the broker at the supplied interval.

Create a directory named simpleprovider in the $GOPATH/context_linux_go/providers directory.
Create a file named simpleprovider.go inside the directory.

A provider requires implementing functions and structs required by the context core. The below steps showcase the minimum steps required to create a basic provider, with only the createItem function being specific to this tutorial.

In the following steps, you'll be adding lines of code to the simpleprovider.go file:

Encapsulate all the contents of the provider in a package:
```
package simpleprovider
```

Import the required packages, time and core:

import (
     "context_linux_go/core"     "time"
)

Declare a constant identifier that other providers can use to identify the data coming from our simpleprovider:

const (
     // SimpleProviderType is the URN for a data from this provider
     SimpleProviderType string = "urn:x-intel:context:thing:simpleprovider"
)

Define a schema to register with the broker.
This will enable the broker to identify the unique identifier and perform necessary schema validation:

// SimpleProviderSchema schema satisfied by this provider, the value is placed in the “data"
var SimpleProviderSchema = core.JSONSchema{
     "type": SimpleProviderType,
     "schema": core.JSONSchema{
          "type": "object",
          "properties": core.JSONSchema{
               "data ": core.JSONSchema{
                    "type": "string",
               },
          },
     },
     "descriptions": core.JSONSchema{
          "en": core.JSONSchema{
               "documentation": "Simple string producer",
               "short_name":    "SimpleString",
          },
     },
}

Define a struct that holds an instance of the provider.
We will use the stopChan variable to start/stop the provider and also provide a reference to the additional options that the provider can accept:

// Provider holds an instance of the simple provider. 
// Methods defined for this type must implement core.ProviderInterface
type Provider struct {
     ticker   *time.Ticker
     stopChan chan bool
     options  *Options
}

Define the options for this provider.
We will supply the time interval after which the string should be published:

// Options that are provider specific
type Options struct {
     core.ProviderOptions
     Period int // Period of ticking in milliseconds
}

We can supply multiple URN identifiers in a single provider. Define a static function to return all the types supported in this provider:

// Types is a static function that returns the types this Provider supports (URN and schema)
func Types() []core.ProviderType {
     return []core.ProviderType{
          core.ProviderType{URN: SimpleProviderType, Schema: SimpleProviderSchema}}
}

Define a function that can return the Types supported:

// Types is a provider specific function that queries the type of an ProviderInterface instance
func (p *Provider) Types() []core.ProviderType {
     return Types()
}

Define the New function, which can set options called from our sample:

// New creates a new simpleprovider.Provider with the specified options
func New(options *Options) *Provider {
     var dp Provider
     dp.options = options
     dp.stopChan = make(chan bool)
     return &dp
}

Implement the Start function. In this email, we'll supply the ItemData to publish and also decide when to publish with the help of the ticker:

// Start begins producing events on the item and error channels
func (p *Provider) Start(onItem core.ProviderItemChannel, onErr core.ErrorChannel) {
     p.ticker = time.NewTicker(time.Millisecond * time.Duration(p.options.Period))
     go func() {
          for {
               select {
               case <-p.ticker.C:
                    onItem <- p.createItem()
               case <-p.stopChan:
                    close(onItem)
                    close(onErr)
                    return
               }
          }
     }()
}

Implement the createItem function. This function populates the ItemDatawith our string:

// Generates a new simple provider item
func (p *Provider) createItem() *core.ItemData {
     var item = core.ItemData{
          Type: SimpleProviderType,
          // Value map must match the schema
          Value: map[string]interface{}{"data": "Hello World"},
     }
     return &item
}

Implement the GetItem function:

// GetItem returns a new simple provider item. Returns nil if itemType is not recognized
func (p *Provider) GetItem(itemType string) *core.ItemData {
     if itemType != SimpleProviderType {
          return nil
     }
     return p.createItem()
}

Implement the Stop function to stop producing items:

func (p *Provider) Stop() {
     p.stopChan <- true
     if p.ticker != nil {
          p.ticker.Stop()
      }
}

We must implement the GetOptions function to return a pointer to ProviderOptions in the Sensing core:

// GetOptions returns a pointer to the core options for use within the Sensing core
func (p *Provider) GetOptions() *core.ProviderOptions {
     return &p.options.ProviderOptions
}

Creating a Sample Application Utilizing a Provider

A basic sample application utilizing a provider requires creating channels for onStart, onError and onItem to interface with the provider. Additionally, the Sensing API takes options, onStart, and onError as input. We can also supply options required as input for the provider itself.

Create a directory named simpleProviderSample in the $GOPATH/context_linux_go/samples directory.
Create a file named main.go inside the directory.

In the following steps, you'll be adding code to the main.go file:

Encapsulate all the contents of our sample in a package:
```
package main
```

Import the required package: core, sensing, our simpleprovider, and fmt (to print to the terminal):

import (
	"context_linux_go/core""context_linux_go/core/sensing""context_linux_go/providers/simpleprovider""fmt"
)

Implement the main function. We will supply the channels for onStart, onError, and onItem from the context core:

func main() {
	onStart := make(core.SensingStartedChannel, 5)
	onError := make(core.ErrorChannel, 5)
	onItem := make(core.ProviderItemChannel, 5)

Supply the provider options in the main function for the sensing core such as broker ipAddress and port, an indicator to publish to the broker, the name of our sample application, onStart, and onError:
```
	options := core.SensingOptions{
		Server:      "localhost:8888",
		Publish:     true,
		Application: "go_simpleprovider_application",
		OnStarted:   onStart,
		OnError:     onError,
	}
```
Create a new instance of Sensing and provide the sensing options in the main function:
```
	sensing := sensing.NewSensing()
	sensing.Start(options)
```
Create an instance of the simpleprovider and supply the time period in the provider options in the main function:
```
	spProvider := simpleprovider.New(&simpleprovider.Options{Period: 1000, ProviderOptions: core.ProviderOptions{Publish: true}})
```
Note: The above line is a single line of code.

Enable sensing and provide a reference to our provider instance. In this example, we'll print the URN type and actual data every time our provider generates ItemData. We'll stop our provider if any error is detected.

	for {
		select {
		case <-onStart:
			fmt.Println("Started sensing")
			sensing.EnableSensing(spProvider, onItem, onError)
		case item := <-onItem:
			fmt.Println(item.Type, item.Value)
		case err := <-onError:
			fmt.Println("Error", err)
			sensing.Stop()
			return
		}
	}
} //end of main function

Creating a Microservice

We can encapsulate an application and other dependencies inside a single service using Docker.

Dockerizing our application helps to secure the implementation (source code) and dynamically configure connections to other services, such as the broker, without modifying the source code on host machines.

Create a file named SimpleProviderDockerfile in the $GOPATH/context_linux_go directory. You'll be editing this file in the steps below.
Note: There is no extension in the name of the file.

Provide the dependencies required by the SDK, as well as the Intel proxy information:

FROM golang:1.8.3-alpine3.5

RUN mkdir /app
ADD ./samples /app/
ADD . /go/src/context_linux_go/

ENV http_proxy=http://proxy-chain.intel.com:911
ENV https_proxy=http://proxy-chain.intel.com:912

RUN apk add --no-cache git \
    && go get golang.org/x/net/websocket \
    && go get github.com/sirupsen/logrus \
    && apk del git

WORKDIR /app/.

Provide a name (simple_provider_client) and a path to our sample application (simpleProviderSample/main.go), then run the sample application:
```
RUN go build -o simple_provider_client simpleProviderSample/main.go

CMD ["./simple_provider_client"]
```

Running your micro service

Ensure the broker is running on your machine (Refer: Setting up the Broker).

Build the image locally with a tag using the Docker file.
```
docker build --tag smp:latest -f SimpleProviderDockerfile . 
```
Note: The DOT at the end is required in the above command.
Create a container named smp, tagged as latest. Run the container for the very first time:
```
docker run --name=smp --network host -e http_proxy=”” -e https_proxy=”” smp:latest 
```
Note: For subsequent runs use: docker start -i smp
-i or -it is used to run in the foreground to see the output in the current terminal.

To stop the smp instance, use CTRL+C to interrupt when running in the foreground or docker stop smp when running in the background. In order to remove the container if it's preventing the use of Docker, use: docker rm –f smp

Miscellaneous

For your convenience, this section contains topics out of the scope of this document for your convenience, but that may be listed in the requirements.

Setting up Docker

If an install script was provided with this document, simply run it in the terminal: ./install_docker.sh If not, below are steps that need to be completed to successfully install Docker:

Follow the Docker manual installation instructions: https://docs.docker.com/engine/installation/linux/ubuntu/#install-using-the-repository
If you are behind a corporate proxy ,you may need to set Docker's proxy and DNS settings: Proxy Instructions
Determine your host machine's DNS servers:
```
nmcli dev show | grep 'IP4.DNS'
```
Set up daemon.json with the 'dns' key and your DNS addresses:
```
Example: { "dns" : [ "10.0.0.2" , "8.8.8.8" ] }
```

Add your user to the docker group:

sudo groupadd docker
sudo gpasswd -a ${USER} docker
sudo service docker restart
newgrp docker

Make sure you have access to hub.docker.intel.com by trying to log in in the web portal: https://hub.docker.intel.com
Associate Docker on your machine with your user account:
```
docker login hub.docker.intel.com
```

Setting up the Go Environment

Fetch the Golang distribution package:

wget -c https://storage.googleapis.com/golang/go1.8.3.linux-amd64.tar.gz

Extract the contents:

sudo tar -C /usr/local -xvzf go1.8.3.linux-amd64.tar.gz

Append the below line into your .bashrc file, usually located at $Home/.bashrc
```
export PATH=$PATH:/usr/local/go/bin
```
Apply the changes to the current session:
```
source ~/.bashrc
```

Accessing Go Documentation in your Browser

Access the Go documentation for the SDK from your browser to view additional API information and samples that are not demonstrated in this document.

Navigate to $GOPATH/context_linux_go and enter:
```
godoc -http=:6060
```
In a web browser, enter the URL:
```
http://localhost:6060/
```
Click on Packages from the menu on top of the webpage.

You should now be able to view the documentation contents of context_linux_go under standard packages section of the webpage.

This document will explain the different REST API commands supported by the Intel® Context Sensing SDK.

Prerequisites

Software

OS: Ubuntu* 14.04 or 16.04 or Windows® 10
Linux* terminal / Windows Command line
Curl*: Refer to Install Curl
Postman*: (optional third-party tool): https://www.getpostman.com/

Install Curl

We use curl to run the REST API commands, both on Linux and Windows.

You can download curl from: https://curl.haxx.se/dlwiz/. Alternatively, you can install Postman as a Chrome* browser extension and use that instead.

Generic Curl Options

Generic curl options are explained below for reference.

Curl Command Options	Actions
`-V --noproxy '*'`	Ignore all proxy settings.
`-X GET`	Using the GET type of REST call. You can replace it with POST, PUT, or other rest calls.
`-H "Authorization: Bearer none"`	Authorization parameters that the broker expects.
`-H "Content-Type: application/json"`	The content type.
`-d '{ --some JSON object-- }'`	The JSON body you send along with a POST/PUT type of message

Start the Broker

Make sure you have the broker set up. For steps, see Setting up the Broker.
This section assumes you have Docker* already set up with Intel credentials. (Refer: Setting Up Docker)
Make sure you have a running instance of MongoDB*, which is required by the broker.

Start the Terminal

Start another terminal (or command line in Windows), separate from the terminal that the broker is running on.

Rest APIs

The code sections below illustrate terminal commands or JSON input/output.

Note: This is an example of a command: command. Enter commands in a terminal window, and press Enter to run them.

1. GetStates

GetStates returns the current state of the Bus, which is all the last known data for all endpoints and all types within that endpoint seen by the broker.

Method	Resource	Filter	Description
GET	/states	none	Returns all the states without any filter applied

Actual Command Line

curl -v --noproxy '*' -X GET -H "Authorization: Bearer none" http://localhost:8888/context/v1/states

Result with Response Code : 200 OK

If everything goes well, you will get the following result on the terminal that runs the broker:

no authorization

If the broker database is empty, you will receive a JSON response on your REST API terminal, containing an empty list (example below). If you want to push some states and populate this data, see: Push States.

{
    "data": {
        "kind": "states",
        "items": []
    }
}

On the other hand, if you have any data populated, you will get a JSON response that has the last received/forwarded data of all the types of every endpoint. It will look something like the following:

{
  "data": {
    "kind": "states",
    "items": [
      {
        "value": {
          "datetime": "2017-09-21T23:56:40.948Z"
        },
        "dateTime": "2017-09-21T23:56:40.948Z",
        "type": "urn:x-intel:context:thing:ticktock",
        "owner": {
          "device": {
            "id": "08:00:27:a2:a9:32:sample_ticktock",
            "runtime": null,
            "name": null
          },
          "user": {
            "id": "5514838787b1784b6b6f9e9a",
            "name": null
          },
          "application": {
            "id": "2dcdg777z7uan4tbmbch3rvd",
            "name": null
          }
        }
      },      
      {
        "value": {
          "sent_on": 1509737046088,
          "motion_detected": false,
          "device_id": "RSPSensor8"
        },
        "type": "urn:x-intel:context:retailsensingplatform:motion",
        "dateTime": "2017-11-03T19:24:06.098Z",
        "owner": {
          "device": {
            "runtime": null,
            "* Connection #0 to host localhost left intact id": "0a:00:27:00:00:08:serviceMotionSensor",
            "name": null
          },
          "user": {
            "id": "5514838787b1784b6b6f9e9a",
            "name": null
          },
          "application": {
            "id": "2dcdg777z7uan4tbmbch3rvd",
            "name": null
          }
        }
      }
    ]
  }
}

2. GetItem

GetItem returns data from a specific Item over a period of time.

Method	Resource	Filter	Description
GET	/items	none	Returns all the items.

Actual Command line

curl -v --noproxy '*' -X GET -H "Authorization: Bearer none" http://localhost:8888/context/v1/items

Result with Response Code : 200 OK

If everything goes well, you will get a result on the terminal that runs the broker:

no authorization

On your Rest API Terminal, you will get a JSON response that contains the items:

{
    "data": {
        "kind": "items",
        "items": []
    }
}

3. PushStates

Allows you to push state/data to the broker. Make sure the state pushed complies with the registered JSON Schema.

Method	Resource	Filter	Description
PUT	/states	none	Pushes the current state/data to the broker.

In this example, we want to push the following:

{
  "states": [
    {
      "type": "urn:x-intel:context:type:media:audio",
      "activity": "urn:activity:listening",
      "value": {
        "type": "song",
        "title": "Very interesting Song",
        "description": "Song by Metallica on Garage Inc.",
        "genre": [ "metal" ],
        "language": "eng",
        "author": "Metallica"
      },
      "dateTime": "2013-04-29T16:01:00+00:00"
    }
  ],
  "owner": {
    "device": {
      "id": "c2f6a5c0-b0f0-11e2-9e96-0800200c9a66"
    }
  }
}

Actual Command Line

curl -v --noproxy '*' -X PUT -H "Authorization: Bearer none" -H "Content-Type: application/json" http://localhost:8888/context/v1/states -d '{ "states":[{"type":"urn:x-intel:context:type:media:audio","activity":"urn:activity:listening","value":{"type":"song","title":"Very interesting Song","description":"Song by Metallica on Garage Inc.","genre":["metal"],"language":"eng","author":"Metallica"},"dateTime":"2013-04-29T16:01:00+00:00"}],"owner":{"device":{"id":"c2f6a5c0-b0f0-11e2-9e96-0800200c9a66"}} }'

Result with Response Code : 204 No Content

If everything goes well, you will get a result 204 with no content:

On your terminal running the broker, you will get no content.

On your Rest API terminal, you will get no content.

4. SendCommand

SendCommand will send a command to be executed by the broker or pass it along to be executed by an endpoint and pass the endpoint's result back to the calling service.

An example transaction: Endpoint1 <--------> Broker <---------> Endpoint2 with function retrieveitem() called by URN, such as urn:x-intel:context:command:getitem

Method	Resource	Filter	Description
POST	/command	none	Returns the result of the command executed.

Actual Command Line

curl -v --noproxy '*' -X POST -H "Authorization: Bearer none" -H "Content-Type: application/json" http://localhost:8888/context/v1/command -d '{ "method": "urn:x-intel:context:command:getitem", "endpoint": { "macaddress": "0:0:0:0:0:0", "application": "sensing"}, "params": ["0:0:0:0:0:0:sensing", "urn:x-intel:context:type:devicediscovery"]}'

The -d option is used to send the JSON body. Refer to Generic Curl Options. The current body represents the following:

"method": "urn:x-intel:context:command:getitem": is the URN exposing the function retrieveitem() from endpoint2.
"endpoint": is the MAC address and application name of the service that needs to execute the method mentioned above (such as endpoint2).
Note: A Broker can also serve as an endpoint with a MAC address: 0:0:0:0:0:0 and application name:sensing. One may also specify a different MAC address and application name for an endpoint2 in the system mentioned above.
params: The arguments that are expected by the function in endpoint2, such as retrieveitem(). In the following example, we expect 2 arguments: 0:0:0:0:0:0:sensing and urn:x-intel:context:type:devicediscovery. This is function-specific.

Note: You can send variations of this command by changing the endpoint, the URN for function(method) that endpoint supports, and the parameters the function is changing.

{
	"method": "urn:x-intel:context:command:getitem",
	"endpoint": {
		"macaddress": "0:0:0:0:0:0",
		"application": "sensing"
	},
	"params": [
		"0:0:0:0:0:0:sensing",
		"urn:x-intel:context:type:devicediscovery"
	]
}

Result with Response Code : 200 OK

If everything goes well, you will get a result on the terminal that runs the broker:

no authorization

On your Rest API terminal, you will get a JSON response that has a list of all devices registered and active with the broker.

{
  "result": {
    "body": {
      "type": "urn:x-intel:context:type:devicediscovery",
      "value": {
        "devices": [          
        ]
      }
    },
    "response_code": 200
  }
}

Setting up Docker*

If an install script was provided with this document, simply run it in the terminal: ./install_docker.sh

If not, below are steps that need to be completed to successfully install Docker:

Docker manual installation instructions: https://docs.docker.com/engine/installation/linux/ubuntu/#install-using-the-repository
Determine your host machine's DNS servers: nmcli dev show | grep 'IP4.DNS'
Set up daemon.json with the 'dns' key and your DNS addresses. Example: { "dns" : [ "10.0.0.2" , "8.8.8.8" ]
Add your user to the docker group: sudo groupadd docker sudo gpasswd -a ${USER} docker sudo service docker restart newgrp docker
Make sure you have access to hub.docker.intel.com by trying to log in in the web portal: https://hub.docker.intel.com
Associate Docker on your machine with your user account: docker login hub.docker.intel.com

Setting up the Broker

There are two options to set up the broker:

Dockerized: Context repo from hub.docker.intel.com(preferred)
Non-dockerized: context-broker-VERSION.tgz file

Notes

This document only covers the preferred Dockerized method.
The section assumes you have Docker already set up with Intel credentials. (Refer: Setting Up Docker)
The broker requires a running instance of MongoDB.

1. Use Docker to pull the Mongo* image onto your machine.

docker pull mongo

2. Create a container named mymongodb and run it for the very first time.

docker run --name=mymongodb -d mongo

Note: For subsequent runs use: docker start mymongodb

3. Pull the Context Linux Broker image.

docker pull hub.docker.intel.com/context/context-broker:v0.10.5

4. Create a container named contextbroker and run it for the very first time.

docker run --name contextbroker -it -p 8888:8888 --link mymongodb -e MONGODB_HOST=mymongodb hub.docker.intel.com/context/context-broker:v0.10.5

Notes

For subsequent runs, use: docker start -i contextbroker -i or -it is used to run in the foreground and to see the output in the current terminal.
To stop the context broker instance, use CTRL+C to interrupt when running in the foreground or docker stop contextbroker when running in the background.
In order to remove the container preventing the use of Docker start, use: docker rm –f contextbroker

If you have access to our GitHub* repository(Non dockeized)

1. Go into /broker

python3 runserver.py

If everything goes well, you will get a result on the terminal that runs the broker:

Listening of Http(s)
8888

A ﬂexible sensor-based solution simplifies modernization for commercial
real estate

"Our mission is to get these dense mesh networks into commercial buildings and high-rise residential towers—to provision the lighting and emergency management and support tracking."

—Dr. Simon Benson, CEO and founder, Levaux

Executive Summary

Commercial buildings face complex infrastructure and operational challenges to deploying IoT technology. With the comprehensive visibility provided by the Levaux SenseAgent* solution powered by Intel® architecture, buildings can be optimised on a per room basis, creating operational and cost efficiencies. High-resolution data can translate into capital savings, extensive capabilities, and improved occupant experiences—all for the price of lighting.

Challenges

The benefits for modernized building management systems (BMS) based on connected infrastructure are many, but attaining these is challenging for both brownfield and greenfield buildings in the IoT era.

Technology in older buildings tends to be more difficult to work with and prohibitively expensive to change. Most BMS were installed at the time of building construction and are rarely, if ever, upgraded. These BMS rely on cables, rather than the more ﬂexible Wi-Fi or mesh networks, and wireless infrastructure is limited. Systems lack common protocols and interoperability, preventing holistic visibility into building operations. Hardwired equipment requires maintenance and programmatic upgrades to be handled manually—a labor- and cost-intensive process. Compliance management is also manual, with time-consuming testing of all emergency equipment conducted every three to twelve months.

Sensor density tends to be light, resulting in optimization based on data from limited areas of the building. Overall, older buildings are simply not getting enough data—or timely access to relevant reportage—to maximize efficiency and deliver good occupant experiences. In sum, brownfield buildings do not have the capabilities or ﬂexibility to adjust systems based on changing occupancy and conditions, or to meet modern standards for wellness and productivity in commercial buildings.

Greenfield buildings may circumvent many of these obstacles, but they also face issues. These include the challenge of integration across often incompatible systems from multiple vendors and protocols, securing connected equipment from cyberthreats, and gathering and analyzing relevant data quickly and economically. Building management is often decentralized, prohibiting the savings of centralized, remote management.

How can all types of buildings get the advantages of IoT and establish a scalable foundation for the future? A streamlined smart sensor solution from Levaux running on an Intel® architecture-based IoT gateway is delivering the value of edge-to-cloud insight for smart building management.

Solution

Drawing on deep expertise in technology and management of complex systems from its work in the military and communications industries, Levaux has created a smart building solution designed to support both brownfield and greenfield venues. Levaux’s SenseAgent is an innovative solution that manages remote sensors monitoring a wide spectrum of key building functionality. This purpose-built solution for the building industry combines hardware, middleware, and cloud software and creates a fast, reliable connection from physical environment to sensor. Paired with cloud, the solution requires no training and simplifies BMS.

A wireless sensor performs five smart core capabilities— lighting, safety, climate, security, and utilization. Each sensor gathers ten diﬀerent metrics, tracking building variables such as occupancy, movement, ambient light, humidity, and temperature. The sensor is an integrated, architecturally designed ceiling fitting and eliminates the need to use numerous vendors to achieve the same breadth of functionality.

levaux's senseagent covers five core capabilities
Figure 1. Levaux’s SenseAgent* covers five core capabilities for smart buildings in a single sensor

The high-density sensors can replace existing lighting controls at a cost equal to or less than most common lighting controllers and at a fraction of the capital cost of a traditional cabled lighting control solution.

Sensor data is aggregated, filtered, and processed by an Intel®-based IoT gateway, allowing for edge analytics, alerts, and notifications. Data needed for more in-depth or historical analysis is automatically sent to the cloud. The gateway’s powerful Intel® processor and storage capacity also support data backup, so building managers can safeguard and quickly access their IP. Machine learning provides recommendations based on usage patterns, automating key optimization functions. And, because Intel architecture enables parsing data needed for immediate action or longer-term analysis, the cost of transmitting all data to the cloud is reduced. The IoT gateway connects to the building’s backend and seamlessly interfaces with traditional BMS.

senseagent combined with intel architecture
Figure 2. SenseAgent* combined with Intel® architecture provides eﬀective, robust building management

The intuitive SenseAgent interface simplifies data analysis and changes. Levaux’s sensors can be programmed remotely, allowing the solution to evolve to meet changing building management requirements and opportunities.

	Automate service delivery	Increase visibility and control
Lighting	Sustainability	Efficiently manage energy consumption
Safety	Compliance	Proactive procedures to ensure safety of people and assets
Climate	Control	Predictive analytics for smart climate control
Security	Tracking	Integrated security tracking of people and assets
Utilisation	Productivity	Efficient utilisation of property and increased productivity of people

SenseAgent Benefits for Commercial Real Estate

Plug and play: Lighting control sensors are installed directly inline between the ceiling and luminaries. They provide power to the luminaire and lighting control via common lighting control protocols (DALI, PWM, 0–10 volt). Sensors communicate using a wireless mesh—so there is no requirement for expensive communication cabling and network infrastructure.
Support multiple applications and metrics: Replace existing lighting control solutions, while oﬀering additional functionality to cover a range of applications and metrics across building operations.
Ease of use: The SenseAgent cloud application is a web-based application that can be accessed from any device anywhere in the world. It has been designed and built using modern user-interface guidelines and technologies to be easy to use with minimal training.
Wireless infrastructure: SenseAgent deployments provide buildings with dense wireless mesh networks. A routed Bluetooth* mesh network with large bandwidth capability enables building assets to exploit future applications.
Improve operations: Creates intelligent, more sustainable environments, minimising energy consumption and supporting increased productivity

How It Works in Brief

The end-to-end engineered SenseAgent IoT solution couples electronic hardware with purpose-designed middleware and a cloud interface for a fully vertically integrated out-of-the-box system. The wireless building sensor system is designed to perform immediately upon deployment in brownfield or greenfield sites.

The solution was created with a robust middleware using object-orientated remote procedure calls (RPC) architecture implemented in pure C++ from the ground up, in conjunction with embedded hardware and an optimised mesh network stack. This supports smart connected systems with the lowest latency.

Designed from the start to be an enterprise-class, secure, scalable, and reliable IoT communication system that does not sacrifice speed, the solution avoids lightweight communication protocols and messaging queues.

Sensors perform autonomously without the need for constant cloud connectivity, using weekly operational schedules and programs. The sensors sample data every second, sharing it directly peer-to-peer over the mesh network for local processing to control electrical equipment using logic-based decision-making.

Embedded hardware running unique dynamic firmware loads operational schedule profiles from the cloud. Dynamic firmware accepts schedule updates over the air, allowing the sensor system to adapt to changes in functional purpose and making it more suitable for edge computing and process optimisation.

Machine learning is made possible by a mesh network of sensors managed by multipoint Intel-based IoT gateways. The gateways supervise the commands on the network sensors and the ﬂow of data to the cloud. Data that is stored and processed at the edge applies machine learning to generate knowledge to optimise sensor behaviour.

The lighting-based sensors create a wireless mesh network on the building ceiling and uses existing power supplies. New applications can be added or built on top of the mesh network over time. The solution is designed to scale and evolve with building management needs and to help future-proof investments

senseagent innovative architecture simplifies modernization
Figure 3. SenseAgent* innovative architecture simplifies smart building modernization

The foundation for IoT

The Levaux solution is just one example of how Intel works closely with the IoT ecosystem to help enable smart IoT solutions based on standardized, scalable, reliable Intel® architecture and software. These solutions range from sensors and gateways to server and cloud technologies to data analytics algorithms and applications. Intel provides essential end-to-end capabilities—performance, manageability, connectivity, analytics, and advanced security—to help accelerate innovation and increase revenue for enterprises, service providers, and the building industry.

Conclusion

With the Levaux SenseAgent and Intel-based IoT gateway, building managers have the insight to maximize efficiency, proactively address maintenance, optimize environments, and improve occupant wellness and productivity. By simplifying IoT integration with an aﬀordable, connected, end-to-end solution, Levaux and Intel are enabling the considerable advantages of making buildings smarter.

Learn More

For more information about Levaux, please visit senseagent.com or contact us at support@senseagent.com

For more information about Intel® IoT Technology and the Intel IoT Solutions Alliance, please visit intel.com/iot.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer, or learn more at intel.com/iot. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may aﬀect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others

Abstract

This article provides a comparative study of the performance of the Intel® Xeon® Gold processor when the Naive Bayes algorithm is taken from the textbook Artificial Intelligence: A Modern Approach (AIMA) by Stuart Russell and Peter Norvig. scikit-learn* (SkLearn), and the PyDAAL programming interface are run to show the advantage of the Intel® Data Analytics Acceleration Library (Intel® DAAL). The accuracy of the above-mentioned varieties of the Naive Bayes classifier in Intel® Xeon® processors was calculated and compared. It was observed that the performance of Naive Bayes is considerably better in PyDAAL (multinomial) as compared to the performance of SkLearn and AIMA. It was also observed that the performance was better in SkLearn as compared with AIMA.

Test and System Configuration

Environment setup

We used the following environment setup to run the code and determine the test processor performance.

Processor	System	Cores	Storage (RAM)	Python* Version	PyDAAL Version
Intel® Xeon® Gold 6128 processor 3.40 GHz	CentOS* (7.4.1708)	24	92 GB	3.6.2	2018.0.0.20170814

Test setup

We used the following conventions and methods to perform the test and compare the values:

To run the Naive Bayes classifier from PyDAAL, we used the Conda* virtual environment.
The Naive Bayes classifier described in AIMA is available in the learning_apps.ipynb file from the GitHub* code.
Calculated average execution time and accuracy of learning_apps.ipynb (converted to .py) with Naive Bayes learner from AIMA.
Calculated average execution time and accuracy of learning_apps.ipynb (converted to .py) with Naive Bayes classifier from SkLearn and PyDAAL.
To calculate the average execution time, Linux* time command is used:
- Example: time(cmd="python learning_apps.py"; for i in $(seq 10); do $cmd; done)
- Average execution time = time/10.
To calculate accuracy, the accuracy_score method in SkLearn is used in all cases.
Performance gain percentage = ((AIMA - PyDAAL)/AIMA) × 100 or ((SkLearn - PyDAAL)/SkLearn) × 100.
Performance Improvement (x) = AIMA(s)/PyDAAL(s) or Sklearn (s)/PyDAAL(s).
The higher the value of the performance gain percentage, the better the performance of PyDAAL.
Performance improvement (x) value greater than 1 indicates better performance for PyDAAL.
Only the Naive Bayes part of the learning_apps.ipynb file are compared.

Code and conditional probability

The Naive Bayes learner part of the code given in AIMA was compared to the corresponding implementation from SkLearn (Gaussian and multinomial) and PyDAAL (multinomial). The following are the relevant code samples:

AIMA

from learning import

temp_train_lbl = train_lbl.reshape((60000,1))
training_examples = np.hstack((train_img, temp_train_lbl))

MNIST_DataSet = DataSet(examples=training_examples, distance=manhattan_distance)
nBD = NaiveBayesLearner(MNIST_DataSet, continuous=False)
y_pred = np.empty(len(test_img),dtype=np.int)
for i in range (0,len(test_img)-1):
y_pred[i] = nBD(test_img[i])

temp_test_lbl = test_lbl.reshape((10000,1))
temp_y_pred_np = y_pred.reshape((10000,1))

SkLearn (Gaussian)

from sklearn.naive_bayes import GaussianNB

classifier=GaussianNB()
classifier = classifier.fit(train_img, train_lbl)

churn_predicted_target=classifier.predict(test_img)

SkLearn (multinomial)

from sklearn.naive_bayes import MultinomialNB

classifier=MultinomialNB()
classifier = classifier.fit(train_img, train_lbl)

churn_predicted_target=classifier.predict(test_img)

PyDAAL (multinomial)

from daal.data_management import HomogenNumericTable, BlockDescriptor_Float64, readOnly
from daal.algorithms import classifier
from daal.algorithms.multinomial_naive_bayes import training as nb_training
from daal.algorithms.multinomial_naive_bayes import prediction as nb_prediction

def getArrayFromNT(table, nrows=0):
bd = BlockDescriptor_Float64()
if nrows == 0:
nrows = table.getNumberOfRows()
table.getBlockOfRows(0, nrows, readOnly, bd)
npa = np.copy(bd.getArray())
table.releaseBlockOfRows(bd)
return npa

temp_train_lbl = train_lbl.reshape((60000,1))
train_img_nt = HomogenNumericTable(train_img)
train_lbl_nt = HomogenNumericTable(temp_train_lbl)
temp_test_lbl = test_lbl.reshape((10000,1))
test_img_nt = HomogenNumericTable(test_img)
nClasses=10
nb_train = nb_training.Online(nClasses)

# Pass new block of data from the training data set and dependent values to the algorithm
nb_train.input.set(classifier.training.data, train_img_nt)
nb_train.input.set(classifier.training.labels, train_lbl_nt)
# Update ridge regression model
nb_train.compute()
model = nb_train.finalizeCompute().get(classifier.training.model)

nb_Test = nb_prediction.Batch(nClasses)
nb_Test.input.setTable(classifier.prediction.data,  test_img_nt)
nb_Test.input.setModel(classifier.prediction.model, model)
predictions = nb_Test.compute().get(classifier.prediction.prediction)

predictions_np = getArrayFromNT(predictions)

The ‘learning_apps.ipynb’ of ‘aima-python-master’ is used as a reference code for the experiment. This file implements the classification of the MNIST dataset using ‘Naive Bayes classifier’ in a conventional way. But this consumes a lot of time for classifying the data.

In order to check for better performance, the same experiment is implemented using a high-performance data analytics library for Python* called PyDAAL. In this, the data structure mainly uses ‘NumericTables’, a generic datatype to represent data in the memory.

In the code, the data is loaded as train_img, train_lbl, test_img and test_lbl using the function ‘load_MNIST()’. The ‘train_img’ and the ‘test_img’ represent the train data and test data while train_lbl and test_lbl represent the labels used for training and testing. These input data are converted into 'HomogenNumericTable' after checking the ‘C-contiguous’ nature. This is done because the conversion can only happen if the input data is ‘C-contiguous’.

An algorithm object (nb_train) is created to train the multinomial Naive Bayes model in online processing mode. The two pieces of input, that is, data and labels, are set using the 'input.set' member methods of the ‘nb_train’ algorithm object. Further, the 'compute()' method is used to update the partial model. After creating the model, a test object (nb_Test) is defined. The testing data set and the trained model is passed to the algorithm using the methods input.setTable() and nbTest.input.setModel(), respectively. After finding the predictions using the ‘compute()’ method, the accuracy and time taken for the experiment are calculated. The ‘SkLearn’ library and the ‘time’ command in Linux are used for these calculations.

Another implementation of the same code was done using the ‘Multinomial Naive Bayes’ in SkLearn for the comparison with conventional method and PyDAAL.

On analyzing the time taken for the experiments, it is clear that PyDAAL has better time performance compared to the other methods.

The conditional probability distribution assumption made in AIMA is
- A probability distribution formed by observing and counting examples.
- If p is an instance of this class and o is an observed value, there are three main operations:
  - p.add(o) increments the count for observation o by 1.
  - p.sample() returns a random element from the distribution.
  - p[o] returns the probability for o (as in a regular ProbDist).
The conditional probability distribution assumption made in Gaussian Naive Bayes is Gaussian/normal distribution.
The conditional probability distribution assumption made in multinomial Naive Bayes is multinomial distribution.

Introduction

During the test, the Intel® Xeon® Gold processor was used to run the Naive Bayes from AIMA, SkLearn (Gaussian and multinomial), and PyDAAL (multinomial). To determine the performance improvement of the processors, we compared the accuracy percentage for all relevant scenarios. We also calculated the performance improvement (x) for PyDAAL when compared to the others. Naive Bayes (Gaussian) was not included in this calculation, assuming that it was more appropriate to compare the multinomial versions of both SkLearn and PyDAAL.

Observations

Intel® DAAL helps to speed up big data analysis by providing highly optimized algorithmic building blocks for all stages of data analytics (preprocessing, transformation, analysis, modeling, validation, and decision making) in batch, online, and distributed processing modes of computation.

Helps applications deliver better predictions faster
Analyzes larger data sets with the same compute resources
Optimizes data ingestion and algorithmic compute together for the highest performance
Supports offline, streaming, and distributed usage models to meet a range of application needs
Provides priority support―connect privately with Intel engineers for technical questions

Accuracy

We ran the Naive Bayes learner from AIMA and observed that both PyDAAL and SkLearn (multinomial) had the same percentage of accuracy (refer Test and System Configuration).

Figure 1 provides a graph of the accuracy values of Naive Bayes.

Figure 1. Intel® Xeon® Gold 6128 processor—graph of accuracy values.

Benchmark results were obtained prior to the implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information, see Performance Benchmark Test Disclosure.

Configuration: Intel® Xeon® Gold 6128 processor 3.40 GHz; System CentOS* (7.4.1708); Cores 24; Storage (RAM) 92 GB; Python* Version 3.6.2; PyDAAL Version 2018.0.0.20170814.
Benchmark Source: Intel Corporation. See below for further notes and disclaimers.¹

Performance improvement

The performance improvement (x) with respect to time among the Naive Bayes (AIMA and PyDAAL and also SkLearn and PyDAAL) was calculated, and observed that the performance (refer Test and System Configuration) was better on PyDAAL.

Figures 2 and 3 provide graphs of the performance improvement speedup values.

Figure 2. Intel® Xeon® Gold 6128 processor—graph of AIMA versus PyDAAL performance improvement.

Figure 3. Intel® Xeon® Gold 6128 processor—graph of SkLearn versus PyDAAL performance improvement.

Summary

The optimization test on the Intel Xeon Gold processor illustrates that PyDAAL takes less time (see figure 4) and hence provides better performance (refer Test and System Configuration) when compared to AIMA and SkLearn. In this scenario, both SkLearn (multinomial) and PyDAAL had the same accuracy. The conditional probability distribution assumption made in AIMA is a simple distance measure. However, in SkLearn and PyDAAL, it is either Gaussian distribution or multinomial, which is the reason for the difference in accuracy observed.

Figure 4. Intel® Xeon® Gold 6128 processor—graph of performance time.

References

AIMA code:
https://github.com/aimacode/aima-python
The AIMA data folder:
https://github.com/aimacode/aima-python (download separately)
Book:
Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig

¹Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, visit www.intel.com/benchmarks.

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804

Introduction

This is an educational white paper on transfer learning, showcasing how existing deep learning models can be easily and flexibly customized to solve new problems. One of the biggest challenges with deep learning is the large number of labeled data points that are required to train them to sufficient accuracy. For example, the ImageNet*² database for image recognition consists of over 14 million hand labeled images. While the number of possible applications of deep learning systems in vision tasks, text processing, speech-to-text translation and many other domains is enormous, very few potential users of deep learning systems have sufficient training data to create models from scratch. A common concern among teams considering the use of deep learning to solve business problems is the need for training data: “Doesn’t Deep Learning need millions of samples and months of training to get good results?” One powerful solution is transfer learning, in which part of an existing deep learning model is re-optimized on a small data set to solve a related, but new, problem. In fact, one of the great attractions of Transfer Learning is that, unlike most traditional approaches to machine learning, we can take models trained on one (perhaps very large) dataset and modify them quickly and easily to work well on a new problem (where perhaps we have only a very small dataset). Transfer learning methods are not only parsimonious in their training data requirements, but they run efficiently on the same Intel® Xeon® processor (CPU) based systems that are widely used for other analytics workloads including machine learning and deep learning inference. The abundance of readily-available CPU capacity in current datacenters, in conjunction with transfer learning, makes CPU based systems preferred choice for deep learning training and inference.

Today transfer learning appears most notably in data mining, machine learning and applications of machine learning and data mining¹ . Traditional machine learning techniques attempt at learning each task from scratch, while transfer learning transfers knowledge from some previous task to a target task when the latter has fewer high-quality training data.

References

Use this option if you need to replace an expired license file with a new one for an existing installation

Place the new license file "*.lic" in the following directory, making sure not to change the license file name:

On Windows*:
<installation drive>\Program Files\Common Files\Intel\Licenses
For example: "c:\Program Files\Common Files\Intel\Licenses"
Note: If the INTEL_LICENSE_FILE environment variable is defined, copy the file to the directory specified by the environment variable instead.
On Linux*: /opt/intel/licenses
On OS X*: /Users/Shared/Library/Application Support/Intel/Licenses

Note: You will likely need administrative/root privileges to copy the license to the named directory.

Make sure to remove expired license files from the directory to ensure the correct file is being used.

Introduction

Artificial Intelligence

Relationship of AI, machine learning, and deep learning.

Figure 1. Relationship of artificial intelligence, machine learning, and deep learning.

AI Health Market

Alleviating the burden on clinicians and giving medical professionals the tools to do their jobs more effectively.
Filling in gaps during the rising labor shortage in healthcare.
Enhancing efficiency, quality, and outcomes for patients.
Magnifying the reach of care by integrating health data across platforms.
Delivering benefits of greater efficiency, transparency, and interoperability.
Maintaining information security.

Healthcare Data

Healthcare Dataset: Pictures, Scans, Drawings

Various types of healthcare image data

Healthcare Dataset: Numerical

Example of numerical data

Figure 3. Example of numerical data.

Healthcare Dataset: Textual

Example of textual data

Figure 4. Example of textual data.

27,31

Figure 5. Examples of electrogram data. Source:27,31

Healthcare Dataset: Electrogram

Architecture of deep learning with convolutional neural network model

Figure 6. Architecture of deep learning with convolutional neural network model useful in classification of EEG data. (Source: 28-29)

Healthcare Dataset: Audio and Video

Example of audio data

Figure 7. Example of audio data.

Healthcare Dataset: Molecular Structure

Molecular structure of 4CDG

Figure 8. Molecular structure of 4CDG (Source: rcbs.org)

AI: Solving Healthcare Problems

Here are a few practical examples where AI developers, startups, and institutes are building and testing AI models:

As emotional intelligence indicators that detect subtle cues in speech, inflection, or gesture to assess a person’s mood and feelings
Help in tuberculosis detection
Help in the treatment of PTSD
AI chatbots (Florence*, SafedrugBot*, Babylon Health*, SimSensei*)
Virtual assistants in helping patients and clinicians
Verifying insurance
Smart robots that explain lab reports
Aging-based AI centers
Improving clinical documentation
Personalized medicine

Data Science and Health Professionals: A Combined Approach

Deep learning has great potential to help medical and paramedical practitioners by:

Reducing the human error rate⁴⁰ and workload
Helping in diagnosis and the prognosis of disease
Analyzing complex data and building a report

The examination of thousands of images is complex, time consuming, and labor intensive. How can AI help?

Intel® AI Academy

Intel provides educational software and hardware support to health professionals, data scientist and AI developers, and makes available free AI training and tools through the Intel® AI Academy.

Intel recently published a series of AI hands-on tutorials, walking through the process of AI project development, step-by-step. Here you will learn:

Ideation and planning
Technology and infrastructure
How to build an AI model (data and modeling)
How to build and deploy an app (app development and deployment)

References

Faggella, D. Machine Learning Healthcare Applications – 2018 and Beyond. Techemergence.
Artificial intelligence in healthcare - Wikipedia. (Accessed: 12th February 2018)
Intel® Math Kernel Library for Deep Learning Networks: Part 1–Overview and Installation | Intel® Software. (Accessed: 14th February 2018)
Lecun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature521, 436–444 (2015).
Mamoshina, P., Vieira, A., Putin, E. & Zhavoronkov, A. Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics13, 1445–1454 (2016).
From $600 M to $6 Billion, Artificial Intelligence Systems Poised for Dramatic Market Expansion in Healthcare. (Accessed: 12th February 2018)
Accenture. Artificial Intelligence in Healthcare | Accenture.
Marr, B. How AI And Deep Learning Are Now Used To Diagnose Cancer. Foboes
Executive Summary: Data Growth, Business Opportunities, and the IT Imperatives | The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. Available at: . (Accessed: 12th February 2018)
Lifelong brain-stimulating habits linked to lower Alzheimer’s protein levels | Berkeley News. (Accessed: 21st February 2018)
Emphysema H and E.jpg - Wikimedia Commons (Accessed : 23^rd February 2018). https://commons.wikimedia.org/wiki/File:Emphysema_H_and_E.jpg
Superficie_ustioni.jpg (696×780). (Accessed: 23rd February 2018). https://upload.wikimedia.org/wikipedia/commons/1/1b/Superficie_ustioni.jpg
Heart_frontally_PDA.jpg (1351×1593). (Accessed: 27th February 2018). https://upload.wikimedia.org/wikipedia/commons/5/57/Heart_frontally_PDA.jpg
Kaggle competition-Intel & MobileODT Cervical Cancer Screening. Intel & MobileODT Cervical Cancer Screening. Which cancer treatment will be most effective? (2017).
Intel and MobileODT* Competition on Kaggle*. Faster Convolutional Neural Network Models Improve the Screening of Cervical Cancer. December 22 (2017).
Kaggle*, I. and M. C. on. Deep Learning Improves Cervical Cancer Accuracy by 81%, using Intel Technology. December 22 (2017).
Xu, M. et al. A deep convolutional neural network for classification of red blood cells in sickle cell anemia. PLoS Comput. Biol.13, 1–27 (2017).
Gulshan, V. et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA316, 2402 (2016).
Jäge, P. F. et al. Revealing hidden potentials of the q-space signal in breast cancer. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)10433 LNCS, 664–671 (2017).
Ali, A.-R. Deep Learning in Oncology – Applications in Fighting Cancer. September 14 (2017).
Zhang, Q. et al. Sonoelastomics for Breast Tumor Classification: A Radiomics Approach with Clustering-Based Feature Selection on Sonoelastography. Ultrasound Med. Biol.43, 1058–1069 (2017).
Danaee, P., Ghaeini, R. & Hendrix, D. A. A deep learning approach for cancer detection and relevant gene indentification. Pac. Symp. Biocomput.22, 219–229 (2017).
Kaggle: Breast Cancer Diagnosis Wisconsin. Breast Cancer Wisconsin (Diagnostic) Data Set: Predict whether the cancer is benign or malignant.
What is the Role of Natural Language Processing in Healthcare? (Accessed: 1st February 2018)
VA uses EHRs, natural language processing to spot suicide risks. (Accessed: 1st February 2018)
Predictive Analytics, NLP Flag Psychosis with 100% Accuracy. (Accessed: 1st February 2018)
Heart_block.png (450×651). (Accessed: 23rd February 2018)
Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG Short title: Convolutional neural networks in EEG analysis. (2017).
Schirrmeister, R. T. et al. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp.38, 5391–5420 (2017).
Pourbabaee, B., Roshtkhari, M. J. & Khorasani, K. Deep Convolutional Neural Networks and Learning ECG Features for Screening Paroxysmal Atrial Fibrillation Patients. IEEE Trans. Syst. Man, Cybern. Syst. 1–10 (2017). doi:10.1109/TSMC.2017.2705582
Chambon, S., Galtier, M. N., Arnal, P. J., Wainrib, G. & Gramfort, A. A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series. arXiv:1707.0332v2 (2017).
Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H. & Virtanen, T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Trans. Audio, Speech, Lang. Process.25, 1291–1303 (2017).
Project InnerEye – Medical Imaging AI to Empower Clinicians. Microsoft
Khorrami, P., Le Paine, T., Brady, K., Dagli, C. & Huang, T. S. HOW DEEP NEURAL NETWORKS CAN IMPROVE EMOTION RECOGNITION ON VIDEO DATA.
Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model.55, 263–274 (2015).
Dahl, G. E., Jaitly, N. & Salakhutdinov, R. Multi-task Neural Networks for QSAR Predictions. (University of Toronto, Canada. Retrieved from http://arxiv.org/abs/1406.1231, 2014).
Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm.13, 2524–2530 (2016).
Tavanaei, A., Anandanadarajah, N., Maida, A. & Loganantharaj, R. A Deep Learning Model for Predicting Tumor Suppressor Genes and Oncogenes from PDB Structure. bioRxiv October 22, 1–10 (2017).
Wallach, I., Dzamba, M. & Heifets, A. AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery. 1–11 (2015). doi:10.1007/s10618-010-0175-9
Kontzer, T. Deep Learning Drops Error Rate for Breast Cancer Diagnoses by 85%. September 19 (2016).
Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep.6, (2016).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Anal.42, 60–88 (2017).

Carnegie Mellon University

Principal Investigator

Description

Related websites:

http://sv.cmu.edu/directory/faculty-and-researchers-directory/faculty-and-researchers/mengshoel.html
https://users.ece.cmu.edu/~olem/omengshoel/Home.html
https://works.bepress.com/ole_mengshoel/

Introduction

This article shows how to implement a persistent memory (PMEM)-aware queue using a linked list and the C++ bindings of the Persistent Memory Development Kit (PMDK) library libpmemobj.

A queue is a first in first out (FIFO) data structure that supports push and pop operations. In a push operation, a new element is added to the tail of the queue. In a pop operation, the element at the head of the queue gets removed.

A PMEM-aware queue differs from a normal queue in that its data structures reside permanently in persistent memory, and a program or machine crash could result in an incomplete queue entry and a corrupted queue. To avoid this, queue operations must be made transactional. This is not simple to do, but PMDK provides support for this and other operations specific to persistent memory programming.

We'll walk through a code sample that describe the core concepts and design considerations for creating a PMEM-aware queue using libpmemobj. You can build and run the code sample by following the instructions provided later in the article.

For background on persistent memory and the PMDK, read the article Introduction to Programming with Persistent Memory from Intel and watch the Persistent Memory Programming Video Series.

C++ Support in libpmemobj

The main features of the C++ bindings for libpmemobj include:

Transactions
Wrappers for basic types: automatically snapshots the data during a transaction
Persistent pointers

Transactions

Transactions are at the core of libpmemobj operations. This is because, in terms of persistence, the current x86-64 CPUs guarantee atomicity only for 8-byte stores. Real-world apps update in larger chunks. Take, for example, strings; it rarely makes sense to change only eight adjacent bytes from one consistent string state to another. To enable atomic updates to persistent memory in larger chunks, libpmemobj implements transactions.

Libpmemobj uses undo log-based transactions instead of redo log-based for visibility reasons. Changes made by the user are immediately made visible. This allows for a more natural code structure and execution flow, which in turn improves code maintainability. This also means is that in the case of an interruption in the middle of a transaction, all of the changes made to the persistent state will be rolled back.

Transactions have ACID (atomicity, consistency, isolation, and durability)-like properties. Here's how these properties relate to programming with the PKDK:

Atomicity: Transactions are atomic with respect to persistency; All the changes made within a transaction are committed when the transaction is completed successfully or none of them are.

Consistency: The PMDK provides functionality to enable the user to maintain data consistency.

Isolation: The PMDK library provides persistent memory-resident synchronization mechanisms to enable the developer to maintain isolation.

Durability: All of a transaction's locks are held until the transaction completes to ensure durability.

Transactions are done on a per thread basis, so the call returns the status of the last transaction performed by the calling thread. Transactions are power-safe but not thread-safe.

The `< p >` property

In a transaction, undo logs are used to snapshot user data. The <p> template wrapper class is the basic building block for automating snapshotting of the user data so app developers don't need to do this step manually (as is the case with the C implementation of libpmemobj). This wrapper class supports only basic types. Its implementation is based on the assignment operator and each time the variable of this wrapper class is assigned a new value, the old value of the variable is snapshotted. Use of the <p> property for stack variables is discouraged because snapshotting is a computationally intensive operation.

Persistent pointers

Libraries in PMDK are built on the concept of memory mapped files. Since files can be mapped at different addresses of the process virtual address space, traditional pointers that store absolute addresses cannot be used. Instead, PMDK introduces a new pointer type that has two fields: an ID to the pool (used to access current pool virtual address from a translation table), and an offset from the beginning of the pool. Persistent pointers are a C++ wrapper around this basic C type. Its philosophy is similar to that of std::shared_ptr.

libpmemobj Core Concepts

Root object

Making any code PMEM-aware using libpmemobj always involves, as a first step, designing the types of data objects that will be persisted. The first type that needs to be defined is that of the root object. This object is mandatory and used to anchor all the other objects created in the persistent memory pool (think of a pool as a file inside a PMEM device).

Pool

A pool is a contiguous region of PMEM identified by a user-supplied identifier called layout. Multiple pools can be created with different layout strings.

Queue Implementation using C++ Bindings

The queue in this example is implemented as a singly linked list, with a head and tail that demonstrates how to use the C++ bindings of libpmemobj.

Design Decisions

Data structures

The first thing we need is a data structure that describes a node in the queue. Each entry has a value and a link to the next node. As per the figure below, both variables are persistent memory-aware.

Data structure map
Figure 1. Data structure describing the queue implementation.

Code walkthrough

Now, let's go a little deeper into the main function of the program. While running the code you need to provide three arguments. One is the absolute location of the pool file, while the second one is the actual queue operation that needs to be performed. The supported operations in the queue are push (insert element), pop (return and remove element), and show (return element).

if (argc < 3) {
	std::cerr << "usage: "<< argv[0]
	<< " file-name [push [value]|pop|show]"<< std::endl;
	return 1;
}

In the snippet below, we check to see if the pool file exists. If it does, the pool is opened. If it doesn't exist, the pool is created. The layout string identifies the pool that we requested to open. Here we are opening the pool with layout name Queue as defined by the macro LAYOUT in the program.

const char *path = argv[1];
queue_op op = parse_queue_op(argv[2]);
pool<examples::pmem_queue> pop;

if (file_exists(path) != 0) {
	pop = pool<examples::pmem_queue>::create(
		path, LAYOUT, PMEMOBJ_MIN_POOL, CREATE_MODE_RW);
} else {
	pop = pool<examples::pmem_queue>::open(path, LAYOUT);
}

pop is the pointer to the pool from where we can access a pointer to the root object, which is an instance of examples::pmem_queue, and the Create function creates a new pmemobj pool of type examples::pmem_queue. The root object is like the root of a file system, since it can be used to reach all of the other objects in the pool (as long as these objects are linked properly and no pointers are lost due to coding errors).

auto q = pop.get_root();

Once you get the pointer to the queue object, the program checks the second argument in order to identify what type of action the queue should perform; that is, push, pop, or show.

switch (op) {
	case QUEUE_PUSH:
		q->push(pop, atoll(argv[3]));
		break;
	case QUEUE_POP:
		std::cout << q->pop(pop) << std::endl;
		break;
	case QUEUE_SHOW:
		q->show();
		break;
	default:
		throw std::invalid_argument("invalid queue operation");
}

Queue operations

Push

Let's look at how the push function is implemented to make it persistent programming-aware. As shown in the code below, the transactional code is implemented as a lambda function wrapped in a C++ closure (this makes it easy to read and follow the code). If a power failure happens the data structure does not get corrupted because all changes are rolled back. For more information how transactions are implemented in C++, read C++ bindings for libpmemobj (part 6) - transactions on pmem.io.

Allocation functions are transactional as well, and they use transaction logic to enable allocation/delete rollback of the persistent state; make_persistent() is the constructor, while delete_persistent() is the destructor.

Calling make_persistent() inside a transaction allocates an object and returns a persistent object pointer. As the allocation is now part of the transaction, if it aborts, the allocation is rolled back, reverting the memory allocation back to its original state.

After the allocation, the value of n is initialized to the new value in the queue, and the next pointer is set to null.

void push(pool_base &pop, uint64_t value) {
	transaction::exec_tx(pop, [&] {
		auto n = make_persistent<pmem_entry>();

		n->value = value;
		n->next = nullptr;

		if (head == nullptr && tail == nullptr) {
			head = tail = n;
		} else {
			tail->next = n;
			tail = n;
		}
	});
}

Data structure map for push functionality
2. Data structure for push functionality.

Pop

Similar to push, the pop function is shown below. Here we need a temporary variable to store a pointer to the next pmem_entry in the queue. This is needed in order to set the head of the queue to the next pmem_entry after deleting the head using delete_persistent(). Since this is done using a transaction, it is persistent-aware.

uint64_t pop(pool_base &pop){
	uint64_t ret = 0;
	transaction::exec_tx(pop, [&] {
		if (head == nullptr)
			transaction::abort(EINVAL);

		ret = head->value;
		auto n = head->next;

		delete_persistent<pmem_entry>(head);
		head = n;

		if (head == nullptr)
			tail = nullptr;
	});

	return ret;
}

Data structure map for pop functionality.
Figure 3. Data structure for pop functionality.

Build Instructions

Instructions to run the code sample

Download the source code from the PMDK GitHub* repository:

Git clone https://github.com/pmem/pmdk.git

Figure 4. Download source code from the GitHub* repository.
cd pmdk and run make on the command line as shown below. This builds the complete source code tree.

Figure 5. Building the source code.
cd pmdk/src/examples/libpmemobj++/queue
View command line options for the queue program:
./queue
Push command:
./queue TESTFILE push 8

Figure 6. PUSH command using command line.
Pop command:
./queue TESTFILE pop
Show command:
./queue TESTFILE show

Figure 7. POP command using command line.

Summary

In this article, we showed a simple implementation of a PMEM-aware queue using the C++ bindings of the PMDK library libpmemobj. To learn more about persistent memory programming with PMDK, visit the Intel® Developer Zone (Intel® DZ) Persistent Memory Programming site. There you will find articles, videos, and links to other important resources for PMEM developers.

About the Author

Praveen Kundurthy is a Developer Evangelist with over 14 years of experience in application development, optimization and porting to Intel platforms. Over the past few years at Intel, he has worked on topics spanning Storage technologies, Gaming, Virtual reality and Android on Intel platforms.

Using Caffe with the Intel *

Deep Learning Model Optimizer for Caffe* requires the Caffe framework to be installed on the client machine with all relevant dependencies. Caffe should be dynamically compiled and linked. A shared library named libcaffe.so should be available in the CAFFE_HOME/build/lib directory.

For ease of reference, the Caffe* installation folder is referred to as <CAFFE_HOME> and the Model Optimizer installation folder is referred to as <MO_DIR>.

The installation path to the Model Optimizer depends on whether you use the Intel® CV SDK or Deep Learning Deployment Toolkit. For example, if you are installing with sudo, the default <MO_DIR> directory is:

/opt/intel/deeplearning_deploymenttoolkit_<version>/deployment_tools/model_optimizer - In case of using Deep Learning Deployment Toolkit
/opt/intel/computer_vision_sdk_<version>/mo - In case of Intel CV SDK installation.

Installing Caffe

To install Caffe, complete the following steps:

For convenience, set the following environment variables:

  export MO_DIR=<PATH_TO_MO_INSTALL_DIR>
  export CAFFE_HOME=<PATH_TO_YOUR_CAFFE_DIR>

Go to the Model Optimizer folder:
```
cd $MO_DIR/model_optimizer_caffe/
```
For easiness of the installation procedure, you can find two additional scripts in the $MO_DIR/model_optimizer_caffe/install_prerequisites folder:
- install_Caffe_dependencies.sh - Installs the required dependencies like Git*, CMake*, GCC*, etc.
- clone_patch_build_Caffe.sh - Installs the Caffe* distribution on your machine and patches it with the required adapters from the Model Optimizer.
Go to the helper scripts folder and install all the required dependencies:
```
    cd install_prerequisites/
    ./install_Caffe_dependencies.sh 
```

Install Caffe* distribution. By default it installs the BVLC Caffe* from the master branch of the official repository. If you want to install other version of Caffe*, you can slightly edit the content of the clone_patch_build_Caffe.sh script. In particular, the following lines:

    CAFFE_REPO=https://github.com/BVLC/caffe.git # link to the repository with Caffe* distribution
    CAFFE_BRANCH=master # branch to be taken from the repository
    CAFFE_FOLDER=`pwd`/caffe # where to clone the repository on your local machine
    CAFFE_BUILD_SUBFOLDER=build # name of the folder required for building Caffe*

To launch installation, just run the following command:

    ./clone_patch_build_Caffe.sh

NOTE: In case of problem with the hdf5 library while building Caffe on Ubuntu* 16.04, see the following fix.

Once you have configured Caffe* framework on your machine, you need to configure Model Optimizer for Caffe* to properly work with it. For that, please refer to the Configure Model Optimizer for Caffe* page.

Adding Parameters to P_Fire

Particle parameters interface
Figure 2. Particle parameters.

To make this particle effect, we modify the P_Fire particle system included in the Unreal Engine starter content. In Figure 2, modules that we modify are highlighted in purple, and modules we add are highlighted in orange.

Modifying light

Lighting is one of the major benefits of using CPU particles and will form the core of this effect.

Settings interface
Figure 3. First flame emitter.

Select parameter distribution on the distribution drop-down menu

In the details panel of the first flame emitter in the P_Fire particle system, select Distribution Float Particle Parameter from the Brightness Over Life Distribution drop-down menu as shown at the top of Figure 3. This allows us to tie the amount of light emitted to a variable, in this case, the amount of fuel left in the fire.

Set name

The next step is to specify which particle parameter this distribution will be tied to. We'll use the name "FuelLeft". Enter this in the Parameter Name field, as show in Figure 3.

Set mapping mode

A powerful feature of particle parameters is input mapping. This feature allows us to specify the max and min input that we will accept and to scale those values to a given range in order to make a single input parameter function seamlessly for many different modules. This capability allows us to make different parts of the particle effects scale down at different points. Effects like the sparks and embers will only start to change once the fire starts burning low, and we will set their input range to reflect that. We'll use DPM Normal for all the distributions in this tutorial as we want to both clamp the input and scale it to a particular range. This is selected under the Param Mode drop-down menu shown in Figure 3.

Set input range

Next we specify the min and max output. For this effect, we'll use 0.0 for the min and 1.0 for the max, as shown in Figure 4. This means the light from this part of the fire will scale from 0 percent fuel (fully dark) to 100 percent fuel (a nice campfire glow).

Settings interface
Figure 4. Setting input range.

Set output range

The output range lets us specify the minimum and maximum brightness for this part of the fire. Set these to the values shown in Figure 5.

Settings interface
Figure 5. Setting output range.

Set default input value

Now we need to set a default input value in case the effect is not given a value. This is done with Constant (see Figure 6). For this particle system, we'll set the default at 1.0, or a full flame.

Settings interface
Figure 6. Setting the default value.

Setting up the rest of the modules

Second emitter light

To ensure the light emitted by the fire is consistent with the particles in the particle system, we modify the light module on the second emitter as well. Change the Brightness Over Life section on the light module on the second emitter to match the values shown in Figure 7. If we didn't scale this light source as well, the fire would still emit a full glow when it is just embers.

Settings interface
Figure 7. Second emitter light.

First and second emitter scale

Presently, the amount of light that our fire produces will change with fuel, but the size of the flames will not. To change this, we add a Size Scale emitter to both the first and second emitter as shown in Figure 2. This distribution will be a Vector Particle Parameter instead of a Float Particle Parameter. Since we are giving it the same parameter name as the Float Particle Parameter, Cascade copies the float value across all three fields for our vector. For both modules, we want the graphics to scale in size from 0 percent to 100 percent fuel, so the only fields we need to change are Parameter Name and Constant. Set both modules to match the values shown in Figure 8.

Settings interface
Figure 8. Emitter scale.

Smoke spawn rate

Smaller fires produce less smoke, and we can modify our particle system to reflect that. To do this, we set up a particle parameter on the rate section of the spawn module on the smoke emitter. However, unlike the previous particle parameters we set up, we only want to start scaling down the smoke spawned when we reach 40 percent fuel and below. To do this, set the Max Input to 0.4 instead of 1. Set Distribution to match the values shown in Figure 9.

Settings interface
Figure 9. Smoke spawn rate.

Embers spawn rate

Embers also scale with the size of the fire, but don't start scaling down until our fire gets really small. We'll start scaling down embers at 50 percent (0.5) for this effect. Set the Spawn Rate Distribution on the Embers emitter to match the values shown in Figure 10.

Settings interface
Figure 10. Embers spawn rate.

Distortion spawn rate

The distortion caused by the flames needs to be scaled in the same way that the flames are scaled. Since we scaled the flames from 0 percent to 100 percent fuel, we need to do the same with the distortion. Set the Spawn Rate Distribution on the Distortion emitter to match the values shown in Figure 11.

Settings interface
Figure 11. Distortion spawn rate.

Set up a blueprint

Now that our fire effect can be scaled with the amount of fuel, we need to set up a blueprint to set the amount of fuel. In this tutorial, the amount of fuel slowly depletes, and then fills back up again to demonstrate the effect. To create a blueprint for this effect, drag the particle system into the scene, and then click Blueprint/Add Script in the details panel.

Setting up the variables

For this effect we will need just two variables, as shown in Figure 12 below:

FuelLeft: A float that keeps track of how much fuel is in our fire, ranging from 1 for 100 percent fuel to 0 for 0 percent fuel. The default is set to 1, so the fire starts at full flame.

FuelingRate: A float that dictates how quickly we deplete or fill fuel. For this tutorial, we'll set the default value to -0.1 (-10 percent per second) for this tutorial.

When both variables have been created, the variable section of the blueprint should match that of Figure 12.

Settings interface
Figure 12. Fire variables.

Changing fuel left

For this effect, we need to change the amount of fuel left every tick and apply it to the particle system. To do this, we multiply the Fueling Rate by Delta Seconds and add it to Fuel Left. This value then gets set to Fuel Left.

To apply Fuel Left to the particle system, we use the Set Float Parameter node. For the target, we use our modified P_Fire particle system component, and for Param we use Fuel Left. The parameter name needs to be the name we used in our particle system, which in this tutorial is FuelLeft.

Settings interface
Figure 13. Modifying fuel left.

Bounding fuel left

Eventually our fire will run out of fuel. In this tutorial, we want to switch to fueling the fire instead of depleting it at that point. To do this, we continue to work on the tick and check whether our new fuel value is too low (less than or equal to -0.1) or too high (greater than or equal to 1.0). The reason we set the low bounds to -0.1 is so that the fire will stay depleted for a bit before refueling. This doesn't cause any problems because any values passed to our particle system below 0 are treated as 0 due to the min input we set up.

If we find that Fuel Left is out of bounds, we multiply the Fueling Rate variable by -1. If Fuel Left is being decreased, this will cause it to be increased in subsequent ticks, or vice versa if it is being increased.

Settings interface
Figure 14. Bounding fuel left.