Knight Landing edition

We have finished our latest book project: "Intel® Xeon Phi™ Processor High Performance Programming, Knights Landing Edition, by Jim Jeffers, James Reinders, and Avinash Sodani.

Our book has three sections: I. Knights Landing, II. Parallel Programming, III. Pearls. The book has an extensive Glossary and Index to facilitate jumping around the book.
Section I: Knights Landing. Focuses on Knights Landing itself, diving into the architecture, the high bandwidth memory, the cluster modes and the integrated fabric.

Chapter 1: Introduction. Introduces Many-core Programming. Explains why many-core is important, how to measure readiness for many-core, and the importance of tuning for performance for multi- and many-core. Parallel programming models play a key role. The dual-tuning advantage of many-core (with multi-core) is introduced, which is valdated in Section III of the book.

Chapter 2: Knights Landing Overview. Introduces Knights Landing, a many-core processor that delivers massive thread and data parallelism with high memory bandwidth. Knights Landing is the Second Generation of Intel® Xeon Phi™ products using a many-core architecture which both benefits from, and relies on, parallel programming. Key new innovations such as MCDRAM, cluster modes and memory modes are explained at a high-level.

Chapter 3: Programming MCDRAM and Cluster Modes. Essentials of programming to utilize the high bandwidth memory known as the MCDRAM and to utilize cluster modes. The memkind library, and use of numactl, are discussed.

Chapter 4: Knights Landing Architecture. Dives deeply into the Knights Landing architecture. Describes the tile and core architecture, as well as the cluster modes and memory modes supported by Knights Landing.

Chapter 5: Intel Omni-Path Fabric. Details on the next generation fabric with heritage from the Intel® TrueScale product line and the Cray Aries interconnect. Some versions of Knights Landing have this fabric integrated on-package.

Chapter 6: μarch Optimization Advice. Tuning advice that is specific to the Knights Landing design, which is known as the microarchitecture and is abbreviated as μarch. Focuses on tuning advice arising specifically from the Knights Landing μarch design when compared with the Knights Corner μarch (found in the first generation Intel Xeon Phi products) or the μarch of a recent Intel® Xeon® processor.


Section II: Parallel Programming. Focuses on application programming with consideration for the scale of many-core.

Chapter 7: Programming Overview for Knights Landing. Discusses the keys to effective parallel programming. While getting maximal performance from Knights Landing is largely the same challenge as with any processor, the challenge of parallel programming remains. The basics of managing parallelism at the domain, thread, data and locality levels are discussed. The provocative “To Refactor, or Not to Refactor” question is examined.

Chapter 8: Tasks and Threads. Discusses the key techniques, i.e., OpenMP, Fortran 2008, TBB, or MKL, which are expected to be the most popular for Knights Landing. Emerging trends and options are discussed briefly. The compatibility of Knights Landing means that much more is possible than covered in a short chapter.

Chapter 9: Vectorization. Discusses the AVX-512 vector parallel capabilities of Knights Landing and introduces how to utilize them. This chapter give the fundamentals, which are the same techniques found in most any tutorial or reference on vectorization for processors.

Chapter 10: Vectorization Advisor. Introduces the Intel Vectorization Advisor, which provides AVX-512 analysis capabilities to help reach the vectorization potential of Knights Landing. For scalar loops, it helps to discover what prevents code from being vectorized. For vectorized loops, it provides detailed AVX-512 performance characterization. Recommendations are additionally supplemented with the AVX-512 Traits and FLOPs, masks, Roofline and Gather/Scatter reports.

Chapter 11: Vectorization with SDLT. Introduces Intel® SIMD Data Layout Templates (SDLT) containers (use in place of std::vector). For C++ code, this can offer an effective method to achieve superior performance by increasing vectorization through “AOS to SOA or AOSOA” conversions. This can enhance performance of Knights Landing or any processor. Includes sample codes, and discussion on how to transition from Array of Structures (AOS) to Structure of Arrays (SOA) or Arrays of Structure of Arrays (ASA) utilizing SDLT while maintaining a high level object oriented structure.

Chapter 12: Vectorization with AVX-512 Intrinsics. Introduces programming with intrinsics for Intel® Advanced Vector Extensions 512-bits (AVX 512). Helps directly harness the richness of AVX-512 instructions by bypassing limitations of languages and compilers.

Chapter 13: Performance Libraries. Discusses three libraries from Intel, i.e., Intel® Math Kernel Library (MKL), Intel® Data Analytics Acceleration Library (DAAL), and Intel® Integrated Performance Primitives (IPP), collectively referred to as the Intel® Performance Libraries. These libraries high performance versions of important computationally complex algorithms. Knights Landing can utilize each of them; Intel has endowed these libraries with Knights Landing optimizations including support for AVX-512.

Chapter 14: Profiling and Timing. Discusses insight based on event counters built into Knights Landing, and using those counters with the Intel® VTune Amplifier. Also discusses timing, a critical element in evaluating performance.

Chapter 15: MPI. Discusses MPI on Knights Landing, which has the same interfaces as on Intel Xeon processor based systems. Discusses how the characteristics of hybrid MPI/OpenMP performance may require tuning as the optimal balance of MPI ranks and OpenMP threads may vary.

Chapter 16: PGAS Programming Models. Takes a look at Partitioned Global Address Space (PGAS) programming models, which scale across cores and nodes while preserving a shared memory-like programming model. While Knights Landing will be programmed mostly with MPI, OpenMP and TBB, utilizing PGAS models will be increasingly important in the future. Examples illustrate that PGAS can be an effective programming model for the large number of cores on a Knights Landing.

Chapter 17: Software Defined Visualization. Visualizations of large data sets are best done on processors, and this chapter explains why and how by highlighting three key open source libraries that are fundamental for SDVis work (i.e., OpenSWR, Embree, and OSPRay). These libraries benefit from the SDVis capabilities of Knights Landing.

Chapter 18: Offload to Knights Landing. Covers two topics: the offload programming model, and Knights Landing coprocessor specific considerations. They are separate, but related, topics which are addressed together and separately.

Chapter 19: Power Analysis. Explores the fundamentals of power and performance analysis on Knights Landing using both open-source and Intel tools. Because Knights Landing is compatible with other Intel Xeon processors, the power measurement techniques covered are also applicable to server systems based on other Intel processors.


Section III: Pearls. Focuses on parallel programming in full applications with examples with notes on Knights Landing specific results and optimizations.

Chapters 20-26: Results on LAMMPS, SeisSol, WRF, N-Body Simulations, Machine Learning, Trinity mini-applications and QCD are discussed.

Elsevier (our publisher).
Amazon.