<p>Foreword xv</p> <p>Preface xxi</p> <p>Acknowledgments xxiii</p> <p>About the Contributors xxv</p> <p><strong>Chapter 1: OpenACC in a Nutshell 1</strong></p> <p>1.1 OpenACC Syntax 3</p> <p>1.2 Compute Constructs 6</p> <p>1.3 The Data Environment 11</p> <p>1.4 Summary 15</p> <p>1.5 Exercises 15</p> <p><strong>Chapter 2: Loop-Level Parallelism 17</strong></p> <p>2.1 Kernels Versus Parallel Loops 18</p> <p>2.2 Three Levels of Parallelism 21</p> <p>2.3 Other Loop Constructs 24</p> <p>2.4 Summary 30</p> <p>2.5 Exercises 31</p> <p><strong>Chapter 3: Programming Tools for OpenACC 33</strong></p> <p>3.1 Common Characteristics of Architectures 34</p> <p>3.2 Compiling OpenACC Code 35</p> <p>3.3 Performance Analysis of OpenACC Applications 36</p> <p>3.4 Identifying Bugs in OpenACC Programs 51</p> <p>3.5 Summary 53</p> <p>3.6 Exercises 54</p> <p><strong>Chapter 4: Using OpenACC for Your First Program 59</strong></p> <p>4.1 Case Study 59</p> <p>4.2 Creating a Naive Parallel Version 68</p> <p>4.3 Performance of OpenACC Programs 71</p> <p>4.4 An Optimized Parallel Version 73</p> <p>4.5 Summary 78</p> <p>4.6 Exercises 79</p> <p><strong>Chapter 5: Compiling OpenACC 81</strong></p> <p>5.1 The Challenges of Parallelism 82</p> <p>5.2 Restructuring Compilers 88</p> <p>5.3 Compiling OpenACC 92</p> <p>5.4 Summary 97</p> <p>5.5 Exercises 97</p> <p><strong>Chapter 6: Best Programming Practices 101</strong></p> <p>6.1 General Guidelines 102</p> <p>6.2 Maximize On-Device Compute 105</p> <p>6.3 Optimize Data Locality 108</p> <p>6.4 A Representative Example 112</p> <p>6.5 Summary 118</p> <p>6.6 Exercises 119</p> <p><strong>Chapter 7: OpenACC and Performance Portability 121</strong></p> <p>7.1 Challenges 121</p> <p>7.2 Target Architectures 123</p> <p>7.3 OpenACC for Performance Portability 124</p> <p>7.4 Code Refactoring for Performance Portability126</p> <p>7.5 Summary 132</p> <p>7.6 Exercises133</p> <p><strong>Chapter 8: Additional Approaches to Parallel Programming 135</strong></p> <p>8.1 Programming Models135</p> <p>8.2 Programming Model Components142</p> <p>8.3 A Case Study 155</p> <p>8.4 Summary170</p> <p>8.5 Exercises170</p> <p><strong>Chapter 9: OpenACC and Interoperability 173</strong></p> <p>9.1 Calling Native Device Code from OpenACC 174</p> <p>9.2 Calling OpenACC from Native Device Code 181</p> <p>9.3 Advanced Interoperability Topics 182</p> <p>9.4 Summary185</p> <p>9.5 Exercises185</p> <p><strong>Chapter 10: Advanced OpenACC 187</strong></p> <p>10.1 Asynchronous Operations 187</p> <p>10.2 Multidevice Programming 204</p> <p>10.3 Summary 213</p> <p>10.4 Exercises 213</p> <p><strong>Chapter 11: Innovative Research Ideas Using OpenACC, Part I 215</strong></p> <p>11.1 Sunway OpenACC 215</p> <p>11.2 Compiler Transformation of Nested Loops for Accelerators 224</p> <p><strong>Chapter 12: Innovative Research Ideas Using OpenACC, Part II 237</strong></p> <p>12.1 A Framework for Directive-Based High-Performance Reconfigurable Computing 237</p> <p>12.2 Programming Accelerated Clusters Using XcalableACC 253</p> <p>Index 269</p>