

ATAL_FDP_HPC_S6_OpenMP
Presentation
•
Computers
•
University
•
Practice Problem
•
Medium
Pushpendra Pateriya
Used 2+ times
FREE Resource
49 Slides • 5 Questions
1
Parallel Programming with OpenMP
Pushpendra Kumar Pateriya
Assistant Professor, Head of System Programming
School of Computer Science and Technology,
Lovely Professional University
2
Multiple Choice
Why is parallel programming needed in modern computing?
To replace all sequential programs
3
What's the need of parallel architectures and parallel programming
4
Moore's Law.
The chart tracks the progress of transistor integration, for Intel's devices, from 1960 to 2010.
5
Computer Architecture and the Power Wall
6
Partial solution: simple low power cores
7
Power calculation for analysis
8
Power consumption Analysis
9
Microprocessor trends
10
Concurrency vs. Parallelism
Concurrency: A condition of a system in which multiple tasks are logically active at one time.
Parallelism: A condition of a system in which multiple tasks are actually active at one time.
11
Concurrency vs. Parallelism
12
Multiple Choice
What is the key difference between parallel and concurrent execution?
Parallel execution happens simultaneously, while concurrent execution involves task switching.
Concurrent execution requires multiple processors.
13
The Parallel programming process:
14
OpenMP Overview
Definition: OpenMP (Open Multi-Processing) is an API for parallel programming in C, C++, and Fortran.
Purpose: Enables shared-memory multiprocessing on multi-core and multi-threaded processors.
Components:
Compiler directives (#pragma omp ...)
Runtime library routines
Environment variables
Advantages:
Simplicity: Uses directives instead of rewriting code
Scalability: Works on multi-core architectures
Portability: Supported by most modern compilers
15
OpenMP Limitations
Works only for shared-memory architectures
Manual tuning required for performance optimization
Not suitable for all types of parallelism (e.g., distributed memory systems)
16
OpenMP Solution Stack
17
Single and Multithreaded Process
18
Programming shared memory computers
19
Programming shared memory computers
20
Multiple Choice
Which of the following statements is true regarding the stack, heap, text, and data segments in the context of threads?
21
A shared memory program
An instance of a program:
One process, multiple threads sharing a common address space.
Threads interact via shared memory using reads/writes.
OS scheduler interleaves thread execution for fairness.
Synchronization ensures correctness across all execution orders.
22
Exercise 1, Part A: Hello world
What will be the output of the following code?
int main()
{
int ID = 0;
printf(“ hello(%d) ”, ID);
printf(“ world(%d) \n”, ID);
}
23
Open Ended
What will be the output of following C program?
int main()
{
int ID = 0;
printf(“ hello(%d) ”, ID);
printf(“ world(%d) \n”, ID);
}
24
Modified Code
25
A multi-threaded “Hello world” program
26
Memory access time varies depending on the location of the data.
Memory is divided into "Near" and "Far" regions, where access to local (near) memory is faster than remote (far) memory.
Example: AMD EPYC processors with multiple interconnected memory regions.
Non-Uniform Memory Access (NUMA)
All processors have equal access time to the shared memory.
The operating system treats all processors identically.
Example: Intel Xeon-based servers with multiple CPUs sharing the same memory.
Symmetric Multiprocessor (SMP)
Shared memory Computers
27
OpenMP Programming Model
Fork-Join Parallelism:
Master thread spawns a team of threads as needed.
Parallelism added incrementally until performance goals are met: i.e. the sequential program evolves into a parallel program.
28
Thread Creation: Parallel Regions
For example, To create a 4 thread Parallel region
29
Thread Creation: Parallel Regions
30
Exercise
31
Serial pi program
32
Write a parallel program
Create a parallel version of the pi program using a parallel construct.
Pay close attention to shared versus private variables.
In addition to a parallel construct, you will need the runtime library routines
int omp_get_num_threads(); // Number of threads in the team
int omp_get_thread_num(); //Thread ID or rank
double omp_get_wtime(); //Time in Seconds since a fixed point in the past
33
A simple Parallel pi program
34
Results
35
False sharing
36
Eliminate False sharing by padding the sum array
37
Results: pi program padded accumulator
38
Do we really need to pad our arrays?
Padding arrays requires a good understanding of how the cache works. If you switch to a computer with a different cache size, your program's performance may drop significantly.
39
How do threads interact?
OpenMP is a multi-threading, shared address model.
Threads communicate by sharing variables.
Unintended sharing of data causes race conditions:
race condition: when the program’s outcome changes as the threads are scheduled differently.
To control race conditions:
Use synchronization to protect data conflicts.
Synchronization is expensive so:
Change how data is accessed to minimize the need for synchronization.
40
Synchronization
Synchronization: bringing one or more threads to a well defined and known point in their execution.
The two most common forms of synchronization are:
41
Synchronization
High level synchronization:
critical
atomic
barrier
ordered
Low level synchronization:
flush
locks (both simple and nested)
42
Synchronization: Barrier
Barrier: Each thread waits until all threads arrive.
43
Synchronization: critical
Mutual exclusion: Only one thread at a time can enter a critical region.
44
Synchronization: Atomic
Atomic provides mutual exclusion but only applies to the update of a memory location (the update of X in the following example)
45
Pi program with false sharing
Original Serial pi program with 100000000 steps ran in 1.83 seconds.
46
Using a critical section to remove impact of false sharing
47
Using a critical section to remove impact of false sharing
48
Open Ended
Any Question till now?
49
The loop worksharing Constructs
The loop worksharing construct splits up loop iterations among the threads in a team
50
Loop worksharing Constructs
51
Both are equivalent
Put the “parallel” and the worksharing directive on the same line
52
Nested loops
collapse clause
Will form a single loop of length NxM and then parallelize that.
Useful if N is O(no. of threads) so parallelizing the outer loop makes balancing the load difficult.
53
Reduction
OpenMP reduction clause: reduction (op : list)
Inside a parallel or a work-sharing construct:
A local copy of each list variable is made and initialized depending on the “op” (e.g. 0 for “+”).
Updates occur on the local copy.
Local copies are reduced into a single value and combined with the original global value.
The variables in “list” must be shared in the enclosing parallel region.
54
-- Pushpendra Kumar Pateriya
"That brings us to the end of today's session. We explored OpenMP"
"Thank you everyone"
Parallel Programming with OpenMP
Pushpendra Kumar Pateriya
Assistant Professor, Head of System Programming
School of Computer Science and Technology,
Lovely Professional University
Show answer
Auto Play
Slide 1 / 54
SLIDE
Similar Resources on Wayground
49 questions
Test2
Presentation
•
University
45 questions
Getting to Know Access
Presentation
•
University
49 questions
ELS106 Verbs
Presentation
•
University
45 questions
Wk1 - Intro to Programming Logic Formulation (C105-103i)
Presentation
•
University
45 questions
Email - Year 7 23/24
Presentation
•
KG - University
50 questions
Network Topology
Presentation
•
University
46 questions
Unit-1,2,3-Organizational Communication revision
Presentation
•
University
46 questions
Atoms, Tek 8.5B, and 8.5C
Presentation
•
KG - University
Popular Resources on Wayground
20 questions
STAAR Review Quiz #3
Quiz
•
8th Grade
20 questions
Equivalent Fractions
Quiz
•
3rd Grade
6 questions
Marshmallow Farm Quiz
Quiz
•
2nd - 5th Grade
20 questions
Main Idea and Details
Quiz
•
5th Grade
20 questions
Context Clues
Quiz
•
6th Grade
20 questions
Inferences
Quiz
•
4th Grade
19 questions
Classifying Quadrilaterals
Quiz
•
3rd Grade
12 questions
What makes Nebraska's government unique?
Quiz
•
4th - 5th Grade