link

January 29, Tuesday
12:00 – 13:00

Revisiting Data Dependencies and their Applicability to Parallelization of Modern Software
Computer Science seminar
Lecturer : Omer Tripp
Affiliation : School of Computer Science, Tel Aviv University
Location : 202/37
Host : Dr. Danny Hendler
Parallelizing compilers are known to work well for scientific computations, such as matrix and vector manipulations. However, modern software has long moved on. The current landscape of applications written in high-level languages, like Java and C#, introduces new and important challenges. The compiler needs to handle multiple levels of abstraction, extensive pointer-based computations over dynamic memory, unbounded data structures, and deployment-specific program behaviors. Data dependence analysis – tracking whether statements in the sequential code depend on each other due to interfering accesses to some shared memory location – has been the foundation of parallelizing compilers. The fundamental observation, going (at least) 30 years back, is that two code blocks that have no (transitive) data dependencies can be executed in parallel, resulting in the same final state as running the codes sequentially. For all the challenges listed above, only in rare cases are parallelizing compilers able to prove this precondition: The candidate code blocks are often dependent, and even if not, the compiler's (static) dependence analysis is typically too conservative to prove independence, failing due to spurious dependencies. I will propose a new view of data dependencies, showing that they remain a useful tool for reasoning about parallelism, albeit in different ways and different form. I will show how data dependencies can be lifted to higher levels of abstraction, per the abstractions governing the structure of the application, as well as how effective use can be made of partial yet precise dependencies to guide the behavior a parallelization system while preserving its correctness (i.e. serializability guarantees). This can be done in more than one way, including (i) building specialized, client-specific conflict-detection oracles, (ii) synthesizing concurrency monitors that predict the available parallelism per input data and/or computation phase, and (iii) finding true, semantic dependencies that limit parallelism.