Shared Memory

For programmers familiar with a distributed shared-memory paradigm, the CDS primitives can be used to mimic a similar style, specifically one which exhibits release consistency and where each shared memory region has a home process(or) to which it returns when not being accessed. This very portable approach is detailed further here, primarily as a stepping stone for translating existing shared-memory programs to CDS. Once converted, these programs will often benefit by broadening their use of other available CDS primitives. In terms of shared-memory terminology, such broadening can be considered as using multiple locks (and thereby home locations) for a single shared region and/or a single lock to cover multiple "versions" of a shared region, thereby obviating some amount of superfluous communication and/or providing greater potential concurrency.

Logical Model

A cell in CDS can serve the same role as a (write) lock in shared memory programming, since a region can be considered as accessible to any process when it resides in a cell (just as it would be in a shared-memory paradigm if a lock on the region were in a released, or "unlocked", state), and protected from access (by processes other than its current accessor) when it is not in a cell (just as though the lock had been acquired, or "locked", in a shared-memory paradigm). In this case, the write and deq primitives play the roles of primitives that release and acquire more traditional locks. (The fact that write and deq are different from those traditional primitives in that they take and deliver, respectively, a pointer to the locked region actually makes very little difference in the way these primitives are used.)

This exclusive-access paradigm can be easily extended into one allowing multiple readers but only a single writer through judicious use of the read, deq, and rgmod primitives. Specifically, a reader performs a read on the lock (i.e. cell) to access the region, thereby leaving the region available for other readers and writers, and performs an rgfree when finished, while a writer performs a deq on the lock (cell) and an rgmod of the region, and when finished, performs a write back to the lock (cell). Since each process logically gets its own copy of the region, this protocol ensures that a reader does not logically interfere with any other processes, but once a writer has taken the lock, no other readers or writers will successfully obtain the lock until the writer is done. (See the Physical Model section, below, for a discussion on how to optimize this logical model in various ways based on architectural and algorithmic properties.)

To facilitate the use of cells in this way, CDS provides mnemonic locking routines that are actually semantically equivalent to pre-existing CDS primitives, as follows:

acqwl (acquire write lock), equivalent to deq (with CDS_PWRIT permission)
rlswl (release write lock), equivalent to write (with CDS_PNONE permission)
acqrl (acquire read lock), equivalent to read (with CDS_PREAD permission)
rlsrl (release read lock), equivalent to rgfree
wl2rl (change write lock to read lock), equivalent to write (with CDS_PREAD)

acqrl and acqwl are also available in non-blocking versions, iacqrl and iacqwl.

The use of these routines is illustrated here.

To allocate a region (from the local comm heap) and assign it to a lock (i.e. a comm cell) on process proc in context cntxt:

cds_rgalloc(&rgid,regionsize);
(Initialize the region, if desired)
cds_rlswl(rgid,proc,cntxt,lock)

To block until a write lock to the region is acquired

cds_acqwl(&rgid,proc,cntxt,lock,CDS_BLOCK,0)

which translates into

cds_deq(&rgid,proc,cntxt,lock,CDS_PWRIT,CDS_BLOCK)

Note that acqwl contains an extra last argument. If it is non-zero, it is used as the waitflg argument on an explicit rgmod. That is, if the last argument had been 1 rather than 0, above, then it would have translated into:

cds_deq(&rgid,proc,cntxt,lock,CDS_PREAD,CDS_BLOCK)
cds_rgmod(rgid,1)

(The value of this extra argument actually has no logical effect on the behavior of the program, and the potential efficiency differences are discussed under the Physical Model of the basic primitives.)

To release a write lock to the region

cds_rlswl(rgid,proc,cntxt,lock)

To block until a read lock to the region is acquired

cds_acqrl(&rgid,proc,cntxt,lock,CDS_BLOCK)

To release a read lock to the region

cds_rlsrl(rgid,proc,cntxt,lock)

(Note that all arguments to rlsrl except the first are ignored, since rgfree takes only the region id. The other arguments are only to provide symmetry with rlswl.)

To convert a write lock to a read lock

cds_wl2rl(rgid,proc,cntxt,lock)

Physical Model

Using these operations on a true shared-memory architecture, acquiring a read lock should never require any copying (since writers, by construction, cannot exist), but acquiring a write lock may require a copy if there are readers present. The use of the last argument on acqwl may diminish the occurrence of such copying slightly in certain circumstances, but the true solution is probably a change in perception rather than in implementation: A copy is not a bad thing if it improves the performance of the program by decreasing idle time, and that is exactly what the copy performed by acqwl will accomplish.

Certainly, the "lock" (i.e. cell) used for a logical shared memory region should be in or near the process or processes where that region will be accessed the most frequently. In fact, by using multiple "locks" (i.e. cells), located in the different processes where the region will be used, and always "unlocking" the the region in the process where it will be used next, the region will effectively be predictively forwarded to its next destination, helping to hide latency (if any). (This approach is not always possible, depending upon the nature of the algorithm being programed.)

The more general form of cells, which may contain multiple regions, can (in some cases) be regarded as simply queuing multiple versions of a shared region (and therefore queuing the lock). Again, the utility of such an interpretation will depend upon the algorithm being programed. In any case, the basic primitives (rather than these shared memory mnemonics) should probably be used whenever dealing with cells that might contain multiple regions.

[Index] On to Handlers and Active Messages