Multi-Threading Questions

Table of Contents

What is the difference between a thread and a process?

heavyweight vs lightweight

Processes are always used to execute "heavyweight" jobs such as running different applications, while threads are always used to execute smaller "lightweight" jobs such as auto saving.

create

New threads are easily created, whereas new processes require duplication of the parent process.

Creating a new process can be expensive. It takes time. (A call into the operating system is needed, and if the process creation triggers process rescheduling activity, the operating system's context-switching mechanism will become involved.) IT takes memory. (The entire process must be replicated.)

Threads can be created without replicating an entire process.

Sharing Resources

Independent processes share nothing. Each process runs in a separate address space. Changes made to the parent processes donot affect the child processes.

Threads share such process resources and global variables and file descriptions. If one thread changes the value of any such resource, the change will be evident to any other thread in the process, if anyone cares to look. The sharing of process resources among threads is one of the multithreaded programming model's major performance advantages, as well as one of its most difficult programming aspects. Having all of this context available to all threads in the same memory facilitates communication between threads. However, at the same time, it makes it easy to introduce errors of the sort in which one thread affects the value of a variable used by another thread in ways the other thread did not expect.

Then switching between threads is much simpler and faster then switching between processes.

Communication

Threads can directly communicate with other threads of its process; processes must use inter-process communication mechanisms to communicate with sibling processes.

Multiple processes can use any of the many other UNIX Interprocess Communication (IPC) mechanisms: sockets, shared memory, and messages, to name a few. The multiprocess version of our program uses shared memory, but the other methods are equally valid. Even the waitpid call in our program could be used to exchange information, if the program checked its return value. However, in the multiprocess world, all types of IPC involve a call into the operating system—to initialize shared memory or a message structure, for instance. This makes communication between processes more expensive than communication between threads. Add to this the cost of interprocess communication and synchronization of shared data, which also may involve calls into the operating system kernel.

When processes synchronize, they usually have to issue system calls, a relatively expensive operation that involves trapping into the kernel. But threads can synchronize by simply monitoring a variable—in other words, staying within the user address space of the program.

similarities

A thread can do anything that a process can do. They are both able to reproduce(Processes create child process and threads create more threads.), and they both have ID, priority and etc.

primary benefits to multithreading

http://www.quora.com/What-is-the-difference-between-a-process-and-a-thread

There are four primary benefits to multithreading:

  • Programming abstraction. Dividing up work and assigning each division to a unit of execution (a thread) is a natural approach to many problems. Programming patterns that utilize this approach include the reactor, thread-per-connection, and thread pool patterns. Some, however, view threads as an anti-pattern. The inimitable Alan Cox summed this up well with the quote, "threads are for people who can't program state machines."
  • Parallelism. In machines with multiple processors, threads provide an efficient way to achieve true parallelism. As each thread receives its own virtualized processor and is an independently-schedulable entity, multiple threads may run on multiple processors at the same time, improving a system's throughput. To the extent that threads are used to achieve parallelism—that is, there are no more threads than processors—the "threads are for people who can't program state machines" quote does not apply.
  • Blocking I/O. Without threads, blocking I/O halts the whole process. This can be detrimental to both throughput and latency. In a multithreaded process, individual threads may block, waiting on I/O, while other threads make forward progress. Asynchronous & non-blocking I/O are alternative solutions to threads for this issue.
  • Memory savings. Threads provide an efficient way to share memory yet utilize multiple units of execution. In this manner they are an alternative to multiple processes.

The cost of these benefits of threading are increased complexity in the form of needing to manage concurrency through mechanisms such as mutexes and condition variables. Given the growing trend toward processors sporting multiple cores and systems sporting multiple processors, threading is only going to become a more important tool in system programming.

Advantages and Disadvantages

Thread Advantages

  • Threads are memory efficient. Many threads can be efficiently contained within a single EXE, while each process can incur the overhead of an entire EXE.
  • Threads share a common program space, which among other things, means that messages can be passed by queuing only a pointer to the message. Since processes do not share a common program space, the kernel must either copy the entire message from process A's program space to process B's program space - a tremendous disadvantage for large messages, or provide some mechanism by which process B can access the message.
  • Thread task switching time is faster, since a thread has less context to save than a process.
  • With threads the kernel is linked in with the user code to create a single EXE. This means that all the kernel data structures like the ready queue are available for viewing with a debugger. This is not the case with a process, since the process is an autonomous application and the kernel is separate, which makes for a less flexible environment.

Thread Disadvantages

  • Threads are typically not loadable. That is, to add a new thread, you must add the new thread to the source code, then compile and link to create the new executable. Processes are loadable, thus allowing a multi-tasking system to be characterized dynamically. For example, depending upon system conditions, certain processes can be loaded and run to characterize the system. However, the same can be accomplished with threads by linking in all the possible threads required by the system, but only activating those that are needed, given the conditions. The really big advantage of loadability is that the process concept allows processes (applications) to be developed by different companies and offered as tools to be loaded and used by others in their multi-tasking applications.
  • Threads can walk over the data space of other threads. This cannot happen with processes. If an attempt is made to walk on another process an exception error will occur.

What synchronization primitives do you know, tell difference between them

  1. Mutex
  2. Join
  3. Condition variable
  4. Barriers
  5. Spinlock
  6. Semaphore

What is the best way to terminate a thread.

Why you shouldn't use TerminateThread-esque functions.

thread-safe producer/consumer

Write a thread-safe producer/consumer buffer that can be accessed by one or more producer/consumers

When might you choose to use threads on a single CPU system?

IO read/write, buffer and such time-consuming processes.

How would you measure the context switch overhead between threads?

MT-safe hash table

How would you make a MT-safe hash table, while allowing for maximal concurrency?

write thread code

What can go wrong when you write thread code and how can you guard against them?

Basically, the big three of threading problems are deadlock, races and starvation.

The simplest deadlock condition is when there are two threads and thread A can't progress until thread B finishes, while thread B can't progress until thread A finishes. This is usually because both need the same two resources to progress, A has one and B has the other. Various symmetry breaking algorithms can prevent this in the two thread or larger circle cases.

Races happen when one thread changes the state of some resource when another thread is not expecting it (such as changing the contents of a memory location when another thread is part way through reading, or writing to that memory). Locking methods are the key here. (Some lock free methods and containers are also good choices for this. As are atomic operations, or transaction based operations.)

Starvation happens when a thread needs a resource to proceed, but can't get it. The resource is constantly tied up by other threads and the one that needs it can't get in. The scheduling algorithm is the problem when this happens. Look at algorithms that ass

What is the atomic operation and why do they matter?

Atomic operations are operations that can't lose control of the resources they have while executing. Functionally, they are equivalent to operations that happen in a single clock tick on a simple non-pipelined processor. In reality, most take more than a tick but are protected while they execute. They are immune to race conditions, and that makes them useful.

Name three thread design patterns1

  1. Thread pool
  2. Peer
  3. Pipeline

Explain how a thread pool works

One thread dispatches other threads to do useful work which are usually part of a worker thread pool. This thread pool is usually pre-allocated before the boss(or master) begins dispatching threads to work. Although threads are lightweight, they still incur overhead when they are created.

If the boss thread becomes a worker thread once it's done starting other worker threads then this is a Peer Thread Design Pattern.

Define: critical section

critical section is a piece of code that accesses a shared resource (data structure or device) that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will have to wait a fixed time to enter it (aka bounded waiting). Some synchronization mechanism is required at the entry and exit of the critical section to ensure exclusive use, for example a semaphore.

What are four mutex types?

  • Recursive: allows a thread holding the lock to acquire the same lock again which may be necessary for recursive algorithms.
  • Queuing: allows for fairness in lock acquisition by providing FIFO ordering to the arrival of lock requests. Such mutexes may be slower due to increased overhead and the possibility of having to wake threads next in line that may be sleeping.
  • Reader/Writer: allows for multiple readers to acquire the lock simultaneously. If existing readers have the lock, a writer request on the lock will block until all readers have given up the lock. This can lead to writer starvation.
  • Scoped: RAII-style semantics regarding lock acquisition and unlocking.

Define: deadlock and Banker's Algorithm

Two (or more) threads have stopped execution or are spinning permanently. For example, a simple deadlock situation: thread 1 locks lock A, thread 2 locks lock B, thread 1 wants lock B and thread 2 wants lock A.

Banker's Algorithm2

  • Resources

    For the Banker's algorithm to work, it needs to know three things:

    • How much of each resource each process could possibly request[CLAIMS]
    • How much of each resource each process is currently holding[ALLOCATED]
    • How much of each resource the system currently has available[AVAILABLE]

    Resources may be allocated to a process only if it satisfies the following conditions:

    • request ≤ max, else set error condition as process has crossed maximum claim made by it.
    • request ≤ available, else process waits until resources are available.

    Let n be the number of processes in the system and m be the number of resource types. Then we need the following data structures:

    • Available: A vector of length m indicates the number of available resources of each type. If Available[j] = k, there are k instances of resource type Rj available.
    • Max: An n×m matrix defines the maximum demand of each process. If Max[i,j] = k, then Pi may request at most k instances of resource type Rj.
    • Allocation: An n×m matrix defines the number of resources of each type currently allocated to each process. If Allocation[i,j] = k, then process Pi is currently allocated k instances of resource type Rj.
    • Need: An n×m matrix indicates the remaining resource need of each process. If Need[i,j] = k, then Pi may need k more instances of resource type Rj to complete the task.

    Note: Need[i,j] = Max[i,j] - Allocation[i,j].

  • Safe and Unsafe States

    Given that assumption, the algorithm determines if a state is safe by trying to find a hypothetical set of requests by the processes that would allow each to acquire its maximum resources and then terminate (returning its resources to the system). Any state where no such set exists is an unsafe state.

  • Limitations
    1. Specifically, it needs to know how much of each resource a process could possibly request. In most systems, this information is unavailable, making it impossible to implement the Banker's algorithm.
    2. Also, it is unrealistic to assume that the number of processes is static since in most systems the number of processes varies dynamically.
    3. Moreover, the requirement that a process will eventually release all its resources (when the process terminates) is sufficient for the correctness of the algorithm, however it is not sufficient for a practical system. Waiting for hours (or even days) for resources to be released is usually not acceptable.

How can you prevent deadlocking from occurring?

You can prevent deadlocks from happening by making sure threads acquire locks in an agreed order(i.e. preservation of lock ordering). Deadlock can also happen if threads do not unlock mutexes properly.

Define: race condition

A race condition is when non-deterministic behavior results from threads accessing shared data or resources without following a defined synchronization protocol for serializing such access.

when a piece of code is shared between 2 threads and when output of the code is different depending on the order in which that 2 threads execute that code…is race condition.

How can you prevent race conditions from occurring?

Take steps within your program to enforce serial access to shared file descriptors and other external resources.

Define: priority inversion

A higher priority thread can wait behind a lower priority thread if the lower priority thread holds a lock for which the higher priority thread is waiting.

Define: Condition Variable

Condition variables allow threads to synchronize to a value of a shared resource. Typically, condition variables are used as a notification system between threads.

Define: (thread) barriers

Barriers are a method to synchronize a set of threads at some point in time by having all participating threads in the barrier wait until all threads have called the said barrier function. This, in essence, blocks all threads participating in the barrier until the slowest participating thread reaches the barrier call.

Semaphores

Semaphores are another type of synchronization primitive that come in two flavors: binary and counting. Binary semaphores act much like simple mutexes, while counting semaphores can behave as recursive mutexes. Counting semaphores can be initialized to any arbitrary value which should depend on how many resources you have available for that particular shared data. Many threads can obtain the lock simultaneously until the limit is reached. This is referred to as lock depth.

Spinlocks

Spinlocks are locks which spin on mutexes. Spinning refers to continuously polling until a condition has been met. In the case of spinlocks, if a thread cannot obtain the mutex, it will keep polling the lock until it is free. The advantage of a spinlock is that the thread is kept active and does not enter a sleep-wait for mutex to become available, thus can perform better in certain cases than typical blocking-sleep-wait style mutexes. Mutexes which are heavily contended are poor candidates for spinlocks.

Six synchronization primitives

  1. Mutex
  2. Join
  3. Condition variable
  4. Barriers
  5. Spinlock
  6. Semaphore

livelock

A livelock is simliar to a deadlock, except that the states of the processes involved in the livelock constantly change with regard to one another, none progressing.

What does the term 'lock-free' mean?

Multithreaded code written such that the threads can never permanently lock up.

"Busy waiting" and how it can be avoided

When one thread is waiting for another thread using an active looping structure that doesn't do anything. This can be avoided using mutexes.

IPC3

  • File
  • Signal
  • Socket
  • Message queue
  • Pipe
  • Shared memory

How would you maintain concurrency on a shared page

How would you maintain concurrency on a shared page being edited by multiple users simultaneously. What if the page is being shared using a client- server mechanism. Represent the classes and explain the thread safety mechnism to avoid editing conflicts.

Use Optimistic concurrency control(OCC). OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted.

  • Divide the page into multiple segments. Assign each segment a version number which increments on every update.
  • The list of version numbers is communicated to the clients as well.
  • Before updating a segment, the client gets the latest page and its versions from the server. Updates the page and sends it across to the server along with the versions.
  • f any part of the page is updated, the version on the server would change. If so, reject the clients request along with the latest page and the changed versions.

Write a multi threaded C code with one thread printing all even numbers

Write a multi threaded C code with one thread printing all even numbers and the other all odd numbers. The output should always be in sequence ie. 0,1,2,3,4….etc

#include <stdio.h>
#include <pthread.h>

pthread_t odd_thread, even_thread;
pthread_mutex_t lock;
pthread_cond_t cond;

int is_even;

void* even_print(void *data) {
  int *max = data;
  int i;
  for (i = 0; i < *max; i += 2) {
    pthread_mutex_lock(&lock);
    while (is_even != 1) {
      pthread_cond_wait(&cond, &lock);
    }
    is_even = 0;
    printf("%d, ", i);
    pthread_cond_signal(&cond);
    pthread_mutex_unlock(&lock);
  }
  pthread_exit(0);
}

void* odd_print(void *data) {
  int *max = data;
  int i;
  for (i = 1; i < *max; i += 2) {
    pthread_mutex_lock(&lock);
    while (is_even != 0) {
      pthread_cond_wait(&cond, &lock);
    }
    is_even = 1;
    printf("%d, ", i);
    pthread_cond_signal(&cond);
    pthread_mutex_unlock(&lock);
  }
  pthread_exit(0);
}

int main(int argc, char *argv[]) {
  int max = 100;
  is_even = 1;
  pthread_cond_init(&cond, NULL);
  pthread_mutex_init(&lock, NULL);
  pthread_create(&even_thread, NULL, &even_print, &max);
  pthread_create(&odd_thread, NULL, &odd_print, &max);
  pthread_join(even_thread, NULL);
  pthread_join(odd_thread, NULL);
  pthread_mutex_destroy(&lock);
  pthread_cond_destroy(&cond);
  return 0;
}

Difference between concurrency and parallelism

  • Concurrency is the programming concept where multiple threads are spawned by same process under same process space. The threads may or may not contest for the same resource. If they do contest for same resource then synchronization may be needs to avoid situations like deadlocks.
  • Parallelism is the concept of dividing a particular computation into independent parallel processes which do not share the same process space. Parallelism requires multiple CPUs or a processor with multiple core.

If two threads are incrementing a variable 100 times each

If two threads are incrementing a variable 100 times each without synchronization, what would be the possible min and maximum value.

max value: var + 200 (it is simple. so explaining only the below) min value: var + 2

  1. P1 & P2 copy var
  2. P1 increments 99 times. so var becomes var + 99
  3. P2 increments once. so var becomes var + 1
  4. P1 copies var (value is var + 1)
  5. P2 increments 99 times. so var becomes var + 100
  6. P1 increments once. so var becomes var + 2

How would a mutex lock be implemented by the system?

t is both complicated and differs from Unix to Unix variant.

In Linux, for example, a system called Futex (Short for Fast Userspace Mutex) is used.

In this system an atomic increment and test operation is performed on the mutex variable in user space.

If the result of the operation indicates that there was no contention on the lock, the call to pthreadmutexlock returns without ever context switching into the kernel, so the operation of taking a mutex can be very fast.

Only if contention was detected does a system call (called futex) and context switch into the kernel occurs that puts the calling process to sleep until the mutex is released.

There are many many more details, especially for reliable and/or priority inhertience mutexes, but this is the essence of it.

What is a Executer in threads?

Executer is kind of Theread Pool which keeps the number of thread in a process finite. It is very efficient as it controls process creating too many threads.

what do threads share amongst themselves and what they dont share?

Shared:

  • code
  • data
  • heap memory and memory address
  • memory management information (base/relocation & limit registers, page tables
  • process state (new, ready, running, waiting, halted, etc)
  • I/O status information (opened file descriptors, etc)
  • signals
  • environment variables

Not shared:

  • registers
  • stack
  • thread specific data

Drawbacks of using mutexes in threads.

mutex have property that only owner can release it. So in a threading environment if some thread sets mutex on some shared address and it goes for long time I/O sleep or not scheduled for a long time (may be cause of lower priority) then other threads needy of that locked address will have to wait (may be infinitely causing deadlock).

A program has two functions 'readerfunc' and 'writerfunc'.

A program has two functions 'reader_func' and 'writer_func'. The reader_func reads shared data and contains a critical section. The writer_func writes to shared data and contains a critical section.

Reader threads call reader_func. Writer threads call writer_func.

The condition is multiple reader threads can access the critical section at the same time as long as they don't access the critical section along with a writer. Only a single writer thread can access the critical section, i.e. no reader or other writer threads are allowed.

Give the code segment, add code that uses mutexes that controls access to the critical sections so that the shared data is not corrupted and satisfies the give conditions. You can create as many mutexes and global variables as you want. Don't emphasize too much on syntax as to how to acquire and release locks on mutexes. Just use mutex.acquire() and mutex.release() .

Code segment:

void reader_func()
{
   //critical section
}

void writer_func()
{
   //critical section
}
int reader_count = 0;  //Keeps track of reader count
Mutex csMutex;         //Mutex for the critical section.
Mutex readerMutex;     //Mutex for the reader_count access. If you don't have this the reader_count will get corrupted.
void reader_func()
{
    readerMutex.acquire();
    if(reader_count == 0)
           csMutex.acquire();
     reader_count++;
  readerMutex.release();

 //critical section

    readerMutex.acquire();
   reader_count--;
   if(reader_count == 0)
        csMutex.release();
   readerMutex.release();
}

void writer_func()
{
   csMutex.acquire();
   //critical section
   csMutex.release();
}
  • Given a scenario where there will be lots of readers and few writers, the writers might not acquire a lock on csMutex at all give the way the code is written. There will always be bunch of readers accessing the critical section and all writers will have to most likely wait forever. So the above code favors the readers.
  • How would you change the above code to favor the writers? Whenever a writer is waiting, come up with a mechanism that won't allow any more readers into the critical section or acquire the lock. The existing readers will be allowed to finish and the last reader will release the lock. Once this is done the waiting writer will acquire the lock and do its task.
  • You can also put a limit on the number of readers that can access the critical section. Once in a while or when a writer is waiting, change the limit to 0 so that no new reader can acquire the lock thereby allowing a writer to acquire the lock on the mutex.
int MAX_READERS = 10 /*max readers 10*/ , cSem = 0 /*counting semaphore*/;
volatile bool requestCS = false;  //this flag is used by writer to request CS.
WaitCondition waitForReadersExit; //condition variable which is invoked when readers==0.
Mutex semMtx;   //mutex for atomically operating on cSem variable

void reader_func()
{
   if(!requestCS){ //if writer thread not waiting then proceed
        if(testAndIncrement(cSem,MAX_READERS)){ //if cSem less than MAX_READERS then increment it atomically proceed
                //Critical Section processing
                semMtx.acquire();
                        --cSem;
                        if(requestCS && cSem==0)                 //if writer requested CS and no readers are in CS then
                                waitForReadersExit.notify(semMtx);//unlock semMtx and notify writer
                        else
                                semMtx.release();
        }else{
                //max readers in CS.  Try later
        }
  }else{
       //writer has placed request, wait till writer is done.
  }
}
void writer_func()
{
   semMtx.acquire();
           if(cSem!=0){
                requestCS = true; 
                waitForReadersExit.wait(semMtx) //unlocks 'semMtx' and waits on the condition variable
           }
           //critical section
           requestCS = false;
   semMtx.release();
}
//atomically test and increment semaphore variable
bool testAndIncrement(semVar,MAX_READERS){
        semMtx.acquire();
        if(MAX_READERS>semVar){
                ++semVar; semMtx.release(); return true;
        }semMtx.release();
        return false;
}
/* pthread implementation of 1st reader/writer problem
 *  based on mutex and conditional variable
*/
typedef struct read_write_lock_s {
   pthread_mutex_t mutex;
   pthread_cond_t cv;
   int read_ctr;
   it  write_ctr;
}read_write_lock;

void rw_lock_init(read_write_lock *rw_lock)
{
  rw_lock->read_ctr = 0;
  rw_lock->write_ctr = 0;
  pthread_cond_init(&rw_lock->cv);
  pthread_mutex_init(&rw_lock->mutex);
}

void rw_lock_acquire_read(read_write_lock *rw_lock)
{
  pthread_mutex_lock(&rw_lock->mutex);
  while(rw_lock->write_ctr != 0) {  /* wait for writer */
    pthread_cond_wait(&rw_lock->cv, &rw_lock->mutex);
  }
  rw_lock->read_ctr++;
  pthread_mutex_unlock(&rw_lock->mutex);
}

void rw_lock_acquire_write(read_write_lock *rw_lock)
{
  pthread_mutex_lock(&rw_lock->mutex);
  while(rw_lock->write_ctr !=0 || rw_lock->read_ctr>;0) {
    pthread_cond_wait(&rw_lock->cv, &rw_lock->mutex);
    rw_lock->write_ctr=1;
  }
  pthread_mutex_unlock(&rw_lock->mutex);
}

void rw_lock_release_read(read_write_lock *rw_lock)
{
  pthread_mutex_lock(&rw_lock->mutex);
  if(--rw_lock->read_ctr == 0) { /* no one is reading now; writer please hurry! */
    pthread_cond_broadcast(&rw_lock->cv);
  }
  pthread_mutex_unlock(&rw_lock->mutex);
}

void rw_lock_release_write(read_write_lock *rw_lock)
{
  pthread_mutex_lock(&rw_lock->mutex);
  rw_lock->write_ctr = 0;
  pthread_cond_broadcast(&rw_lock->cv); /* writer is done; readers in... */
  pthread_mutex_unlock(&rw_lock->mutex);
}

What is deadlock and what are the 4 conditions that creates a deadlock?

  1. Mutual exclusion condition: a resource that cannot be used by more than one process at a time
  2. Hold and wait condition: processes already holding resources may request new resources
  3. No preemption condition: No resource can be forcibly removed from a process holding it, resources can be released only by the explicit action of the process
  4. Circular wait condition: two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds

Memory is an array R[1..n]. And a Block is essentially

Memory is an array R[1..n]. And a Block is essentially all memory between two indexes i, and j. Now, each application uses some blocks. And blocks can be contained within one another or can be disjoint, but they cannot be intersecting otherwise. So in this scenario, write an algorithm to lock or unlock a block. if a block is locked, none of its child blocks should be allowed to be locked and none of its parent blocks should be allowed to be locked.

Store the blocks in the form of a n-ary tree structure. ( i.e. every sub block of a block stored as a child).

1.Initially a block( say Block1) is locked. 2.Next when another block( say Block2) needs to be locked, find the common ancestor(say Block3) of Block1 and Block2.

  1. If (Block3==Block1) or (Block3==Block2) then dont grant lock to Block2. else grant lock to Block2.

What are the implications of doing a fork()

What are the implications of doing a fork() from one of the threads of a multi-threaded process? Does child process get all threads of parent process?

In linux, no, the "parent thread" that generated child process was obtained by child process, ignoring other threads in parent process.

Implement a semaphore using a mutex

class Semaphore()
{
private:
  innerValue;
  Mutex mutex;

public:
  Semaphore(int _v)
  {
    innerValue=_v;
  };

  void post()
  {
    mutex.lock();
    ++innerValue;
    mutex.unlock();
  };

  void wait()
  {
    mutex.lock();
    while(innerValue<=0)
    {
      mutex.unlock;
      sleep();
      mutex.lock();
    }

    mutex.unlock();
  };
}

different between exec and fork.

Fork : The fork call basically makes a duplicate of the current process, identical in almost every way (not everything is copied over, for example, resource limits in some implementations but the idea is to create as close a copy as possible).

The new process (child) gets a different process ID (PID) and has the the PID of the old process (parent) as its parent PID (PPID). Because the two processes are now running exactly the same code, they can tell which is which by the return code of fork - the child gets 0, the parent gets the PID of the child. This is all, of course, assuming the fork call works - if not, no child is created and the parent gets an error code.

Exec : The exec call is a way to basically replace the entire current process with a new program. It loads the program into the current process space and runs it from the entry point. exec() replaces the current process with a the executable pointed by the function. Control never returns to the original program unless there is an exec() error.

context switching

asked simple question about context switching, advantage/disadvantage of process vs thread etc

A process contains two kinds of information: resources that are available to the entire process such as program instructions, global data and working directory, and schedulable entities, which include program counters and stacks. A thread is an entity within a process that consists of the schedulable part of the process.

A thread (an example of a lightweight process) which has the following properties:

  1. Resources and data can be shared between threads
  2. Each thread has its own stack
  3. Context switching is fast

spinlock vs mutex

The Theory

In theory, when a thread tries to lock a mutex and it does not succeed, because the mutex is already locked, it will go to sleep, immediately allowing another thread to run. It will continue to sleep until being woken up, which will be the case once the mutex is being unlocked by whatever thread was holding the lock before. When a thread tries to lock a spinlock and it does not succeed, it will continuously re-try locking it, until it finally succeeds; thus it will not allow another thread to take its place (however, the operating system will forcefully switch to another thread, once the CPU runtime quantum of the current thread has been exceeded, of course).

The Problem

The problem with mutexes is that putting threads to sleep and waking them up again are both rather expensive operations, they'll need quite a lot of CPU instructions and thus also take some time. If now the mutex was only locked for a very short amount of time, the time spent in putting a thread to sleep and waking it up again might exceed the time the thread has actually slept by far and it might even exceed the time the thread would have wasted by constantly polling on a spinlock. On the other hand, polling on a spinlock will constantly waste CPU time and if the lock is held for a longer amount of time, this will waste a lot more CPU time and it would have been much better if the thread was sleeping instead.

The Solution

Using spinlocks on a single-core/single-CPU system makes usually no sense, since as long as the spinlock polling is blocking the only available CPU core, no other thread can run and since no other thread can run, the lock won't be unlocked either. IOW, a spinlock wastes only CPU time on those systems for no real benefit. If the thread was put to sleep instead, another thread could have ran at once, possibly unlocking the lock and then allowing the first thread to continue processing, once it woke up again.

On a multi-core/multi-CPU systems, with plenty of locks that are held for a very short amount of time only, the time wasted for constantly putting threads to sleep and waking them up again might decrease runtime performance noticeably. When using spinlocks instead, threads get the chance to take advantage of their full runtime quantum (always only blocking for a very short time period, but then immediately continue their work), leading to much higher processing throughput.

The Practice

Since very often programmers cannot know in advance if mutexes or spinlocks will be better (e.g. because the number of CPU cores of the target architecture is unknown), nor can operating systems know if a certain piece of code has been optimized for single-core or multi-core environments, most systems don't strictly distinguish between mutexes and spinlocks. In fact, most modern operating systems have hybrid mutexes and hybrid spinlocks. What does that actually mean?

A hybrid mutex behaves like a spinlock at first on a multi-core system. If a thread cannot lock the mutex, it won't be put to sleep immediately, since the mutex might get unlocked pretty soon, so instead the mutex will first behave exactly like a spinlock. Only if the lock has still not been obtained after a certain amount of time (or retries or any other measuring factor), the thread is really put to sleep. If the same system runs on a system with only a single core, the mutex will not spinlock, though, as, see above, that would not be beneficial.

A hybrid spinlock behaves like a normal spinlock at first, but to avoid wasting too much CPU time, it may have a back-off strategy. It will usually not put the thread to sleep (since you don't want that to happen when using a spinlock), but it may decide to stop the thread (either immediately or after a certain amount of time) and allow another thread to run, thus increasing chances that the spinlock is unlocked (a pure thread switch is usually less expensive than one that involves putting a thread to sleep and waking it up again later on, though not by far).

Summary

If in doubt, use mutexes, they are usually the better choice and most modern systems will allow them to spinlock for a very short amount of time, if this seems beneficial. Using spinlocks can sometimes improve performance, but only under certain conditions and the fact that you are in doubt rather tells me, that you are not working on any project currently where a spinlock might be beneficial. You might consider using your own "lock object", that can either use a spinlock or a mutex internally (e.g. this behavior could be configurable when creating such an object), initially use mutexes everywhere and if you think that using a spinlock somewhere might really help, give it a try and compare the results (e.g. using a profiler), but be sure to test both cases, a single-core and a multi-core system before you jump to conclusions (and possibly different operating systems, if your code will be cross-platform).

Spin-locks are best used when a piece of code cannot go to sleep state. Best example of such code would be interrupts request handlers.

Mutexes are best used in user space program where sleeping of process does not affect the system in catastrophic way.

thread and fork

Footnotes:

Author: Shi Shougang

Created: 2015-03-05 Thu 23:20

Emacs 24.3.1 (Org mode 8.2.10)

Validate