Thursday 31 May 2012

Linux Kernel Process


 what is the process?
A process is an instance of a computer program that is being executed. But the Processes are more than just the executing program code( often called the text section in Unix, the program has some section with in it(data section, text section, BSS section...)). Because the process also can include the set of resources such ass open files and pending signals, internal kernel data, one or more threads of execution.

Now we can see the Your process in the OS:

At Linux, you may run the command ps,
ps aux

USER             PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
reg              637   5.6 14.9  3266892 623324   ??  S     9:55上午   1:40.51 /Applications/VirtualBox.app/Contents/MacOS/../Resources/VirtualBoxVM.app/Contents




Above this, we can the process has the ID named (PID).

Create the process example.

In Linux, the fork() system calls  make the a process created by duplicating an existing one.
The process that calls fork() is the parent, whereas the new process is the child. The parent resumes execution and the child starts execution at the same place: where the call to fork() returns. The fork() system call returns from the kernel twice: once in the parent process and again in the newborn child process.

#include <unistd.h>
#include <sys/types>
#include <stdio.h>


int main()
{
  pid_t pid;
  printf("The parent calls fork!\n");
  pid = fork();
 
  if(pid < 0)
      printf("error in fork!\n")
   else if(pid == 0)
      printf("The child process, process id is %d\n", getpid());
   else
     printf("The parent process, process id is %d\n", getpid());
  return 0;
}



And the result is:
The parent calls fork!
The parent process, process id is 2377
The child process, process id is 2378
 
For the one process program, it's  amazing to run the if and the else, why it shows this?
Because All the resources owned by the parent are duplicated and the copy is given to the child.
So from the pid = fork() line, the child run the same code with the parent, so "The child process, process id is 2378" printed, the "The parent calls fork!" didn't print, the child process exec from the fork function.



What's in the process?



In the kernel stores the process use the process descriptor of  struct task_struct, which defined in the <linux/sched>.

With the process descriptor now dynamically created via the slab allocator, a new struct thread_info, was created that lives at the bottom of the stack(for stacks that grows down).
The thread_info structure is defined on x86 in <asm/thraed_info.h>.

thread_info->task is the process struct /* main task structure */.

Use the thread_info created at the bottom feature, the linux use it to store current process, because in the x86, the registers is so few to waste to store the current process address indivisibly.

static inline struct thread_info *current_thread_info(void)
{
        return (struct thread_info *)
                 (current_stack_pointer & ~(THREAD_SIZE - 1));
}

#ifdef CONFIG_4KSTACKS
#define THREAD_ORDER    0
#else
#define THREAD_ORDER    1
#endif
#define THREAD_SIZE     (PAGE_SIZE << THREAD_ORDER)
 
IF the  CONFIG_4KSTACKS defined means the STACK_SIZE is 4kb, ELSE

STACK_SIZE is 8kb.

current_stack_pointer is
register unsigned long current_stack_pointer asm("esp") __used;
means esp address .
So the 
(current_stack_pointer & ~(THREAD_SIZE - 1))   means get the current stack pointer address,
i.e. 0x01511fff, we assume the  THREAD_SIZE is 8kb, ~(THREAD_SIZE - 1) = 0xfffffe000, so we can
get the bottom of the stack is 0x0151e000. No matter stack pointer address changed, as long 
as it in the stack, it & ~(THREAD_SIZE - 1)  always get the bottom address of the stack 0x0151e000.
 
  
The Process State:

The state field of the process descriptor describes the current condition of the process. 
Each process on the system is in exactly one of five different states.This
value is represented by one of five flags:
 
 1. Task_Running. (The process is running)
 2. Task_Interruptable (The process is sleeping, it wait the signal to wake up)
 3. Task_Uninterruptable (The process is sleeping, it wait for the kernel function not the signal
, so the signal can't wake up it)
 4. __Task_Stopped (The process is not runnable, and not eligible to run. )
 5. __Task_Traced (it's for the debug.)

Kernel code often needs to change a process’s state.The preferred mechanism is using
set_task_state(task, state);

The process creation:
In the Unix system, you can use the fork() function to create the process, and you will create 
the child process of current process, has own PID, PPID, and some inherited from the parent process.
  
All resources owned by the parent are duplicated and the copy is given to the child.
This approach is naive and inefficient in that it copies much data that might otherwise be shared.Worse still, if the new process were to immediately
execute a new image, all that copying would go to waste. 
So we use the copy-on-write, it means when the child process created, it just get the pointer
to point the parent's data address, not copy the these data, until the child really use it(Write these data,
, read not cause the copy.)  So we delay the copy until the write, improve the process creation.

 
 
Thread:
 
Why the Thread produced?
  1. as times go on, people want to process can work more parallel things, we need them
    shared the memory address, and have the same resources. 
     
     2.  the thread is light weight than the process, it is quicker , easy to create, destroy. In many system,
          the thread creation time is 10-100 quicker than the process creation.
 
    3.  the thread is easy to communicate with other thread, than the process. the process can 
          not enter others address at most time.

Threads are a popular modern programming abstraction.They provide multiple threads of
execution within the same program in a shared memory address space.They can also
share open files and other resources.Threads enable concurrent programming and, on multiple
processor systems, true parallelism. 

Threads like the process shared its address with others processes, so in Linux, thread struct  is process struct,
The thread create clone:
 
clone(CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND, 0);
 
The process create clone:
 
clone(SIGCHLD, 0); 
 
 
when the thread has more things created than the process creation,
  •  FILES: file descriptor 
  •   FS: filesystem resources
  • VM: address space
the process create the own FS, FILES, VM, not clone from the parent



 
 
 
 

Monday 21 May 2012

The Java singleton pattern thread safe

what's the Singleton pattern?

the singleton pattern is a design pattern that restricts the instantiation of a class to one object. This useful when exactly one object is needed to coordinate action across the system.  (Singleton_pattern)

So How can we implement it?

For the simple implement:


public class Singleton {
        private static Singleton _instance = null;
 
        private Singleton() {   }
 
        public static Singleton getInstance() {
                if (_instance == null) {
                        _instance = new Singleton();
                }
                return _instance;
        }
 


if every time invoke the Singleton.getInstance() method, we always get the _instance in the all system, but this implementation has the problems? It's not the thread safe..
why?




So let show you:
when the TA(Thread A) enter getInstance(), because of _instance == null, So the _instance = new Singleton(), But when the TB(Thread B) also enter the getInstance() method at the same time, it also check the _instance == null (it's the null, because the TB and TA at the same time), so the _instance was created again.
The singleton pattern must be carefully constructed in multi-thread applications.

And I will provide the Java solutions for the thread safe:

1. Eager initialization

public class Singleton {
    private static Singleton _instance = new Singleton();
 
    private Singleton() {}
 
    public static Singleton getInstance() {
        return _instance;
    }
}
 
 

Why this is thread safe?
Because the _instance is the static variable, and Java guarantees that the initialization will be run before the code is accessed by ANY class.
Popular point that the _instance is constructed at the building time, so when the caller invoke getInstance() method, the _instance just at here, return is OK. NO multi-Thread problem.
But it has the "cost", if the cost of creating the instance is not too large in terms of time/resources, you can do, IF not, you may want to switch to lazy initialization.


2. Lazy initialization

why called the lazy? Because the instance will be created at the needed time, not like the Eager initialization, the instance created at building even if you not call the getInstance().
 The Lazy initialization will avoid above situation(creating the instance is not too large in terms of time/resources)

public class Singleton {
        private static Singleton _instance = null;
 
        private Singleton() {   }
 
        public static synchronized Singleton getInstance() {
                if (_instance == null) {
                        _instance = new Singleton();
                }
                return _instance;
        }
}

Because the getInstance() is the synchronized, So anytime only one thread can enter this function. But it also has a problem: For the thread unsafe situation, just the first creation.
at if _instance != null, we are not needed to lock this function, if we add the synchronized in the method, we lock this function anytime.

So the double-checked locking is coming. it is a software design pattern used to reduce the overhead of acquiring a lock at the first testing the locking criterion without actually acquiring the lock.

So maybe you redesign this function:


public class Singleton {
        private static Singleton _instance = null;
 
        private Singleton() {   }
 
        public static Singleton getInstance() { 
              if(_instance == null){
                    synchronized(this) {
                     if (_instance == null) {
                           _instance = new Singleton();
                     }

               return _instance;
        }
}




why the Double-check happened? Because when the Thread acquired the lock, but the another has done the initialization of the instance. So the Double-check happened.

But the subtle problems happened.

1. Thread A enter this function, notice the _instance is not initialized, so it obtains the lock and begins to initialize the _instance.

2.Due to the semantics of some programming languages, the code generated by the compiler is allowed to update the shared variable to point to partially constructed object.
 Popular point that Before A as finished performing the initialization, the thread B enter this function, it check _instance != null(because java call the constructor, it will updated the _instance when the memory allocate, but it not call the construct function, just has the space. ) SO if thread B use the instance, maybe the program will crash.

the problem has been fixed, the volatile keyword now ensure that multiple threads handle the singleton instance correctly.

ublic class Singleton {
        private static volatile Singleton _instance = null;
 
        private Singleton() {   }
 
        public static Singleton getInstance() { 
              if(_instance == null){
                    synchronized(this) {
                     if (_instance == null) {
                           _instance = new Singleton();
                     }

               return _instance;
        }
}


3. Initialization-on-demand holder idiom
 First let us see the implemention

public class Singleton {
        // Private constructor prevents instantiation from other classes
        private Singleton() { }
 
        /**
        * SingletonHolder is loaded on the first execution of Singleton.getInstance() 
        * or the first access to SingletonHolder.INSTANCE, not before.
        */
        private static class SingletonHolder { 
                public static final Singleton instance = new Singleton();
        }
 
        public static Singleton getInstance() {
                return SingletonHolder.instance;
        }
}

You may look this familiar, it looks like the eager initialization, but how this avoid the construct the large object.
Because it takes advantage of language guarantees about class initialization, so java don't construct the static inner class SingletonHodler at the building time. So the construction happen in the getInstance() function.
Since the class initialization phase is guaranteed by the JLS to be serial, and the initialization phase writes the static variable INSTANCE in a serial operation, so the JVM ensure SingletonHolder.instance is serial, is thread safe.


4. The Enum way

public enum Singleton {
        INSTANCE;
        public void execute (String arg) {
                //... perform operation here ...
        }
}

the second edition of his book "Effective Java" Joshua Bloch claims that "a single-element enum type is the best way to implement a singleton"[10] for any Java that supports enums. The use of an enum is very easy to implement and has no drawbacks regarding serializable objects, which have to be circumvented in the other ways.
 This approach implements the singleton by taking advantage of Java's guarantee that any enum value is instantiated only once in a Java program.Since Java enum values are globally accessible, so is the singleton. The drawback is that the enum type is somewhat inflexible. for examole, it dose not allow lazy initialization.

References Links: