CS111 Spring 2009

Modularity and virtualization

April 06, 2009

Julio Ceballos, Lorena Topete, Cody Prestwood

MODULARITY - Split the program into pieces

ABSTRACTION - use "nice" boundaries between pieces
L lines of code and M modules

Problems with out paranoid word-counter program

A. Performance issues

1. cache answer

2. so IO in parallel with computation

* will help a bit for out OS but can be better

B. Flexibility

-Most Important ( common needs in Software Engineering)

1. to much of a pain to change // ex: recompile and install

2. to much of a pain to reuse parts

-Least Important

3. to much of a pain to run several programs at the same time

4. to much of a pain to recover from crash

C. Complexity

1. Coping techniques

* Modularity - break programs into pieces

* Abstraction - use "nice" boundaries between pieces

2. Modularity-----> Find bugs Faster

-assume L lines of codes and B bugs

- assume B is proportional to L

- assume time to find bugs is proportional to L

-debug time: is proportional to B*L which is proportional to L

*Now assume M modules

-now Debug time is proportional to

B/M * L/M * M => (L*L)/M

* this is assuming bugs don't cross module boundaries, and modules are the same size,

and bugs are evenly distributed

* Obvious which Module owns the bug

* Modularity can greatly help you find the bugs in your code

* API = application programing interface

Ex: Modularity Example: Rewriting read_ide_sector

void read_ide_sector (int s, int a);

^ ide is only good fro reading ide Disks

void read_sector (int s, int a); *higher level of abstraction

void read_sector (int s, off_t s, int a);

^ ^ ^ address

^ ^ Sector number

^ Disk number

ssize_t read_bytes(int s, off_t o, char* a, size_t n);

^ ^ ^ ^ address

^ ^ ^ offset from Disk start, in bytes

^ ^ Disk number

^ signed, returns # of bytes read or - number for error

OFFICIAL UNIX

ssize_t read(int d, char* a, size_t n);

off_t lseek( int d, off_t o, int whence);

^ ^ ^

^ ^ ^ 0 from beginning; 1 from current; 2 from end

^ ^ signed

^ returns new seek location

in Unix:

ssize_t read(int d, char *a, size_t n)

-always read from last seek position

off_t lseek(int d, off_t o, int whence)

o - is positive or negative

off_t is signed int

whence: 0 from beginning

1 current position

2 from end

Unix design:

1. Advantage - can share lseek with read/write functions

2. Disadvantage - Takes more time if application does

random access

Abstraction of function

read_ide_sector(int s, int a)

s-sector

a-returned result

change to: read_sector(int d, int s, int a)

d-disk ID to support multiple disks

generalize interface: read_sector(int d, off_t s, char *a)

-provides 64-bit support for disks larger that 4Gig

s-offset into disk

a-pointer to receiving buffer

1. OS needs to maintain mapping between device and

sectors. So device number and secotr number are

separate arguments

2. Assume secotr is 512 bytes and only one sector per

read

Positives and Negatives

+ many apps don't care about current seek location

+ share lseek with read and write

+ encourages sequential reads

- take more time if apps generally do Random Access

**How To enforce Modularity**

1. Do Nothing – use a single program, global variables, no functions

2. Function call modularity where functions are a black box

a. Put in BIOS and the program is called from the BIOS

b. Main is in RAM instead of BIOS

c. Stack has call – this approach has a problem

Partition of memory

-------------------------------------------------------------

-------------------------------------------------------------

Caller and Callee Contract

** Because the callee can potentially overwrite the caller's memory,

a contract exists between the caller and callee preventing this.

Example: recursive factorial function

int fact (int n)

{

If(n==0)

return 1;

else

return n* fact (n-1);

}

In x86 code:

fact:

pushl %ebp # push caller's frame Pointer

movl $1, %eax # move constant 1 into ax reg. ( this is preparing the register to return 1)

movl %esp, %ebf # move sp into fp

subl $8, %esp # allocate our frame

movl %ebx, -4 (%ebp) # save caller's bx

movl 8(%ebp), %ebx # bx: =n

tstl %ebx, %ebx # bx=0

jne L5 # Jump if not zero

.L1:

movl -4(%ebp), %ebx # restore bx

movl %ebp, %esp # restore sp

popl %ebp #restore bp

ret #restore ip

Now let's look at the caller

.L5:

leal -1(%ebx), %eax # ax =bx-1

movl %eax, %esp # store n-1 as argument

call fact # push "*+1" into Stack, go to fact

imull %ebx, %eax # ax * =n

jmp .L1

Let's practice:

what happens if we type: fact (6);

pushl $6 # push arg into stack

call fact

addl $4, %esp # discard the 6 we pushed in stack

# 4 because a word is 4 bytes long

We can san that we have a contract between the caller & callee.

Caller must:

top of stack =ra

2nd of stack =n

(ip) p.counter = fact

Callee must:

put ans into ax

POP stack

Put result in ip

What can go wrong with function call modularity?

callee: step into caller local variables to mess the program up

i.e movl $0, 12(%esp) # set caller's r. a. to 0

i.e movl $0, 0(ebp) # set caller's f p to 0

i.e callee can loop

i.e subl $ 1000000000, %esp # stack overflow

i.e callee can halt

Here, we have SOFT MODULARITY

Upside :

It is fast

Downside:

Does not prevent bugs

We often want HARD MODULARITY because with this, failure in one module doesn't leak into the other.

i.e write an x86 simulator and run callee inside it.

HARD Modularity

Upside:

This is robust and safe

Downside:

It is very slow

What we want is a virtualizable processor that can do the following:

-full speed on normal instructions

-slow down only when it checks anything weird

-gets control on any "bad" instruction

-executes some software to execute this bad instruction