CS111 Spring 2009
Modularity and virtualization
April 06, 2009
Julio Ceballos, Lorena Topete, Cody Prestwood
MODULARITY - Split the program into pieces
ABSTRACTION - use "nice" boundaries between pieces
L lines of code and M modules
Problems with out paranoid word-counter program
A. Performance issues
1. cache answer
2. so IO in parallel with computation
* will help a bit for out OS but can be better
B. Flexibility
-Most Important ( common needs in Software Engineering)
1. to much of a pain to change // ex: recompile and install
2. to much of a pain to reuse parts
-Least Important
3. to much of a pain to run several programs at the same time
4. to much of a pain to recover from crash
C. Complexity
1. Coping techniques
* Modularity - break programs into pieces
* Abstraction - use "nice" boundaries between pieces
2. Modularity-----> Find bugs Faster
-assume L lines of codes and B bugs
- assume B is proportional to L
- assume time to find bugs is proportional to L
-debug time: is proportional to B*L which is proportional to L
*Now assume M modules
-now Debug time is proportional to
B/M * L/M * M => (L*L)/M
* this is assuming bugs don't cross module boundaries, and modules are the same size,
and bugs are evenly distributed
* Obvious which Module owns the bug
* Modularity can greatly help you find the bugs in your code
* API = application programing interface
Ex: Modularity Example: Rewriting read_ide_sector
void read_ide_sector (int s, int a);
^ ide is only good fro reading ide Disks
void read_sector (int s, int a); *higher level of abstraction
void read_sector (int s, off_t s, int a);
^ ^ ^ address
^ ^ Sector number
^ Disk number
ssize_t read_bytes(int s, off_t o, char* a, size_t n);
^ ^ ^ ^ address
^ ^ ^ offset from Disk start, in bytes
^ ^ Disk number
^ signed, returns # of bytes read or - number for error
OFFICIAL UNIX
ssize_t read(int d, char* a, size_t n);
off_t lseek( int d, off_t o, int whence);
^ ^ ^
^ ^ ^ 0 from beginning; 1 from current; 2 from end
^ ^ signed
^ returns new seek location
in Unix:
ssize_t read(int d, char *a, size_t n)
-always read from last seek position
off_t lseek(int d, off_t o, int whence)
o - is positive or negative
off_t is signed int
whence: 0 from beginning
1 current position
2 from end
Unix design:
1. Advantage - can share lseek with read/write functions
2. Disadvantage - Takes more time if application does
random access
Abstraction of function
read_ide_sector(int s, int a)
s-sector
a-returned result
change to: read_sector(int d, int s, int a)
d-disk ID to support multiple disks
generalize interface: read_sector(int d, off_t s, char *a)
-provides 64-bit support for disks larger that 4Gig
s-offset into disk
a-pointer to receiving buffer
1. OS needs to maintain mapping between device and
sectors. So device number and secotr number are
separate arguments
2. Assume secotr is 512 bytes and only one sector per
read
Positives and Negatives
+ many apps don't care about current seek location
+ share lseek with read and write
+ encourages sequential reads
- take more time if apps generally do Random Access
**How To enforce Modularity**
1. Do Nothing – use a single program, global variables, no functions
2. Function call modularity where functions are a black box
a. Put in BIOS and the program is called from the BIOS
b. Main is in RAM instead of BIOS
c. Stack has call – this approach has a problem
Partition of memory
-------------------------------------------------------------
| BootRecord | Main | |Stack| | BIOS |
-------------------------------------------------------------
Caller and Callee Contract
** Because the callee can potentially overwrite the caller's memory,
a contract exists between the caller and callee preventing this.
Example: recursive factorial function
int fact (int n)
{
If(n==0)
return 1;
else
return n* fact (n-1);
}
In x86 code:
fact:
pushl %ebp # push caller's frame Pointer
movl $1, %eax # move constant 1 into ax reg. ( this is preparing the register to return 1)
movl %esp, %ebf # move sp into fp
subl $8, %esp # allocate our frame
movl %ebx, -4 (%ebp) # save caller's bx
movl 8(%ebp), %ebx # bx: =n
tstl %ebx, %ebx # bx=0
jne L5 # Jump if not zero
.L1:
movl -4(%ebp), %ebx # restore bx
movl %ebp, %esp # restore sp
popl %ebp #restore bp
ret #restore ip
Now let's look at the caller
.L5:
leal -1(%ebx), %eax # ax =bx-1
movl %eax, %esp # store n-1 as argument
call fact # push "*+1" into Stack, go to fact
imull %ebx, %eax # ax * =n
jmp .L1
Let's practice:
what happens if we type: fact (6);
pushl $6 # push arg into stack
call fact
addl $4, %esp # discard the 6 we pushed in stack
# 4 because a word is 4 bytes long
We can san that we have a contract between the caller & callee.
Caller must:
top of stack =ra
2nd of stack =n
(ip) p.counter = fact
Callee must:
put ans into ax
POP stack
Put result in ip
What can go wrong with function call modularity?
callee: step into caller local variables to mess the program up
i.e movl $0, 12(%esp) # set caller's r. a. to 0
i.e movl $0, 0(ebp) # set caller's f p to 0
i.e callee can loop
i.e subl $ 1000000000, %esp # stack overflow
i.e callee can halt
Here, we have SOFT MODULARITY
Upside :
It is fast
Downside:
Does not prevent bugs
We often want HARD MODULARITY because with this, failure in one module doesn't leak into the other.
i.e write an x86 simulator and run callee inside it.
HARD Modularity
Upside:
This is robust and safe
Downside:
It is very slow
What we want is a virtualizable processor that can do the following:
-full speed on normal instructions
-slow down only when it checks anything weird
-gets control on any "bad" instruction
-executes some software to execute this bad instruction