The design and implementation of the 4 4 BSD operating system pdf pdf
ContentsContents Part 1 Overview 1 Chapter 1 History and Goals 3 1.1 History of the UNIX System 3 Origins 3Research UNIX 4 AT&T UNIX System III and System VOther Organizations 8 Berkeley Software Distributions 8UNIX in the World 10 1.2 BSD and Other Systems 10 The Influence of the User Community 11 1.3 Design Goals of 4BSD 12 4.2BSD Design Goals 13 4.3BSD Design Goals 14 4.4BSD Design Goals 15 1.4 Release Engineering 16 References 17 212.1 4.4BSD Facilities and the Kernel 21 The Kernel 22 2.2 Kernel Organization 23 2.3 Kernel Services 25 2.4 Process Management 26 Signals 27Process Groups and Sessions 28 2.5 Memory Management 29 BSD Memory-Management Design Decisions 29xvii Resource Limits 70Memory Management Inside the Kernel 31 Filesystem Quotas 70 2.6 I/O System 31 71 3.9 System-Operation Services Descriptors and I/O 32Accounting 71 Descriptor Management 33 Exercises 72 Devices 34 References 73 Socket IPC 35 Scatter/Gather I/O 35Multiple Filesystem Support 36 75 2.7 Filesystems 36 Part 2 Processes 2.8 Filestores 40 77 2.9 Network Filesystem 41 2.10 Terminals 42 77 4. 1 Introduction to Process Management 2.11 Interprocess Communication 43 Multiprogramming 78 2.12 Network Communication 44 Scheduling 79 2.13 Network Implementation 44 4.2 Process State 80 2.14 System Operation 45 The Process Structure 8 1 Exercises 45 The User Structure 85 References 46Context Switching 87 4.3 Process State 87
Chapter 2 Design Overview of 4.4BSD212.1 4.4BSD Facilities and the Kernel 21 The Kernel 22 2.2 Kernel Organization 23 2.3 Kernel Services 25 2.4 Process Management 26 Signals 27Process Groups and Sessions 28 2.5 Memory Management 29 BSD Memory-Management Design Decisions 29xvii Resource Limits 70Memory Management Inside the Kernel 31 Filesystem Quotas 70 2.6 I/O System 31 71 3.9 System-Operation Services Descriptors and I/O 32Accounting 71 Descriptor Management 33 Exercises 72 Devices 34 References 73 Socket IPC 35 Scatter/Gather I/O 35Multiple Filesystem Support 36 75 2.7 Filesystems 36 Part 2 Processes 2.8 Filestores 40 77 2.9 Network Filesystem 41 2.10 Terminals 42 77 4. 1 Introduction to Process Management 2.11 Interprocess Communication 43 Multiprogramming 78 2.12 Network Communication 44 Scheduling 79 2.13 Network Implementation 44 4.2 Process State 80 2.14 System Operation 45 The Process Structure 8 1 Exercises 45 The User Structure 85 References 46Context Switching 87 4.3 Process State 87 88 Chapter 3 Kernel Services
49 Low-Level Context Switching88 Voluntary Context Switching 3.1 Kernel Organization 49 Synchronization 91 System Processes 49 Process Scheduling 92 4.4 System Entry 50 93 Calculations of Process Priority Run-Time Organization 50Process-Priority Routines 95 Entry to the Kernel 52 96 Process Run Queues and Context Switching Return from the Kernel 53 Process Creation 98 4.5 3.2 System Calls 53Process Termination 99 Result Handling 54 4.6 Returning from a System Call 54Signals 100 4.7103 Comparison with POSIX Signals 3.3 Traps and Interrupts 55 Posting of a Signal 104 Traps 55Delivering a Signal 106 I/O Device Interrupts 55 107Process Groups and Sessions Software Interrupts 56 4.8 Sessions 109 3.4 Clock Interrupts 57 Job Control 110 Statistics and Process Scheduling 58 112 Timeouts 58 Process Debugging 4.9 3.5 Memory-Management Services 60Exercises 114 3.6 Timing Services 63References 116 Real Time 63117 Adjustment of the Time 63
Chapter 5 Memory Management External Representation 645.1 Terminology 117 Interval Time 64Processes and Memory 118 3.7 User, Group, and Other Identifiers 65 Paging 119 Host Identifiers 67Replacement Algorithms 120 Process Groups and Sessions 68Working-Set Model 121 3.8 Resource Services 68 Swapping 121 Process Priorities 69122 Advantages of Virtual Memory Resource Utilization 69 Hardware Requirements for Virtual Memory 122 5.2Overview of the 4.4BSD Virtual-Memory System 123 5.3 191 Kernel Memory Management 126 Part 3 I/O System Kernel Maps and Submaps 127193 Kernel Address-Space Allocation 128
Chapter 6 I/O System Overview Kernel Malloc 129193 6.1 I/O Mapping from User to Device 5.4 Per-Process Resources 132 Device Drivers 195 4.4BSD Process Virtual-Address Space 132I/O Queueing 195 Page-Fault Dispatch 134Interrupt Handling 196 Mapping to Objects 134 Block Devices 196 Objects 136 6.2 197 Entry Points for Block-Device DriversObjects to Pages 137 Sorting of Disk I/O Requests 198 5.5 Shared Memory 137 Disk Labels 199 Mmap Model 139 Character Devices 200 Shared Mapping 141 6.3 Raw Devices and Physical I/O 201 Private Mapping 142Character-Oriented Devices 202 Collapsing of Shadow Chains 144203 Entry Points for Character-Device Drivers Private Snapshots 145205 Descriptor Management and Services 5.6 Creation of a New Process 6.4146 Open File Entries 205 Reserving Kernel Resources 147Management of Descriptors 207 Duplication of the User Address Space 148File-Descriptor Locking 209 Creation of a New Process Without Copying 149Multiplexing I/O on Descriptors 211 5.7 Execution of a File 150 Implementation of Select 213 5.8 Process Manipulation of Its Address Space 151216 Movement of Data Inside the Kernel Change of Process Size 151 218The Virtual-Filesystem Interface File Mapping 152 6.5 Contents of a Vnode 219 Change of Protection 154Vnode Operations 220 5.9 Termination of a Process 154Pathname Translation 222 5.10 The Pager Interface 156Exported Filesystem Services 222 Vnode Pager 157 223Filesystem-Independent Services 6.6 Device Pager 159 The Name Cache 225 Swap Pager 160Buffer Management 226 5.11 Paging 162 229 Implementation of Buffer Management 5.12 Page Replacement 166 Stackable Filesystems 231 6.7 Paging Parameters 168Simple Filesystem Layers 234 The Pageout Daemon 169The Union Mount Filesystem 235 Swapping 171Other Filesystems 237 The Swap-In Process 172 Exercises 238 5.13 Portability 173References 240 The Role of the pmap Module 176 Initialization and Startup 179241
Chapter 7 Local Filesystems Mapping Allocation and Deallocation 181241 Change of Access and Wiring Attributes for Mappings 7.1 Hierarchical Filesystem Management 184 Management of Page-Usage Information 185 7.2 Structure of an Inode 243 Initialization of Physical Pages 186Inode Management 245 Management of Internal Data Structures 186 7.3 Naming 247Exercises 187 Directories 247 249 References 188 Finding of Names in Directories Pathname Translation 249RPC Transport Issues 322 Links 251Security Issues 324 9.3 Techniques for Improving Performance 325 7.4 Quotas 253 7.5 File Locking 257 Leases 328 Crash Recovery 332 7.6 Other Filesystem Semantics 262 Large File Sizes 262 Exercises 333 File Flags 263 References 334Exercises 264 337 10.1 Terminal-Processing Modes 338 10.2 User Interface 340 10.3 8.1 Overview of the Filestore 265The tty Structure 342 10.4 8.2 The Berkeley Fast Filesystem 269 343 Process Groups, Sessions, and Terminal Control 10.5 Organization of the Berkeley Fast Filesystem 269 C-lists 344 Optimization of Storage Utilization 271 10.6 RS-232 and Modem Control 346 Reading and Writing to a File 273 10.7 Filesystem Parameterization 275Terminal Operations 347 10.8 Layout Policies 276 Open 347 Allocation Mechanisms 277Output Line Discipline 347 Block Clustering 281Output Top Half 349 Synchronous Operations 284 Output Bottom Half 350 Input Bottom Half 351 8.3 The Log-Structured Filesystem 285 Input Top Half 352 Organization of the Log-Structured Filesystem286 The stop Routine 353 Index File 288The ioctl Routine 353 Reading of the Log 290 Modem Transitions 354 Writing to the Log 291Closing of Terminal Devices 355 Block Accounting 292 The Buffer Cache 294 Other Line Disciplines 355 10.9 Directory Operations 295 Serial Line IP Discipline 356 Creation of a File 296Graphics Tablet Discipline 356 Reading and Writing to a File 297 Exercises 357 Filesystem Cleaning 297 References 357 Filesystem Parameterization 300 Filesystem-Crash Recovery 300359 8.4 The Memory-Based Filesystem 302 Part 4 Interprocess Communication Organization of the Memory-Based Filesystem303 361Filesystem Performance 305 11.1 Interprocess-Communication ModelExercises 306 Use of Sockets 364 References 307 368 11.2 Implementation Structure and Overview 11.3 Memory Management 369 372 Storage-Management Algorithms 9.1 History and Overview 311 Mbuf Utility Routines 373 9.2 NFS Structure and Operation 314 The NFS Protocol 316 11.4 Data Structures 374 The 4.4BSD NFS Implementation 318Communication Domains 375 Client-Server Interactions 321 Sockets 376 Socket Addresses 378435
Chapter 13 Network Protocols11.5 Connection Setup 380 11.6 Data Transfer 382 13.1 Internet Network Protocols 436 Transmitting Data 383Internet Addresses 437 Receiving Data 385Subnets 438 Passing Access Rights 388Broadcast Addresses 441 Passing Access Rights in the Local Domain 389Internet Multicast 441 Internet Ports and Associations 442 11.7 Socket Shutdown 390 Protocol Control Blocks 442 Exercises 391 13.2 User Datagram Protocol (UDP) 443References 393 Initialization 443 Output 444
Chapter 12 Network Communication 395 Input 44512.1 Internal Structure 396 Control Operations 446 Data Flow 397 13.3 Internet Protocol (IP) 446 Communication Protocols 398Output 447 Network Interfaces 400Input 448 12.2 Socket-to-Protocol Interface 405 Forwarding 449 Protocol User-Request Routine 405 13.4 Transmission Control Protocol (TCP) 451 Internal Requests 409TCP Connection States 453 Protocol Control-Output Routine 409Sequence Variables 456 12.3 Protocol-Protocol Interface 410 13.5 TCP Algorithms 457 pr_output411 Timers 459pr_input 411Estimation of Round-Trip Time 460pr_ctlinput 411Connection Establishment 461 12.4 Interface between Protocol and Network Interface412 Connection Shutdown 463 Packet Transmission 412 13.6 TCP Input Processing 464 Packet Reception 413 13.7 TCP Output Processing 468 12.5 Routing 416 Sending of Data 468 Kernel Routing Tables 417Avoidance of the Silly-Window Syndrome 469 Routing Lookup 420Avoidance of Small Packets 470 Routing Redirects 423Delayed Acknowledgments and Window Updates 471 Routing-Table Interface 424Retransmit State 472 User-Level Routing Policies 425 Slow Start 472 User-Level Routing Interface: Routing Socket 425Source-Quench Processing 474 12.6 Buffering and Congestion Control 426 Buffer and Window Sizing 474 Protocol Buffering Policies 427Avoidance of Congestion with Slow Start 475 Queue Limiting 427Fast Retransmission 476 12.7 Raw Sockets 428 13.8 Internet Control Message Protocol (ICMP) 477 Control Blocks 428 13.9 OSI Implementation Issues 478 Input Processing 429 480 13.10 Summary of Networking and Interprocess Communication Output Processing 429Creation of a Communication Channel 481 12.8 Additional Network-Subsystem Topics 429 Sending and Receiving of Data 482 Out-of-Band Data 430Termination of Data Transmission or Reception 483 Address Resolution Protocol 430 Exercises 484Exercises 432 References 486References 433 Part 5 System Operation 489 Chapter 14 System Startup 491 14.1 Overview 491 14.2 Bootstrapping 492 The boot Program 492Overview 14.3 Kernel Initialization 493 Assembly-Language Startup 494 Machine-Dependent Initialization 495Message Buffer 495 System Data Structures 496 14.4 Autoconfiguration 496 Device Probing 498 Device Attachment 499New Autoconfiguration Data Structures 499 New Autoconfiguration Functions 501Device Naming 501 14.5 Machine-Independent Initialization 502 14.6 User-Level Initialization 505 505/sbin/init/etc/re 505 506/usr/libexec/getty 506/usr/bin/login 14.7 System-Startup Topics 507 Kernel Configuration 507 System Shutdown and Autoreboot 507System Debugging 508 Passage of Information To and From the Kernel 509 Exercises 511References 511 Glossary513 Index551 C H A P T E R History and Goals
1.1 History of the UNIX System
Although numerous organizations have contributed(and still contribute) to the development of the UNIX system, this book will pri- marily concentrate on the BSD thread of development: Origins The first version of the UNIX system was developed at Bell Laboratories in 1969 by Ken Thompson as a private research project to use an otherwise idle PDP-7. The original elegant design of the system [Ritchie, 1978] and developments of the past 15 years [Ritchie, 1984a; Compton, 1985] have made the UNIX system an important and powerful operating system [Ritchie,1987].
The basic organization of the UNIX filesystem, the1969 First Edition idea of using a user process for the command interpreter, the general organization of the filesystem interface, and many other system characteristics, come directlyfrom Multics.1973 Fifth Edition Ideas from various other operating systems, such as the Massachusetts Insti- tute of Technology's (MIT's) CTSS, also have been incorporated. Some of the systems named in the figure are not mentioned in the text, but are included to show more clearly the relations among the ones that we shall examine.
4.4 BSD °
The networking implementa-tion was general enough to communicate among diverse network facilities, rang- ing from local networks, such as Ethernets and token rings, to long-haul networks, The reason for the large virtual-memory space of 3BSD was the development ofwhat at the time were large programs, such as Berkeley's Franz LISP. The ease with which the UNIX system can be modified has led to development work at numerous organizations, including the Rand Corporation, which isresponsible for the Rand ports mentioned in Chapter 11; Bolt Beranek and New- man (BBN), who produced the direct ancestor of the 4.2BSD networking imple-mentation discussed in Chapter 13; the University of Illinois, which did earlier networking work; Harvard; Purdue; and Digital Equipment Corporation (DEC).
1.2 BSD and Other Systems
In addition to contributions that were included in the distribu- tions proper, the CSRG also distributed a set of user-contributed software.applications were developed that used these libraries, including the screen-based An example of a community-developed facility is the public-domain time- editor vizone-handling package that was adopted with the 4.3BSD Tahoe release. Many new facilities were designed for inclusion in 4.2BSD.zone package into BSD brought the technology to the attention of commercial ven- These facilities included a completely revised virtual-memory system to support dors, such as Sun Microsystems, causing them to incorporate it into their systems.processes with large sparse address space, a much higher-speed filesystem, inter- Berkeley solicited electronic mail about bugs and the proposed fixes.
1.3 Design Goals of 4BSD tive authority
This new technologyseparate identities that were not hidden from the users.included Because of time constraints, the system that was released as 4.2BSD did not include all the facilities that were originally intended to be included. The new processors and to larger machines.virtual-memory system provides algorithms that are better suited to the large memories currently available, and is much less dependent on the VAX architecture.
4.3 BSD Design Goals
A flag was added in 4.3BSD to the system interface has been generalized to make the operation set dynamically extensiblecall that establishes a signal handler to allow a process to request the 4.1 BSD and to allow filesystems to be stacked. Protocol modules may deal with This book was reviewed for technical accuracy by a similar process.multiplexing of data from different connections onto a single transport medium, as Many of the primary beta-test personnel not only had copies of the release well as with demultiplexing of data for different protocols and connectionsrunning on their own machines, but also had login accounts on the development received from each network device.
1.4 Release Engineering
Thesuch that the output of the first process is the input of the second, the output of the demand paging and swapping done by the system are effectively transparent to second is the input of the third, and so forth. If multiple processes mapped the same file into their address spaces,job-control shell may create a number of process groups associated with the same changes to the file's portion of an address space by one process would be reflectedterminal; the terminal is the controlling terminal for each process in these groups.in the area mapped by the other processes, as well as in the file itself.
2.9 Define scatter-gather I/O. Why is it useful?
2.12 What is the difference between a pipe and a socket? 2.10 What is the difference between a block and a character device?
CHAPTER 3 Kernel Services
Traps and Interrupts Traps If a process has requested that the system do profiling, the handler also calcu- lates the amount of time that has been spent in the system call, i.e., the systemtime accounted to the process between the latter's entry into and exit from the handler. As an optimization, if the state of the processor is suchfunction and that the softclock( ) execution will occur as soon as the hardclock interrupt returns, /(x) argumenthardclock( ) simply lowers the processor priority and calls softclock( ) directly, 40ms 850ms when 10ms 40ms OxFFFOOOOO place an event when that event is added to the queue than to scan the entire queue per-process to determine which events should occur at any single time.
3.5 Memory-Management Services
If the UIDs do not match, but the GID of the file matches one of the GIDs of the maintained by the softclock() routine (see Section 3.4).process, only the group permissions apply; the owner and other permissions The profiling timer decrements both in process virtual time (when running in are not checked.user mode) and when the system is running on behalf of the process. Many factors other than nice affect scheduling, including the could modify the state of terminals that were trusted by another user's processes.amount of CPU time that the process has used recently, the amount of memory that A session is a collection of process groups, and all members of a process group the process has used recently, and the current load on the system.
Section 4. 1 Introdaction to Process management
The localization of the context of a process in the latter's user structure permits the kernel to do context switching simply by changing the notion of the current userstructure and process structure, and restoring the context described by the PCB within the user structure (including the mapping of the virtual address space). When a process no longer needs the resource, it clears the that a dirty buffer be written to disk and then waiting for the I/O to complete.locked flag and, if the wanted flag is set, invokes wakeupO to awaken all the pro- When the I/O finishes, the process will awaken and will check for an empty buffer.cesses that called sleep 0 to await access to the resource.
For example, in addition to the natural desire of users to give files long descriptive names, a common way of forming filenamesis as basename.extension, where the extension (indicating the kind of file, such as.c for C source or .o for intermediate binary object) is one to three characters, leaving 10 to 12 characters for the basename. Domains currently supported include the local domain, for communication between processes executing on the same machine; the internet domain, for com-munication between processes using the TCP/IP protocol suite (perhaps within theInternet); the ISO/OSI protocol family for communication between sites required to run them; and the XNS domain, for communication between processes using the Interprocess Communication Interprocess communication in 4.4BSD is organized in communication domains.
The top half of the kernel executes in a privileged execu- same priority, or the ELXSI, where there are no interrupts in the traditional sense.tion mode, in which it has access both to kernel data structures and to the context Activities in the bottom half of the kernel are asynchronous, with respect to of user-level processes. As an optimization, if the state of the processor is suchfunction and that the softclock( ) execution will occur as soon as the hardclock interrupt returns, /(x) argumenthardclock( )simply lowers the processor priority and calls softclock( ) directly, 40ms 850ms when 10ms 40ms OxFFFOOOOO place an event when that event is added to the queue than to scan the entire queue per-process to determine which events should occur at any single time.
3.5 Memory-Management Ser vices
As a guard against a filesystem running out of space because of unchecked growth of the accounting file, the system suspends account-In addition to limits on the size of individual files, the kernel optionally enforces ing when the filesystem is reduced to only 2 percent remaining free space.limits on the total amount of space that a user or group can use on a filesystem. An exception to this rule is real-time scheduling, which must ensure that processes finish by a specified deadline or in a particular order; the Section 4.1 nel state includes parameters to the current system call, the current process's user Context switching is done frequently, so increasing the speed of a context switch noticeably decreases time spent in the kernel and provides more time forexecution of user applications.
Section 4. 3 Context Switching
Thus, the parent doing the wait will awaken independent of which child processThe localization of the context of a process in the latter's user structure permits the is the first to exit.kernel to do context switching simply by changing the notion of the current user that will never be awakened. the run queue if it is not swapped out of main memory; if the process has been swapped out, the swapin process will be awakened to load it back into memory(see Section 5.12); if the process is in a SSTOP state, it is left on the queue until it is explicitly restarted by a user-level process, either by a ptrace systemcall or by a continue signal (see Section 4.7) 2.
1. Removes the process from the sleep queue
3. Makes the process runnable if it is in a SSLEEP state, and places the process on
If a process uses up the time quantum (or time slice) allowedit, it is placed at the end of the queue from which it came, and the process at the front of the queue is selected to run. When a process other than the currently running process attains a higher priority (by having that priority either assigned orgiven when it is awakened), the system switches to that process immediately if the current process is in user mode.
4.4 Process Scheduling BSD uses a process-scheduling algorithm based on multilevel feedback queues
The decay causes about 90 percent ofthe influence of p_nice; also, the load used is the current load average, rather than the CPU usage accumulated in a 1-second interval to be forgotten over a period ofthe load average at the time that the process blocked.time that is dependent on the system load average. The roundrobin() routine runs 10 times per second and causes the systemTo understand the effect of the decay filter, we can consider the case where a to reschedule the processes in the highest-priority (nonempty) queue in a round-single compute-bound process monopolizes the CPU.
4.5 Process Creation
There is also a vfork system call that differs from fork in how the virtual-memory resources are The process created by a fork is termed a child process of the original parent From a user's point of view, the child process is an exact duplicate of the fork returns the child PID to the parent and zero to the child process. The vm_fork() routine is passed a pointer to the initialized process The second step is intimately related to the operation of the memory-management structure for the child process and is expected to allocate all the resources that thefacilities described in Chapter 5.
But that cause a process to stop, the process is placed in an SSTOP state,when a signal forces a process to be stopped, the action can be carried out when and the parent process is notified of the state change by a SIGCHLD sig-the signal is posted.nal being posted to it. In theory, posting a sig- call to setrunnable( ) nal to a process simply causes the appropriate signal to be added to the set of pending signals for the process, and the process is then set to run (or is awakened SSTOP The process is stopped by a signal or because it is being debugged.
1. Producing a core dump
Outside the kernel, a shell manipulates a job by sending signals to the job's process group with the killpg system call, which deliv-terminal, and, if the TOSTOP option is set for the terminal, if they attempt to write ers a signal to all the processes in a process group. The SIGTTOU signal is sent to back- process such as a job-control shell exits, each process group that it created will ground jobs that attempt an ioctl system call that would alter the state of the become an orphaned process group: a process group in which no member has a Management parent that is a member of the same session but of a different process group.
4.9 Process Debugging
Finally, since each request by a par-ent process must be done in the context of the child process, two context switches need to be done for each request—one from the parent to the child to send therequest, and one from the child to the parent to return the result of the operation. For compatibility, 4.4BSD still supports the sbrk call that malloc() uses to provide —^vnode / object vnode / objectvnode / object — ^ vm_pagevm_page vm_pagevm_page vm_page Kernel Maps and Submaps senting the physical-memory cache of the object, as well as a pointer to theWhen the system boots, the first task that the kernel must do is to set up data struc- pager_struct structure that contains information on how to page in or page out tures to describe and manage its address space.
5.3 Kernel Memory Management
Here, requested is the sum of the memory that has been requested and not yet freed; required is the amount of memory that has been allocated for the pool fromwhich the requests are filled. The allocator can be called both from the top half of the kernel that is willing to wait for memory to become available, and from the interrupt routines in the bot-tom half of the kernel that cannot wait for memory to become available.
5.4 Per-Process Resources As we have already seen, a process requires a process entry and a kernel stack
These requirements include the space needed for the program text, the initialized data, the uninitialized data, and the run-time stack. During the initialstartup of the program, the kernel will build the data structures necessary to describe these four areas.
4.4 BSD Process Vir tual-Addr ess Space
The executable consists of two parts: the text of the pro- more detail later in this section.gram that resides at the beginning of the file and the initialized data area that page may be mapped into multiple address spaces, but it is always claimed byWhen a process attempts to access a piece of its address space that is not currentlyexactly one object resident, a page fault occurs. In cases where it is notthe same page of a file in more than one memory page, so that all mappings will appropriate to modify the file directly, such as an executable that does not want toget a consistent view of the file.modify its initialized data pages, the kernel must interpose an anonymous shadow An object stores the following information: object between the vm_map_entry and the object representing the file.
5.6 Creation of a New Process
Fork is the only way that new processes are created Excluded from this limit are those parts of the address space that are mapped in 4.4BSD (except for its variant, vfork, which is described in the last subsection ofread-only, such as the program text. Any pages that are being used for a read-only this section). Process Manipulation of Its Address Space No further operations are needed to create a new address space during an exec system call; the remainder of the work comprises copying the arguments and envi-ronment out to the top of the new stack.
5.7 Execution of a FileThe first step of enlarging a process's size is to check whether the new sizewould violate the size limit for the process segment involved. If the new size is inrange, the following steps are taken to enlarge the data area: case:existing:new: becomes:
1. Verify that the virtual-memory resources are available
If These two operations are complicated because the kernel stack in the user area this request is the first mapping of an object, then the kernel checks the objectmust be used until the process relinquishes the processor for the final time.cache to see whether a previous instance of the object still exists. However, if the object maps a file (suchread permission.as an executable) that might be used again soon, the object is saved in the object The kernel implements the mprotect system call by finding the existing cache, where it can be found by newly executing processes or by processes map- vm_map_entry structure or structures that cover the region specified by the call.
5.9 Termination of a Process
The final change in process state that relates to the operation of the virtual-mem- With all its resources free, the exiting process finishes detaching itself from its ory system is exit', this system call terminates a process, as described in Chapter 4.process group, and notifies its parent that it is done. On wait,the system just returns the process status to the caller, and deallocates the process-table entry and the small amount of space in which the resource-usage information was kept.
5.10 The Pager Interface
Since the fault code has no special knowledge of the device pager, it has preallocated a physical-memory page to fill and has associated that vm_page structure is added to the list of all such allocated pages in the device- The first access to a device page will cause a page fault and will invoke the device-pager page-get routine. One approach is to have the filesystem use objects in the virtual-memory system as its cache; a second approach is to havethe virtual-memory objects that map files use the existing filesystem cache; the third approach is to create a new cache that is a merger of the two existing caches,and to convert both the virtual memory and the filesystems to use this new cache.
When a program in a process to awaken, retry the fault, and find the page created by the first pro-demand-paged format is first run, the kernel marks as invalid the pages for the text cess.and initialized data regions of the executing process. If the search has reached the end of the object list and has not found the page, vm_map_entry list of the faulting process to find the entry associated with the then the fault is on an anonymous object chain, and the first object in the list fault.
5.12 Page Replacement
As it is with the for-through cache writes the data back to main memory at the same time as it is writ- ward-mapped page table, a hardware TLB is typically used to speed the translationing to the cache, forcing the CPU to wait for the memory access to conclude. The primary benefit is that thisdescribe the translation and access rights of a single page is often referred to as a approach isolates the pmap module's memory needs from those of the rest of the mapping or mapping structure.system and limits the pmap module's dependencies on other parts of the system.
CHAPTER 6 I/O System Overview I/O Mapping from User to Device Computers store and retrieve data through supporting peripheral I/O devices. These devices typically include mass-storage devices, such as moving-head disk
On receiving a device interrupt, the systemwhen the first write to a disk block is made, and later writes to part of the same invokes the appropriate device-driver interrupt service routine with one or moreblock are then likely to require only copying into the kernel buffer, and no disk I/O.parameters that identify uniquely the device that requires service. Thus, the kernel does not need to read them from the disk every time that The system arranges for the unit-number parameter to be passed to the inter- they are requested.rupt service routine for each device by installing the address of an auxiliary glue If the system crashes while data for a particular block are in the cache but routine in the interrupt-vector table.
6.2 Block Devices
The kernel provides a generic disksort() routine that can be used by all the disk device drivers to sort I/O requests into a drive's request queue using an elevator Sorting of Disk I/O Requests the bootstrap procedure to calculate the location at which a crash dump should be placed and to determine the sizes of the swap devices. Here, the vendor label must be placed where the hardware bootstrap ROM code expects it; the 4.4BSD label must be placed outof the way of the vendor label but within the area that is read in by the hardware bootstrap code, so that it will be available to the second-level bootstrap.
6.3 Character Devices
The file descriptor is used by the kernel to index into the descrip- oriented devices, a read request is passed immediately to the terminal tor table for the current process (kept in the filedesc structure, a substructure of the driver. For other devices, a read request requires that the specified data beprocess structure for the process) to locate a file entry, or file structure. The kernel Although it is possible to work around all these problems, the solutions are not implementation is the same as for dup, except that the scan to find a free entry isstraightforward, so a mechanism for locking files was added in 4.2BSD.changed to close the requested entry if that entry is open, and then to allocate the The most general locking schemes allow multiple processes to update a file entry as before.
4.4 BSD system, programs with superuser privilege are allowed to override any
Conversely, if the process reads from the remote benefit of this approach is that the process does a single system call to specifyend when there are no data for the screen, it will block and will be unable to read the set of descriptors, then loops doing only reads [Accetta et al, 1986].from the terminal. If the top-level select i f ( nr ead > 0 | | ( t p- >t _s t a t e & TS_CARR_ON) == 0) code finds that the selecting flag for the process has been cleared while it has r e t ur n ( 1) ; been doing the polling, and it has not found any descriptors that are ready to s e l r e c or d( c ur pr oc , &t p- >t _r s e l ) ; do an operation, then the top level knows that the polling results are incom- r e t ur n ( 0) ; plete and must be repeated starting at step 2.
1. The socket or device select entry is called with flag of FREAD, FWRITE, or 0
2. The poll returns success if the requested operation is possible. In Fig. 6.5, it is select by a process that is no longer selecting because one of its other descrip-
Eventually, the uio structure reaches the part Movement of Data Inside the Kernel of the kernel responsible for moving the data to or from the process address space:Within the kernel, I/O data are described by an array of vectors. Finally, the bottom half of the kernel cannot do I/O, because it does reading and/or writing that reference the vnode, the number of file entries thatThe one part of the 4.4BSD kernel that does not use uio structures is the are open for writing that reference the vnode, and the number of pages andblock-device drivers.
6.5 The Vir tual-Filesystem Interface operations are described in the next subsection
About 60 percent of the traditional Filesystems may opt to have unsupported operations filled in with either a default filesystem code became the name-space management, and the remaining 40 per-routine (typically a routine to bypass the operation to the next lower layer; see cent became the code implementing the on-disk file storage. The naming schemeSection 6.7), or a routine that returns the characteristic error "operation not sup- and its vnode operations are described in Chapter 7. There are a few instances where a When the final file-entry reference to a file is closed, the usage count on the filesystem will know that there are no further mount points in the remaining path,vnode drops to zero and the vnode interface calls the inactive() vnode operation.and will traverse the rest of the pathname.