Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring...
| draft-ietf-nfsv4-minorversion1-22.txt | draft-ietf-nfsv4-minorversion1-23.txt | |||
|---|---|---|---|---|
| NFSv4 S. Shepler | NFSv4 S. Shepler | |||
| Internet-Draft M. Eisler | Internet-Draft M. Eisler | |||
| Intended status: Standards Track D. Noveck | Intended status: Standards Track D. Noveck | |||
| Expires: November 2, 2008 Editors | Expires: November 10, 2008 Editors | |||
| May 1, 2008 | May 9, 2008 | |||
| NFS Version 4 Minor Version 1 | NFS Version 4 Minor Version 1 | |||
| draft-ietf-nfsv4-minorversion1-22.txt | draft-ietf-nfsv4-minorversion1-23.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 35 | skipping to change at page 1, line 35 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on November 2, 2008. | This Internet-Draft will expire on November 10, 2008. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The IETF Trust (2008). | Copyright (C) The IETF Trust (2008). | |||
| Abstract | Abstract | |||
| This Internet-Draft describes NFS version 4 minor version one, | This Internet-Draft describes NFS version 4 minor version one, | |||
| including features retained from the base protocol and protocol | including features retained from the base protocol and protocol | |||
| extensions made subsequently. Major extensions introduced in NFS | extensions made subsequently. Major extensions introduced in NFS | |||
| skipping to change at page 4, line 26 | skipping to change at page 4, line 26 | |||
| 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 | 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 | |||
| 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 | 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 | |||
| 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 | 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 | |||
| 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 | 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 | |||
| 7.8. Security Policy and Namespace Presentation . . . . . . . 148 | 7.8. Security Policy and Namespace Presentation . . . . . . . 148 | |||
| 8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 | 8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 | |||
| 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 | 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 | |||
| 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 | 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 | |||
| 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 | 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 | |||
| 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 | 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 | |||
| 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 153 | 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 154 | |||
| 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 | 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 | |||
| 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 | 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 | |||
| 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 | 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 | |||
| 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 | 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 | |||
| 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 | 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 | |||
| 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 | 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 | |||
| 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 162 | 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 163 | |||
| 8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 | 8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 | |||
| 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 | 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 | |||
| 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 | 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 | |||
| 8.7. Clocks, Propagation Delay, and Calculating Lease | 8.7. Clocks, Propagation Delay, and Calculating Lease | |||
| Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 | Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 | |||
| 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 | 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 | |||
| 9. File Locking and Share Reservations . . . . . . . . . . . . . 174 | 9. File Locking and Share Reservations . . . . . . . . . . . . . 174 | |||
| 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 | 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 | |||
| 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 | 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 | |||
| 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 | 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 | |||
| skipping to change at page 6, line 30 | skipping to change at page 6, line 30 | |||
| 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 | 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 | |||
| 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 | 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 | |||
| 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 | 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 | |||
| 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 | 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 | |||
| 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 | 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 | |||
| 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 | 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 | |||
| 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 | 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 | |||
| 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 | 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 | |||
| 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 | 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 | |||
| 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 267 | 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 267 | |||
| 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268 | 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 267 | |||
| 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 | 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 | |||
| 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 | 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 268 | |||
| 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 | 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 | |||
| 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 | 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 | |||
| 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 | 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 | |||
| 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 | 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 | |||
| 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 | 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 | |||
| 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 | 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 | |||
| 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 | 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 | |||
| 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 | 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 | |||
| 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275 | 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275 | |||
| 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 | 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 278 | |||
| 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 | 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 | |||
| 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 | 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 | |||
| 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 | 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 | |||
| 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 | 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 | |||
| 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 | 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 | |||
| 12.7.2. Dealing with Lease Expiration on the Client . . . . 290 | 12.7.2. Dealing with Lease Expiration on the Client . . . . 289 | |||
| 12.7.3. Dealing with Loss of Layout State on the Metadata | 12.7.3. Dealing with Loss of Layout State on the Metadata | |||
| Server . . . . . . . . . . . . . . . . . . . . . . . 291 | Server . . . . . . . . . . . . . . . . . . . . . . . 290 | |||
| 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 | 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 | |||
| 12.7.5. Operations During Metadata Server Grace Period . . . 293 | 12.7.5. Operations During Metadata Server Grace Period . . . 293 | |||
| 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294 | 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 293 | |||
| 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 | 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 | |||
| 12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 | 12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 | |||
| 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 | 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 | |||
| 13.1. Client ID and Session Considerations . . . . . . . . . . 296 | 13.1. Client ID and Session Considerations . . . . . . . . . . 295 | |||
| 13.1.1. Sessions Considerations for Data Servers . . . . . . 298 | 13.1.1. Sessions Considerations for Data Servers . . . . . . 297 | |||
| 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 | 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 | |||
| 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 | 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 | |||
| 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 | 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 | |||
| 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 | 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 | |||
| 13.4.2. Interpreting the File Layout Using Sparse Packing . 303 | 13.4.2. Interpreting the File Layout Using Sparse Packing . 303 | |||
| 13.4.3. Interpreting the File Layout Using Dense Packing . . 306 | 13.4.3. Interpreting the File Layout Using Dense Packing . . 305 | |||
| 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 | 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 | |||
| 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310 | 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 309 | |||
| 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311 | 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 310 | |||
| 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313 | 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313 | |||
| 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 315 | 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 314 | |||
| 13.9. Metadata and Data Server State Coordination . . . . . . 315 | 13.9. Metadata and Data Server State Coordination . . . . . . 314 | |||
| 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 315 | 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 314 | |||
| 13.9.2. Data Server State Propagation . . . . . . . . . . . 316 | 13.9.2. Data Server State Propagation . . . . . . . . . . . 315 | |||
| 13.10. Data Server Component File Size . . . . . . . . . . . . 318 | 13.10. Data Server Component File Size . . . . . . . . . . . . 317 | |||
| 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 319 | 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 318 | |||
| 13.12. Security Considerations for the File Layout Type . . . . 319 | 13.12. Security Considerations for the File Layout Type . . . . 319 | |||
| 14. Internationalization . . . . . . . . . . . . . . . . . . . . 320 | 14. Internationalization . . . . . . . . . . . . . . . . . . . . 320 | |||
| 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321 | 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321 | |||
| 14.2. Stringprep profile for the utf8str_cis type . . . . . . 323 | 14.2. Stringprep profile for the utf8str_cis type . . . . . . 322 | |||
| 14.3. Stringprep profile for the utf8str_mixed type . . . . . 324 | 14.3. Stringprep profile for the utf8str_mixed type . . . . . 324 | |||
| 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 326 | 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 325 | |||
| 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 326 | 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 325 | |||
| 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 327 | 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 326 | |||
| 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 327 | 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 326 | |||
| 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 329 | 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 328 | |||
| 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 331 | 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 330 | |||
| 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332 | 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332 | |||
| 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 334 | 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 333 | |||
| 15.1.5. State Management Errors . . . . . . . . . . . . . . 336 | 15.1.5. State Management Errors . . . . . . . . . . . . . . 335 | |||
| 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 337 | 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 336 | |||
| 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337 | 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337 | |||
| 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 338 | 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 337 | |||
| 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339 | 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339 | |||
| 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 340 | 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 339 | |||
| 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341 | 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341 | |||
| 15.1.12. Session Management Errors . . . . . . . . . . . . . 343 | 15.1.12. Session Management Errors . . . . . . . . . . . . . 342 | |||
| 15.1.13. Client Management Errors . . . . . . . . . . . . . . 343 | 15.1.13. Client Management Errors . . . . . . . . . . . . . . 342 | |||
| 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 344 | 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 343 | |||
| 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344 | 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344 | |||
| 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 345 | 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 344 | |||
| 15.2. Operations and their valid errors . . . . . . . . . . . 346 | 15.2. Operations and their valid errors . . . . . . . . . . . 345 | |||
| 15.3. Callback operations and their valid errors . . . . . . . 362 | 15.3. Callback operations and their valid errors . . . . . . . 361 | |||
| 15.4. Errors and the operations that use them . . . . . . . . 364 | 15.4. Errors and the operations that use them . . . . . . . . 363 | |||
| 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 378 | 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 377 | |||
| 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 378 | 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 377 | |||
| 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 379 | 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 378 | |||
| 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 390 | 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 389 | |||
| 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 393 | 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 392 | |||
| 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 393 | 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 392 | |||
| 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 399 | 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 398 | |||
| 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 400 | 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 399 | |||
| 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 403 | 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 402 | |||
| 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting | |||
| Recovery . . . . . . . . . . . . . . . . . . . . . . . . 406 | Recovery . . . . . . . . . . . . . . . . . . . . . . . . 405 | |||
| 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 407 | 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 406 | |||
| 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 407 | 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 406 | |||
| 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 409 | 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 408 | |||
| 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 410 | 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 409 | |||
| 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 413 | 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 412 | |||
| 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 417 | 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 416 | |||
| 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 418 | 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 417 | |||
| 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 420 | 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 419 | |||
| 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 421 | 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 420 | |||
| 18.15. Operation 17: NVERIFY - Verify Difference in | 18.15. Operation 17: NVERIFY - Verify Difference in | |||
| Attributes . . . . . . . . . . . . . . . . . . . . . . . 423 | Attributes . . . . . . . . . . . . . . . . . . . . . . . 422 | |||
| 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 424 | 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 423 | |||
| 18.17. Operation 19: OPENATTR - Open Named Attribute | 18.17. Operation 19: OPENATTR - Open Named Attribute | |||
| Directory . . . . . . . . . . . . . . . . . . . . . . . 443 | Directory . . . . . . . . . . . . . . . . . . . . . . . 442 | |||
| 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 444 | 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 443 | |||
| 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 446 | 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 445 | |||
| 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 446 | 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 445 | |||
| 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 448 | 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 447 | |||
| 18.22. Operation 25: READ - Read from File . . . . . . . . . . 449 | 18.22. Operation 25: READ - Read from File . . . . . . . . . . 448 | |||
| 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 451 | 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 450 | |||
| 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 455 | 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 454 | |||
| 18.25. Operation 28: REMOVE - Remove File System Object . . . . 456 | 18.25. Operation 28: REMOVE - Remove File System Object . . . . 455 | |||
| 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 458 | 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 457 | |||
| 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 462 | 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 461 | |||
| 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 463 | 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 462 | |||
| 18.29. Operation 33: SECINFO - Obtain Available Security . . . 464 | 18.29. Operation 33: SECINFO - Obtain Available Security . . . 463 | |||
| 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 468 | 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 467 | |||
| 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 471 | 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 470 | |||
| 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 472 | 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 471 | |||
| 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 476 | 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 475 | |||
| 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 478 | 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 477 | |||
| 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 481 | 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 480 | |||
| 18.36. Operation 43: CREATE_SESSION - Create New Session and | 18.36. Operation 43: CREATE_SESSION - Create New Session and | |||
| Confirm Client ID . . . . . . . . . . . . . . . . . . . 498 | Confirm Client ID . . . . . . . . . . . . . . . . . . . 497 | |||
| 18.37. Operation 44: DESTROY_SESSION - Destroy existing | 18.37. Operation 44: DESTROY_SESSION - Destroy existing | |||
| session . . . . . . . . . . . . . . . . . . . . . . . . 508 | session . . . . . . . . . . . . . . . . . . . . . . . . 507 | |||
| 18.38. Operation 45: FREE_STATEID - Free stateid with no | 18.38. Operation 45: FREE_STATEID - Free stateid with no | |||
| locks . . . . . . . . . . . . . . . . . . . . . . . . . 509 | locks . . . . . . . . . . . . . . . . . . . . . . . . . 508 | |||
| 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory | |||
| delegation . . . . . . . . . . . . . . . . . . . . . . . 510 | delegation . . . . . . . . . . . . . . . . . . . . . . . 509 | |||
| 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 | 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 513 | |||
| 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings | |||
| for a File System . . . . . . . . . . . . . . . . . . . 516 | for a File System . . . . . . . . . . . . . . . . . . . 515 | |||
| 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using | 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using | |||
| a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 | a layout . . . . . . . . . . . . . . . . . . . . . . . . 517 | |||
| 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 | 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 520 | |||
| 18.44. Operation 51: LAYOUTRETURN - Release Layout | 18.44. Operation 51: LAYOUTRETURN - Release Layout | |||
| Information . . . . . . . . . . . . . . . . . . . . . . 526 | Information . . . . . . . . . . . . . . . . . . . . . . 530 | |||
| 18.45. Operation 52: SECINFO_NO_NAME - Get Security on | 18.45. Operation 52: SECINFO_NO_NAME - Get Security on | |||
| Unnamed Object . . . . . . . . . . . . . . . . . . . . . 530 | Unnamed Object . . . . . . . . . . . . . . . . . . . . . 534 | |||
| 18.46. Operation 53: SEQUENCE - Supply per-procedure | 18.46. Operation 53: SEQUENCE - Supply per-procedure | |||
| sequencing and control . . . . . . . . . . . . . . . . . 531 | sequencing and control . . . . . . . . . . . . . . . . . 536 | |||
| 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 537 | 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 541 | |||
| 18.48. Operation 55: TEST_STATEID - Test stateids for | 18.48. Operation 55: TEST_STATEID - Test stateids for | |||
| validity . . . . . . . . . . . . . . . . . . . . . . . . 539 | validity . . . . . . . . . . . . . . . . . . . . . . . . 543 | |||
| 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 541 | 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 545 | |||
| 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing | 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing | |||
| client ID . . . . . . . . . . . . . . . . . . . . . . . 545 | client ID . . . . . . . . . . . . . . . . . . . . . . . 549 | |||
| 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims | |||
| Finished . . . . . . . . . . . . . . . . . . . . . . . . 545 | Finished . . . . . . . . . . . . . . . . . . . . . . . . 549 | |||
| 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 548 | 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 552 | |||
| 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 548 | 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 552 | |||
| 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 549 | 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 553 | |||
| 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 549 | 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 553 | |||
| 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 553 | 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 557 | |||
| 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 553 | 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 557 | |||
| 20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 554 | 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 558 | |||
| 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from | |||
| Client . . . . . . . . . . . . . . . . . . . . . . . . . 555 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 559 | |||
| 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 559 | 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 563 | |||
| 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to | 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to | |||
| Client . . . . . . . . . . . . . . . . . . . . . . . . . 563 | Client . . . . . . . . . . . . . . . . . . . . . . . . . 567 | |||
| 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 564 | 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable | |||
| objects . . . . . . . . . . . . . . . . . . . . . . . . 568 | ||||
| 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal | |||
| Resources for Recallable Objects . . . . . . . . . . . . 566 | Resources for Recallable Objects . . . . . . . . . . . . 571 | |||
| 20.8. Operation 10: CB_RECALL_SLOT - change flow control | 20.8. Operation 10: CB_RECALL_SLOT - change flow control | |||
| limits . . . . . . . . . . . . . . . . . . . . . . . . . 567 | limits . . . . . . . . . . . . . . . . . . . . . . . . . 572 | |||
| 20.9. Operation 11: CB_SEQUENCE - Supply backchannel | 20.9. Operation 11: CB_SEQUENCE - Supply backchannel | |||
| sequencing and control . . . . . . . . . . . . . . . . . 568 | sequencing and control . . . . . . . . . . . . . . . . . 573 | |||
| 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending | |||
| Delegation Wants . . . . . . . . . . . . . . . . . . . . 570 | Delegation Wants . . . . . . . . . . . . . . . . . . . . 575 | |||
| 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible | 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible | |||
| lock availability . . . . . . . . . . . . . . . . . . . 571 | lock availability . . . . . . . . . . . . . . . . . . . 576 | |||
| 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID | 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID | |||
| changes . . . . . . . . . . . . . . . . . . . . . . . . 573 | changes . . . . . . . . . . . . . . . . . . . . . . . . 578 | |||
| 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback | |||
| Operation . . . . . . . . . . . . . . . . . . . . . . . 575 | Operation . . . . . . . . . . . . . . . . . . . . . . . 580 | |||
| 21. Security Considerations . . . . . . . . . . . . . . . . . . . 575 | 21. Security Considerations . . . . . . . . . . . . . . . . . . . 580 | |||
| 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 577 | 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 582 | |||
| 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 577 | 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 582 | |||
| 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 577 | 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 582 | |||
| 22.3. Defining New Notifications . . . . . . . . . . . . . . . 578 | 22.3. Defining New Notifications . . . . . . . . . . . . . . . 583 | |||
| 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 578 | 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 583 | |||
| 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 580 | 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 585 | |||
| 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 580 | 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 585 | |||
| 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 580 | 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 585 | |||
| 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 580 | 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 585 | |||
| 23.1. Normative References . . . . . . . . . . . . . . . . . . 580 | 23.1. Normative References . . . . . . . . . . . . . . . . . . 585 | |||
| 23.2. Informative References . . . . . . . . . . . . . . . . . 582 | 23.2. Informative References . . . . . . . . . . . . . . . . . 587 | |||
| Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 584 | Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 589 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 586 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 591 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . 587 | Intellectual Property and Copyright Statements . . . . . . . . . 592 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1. The NFS Version 4 Minor Version 1 Protocol | 1.1. The NFS Version 4 Minor Version 1 Protocol | |||
| The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | The NFS version 4 minor version 1 (NFSv4.1) protocol is the second | |||
| minor version of the NFS version 4 (NFSv4) protocol. The first minor | minor version of the NFS version 4 (NFSv4) protocol. The first minor | |||
| version, NFSv4.0 is described in [21]. It generally follows the | version, NFSv4.0 is described in [21]. It generally follows the | |||
| guidelines for minor versioning model listed in Section 10 of RFC | guidelines for minor versioning model listed in Section 10 of RFC | |||
| 3530. However, it diverges from guidelines 11 ("a client and server | 3530. However, it diverges from guidelines 11 ("a client and server | |||
| skipping to change at page 26, line 12 | skipping to change at page 26, line 12 | |||
| information to distinguish the client from other user level | information to distinguish the client from other user level | |||
| clients running on the same host, such as a process identifier or | clients running on the same host, such as a process identifier or | |||
| other unique sequence. | other unique sequence. | |||
| The client ID is assigned by the server (the eir_clientid result from | The client ID is assigned by the server (the eir_clientid result from | |||
| EXCHANGE_ID) and should be chosen so that it will not conflict with a | EXCHANGE_ID) and should be chosen so that it will not conflict with a | |||
| client ID previously assigned by the server. This applies across | client ID previously assigned by the server. This applies across | |||
| server restarts. | server restarts. | |||
| In the event of a server restart, a client may find out that its | In the event of a server restart, a client may find out that its | |||
| current client ID is no longer valid when it receives a | current client ID is no longer valid when it receives an | |||
| NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on | NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on | |||
| the characteristics of the sessions involved, specifically whether | the characteristics of the sessions involved, specifically whether | |||
| the session is persistent (see Section 2.10.5.5), but in each case | the session is persistent (see Section 2.10.5.5), but in each case | |||
| the client will receive this error when it attempts to establish a | the client will receive this error when it attempts to establish a | |||
| new session with the existing client ID and receives the error | new session with the existing client ID and receives the error | |||
| NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be | NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be | |||
| obtained via EXCHANGE_ID and the new session established with that | obtained via EXCHANGE_ID and the new session established with that | |||
| client ID. | client ID. | |||
| When a session is not persistent, the client will find out that it | When a session is not persistent, the client will find out that it | |||
| skipping to change at page 46, line 7 | skipping to change at page 46, line 7 | |||
| two different EXCHANGE_ID requests, and the eir_clientid, | two different EXCHANGE_ID requests, and the eir_clientid, | |||
| eir_server_owner.so_major_id, and eir_server_scope results match | eir_server_owner.so_major_id, and eir_server_scope results match | |||
| in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id | in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id | |||
| results do not match then the client is permitted to perform | results do not match then the client is permitted to perform | |||
| client ID trunking. The client can associate each connection with | client ID trunking. The client can associate each connection with | |||
| different sessions, where each session is associated with the same | different sessions, where each session is associated with the same | |||
| server. | server. | |||
| Of course, even if the eir_server_owner.so_minor_id fields do | Of course, even if the eir_server_owner.so_minor_id fields do | |||
| match, the client is free to employ client ID trunking instead of | match, the client is free to employ client ID trunking instead of | |||
| sessiond trunking. | session trunking. | |||
| The client completes the act of client ID trunking by invoking | The client completes the act of client ID trunking by invoking | |||
| CREATE_SESSION on each connection, using the same client ID that | CREATE_SESSION on each connection, using the same client ID that | |||
| was returned in eir_clientid. These invocations create two | was returned in eir_clientid. These invocations create two | |||
| sessions and also associate each connection with each session. | sessions and also associate each connection with each session. | |||
| When doing client ID trunking, locking state is shared across | When doing client ID trunking, locking state is shared across | |||
| sessions associated with the same client ID. This requires the | sessions associated with the same client ID. This requires the | |||
| server to coordinate state across sessions. | server to coordinate state across sessions. | |||
| skipping to change at page 51, line 37 | skipping to change at page 51, line 37 | |||
| CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is | CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is | |||
| needed for correct operation to match the reply to the request. | needed for correct operation to match the reply to the request. | |||
| o The SEQUENCE or CB_SEQUENCE operation may generate an error. If | o The SEQUENCE or CB_SEQUENCE operation may generate an error. If | |||
| so, the embedded slot id, sequence id, and sessionid (if present) | so, the embedded slot id, sequence id, and sessionid (if present) | |||
| in the request will not be in the reply, and the requester has | in the request will not be in the reply, and the requester has | |||
| only the XID to match the reply to the request. | only the XID to match the reply to the request. | |||
| Given that well formulated XIDs continue to be required, this begs | Given that well formulated XIDs continue to be required, this begs | |||
| the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, | the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, | |||
| slot id and sequence id? Having the sessionid in the reply means the | slot id and sequence id? Having the session id in the reply means | |||
| requester does not have to use the XID to lookup the sessionid, which | the requester does not have to use the XID to lookup the session id, | |||
| would be necessary if the connection were associated with multiple | which would be necessary if the connection were associated with | |||
| sessions. Having the slot id and sequence id in the reply means | multiple sessions. Having the slot id and sequence id in the reply | |||
| requester does not have to use the XID to lookup the slot id and | means requester does not have to use the XID to lookup the slot id | |||
| sequence id. Furhermore, since the XID is only 32 bits, it is too | and sequence id. Furhermore, since the XID is only 32 bits, it is | |||
| small to guarantee the re-association of a reply with its request | too small to guarantee the re-association of a reply with its request | |||
| ([27]); having sessionid, slot id, and sequence id in the reply | ([27]); having sessionid, slot id, and sequence id in the reply | |||
| allows the client to validate that the reply in fact belongs to the | allows the client to validate that the reply in fact belongs to the | |||
| matched request. | matched request. | |||
| The SEQUENCE (and CB_SEQUENCE) operation also carries a | The SEQUENCE (and CB_SEQUENCE) operation also carries a | |||
| "highest_slotid" value which carries additional requester slot usage | "highest_slotid" value which carries additional requester slot usage | |||
| information. The requester must always indicate the slot id | information. The requester must always indicate the slot id | |||
| representing the outstanding request with the highest-numbered slot | representing the outstanding request with the highest-numbered slot | |||
| value. The requester should in all cases provide the most | value. The requester should in all cases provide the most | |||
| conservative value possible, although it can be increased somewhat | conservative value possible, although it can be increased somewhat | |||
| skipping to change at page 53, line 30 | skipping to change at page 53, line 30 | |||
| entries at least as large as the old value of maximum requests | entries at least as large as the old value of maximum requests | |||
| outstanding, until it can infer that the requester has seen a | outstanding, until it can infer that the requester has seen a | |||
| reply containing the new granted highest_slotid. The replier can | reply containing the new granted highest_slotid. The replier can | |||
| infer that requester as seen such a reply when it receives a new | infer that requester as seen such a reply when it receives a new | |||
| request with the same slotid as the request replied to and the | request with the same slotid as the request replied to and the | |||
| next higher sequenceid. | next higher sequenceid. | |||
| 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies | 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies | |||
| When a SEQUENCE or CB_SEQUENCE operation is successfully executed, | When a SEQUENCE or CB_SEQUENCE operation is successfully executed, | |||
| its reply MUST always be cached. Specifically, sessionid, | its reply MUST always be cached. Specifically, session id, sequence | |||
| sequenceid, and slotid MUST be cached in the reply cache. The reply | id, and slot id MUST be cached in the reply cache. The reply from | |||
| from SEQUENCE also includes the highest slotid, target highest | SEQUENCE also includes the highest slot id, target highest slot id, | |||
| slotid, and status flags. Instead of caching these values, the | and status flags. Instead of caching these values, the server MAY | |||
| server MAY re-compute the values from the current state of the fore | re-compute the values from the current state of the fore channel, | |||
| channel, session and/or client ID as appropriate. Similarly, the | session and/or client ID as appropriate. Similarly, the reply from | |||
| reply from CB_SEQUENCE includes a highest slotid and target highest | CB_SEQUENCE includes a highest slot id and target highest slot id. | |||
| slotid. The client MAY re-compute the values from the current state | The client MAY re-compute the values from the current state of the | |||
| of the session as appropriate. | session as appropriate. | |||
| Regardless of whether a replier is re-computing highest slotid, | Regardless of whether a replier is re-computing highest slotid, | |||
| target slotid, and status on replies to retries or not, the requester | target slot id, and status on replies to retries or not, the | |||
| MUST NOT assume the values are being re-computed whenever it receives | requester MUST NOT assume the values are being re-computed whenever | |||
| a reply after a retry is sent, since it has no way of knowing whether | it receives a reply after a retry is sent, since it has no way of | |||
| the reply it has received was sent by the server in response to the | knowing whether the reply it has received was sent by the server in | |||
| retry, or is a delayed response to the original request. Therefore, | response to the retry, or is a delayed response to the original | |||
| it may be the case that highest slotid, target slotid, or status bits | request. Therefore, it may be the case that highest slot id, target | |||
| may reflect the state of affairs when the request was first executed. | slot id, or status bits may reflect the state of affairs when the | |||
| Although acting based on such delayed information is valid, it may | request was first executed. Although acting based on such delayed | |||
| cause the receiver to do unneeded work. Requesters MAY choose to | information is valid, it may cause the receiver to do unneeded work. | |||
| send additional requests to get the current state of affairs or use | Requesters MAY choose to send additional requests to get the current | |||
| the state of affairs reported by subsequent requests, in preference | state of affairs or use the state of affairs reported by subsequent | |||
| to acting immediately on data which may be out of date. | requests, in preference to acting immediately on data which may be | |||
| out of date. | ||||
| 2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE | 2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE | |||
| Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of | Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of | |||
| the slot MUST NOT change. The replier MUST NOT modify the reply | the slot MUST NOT change. The replier MUST NOT modify the reply | |||
| cache entry for the slot whenever an error is returned from SEQUENCE | cache entry for the slot whenever an error is returned from SEQUENCE | |||
| or CB_SEQUENCE. | or CB_SEQUENCE. | |||
| 2.10.5.1.3. Optional Reply Caching | 2.10.5.1.3. Optional Reply Caching | |||
| skipping to change at page 56, line 19 | skipping to change at page 56, line 19 | |||
| client may have been granted a delegation to a file it has opened, | client may have been granted a delegation to a file it has opened, | |||
| but the reply to the OPEN (informing the client of the granting of | but the reply to the OPEN (informing the client of the granting of | |||
| the delegation) may be delayed in the network. If a conflicting | the delegation) may be delayed in the network. If a conflicting | |||
| operation arrives at the server, it will recall the delegation using | operation arrives at the server, it will recall the delegation using | |||
| the backchannel, which may be on a different transport connection, | the backchannel, which may be on a different transport connection, | |||
| perhaps even a different network, or even a different session | perhaps even a different network, or even a different session | |||
| associated with the same client ID | associated with the same client ID | |||
| The presence of a session between client and server alleviates this | The presence of a session between client and server alleviates this | |||
| issue. When a session is in place, each client request is uniquely | issue. When a session is in place, each client request is uniquely | |||
| identified by its { sessionid, slot id, sequence id } triple. By the | identified by its { session id, slot id, sequence id } triple. By | |||
| rules under which slot entries (reply cache entries) are retired, the | the rules under which slot entries (reply cache entries) are retired, | |||
| server has knowledge whether the client has "seen" each of the | the server has knowledge whether the client has "seen" each of the | |||
| server's replies. The server can therefore provide sufficient | server's replies. The server can therefore provide sufficient | |||
| information to the client to allow it to disambiguate between an | information to the client to allow it to disambiguate between an | |||
| erroneous or conflicting callback race condition. | erroneous or conflicting callback race condition. | |||
| For each client operation which might result in some sort of server | For each client operation which might result in some sort of server | |||
| callback, the server SHOULD "remember" the { sessionid, slot id, | callback, the server SHOULD "remember" the { sessionid, slot id, | |||
| sequence id } triple of the client request until the slot id | sequence id } triple of the client request until the slot id | |||
| retirement rules allow the server to determine that the client has, | retirement rules allow the server to determine that the client has, | |||
| in fact, seen the server's reply. Until the time the { sessionid, | in fact, seen the server's reply. Until the time the { sessionid, | |||
| slot id, sequence id } request triple can be retired, any recalls of | slot id, sequence id } request triple can be retired, any recalls of | |||
| the associated object MUST carry an array of these referring | the associated object MUST carry an array of these referring | |||
| identifiers (in the CB_SEQUENCE operation's arguments), for the | identifiers (in the CB_SEQUENCE operation's arguments), for the | |||
| benefit of the client. After this time, it is not necessary for the | benefit of the client. After this time, it is not necessary for the | |||
| server to provide this information in related callbacks, since it is | server to provide this information in related callbacks, since it is | |||
| certain that a race condition can no longer occur. | certain that a race condition can no longer occur. | |||
| The CB_SEQUENCE operation which begins each server callback carries a | The CB_SEQUENCE operation which begins each server callback carries a | |||
| list of "referring" { sessionid, slot id, sequence id } triples. If | list of "referring" { sessionid, slot id, sequence id } triples. If | |||
| the client finds the request corresponding to the referring | the client finds the request corresponding to the referring session | |||
| sessionid, slot id and sequence id to be currently outstanding (i.e. | id, slot id and sequence id to be currently outstanding (i.e. the | |||
| the server's reply has not been seen by the client), it can determine | server's reply has not been seen by the client), it can determine | |||
| that the callback has raced the reply, and act accordingly. If the | that the callback has raced the reply, and act accordingly. If the | |||
| client does not find the request corresponding the referring triple | client does not find the request corresponding the referring triple | |||
| to be outstanding (including the case of a sessionid referring to a | to be outstanding (including the case of a sessionid referring to a | |||
| destroyed session), then there is no race with respect to this | destroyed session), then there is no race with respect to this | |||
| triple. The server SHOULD limit the referring triples to requests | triple. The server SHOULD limit the referring triples to requests | |||
| that refer to just those that apply to the objects referred to in the | that refer to just those that apply to the objects referred to in the | |||
| CB_COMPOUND procedure. | CB_COMPOUND procedure. | |||
| The client must not simply wait forever for the expected server reply | The client must not simply wait forever for the expected server reply | |||
| to arrive before responding to the CB_COMPOUND that won the race, | to arrive before responding to the CB_COMPOUND that won the race, | |||
| skipping to change at page 57, line 28 | skipping to change at page 57, line 28 | |||
| back), the client and server negotiate the maximum sized request they | back), the client and server negotiate the maximum sized request they | |||
| will send or process (ca_maxrequestsize), the maximum sized reply | will send or process (ca_maxrequestsize), the maximum sized reply | |||
| they will return or process (ca_maxresponsesize), and the maximum | they will return or process (ca_maxresponsesize), and the maximum | |||
| sized reply they will store in the reply cache | sized reply they will store in the reply cache | |||
| (ca_maxresponsesize_cached). | (ca_maxresponsesize_cached). | |||
| If a request exceeds ca_maxrequestsize, the reply will have the | If a request exceeds ca_maxrequestsize, the reply will have the | |||
| status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG | status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG | |||
| as the status for first operation (SEQUENCE or CB_SEQUENCE) in the | as the status for first operation (SEQUENCE or CB_SEQUENCE) in the | |||
| request (which means no operations in the request executed, and the | request (which means no operations in the request executed, and the | |||
| state of the slot in the reply cache is unchanged), or it MAY chose | state of the slot in the reply cache is unchanged), or it MAY opt to | |||
| to return it on a subsequent operation in the same COMPOUND or | return it on a subsequent operation in the same COMPOUND or | |||
| CB_COMPOUND request (which means at least one operation did execute | CB_COMPOUND request (which means at least one operation did execute | |||
| and the state of the slot in reply cache does change). The replier | and the state of the slot in reply cache does change). The replier | |||
| SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds | SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds | |||
| ca_maxrequestsize. | ca_maxrequestsize. | |||
| If a reply exceeds ca_maxresponsesize, the reply will have the status | If a reply exceeds ca_maxresponsesize, the reply will have the status | |||
| NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the | NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the | |||
| status for first operation (SEQUENCE or CB_SEQUENCE) in the request, | status for first operation (SEQUENCE or CB_SEQUENCE) in the request, | |||
| or it MAY chose to return it on a subsequent operation (in the same | or it MAY opt to return it on a subsequent operation (in the same | |||
| COMPOUND or CB_COMPOUND reply). A replier MAY return | COMPOUND or CB_COMPOUND reply). A replier MAY return | |||
| NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if | NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if | |||
| the response would still exceed ca_maxresponsesize. | the response would still exceed ca_maxresponsesize. | |||
| If sa_cachethis or csa_cachethis are TRUE, then the replier MUST | If sa_cachethis or csa_cachethis are TRUE, then the replier MUST | |||
| cache a reply except if an error is returned by the SEQUENCE or | cache a reply except if an error is returned by the SEQUENCE or | |||
| CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds | CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds | |||
| ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are | ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are | |||
| TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even | TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even | |||
| if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) | if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) | |||
| skipping to change at page 59, line 37 | skipping to change at page 59, line 37 | |||
| sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by | sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by | |||
| SEQUENCE). Such a session is considered dead. A server MAY re- | SEQUENCE). Such a session is considered dead. A server MAY re- | |||
| animate a session after a server restart so that the session will | animate a session after a server restart so that the session will | |||
| accept new requests as well as retries. To re-animate a session the | accept new requests as well as retries. To re-animate a session the | |||
| server needs to persist additional information through server | server needs to persist additional information through server | |||
| restart: | restart: | |||
| o The client ID. This is a prerequisite to let the client to create | o The client ID. This is a prerequisite to let the client to create | |||
| more sessions associated with the same client ID as the | more sessions associated with the same client ID as the | |||
| o The client ID's sequenceid that is used for creating sessions (see | o The client ID's sequence id that is used for creating sessions | |||
| Section 18.35 and Section 18.36. This is a prerequisite to let | (see Section 18.35 and Section 18.36). This is a prerequisite to | |||
| the client create more sessions. | let the client create more sessions. | |||
| o The principal that created the client ID. This allows the server | o The principal that created the client ID. This allows the server | |||
| to authenticate the client when it sends EXCHANGE_ID. | to authenticate the client when it sends EXCHANGE_ID. | |||
| o The SSV, if SP4_SSV state protection was specified when the client | o The SSV, if SP4_SSV state protection was specified when the client | |||
| ID was created (see Section 18.35). This lets the client create | ID was created (see Section 18.35). This lets the client create | |||
| new sessions, and associate connections with the new and existing | new sessions, and associate connections with the new and existing | |||
| sessions. | sessions. | |||
| o The properties of the client ID as defined in Section 18.35. | o The properties of the client ID as defined in Section 18.35. | |||
| skipping to change at page 76, line 21 | skipping to change at page 76, line 21 | |||
| o A catastrophe that causes the reply cache to be corrupted or lost | o A catastrophe that causes the reply cache to be corrupted or lost | |||
| on the media it was stored on. This applies even if the replier | on the media it was stored on. This applies even if the replier | |||
| indicated in the CREATE_SESSION results that it would persist the | indicated in the CREATE_SESSION results that it would persist the | |||
| cache. | cache. | |||
| o The server purges the session of a client that has been inactive | o The server purges the session of a client that has been inactive | |||
| for a very extended period of time. | for a very extended period of time. | |||
| Loss of reply cache is equivalent to loss of session. The replier | Loss of reply cache is equivalent to loss of session. The replier | |||
| indicates loss of session to the requester by returning | indicates loss of session to the requester by returning | |||
| NFS4ERR_BADSESSION on the next operation that uses the sessionid that | NFS4ERR_BADSESSION on the next operation that uses the session id | |||
| refers to the lost session. | that refers to the lost session. | |||
| After an event like a server restart, the client may have lost its | After an event like a server restart, the client may have lost its | |||
| connections. The client assumes for the moment that the session has | connections. The client assumes for the moment that the session has | |||
| not been lost. It reconnects, and if it specified connection | not been lost. It reconnects, and if it specified connection | |||
| association enforcement when the session was created, it invokes | association enforcement when the session was created, it invokes | |||
| BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes | BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes | |||
| SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns | SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns | |||
| NFS4ERR_BADSESSION, the client knows the session was lost. If the | NFS4ERR_BADSESSION, the client knows the session was lost. If the | |||
| connection survives session loss, then the next SEQUENCE operation | connection survives session loss, then the next SEQUENCE operation | |||
| the client sends over the connection will get back | the client sends over the connection will get back | |||
| skipping to change at page 80, line 19 | skipping to change at page 80, line 19 | |||
| | | Various defined file types. | | | | Various defined file types. | | |||
| | nfsstat4 | enum nfsstat4; | | | nfsstat4 | enum nfsstat4; | | |||
| | | Return value for operations. | | | | Return value for operations. | | |||
| | offset4 | typedef uint64_t offset4; | | | offset4 | typedef uint64_t offset4; | | |||
| | | Various offset designations (READ, WRITE, LOCK, | | | | Various offset designations (READ, WRITE, LOCK, | | |||
| | | COMMIT). | | | | COMMIT). | | |||
| | qop4 | typedef uint32_t qop4; | | | qop4 | typedef uint32_t qop4; | | |||
| | | Quality of protection designation in SECINFO. | | | | Quality of protection designation in SECINFO. | | |||
| | sec_oid4 | typedef opaque sec_oid4<>; | | | sec_oid4 | typedef opaque sec_oid4<>; | | |||
| | | Security Object Identifier. The sec_oid4 data | | | | Security Object Identifier. The sec_oid4 data | | |||
| | | type is not really opaque. Instead it contains | | | | type is not really opaque. Instead it contains an | | |||
| | | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in | | | | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the | | |||
| | | the mech_type argument to GSS_Init_sec_context. | | | | mech_type argument to GSS_Init_sec_context. See | | |||
| | | See [7] for details. | | | | [7] for details. | | |||
| | sequenceid4 | typedef uint32_t sequenceid4; | | | sequenceid4 | typedef uint32_t sequenceid4; | | |||
| | | Sequence number used for various session | | | | Sequence number used for various session | | |||
| | | operations (EXCHANGE_ID, CREATE_SESSION, | | | | operations (EXCHANGE_ID, CREATE_SESSION, | | |||
| | | SEQUENCE, CB_SEQUENCE). | | | | SEQUENCE, CB_SEQUENCE). | | |||
| | seqid4 | typedef uint32_t seqid4; | | | seqid4 | typedef uint32_t seqid4; | | |||
| | | Sequence identifier used for file locking. | | | | Sequence identifier used for file locking. | | |||
| | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | |||
| | | Session identifier. | | | | Session identifier. | | |||
| | slotid4 | typedef uint32_t slotid4; | | | slotid4 | typedef uint32_t slotid4; | | |||
| | | Sequencing artifact for various session | | | | Sequencing artifact for various session | | |||
| skipping to change at page 100, line 47 | skipping to change at page 100, line 47 | |||
| Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can | Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can | |||
| be set via SETATTR but not retrieved via GETATTR. Similarly, some | be set via SETATTR but not retrieved via GETATTR. Similarly, some | |||
| REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be | REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be | |||
| retrieved GETATTR but not set via SETATTR. If a client attempts to | retrieved GETATTR but not set via SETATTR. If a client attempts to | |||
| set a get-only attribute or get a set-only attributes, the server | set a get-only attribute or get a set-only attributes, the server | |||
| MUST return NFS4ERR_INVAL. | MUST return NFS4ERR_INVAL. | |||
| 5.6. REQUIRED Attributes - List and Definition References | 5.6. REQUIRED Attributes - List and Definition References | |||
| The list of REQUIRED attributes appears in Table 4. The meaning of | The list of REQUIRED attributes appears in Table 4. The meaning of | |||
| hte columns of the table are: | the columns of the table are: | |||
| o Name: the name of attribute | o Name: the name of attribute | |||
| o Id: the number assigned to the attribute. In the event of | o Id: the number assigned to the attribute. In the event of | |||
| conflicts between the assigned number and [12], the latter is | conflicts between the assigned number and [12], the latter is | |||
| authoritative. | authoritative. | |||
| o Data Type: The XDR data type of the attribute. | o Data Type: The XDR data type of the attribute. | |||
| o Acc: Access allowed to the attribute. R means read-only (GETATTR | o Acc: Access allowed to the attribute. R means read-only (GETATTR | |||
| skipping to change at page 143, line 25 | skipping to change at page 143, line 25 | |||
| ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, | ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, | |||
| both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) | both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) | |||
| This makes it simpler to modify the effective permissions on the | This makes it simpler to modify the effective permissions on the | |||
| directory without modifying the ACE which is to be inherited to the | directory without modifying the ACE which is to be inherited to the | |||
| new directory's children. | new directory's children. | |||
| 6.4.3.2. Automatic Inheritance | 6.4.3.2. Automatic Inheritance | |||
| The acl attribute consists only of an array of ACEs, but the sacl | The acl attribute consists only of an array of ACEs, but the sacl | |||
| (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an | (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an | |||
| additional flag field. The flag field applies to the entire sacl or | additional flag field. | |||
| dacl; three flag values are defined: | ||||
| struct nfsacl41 { | ||||
| aclflag4 na41_flag; | ||||
| nfsace4 na41_aces<>; | ||||
| }; | ||||
| The flag field applies to the entire sacl or dacl; three flag values | ||||
| are defined: | ||||
| const ACL4_AUTO_INHERIT = 0x00000001; | const ACL4_AUTO_INHERIT = 0x00000001; | |||
| const ACL4_PROTECTED = 0x00000002; | const ACL4_PROTECTED = 0x00000002; | |||
| const ACL4_DEFAULTED = 0x00000004; | const ACL4_DEFAULTED = 0x00000004; | |||
| and all other bits must be cleared. The ACE4_INHERITED_ACE flag may | and all other bits must be cleared. The ACE4_INHERITED_ACE flag may | |||
| be set in the ACEs of the sacl or dacl (whereas it must always be | be set in the ACEs of the sacl or dacl (whereas it must always be | |||
| cleared in the acl). | cleared in the acl). | |||
| Together these features allow a server to support automatic | Together these features allow a server to support automatic | |||
| skipping to change at page 146, line 27 | skipping to change at page 146, line 32 | |||
| In NFSv3, the client expects all LOOKUP operations to remain within a | In NFSv3, the client expects all LOOKUP operations to remain within a | |||
| single server file system. For example, the device attribute will | single server file system. For example, the device attribute will | |||
| not change. This prevents a client from taking namespace paths that | not change. This prevents a client from taking namespace paths that | |||
| span exports. | span exports. | |||
| In the case of NFSv3, an automounter on the client can obtain a | In the case of NFSv3, an automounter on the client can obtain a | |||
| snapshot of the server's namespace using the EXPORTS procedure of the | snapshot of the server's namespace using the EXPORTS procedure of the | |||
| MOUNT protocol. If it understands the server's pathname syntax, it | MOUNT protocol. If it understands the server's pathname syntax, it | |||
| can create an image of the server's namespace on the client. The | can create an image of the server's namespace on the client. The | |||
| parts of the namespace that are not exported by the server are filled | parts of the namespace that are not exported by the server are filled | |||
| in with directories that might be constructed similarly to a NFSv4.1 | in with directories that might be constructed similarly to an NFSv4.1 | |||
| "pseudo file system" (see Section 7.3) that allows the user to browse | "pseudo file system" (see Section 7.3) that allows the user to browse | |||
| from one mounted file system to another. There is a drawback to this | from one mounted file system to another. There is a drawback to this | |||
| representation of the server's namespace on the client: it is static. | representation of the server's namespace on the client: it is static. | |||
| If the server administrator adds a new export the client will be | If the server administrator adds a new export the client will be | |||
| unaware of it. | unaware of it. | |||
| 7.3. Server Pseudo File System | 7.3. Server Pseudo File System | |||
| NFSv4.1 servers avoid this namespace inconsistency by presenting all | NFSv4.1 servers avoid this namespace inconsistency by presenting all | |||
| the exports for a given server within the framework of a single | the exports for a given server within the framework of a single | |||
| skipping to change at page 150, line 28 | skipping to change at page 150, line 33 | |||
| which represents a client as a whole to the eventual lightweight | which represents a client as a whole to the eventual lightweight | |||
| stateid used for most client and server locking interactions. The | stateid used for most client and server locking interactions. The | |||
| details of this transition will vary with the type of object but it | details of this transition will vary with the type of object but it | |||
| always starts with a client ID. | always starts with a client ID. | |||
| 8.1. Client and Session ID | 8.1. Client and Session ID | |||
| A client must establish a client ID (see Section 2.4) and then one or | A client must establish a client ID (see Section 2.4) and then one or | |||
| more sessionids (see Section 2.10) before performing any operations | more sessionids (see Section 2.10) before performing any operations | |||
| to open, lock, delegate, or obtain a layout for a file object. Each | to open, lock, delegate, or obtain a layout for a file object. Each | |||
| sessionid is associated with a specific client ID, and thus serves as | session id is associated with a specific client ID, and thus serves | |||
| a shorthand reference to an NFSv4.1 client. | as a shorthand reference to an NFSv4.1 client. | |||
| For some types of locking interactions, the client will represent | For some types of locking interactions, the client will represent | |||
| some number of internal locking entities called "owners", which | some number of internal locking entities called "owners", which | |||
| normally correspond to processes internal to the client. For other | normally correspond to processes internal to the client. For other | |||
| types of locking-related objects, such as delegations and layouts, no | types of locking-related objects, such as delegations and layouts, no | |||
| such intermediate entities are provided for, and the locking-related | such intermediate entities are provided for, and the locking-related | |||
| objects are considered to be transferred directly between the server | objects are considered to be transferred directly between the server | |||
| and a unitary client. | and a unitary client. | |||
| 8.2. Stateid Definition | 8.2. Stateid Definition | |||
| skipping to change at page 156, line 26 | skipping to change at page 156, line 31 | |||
| appropriate error returned when necessary. Special and non-special | appropriate error returned when necessary. Special and non-special | |||
| stateids are handled separately. (See Section 8.2.3 for a discussion | stateids are handled separately. (See Section 8.2.3 for a discussion | |||
| of special stateids.) | of special stateids.) | |||
| Note that stateids are implicitly qualified by the current client ID, | Note that stateids are implicitly qualified by the current client ID, | |||
| as derived from the client ID associated with the current session. | as derived from the client ID associated with the current session. | |||
| Note however, that the semantics of the session will prevent stateids | Note however, that the semantics of the session will prevent stateids | |||
| associated with a previous client or server instance from being | associated with a previous client or server instance from being | |||
| analyzed by this procedure. | analyzed by this procedure. | |||
| If server restart has resulted in an invalid client ID or a sessionid | If server restart has resulted in an invalid client ID or a session | |||
| which is invalid, SEQUENCE will return an error and the operation | id which is invalid, SEQUENCE will return an error and the operation | |||
| that takes a stateid as an argument will never be processed. | that takes a stateid as an argument will never be processed. | |||
| If there has been a server restart where there is a persistent | If there has been a server restart where there is a persistent | |||
| session, and all leased state has been lost, then the session in | session, and all leased state has been lost, then the session in | |||
| question will, although valid, be marked as dead, and any operation | question will, although valid, be marked as dead, and any operation | |||
| not satisfied by means of the reply cache will receive the error | not satisfied by means of the reply cache will receive the error | |||
| NFS4ERR_DEADSESSION, and thus not be processed as indicated below. | NFS4ERR_DEADSESSION, and thus not be processed as indicated below. | |||
| When a stateid is being tested, and the "other" field is all zeros or | When a stateid is being tested, and the "other" field is all zeros or | |||
| all ones, a check that the "other" and "seqid" fields match a defined | all ones, a check that the "other" and "seqid" fields match a defined | |||
| skipping to change at page 249, line 20 | skipping to change at page 249, line 20 | |||
| referring (absent) file system nor is there any access to the | referring (absent) file system nor is there any access to the | |||
| fh_expire_type attribute. | fh_expire_type attribute. | |||
| o All file system instances servers should be considered as of | o All file system instances servers should be considered as of | |||
| different _change_ classes. | different _change_ classes. | |||
| For other class assignments, handling of file system transitions | For other class assignments, handling of file system transitions | |||
| depends on the reasons for the transition: | depends on the reasons for the transition: | |||
| o When the transition is due to migration, that is the client was | o When the transition is due to migration, that is the client was | |||
| directed to new file system after receiving a NFS4ERR_MOVED error, | directed to new file system after receiving an NFS4ERR_MOVED | |||
| the target should be treated as being of the same _write-verifier_ | error, the target should be treated as being of the same _write- | |||
| class as the source. | verifier_ class as the source. | |||
| o When the transition is due to failover to another replica, that | o When the transition is due to failover to another replica, that | |||
| is, the client selected another replica without receiving and | is, the client selected another replica without receiving and | |||
| NFS4ERR_MOVED error, the target should be treated as being of a | NFS4ERR_MOVED error, the target should be treated as being of a | |||
| different _write-verifier_ class from the source. | different _write-verifier_ class from the source. | |||
| The specific choices reflect typical implementation patterns for | The specific choices reflect typical implementation patterns for | |||
| failover and controlled migration respectively. Since other choices | failover and controlled migration respectively. Since other choices | |||
| are possible and useful, this information is better obtained by using | are possible and useful, this information is better obtained by using | |||
| fs_locations_info. When a server implementation needs to communicate | fs_locations_info. When a server implementation needs to communicate | |||
| skipping to change at page 263, line 24 | skipping to change at page 263, line 24 | |||
| open denies WRITE and the data is changed), that lock SHOULD be | open denies WRITE and the data is changed), that lock SHOULD be | |||
| considered administratively revoked. | considered administratively revoked. | |||
| The opaque strings fss_source and fss_current provide a way of | The opaque strings fss_source and fss_current provide a way of | |||
| presenting information about the source of the file system image | presenting information about the source of the file system image | |||
| being present. It is not intended that client do anything with this | being present. It is not intended that client do anything with this | |||
| information other than make it available to administrative tools. It | information other than make it available to administrative tools. It | |||
| is intended that this information be helpful when researching | is intended that this information be helpful when researching | |||
| possible problems with a file system image that might arise when it | possible problems with a file system image that might arise when it | |||
| is unclear if the correct image is being accessed and if not, how | is unclear if the correct image is being accessed and if not, how | |||
| that image came to be made. This kind of dianostic information will | that image came to be made. This kind of diagnostic information will | |||
| be helpful, if, as seems likely, copies of file systems are made in | be helpful, if, as seems likely, copies of file systems are made in | |||
| many different ways (e.g. simple user-level copies, file system-level | many different ways (e.g. simple user-level copies, file system-level | |||
| point-in-time copies, clones of the underlying storage), under a | point-in-time copies, clones of the underlying storage), under a | |||
| variety of administrative arrangements. In such environments, | variety of administrative arrangements. In such environments, | |||
| determining how a given set of data was constructed can be very | determining how a given set of data was constructed can be very | |||
| helpful in resolving problems. | helpful in resolving problems. | |||
| The opaque string fss_source is used to indicate the source of a | The opaque string fss_source is used to indicate the source of a | |||
| given file system with the expectation that tools capable of creating | given file system with the expectation that tools capable of creating | |||
| a file system image propagate this information, when that is | a file system image propagate this information, when that is | |||
| skipping to change at page 265, line 45 | skipping to change at page 265, line 45 | |||
| ||| | | ||| | | |||
| ||| | | ||| | | |||
| ||| Storage +-----------+ | | ||| Storage +-----------+ | | |||
| ||| Protocol |+-----------+ | | ||| Protocol |+-----------+ | | |||
| ||+----------------||+-----------+ Control | | ||+----------------||+-----------+ Control | | |||
| |+-----------------||| | Protocol| | |+-----------------||| | Protocol| | |||
| +------------------+|| Storage |------------+ | +------------------+|| Storage |------------+ | |||
| +| Devices | | +| Devices | | |||
| +-----------+ | +-----------+ | |||
| Figure 67 | Figure 68 | |||
| In this model, the clients, server, and storage devices are | In this model, the clients, server, and storage devices are | |||
| responsible for managing file access. This is in contrast to NFSv4 | responsible for managing file access. This is in contrast to NFSv4 | |||
| without pNFS where it is primarily the server's responsibility; some | without pNFS where it is primarily the server's responsibility; some | |||
| of this responsibility may be delegated to the client under strictly | of this responsibility may be delegated to the client under strictly | |||
| specified conditions. | specified conditions. | |||
| pNFS takes the form of OPTIONAL operations that manage protocol | pNFS takes the form of OPTIONAL operations that manage protocol | |||
| objects called 'layouts' which contain data location information. | objects called 'layouts' which contain a byte-range and storage | |||
| The layout is managed in a similar fashion as NFSv4.1 data | location information. The layout is managed in a similar fashion as | |||
| delegations are managed. For example, the layout is leased, | NFSv4.1 data delegations. For example, the layout is leased, | |||
| recallable and revocable. However, layouts are distinct abstractions | recallable and revocable. However, layouts are distinct abstractions | |||
| and are manipulated with new operations. When a client holds a | and are manipulated with new operations. When a client holds a | |||
| layout, it is granted the ability to access the data location | layout, it is granted the ability to directly access the byte-range | |||
| directly using the location information specified in the layout. | at the storage location specified in the layout. | |||
| There are interactions between layouts and other NFSv4.1 abstractions | There are interactions between layouts and other NFSv4.1 abstractions | |||
| such as data delegations and byte-range locking. Delegation issues | such as data delegations and byte-range locking. Delegation issues | |||
| are discussed in Section 12.5.5. Byte range locking issues are | are discussed in Section 12.5.5. Byte range locking issues are | |||
| discussed in Section 12.2.9 and Section 12.5.1. | discussed in Section 12.2.9 and Section 12.5.1. | |||
| The NFSv4.1 pNFS feature has been structured to allow for a variety | The NFSv4.1 pNFS feature has been structured to allow for a variety | |||
| of storage protocols to be defined and used. As noted in the diagram | of storage protocols to be defined and used. As noted in the diagram | |||
| above, the storage protocol is the method used by the client to store | above, the storage protocol is the method used by the client to store | |||
| and retrieve data directly from the storage devices. The NFSv4.1 | and retrieve data directly from the storage devices. The NFSv4.1 | |||
| skipping to change at page 266, line 46 | skipping to change at page 266, line 46 | |||
| o Object protocols such as OSD over iSCSI or Fibre Channel [40]. | o Object protocols such as OSD over iSCSI or Fibre Channel [40]. | |||
| o Other storage protocols, including PVFS and other file systems | o Other storage protocols, including PVFS and other file systems | |||
| that are in use in HPC environments. | that are in use in HPC environments. | |||
| It is possible that various storage protocols are available to both | It is possible that various storage protocols are available to both | |||
| client and server and it may be possible that a client and server do | client and server and it may be possible that a client and server do | |||
| not have a matching storage protocol available to them. Because of | not have a matching storage protocol available to them. Because of | |||
| this, the pNFS server MUST support normal NFSv4.1 access to any file | this, the pNFS server MUST support normal NFSv4.1 access to any file | |||
| accessible by the pNFS feature; this will allow for continued | accessible by the pNFS feature; this will allow for continued | |||
| interoperability between a NFSv4.1 client and server. | interoperability between an NFSv4.1 client and server. | |||
| 12.2. pNFS Definitions | 12.2. pNFS Definitions | |||
| NFSv4.1's pNFS feature partitions the file system protocol into two | NFSv4.1's pNFS feature partitions the file system protocol into two | |||
| parts: metadata and data. Where data is the contents of a file and | parts: metadata and data. Where data being the contents of a file | |||
| metadata is "everything else". The metadata functionality is | and the metadata is "everything else". The metadata functionality is | |||
| implemented by a metadata server that supports pNFS and the | implemented by a NFSv4.1 server that supports pNFS and the operations | |||
| operations described in (Section 18). The data functionality is | described in (Section 18) (a metadata server). The data | |||
| implemented by a storage device that supports the storage protocol. | functionality is implemented by one or more storage devices, each of | |||
| A subset (defined in Section 13.6) of NFSv4.1 itself is one such | which are accessed by the client via a storage protocol. A subset | |||
| storage protocol. New terms are introduced to the NFSv4.1 | (defined in Section 13.6) of NFSv4.1 is one such storage protocol. | |||
| nomenclature and existing terms are clarified to allow for the | New terms are introduced to the NFSv4.1 nomenclature and existing | |||
| description of the pNFS feature. | terms are clarified to allow for the description of the pNFS feature. | |||
| 12.2.1. Metadata | 12.2.1. Metadata | |||
| Information about a file system object, such as its name, location | Information about a file system object, such as its name, location | |||
| within the namespace, owner, ACL and other attributes. Metadata may | within the namespace, owner, ACL and other attributes. Metadata may | |||
| also include storage location information and this will vary based on | also include storage location information and this will vary based on | |||
| the underlying storage mechanism that is used. | the underlying storage mechanism that is used. | |||
| 12.2.2. Metadata Server | 12.2.2. Metadata Server | |||
| An NFSv4.1 server which supports the pNFS feature. A variety of | An NFSv4.1 server which supports the pNFS feature. A variety of | |||
| architectural choices exists for the metadata server and its use of | architectural choices exists for the metadata server and its use of | |||
| what file system information is held at the server. Some servers may | file system information held at the server. Some servers may contain | |||
| contain metadata only for the file objects that reside at the | metadata only for file objects residing at the metadata server while | |||
| metadata server while file data resides on the associated storage | the file data resides on associated storage devices. Other metadata | |||
| devices. Other metadata servers may hold both metadata and a varying | servers may hold both metadata and a varying degree of file data. | |||
| degree of file data. | ||||
| 12.2.3. pNFS Client | 12.2.3. pNFS Client | |||
| An NFSv4.1 client that supports pNFS operations and supports at least | An NFSv4.1 client that supports pNFS operations and supports at least | |||
| one storage protocol or layout type for performing I/O to storage | one storage protocol for performing I/O to storage devices. | |||
| devices. | ||||
| 12.2.4. Storage Device | 12.2.4. Storage Device | |||
| A storage device stores a regular file's data, but leaves metadata | A storage device stores a regular file's data, but leaves metadata | |||
| management to the metadata server. A storage device could be another | management to the metadata server. A storage device could be another | |||
| NFSv4.1 server, an object storage device (OSD), a block device | NFSv4.1 server, an object storage device (OSD), a block device | |||
| accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some | accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some | |||
| other entity. | other entity. | |||
| 12.2.5. Storage Protocol | 12.2.5. Storage Protocol | |||
| skipping to change at page 268, line 32 | skipping to change at page 268, line 26 | |||
| devices that hold the data. A layout is said to belong to a specific | devices that hold the data. A layout is said to belong to a specific | |||
| layout type (data type layouttype4, see Section 3.3.13). The layout | layout type (data type layouttype4, see Section 3.3.13). The layout | |||
| type allows for variants to handle different storage protocols, such | type allows for variants to handle different storage protocols, such | |||
| as those associated with block/volume [31], object [30], and file | as those associated with block/volume [31], object [30], and file | |||
| (Section 13) layout types. A metadata server, along with its control | (Section 13) layout types. A metadata server, along with its control | |||
| protocol, MUST support at least one layout type. A private sub-range | protocol, MUST support at least one layout type. A private sub-range | |||
| of the layout type name space is also defined. Values from the | of the layout type name space is also defined. Values from the | |||
| private layout type range MAY be used for internal testing or | private layout type range MAY be used for internal testing or | |||
| experimentation. | experimentation. | |||
| As an example, layout of the file layout type could be an array of | As an example, the organization of the file layout type could be an | |||
| tuples (e.g., deviceID, file_handle), along with a definition of how | array of tuples (e.g., deviceID, file_handle), along with a | |||
| the data is stored across the devices (e.g., striping). A block/ | definition of how the data is stored across the devices (e.g., | |||
| volume layout might be an array of tuples that store <deviceID, | striping). A block/volume layout might be an array of tuples that | |||
| block_number, block count> along with information about block size | store <deviceID, block_number, block count> along with information | |||
| and the associated file offset of the block number. An object layout | about block size and the associated file offset of the block number. | |||
| might be an array of tuples <deviceID, objectID> and an additional | An object layout might be an array of tuples <deviceID, objectID> and | |||
| structure (i.e., the aggregation map) that defines how the logical | an additional structure (i.e., the aggregation map) that defines how | |||
| byte sequence of the file data is serialized into the different | the logical byte sequence of the file data is serialized into the | |||
| objects. Note that the actual layouts are typically more complex | different objects. Note that the actual layouts are typically more | |||
| than these simple expository examples. | complex than these simple expository examples. | |||
| Requests for pNFS-related operations will often specify a layout | Requests for pNFS-related operations will often specify a layout | |||
| type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. | type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. | |||
| The response for these operations will include structures such a | The response for these operations will include structures such a | |||
| device_addr4 or a layout4, each of which includes a layout type | device_addr4 or a layout4, each of which includes a layout type | |||
| within it. The layout type sent by the server MUST always be the | within it. The layout type sent by the server MUST always be the | |||
| same one requested by the client. When a client sends a response | same one requested by the client. When a server sends a response | |||
| that includes a different layout type, the client SHOULD ignore the | that includes a different layout type, the client SHOULD ignore the | |||
| response and behave as if the server had returned an error response. | response and behave as if the server had returned an error response. | |||
| 12.2.8. Layout | 12.2.8. Layout | |||
| A layout defines how a file's data is organized on one or more | A layout defines how a file's data is organized on one or more | |||
| storage devices. There are many potential layout types; each of the | storage devices. There are many potential layout types; each of the | |||
| layout types are differentiated by the storage protocol used to | layout types are differentiated by the storage protocol used to | |||
| access data and in the aggregation scheme that lays out the file data | access data and in the aggregation scheme that lays out the file data | |||
| on the underlying storage devices. A layout is precisely identified | on the underlying storage devices. A layout is precisely identified | |||
| skipping to change at page 269, line 33 | skipping to change at page 269, line 27 | |||
| permissible for layouts with different iomodes, pertaining to the | permissible for layouts with different iomodes, pertaining to the | |||
| same byte range, to be held by the same client. An example of this | same byte range, to be held by the same client. An example of this | |||
| would be copy-on-write functionality for a block/volume layout type. | would be copy-on-write functionality for a block/volume layout type. | |||
| 12.2.9. Layout Iomode | 12.2.9. Layout Iomode | |||
| The layout iomode (data type layoutiomode4, see Section 3.3.20) | The layout iomode (data type layoutiomode4, see Section 3.3.20) | |||
| indicates to the metadata server the client's intent to perform | indicates to the metadata server the client's intent to perform | |||
| either just read operations or a mixture of I/O possibly containing | either just read operations or a mixture of I/O possibly containing | |||
| read and write operations. For certain layout types, it is useful | read and write operations. For certain layout types, it is useful | |||
| for a client to specify this intent at LAYOUTGET (Section 18.43) | for a client to specify this intent at the time it sends LAYOUTGET | |||
| time. For example, block/volume based protocols, block allocation | (Section 18.43). For example, block/volume based protocols, block | |||
| could occur when a READ/WRITE iomode is specified. A special | allocation could occur when a READ/WRITE iomode is specified. A | |||
| LAYOUTIOMODE4_ANY iomode is defined and can only be used for | special LAYOUTIOMODE4_ANY iomode is defined and can only be used for | |||
| LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies | LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies | |||
| that layouts pertaining to both READ and READ/WRITE iomodes are being | that layouts pertaining to both READ and READ/WRITE iomodes are being | |||
| returned or recalled, respectively. | returned or recalled, respectively. | |||
| A storage device may validate I/O with regards to the iomode; this is | A storage device may validate I/O with regard to the iomode; this is | |||
| dependent upon storage device implementation and layout type. Thus, | dependent upon storage device implementation and layout type. Thus, | |||
| if the client's layout iomode is inconsistent with the I/O being | if the client's layout iomode is inconsistent with the I/O being | |||
| performed, the storage device may reject the client's I/O with an | performed, the storage device may reject the client's I/O with an | |||
| error indicating a new layout with the correct I/O mode should be | error indicating a new layout with the correct iomode should be | |||
| fetched. For example, if a client gets a layout with a READ iomode | obtained via LAYOUTGET. For example, if a client gets a layout with | |||
| and performs a WRITE to a storage device, the storage device is | a READ iomode and performs a WRITE to a storage device, the storage | |||
| allowed to reject that WRITE. | device is allowed to reject that WRITE. | |||
| The iomode does not conflict with OPEN share modes or lock requests; | The use of the layout iomode does not conflict with OPEN share modes | |||
| open mode and lock conflicts are enforced as they are without the use | or byte-range lock requests; open mode and lock conflicts are | |||
| of pNFS, and are logically separate from the pNFS layout level. As | enforced as they are without the use of pNFS, and are logically | |||
| well, open modes and locks are the preferred method for restricting | separate from the pNFS layout level. Open modes and locks are the | |||
| user access to data files. For example, an OPEN of read, deny-write | preferred method for restricting user access to data files. For | |||
| does not conflict with a LAYOUTGET containing an iomode of READ/WRITE | example, an OPEN of read, deny-write does not conflict with a | |||
| performed by another client. Applications that depend on writing | LAYOUTGET containing an iomode of READ/WRITE performed by another | |||
| into the same file concurrently may use byte-range locking to | client. Applications that depend on writing into the same file | |||
| serialize their accesses. | concurrently may use byte-range locking to serialize their accesses. | |||
| 12.2.10. Device IDs | 12.2.10. Device IDs | |||
| The device ID (data type deviceid4, see Section 3.3.14) names a group | The device ID (data type deviceid4, see Section 3.3.14) identifies a | |||
| of storage devices. The scope of a device ID is per pair of client | group of storage devices. The scope of a device ID is the pair | |||
| ID and layout type. In practice, a significant amount of information | <client ID, layout type>. In practice, a significant amount of | |||
| may be required to fully address a storage device. Rather than | information may be required to fully address a storage device. | |||
| embedding all such information in a layout, layouts embed device IDs. | Rather than embedding all such information in a layout, layouts embed | |||
| The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is used to | device IDs. The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is | |||
| retrieve the complete address information (including all device | used to retrieve the complete address information (including all | |||
| addresses for the device ID) regarding the storage device according | device addresses for the device ID) regarding the storage device | |||
| to its layout type and device ID. For example, the address of an | according to its layout type and device ID. For example, the address | |||
| NFSv4.1 data server or of an object storage device could be an IP | of an NFSv4.1 data server or of an object storage device could be an | |||
| address and port. The address of a block storage device could be a | IP address and port. The address of a block storage device could be | |||
| volume label. | a volume label. | |||
| Clients cannot expect the mapping between a device ID and its storage | Clients cannot expect the mapping between a device ID and its storage | |||
| device address(es) to persist across metadata server restart. See | device address(es) to persist across metadata server restart. See | |||
| Section 12.7.4 for a description of how recovery works in that | Section 12.7.4 for a description of how recovery works in that | |||
| situation. | situation. | |||
| A device ID lives as long as there is a layout referring to the | A device ID lives as long as there is a layout referring to the | |||
| device ID. If there are no layouts referring to the device ID, the | device ID. If there are no layouts referring to the device ID, the | |||
| server is free to delete the device ID any time. Once a device ID is | server is free to delete the device ID any time. Once a device ID is | |||
| deleted by the server, the server MUST NOT reuse the device ID for | deleted by the server, the server MUST NOT reuse the device ID for | |||
| skipping to change at page 273, line 39 | skipping to change at page 273, line 31 | |||
| is incapable of providing this check in the presence of mandatory | is incapable of providing this check in the presence of mandatory | |||
| file locks, the metadata server then MUST NOT grant layouts and | file locks, the metadata server then MUST NOT grant layouts and | |||
| mandatory file locks simultaneously. | mandatory file locks simultaneously. | |||
| 12.5.2. Getting a Layout | 12.5.2. Getting a Layout | |||
| A client obtains a layout with the LAYOUTGET operation. The metadata | A client obtains a layout with the LAYOUTGET operation. The metadata | |||
| server will grant layouts of a particular type (e.g., block/volume, | server will grant layouts of a particular type (e.g., block/volume, | |||
| object, or file). The client selects an appropriate layout type that | object, or file). The client selects an appropriate layout type that | |||
| the server supports and the client is prepared to use. The layout | the server supports and the client is prepared to use. The layout | |||
| returned to the client may not exactly align with the requested byte | returned to the client might not exactly match the requested byte | |||
| range. A field within the LAYOUTGET request, loga_minlength, | range as described in Section 18.43.3. As needed a client may make | |||
| specifies the minimum length of the layout. The loga_minlength field | multiple LAYOUTGET requests; these might result in multiple | |||
| should be at least one. As needed a client may make multiple | overlapping, non-conflicting layouts (see Section 12.2.8). | |||
| LAYOUTGET requests; these will result in multiple overlapping, non- | ||||
| conflicting layouts. | ||||
| In order to get a layout, the client must first have opened the file | In order to get a layout, the client must first have opened the file | |||
| via the OPEN operation. When a client has no layout on a file, it | via the OPEN operation. When a client has no layout on a file, it | |||
| MUST present a stateid as returned by OPEN, a delegation stateid, or | MUST present a stateid as returned by OPEN, a delegation stateid, or | |||
| a byte-range lock stateid in the loga_stateid argument. A successful | a byte-range lock stateid in the loga_stateid argument. A successful | |||
| LAYOUTGET result includes a layout stateid. The first successful | LAYOUTGET result includes a layout stateid. The first successful | |||
| LAYOUTGET processed by the server using a non-layout stateid as an | LAYOUTGET processed by the server using a non-layout stateid as an | |||
| argument MUST have the "seqid" field of the layout stateid in the | argument MUST have the "seqid" field of the layout stateid in the | |||
| response set to one. Thereafter, the client uses a layout stateid | response set to one. Thereafter, the client uses a layout stateid | |||
| (see Section 12.5.3) on future invocations of LAYOUTGET on the file, | (see Section 12.5.3) on future invocations of LAYOUTGET on the file, | |||
| skipping to change at page 275, line 24 | skipping to change at page 275, line 14 | |||
| correct "seqid" is defined as the highest "seqid" value from | correct "seqid" is defined as the highest "seqid" value from | |||
| responses of fully processed LAYOUTGET or LAYOUTRETURN operations or | responses of fully processed LAYOUTGET or LAYOUTRETURN operations or | |||
| arguments of a fully processed CB_LAYOUTRECALL operation. Since the | arguments of a fully processed CB_LAYOUTRECALL operation. Since the | |||
| server is incrementing the "seqid" value on each layout operation, | server is incrementing the "seqid" value on each layout operation, | |||
| the client may determine the order of operation processing by | the client may determine the order of operation processing by | |||
| inspecting the "seqid" value. In the case of overlapping layout | inspecting the "seqid" value. In the case of overlapping layout | |||
| ranges, the ordering information will provide the client the | ranges, the ordering information will provide the client the | |||
| knowledge of which layout ranges are held. Note that overlapping | knowledge of which layout ranges are held. Note that overlapping | |||
| layout ranges may occur because of the client's specific requests or | layout ranges may occur because of the client's specific requests or | |||
| because the server is allowed to expand the range of a requested | because the server is allowed to expand the range of a requested | |||
| layout and notify the client in the LAYOUTRETURN results Additional | layout and notify the client in the LAYOUTRETURN results. Additional | |||
| layout stateid sequencing requirements are provided in | layout stateid sequencing requirements are provided in | |||
| Section 12.5.5.2. | Section 12.5.5.2. | |||
| The client's receipt of a "seqid" is not sufficient for subsequent | The client's receipt of a "seqid" is not sufficient for subsequent | |||
| use. The client must fully process the operations before the "seqid" | use. The client must fully process the operations before the "seqid" | |||
| can be used. For LAYOUTGET results, if the client is not using the | can be used. For LAYOUTGET results, if the client is not using the | |||
| forgetful model (Section 12.5.5.1), it MUST first update its record | forgetful model (Section 12.5.5.1), it MUST first update its record | |||
| of what ranges of the file's layout it has before using the seqid. | of what ranges of the file's layout it has before using the seqid. | |||
| For LAYOUTRETURN results, the client MUST delete the range from its | For LAYOUTRETURN results, the client MUST delete the range from its | |||
| record of what ranges of the file's layout it had before using the | record of what ranges of the file's layout it had before using the | |||
| skipping to change at page 295, line 4 | skipping to change at page 294, line 36 | |||
| NFSv4.1) what role the request to the common server network | NFSv4.1) what role the request to the common server network | |||
| address is directed to. | address is directed to. | |||
| 12.9. Security Considerations for pNFS | 12.9. Security Considerations for pNFS | |||
| pNFS separates file system metadata and data and provides access to | pNFS separates file system metadata and data and provides access to | |||
| both. There are pNFS-specific operations (listed in Section 12.3) | both. There are pNFS-specific operations (listed in Section 12.3) | |||
| that provide access to the metadata; all existing NFSv4.1 | that provide access to the metadata; all existing NFSv4.1 | |||
| conventional (non-pNFS) security mechanisms and features apply to | conventional (non-pNFS) security mechanisms and features apply to | |||
| accessing the metadata. The combination of components in a pNFS | accessing the metadata. The combination of components in a pNFS | |||
| system (see Figure 67) is required to preserve the security | system (see Figure 68) is required to preserve the security | |||
| properties of NFSv4.1 with respect to an entity accessing storage | properties of NFSv4.1 with respect to an entity accessing storage | |||
| device from a client, including security countermeasures to defend | device from a client, including security countermeasures to defend | |||
| against threats that NFSv4.1 provides defenses for in environments | against threats that NFSv4.1 provides defenses for in environments | |||
| where these threats are considered significant. | where these threats are considered significant. | |||
| In some cases, the security countermeasures for connections to | In some cases, the security countermeasures for connections to | |||
| storage devices may take the form of physical isolation or a | storage devices may take the form of physical isolation or a | |||
| recommendation not to use pNFS in an environment. For example, it | recommendation not to use pNFS in an environment. For example, it | |||
| may be impractical to provide confidentiality protection for some | may be impractical to provide confidentiality protection for some | |||
| storage protocols to protect against eavesdropping; in environments | storage protocols to protect against eavesdropping; in environments | |||
| skipping to change at page 316, line 21 | skipping to change at page 315, line 41 | |||
| o Otherwise, there must be an open stateid for the current open- | o Otherwise, there must be an open stateid for the current open- | |||
| owner, and that open stateid for the open file in question is | owner, and that open stateid for the open file in question is | |||
| used, unless mandatory locking, prevents that. See below. | used, unless mandatory locking, prevents that. See below. | |||
| o If the data server had previously responded with NFS4ERR_LOCKED to | o If the data server had previously responded with NFS4ERR_LOCKED to | |||
| use of the open stateid, then the client should use the lock | use of the open stateid, then the client should use the lock | |||
| stateid whenever one exists for that open file with the current | stateid whenever one exists for that open file with the current | |||
| lock-owner. | lock-owner. | |||
| o Special stateids should never be used and if used the data server | o Special stateids should never be used and if used the data server | |||
| MUST reject the I/O with a NFS4ERR_BAD_STATEID error. | MUST reject the I/O with an NFS4ERR_BAD_STATEID error. | |||
| 13.9.2. Data Server State Propagation | 13.9.2. Data Server State Propagation | |||
| Since the metadata server, which handles lock and open-mode state | Since the metadata server, which handles lock and open-mode state | |||
| changes, as well as ACLs, may not be co-located with the data servers | changes, as well as ACLs, may not be co-located with the data servers | |||
| where I/O access are validated, the server implementation MUST take | where I/O access are validated, the server implementation MUST take | |||
| care of propagating changes of this state to the data servers. Once | care of propagating changes of this state to the data servers. Once | |||
| the propagation to the data servers is complete, the full effect of | the propagation to the data servers is complete, the full effect of | |||
| those changes MUST be in effect at the data servers. However, some | those changes MUST be in effect at the data servers. However, some | |||
| state changes need not be propagated immediately, although all | state changes need not be propagated immediately, although all | |||
| skipping to change at page 378, line 42 | skipping to change at page 377, line 42 | |||
| 16.1.1. ARGUMENTS | 16.1.1. ARGUMENTS | |||
| void; | void; | |||
| 16.1.2. RESULTS | 16.1.2. RESULTS | |||
| void; | void; | |||
| 16.1.3. DESCRIPTION | 16.1.3. DESCRIPTION | |||
| Standard NULL procedure. Void argument, void response. This | This is the standard NULL procedure with the standard void argument | |||
| procedure has no functionality associated with it. Because of this | and void response. This procedure has no functionality associated | |||
| it is sometimes used to measure the overhead of processing a service | with it. Because of this it is sometimes used to measure the | |||
| request. Therefore, the server should ensure that no unnecessary | overhead of processing a service request. Therefore, the server | |||
| work is done in servicing this procedure. | SHOULD ensure that no unnecessary work is done in servicing this | |||
| procedure. | ||||
| 16.1.4. ERRORS | 16.1.4. ERRORS | |||
| None. | None. | |||
| 16.2. Procedure 1: COMPOUND - Compound Operations | 16.2. Procedure 1: COMPOUND - Compound Operations | |||
| 16.2.1. ARGUMENTS | 16.2.1. ARGUMENTS | |||
| enum nfs_opnum4 { | enum nfs_opnum4 { | |||
| skipping to change at page 387, line 24 | skipping to change at page 386, line 24 | |||
| PUTFH fh1 {fh1} | PUTFH fh1 {fh1} | |||
| LOOKUP "compA" {fh2} | LOOKUP "compA" {fh2} | |||
| GETATTR {fh2} | GETATTR {fh2} | |||
| LOOKUP "compB" {fh3} | LOOKUP "compB" {fh3} | |||
| GETATTR {fh3} | GETATTR {fh3} | |||
| LOOKUP "compC" {fh4} | LOOKUP "compC" {fh4} | |||
| GETATTR {fh4} | GETATTR {fh4} | |||
| GETFH | GETFH | |||
| Figure 84 | Figure 85 | |||
| In this example, the PUTFH (Section 18.19) operation explicitly sets | In this example, the PUTFH (Section 18.19) operation explicitly sets | |||
| the current filehandle value while the result of each LOOKUP | the current filehandle value while the result of each LOOKUP | |||
| operation sets the current filehandle value to the resultant file | operation sets the current filehandle value to the resultant file | |||
| system object. Also, the client is able to insert GETATTR operations | system object. Also, the client is able to insert GETATTR operations | |||
| using the current filehandle as an argument. | using the current filehandle as an argument. | |||
| The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations | The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations | |||
| also set the current filehandle. The above example would replace | also set the current filehandle. The above example would replace | |||
| "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in | "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in | |||
| skipping to change at page 388, line 22 | skipping to change at page 387, line 22 | |||
| A "current stateid" is the stateid that is associated with the | A "current stateid" is the stateid that is associated with the | |||
| current filehandle. The current stateid may only be changed by an | current filehandle. The current stateid may only be changed by an | |||
| operation that modifies the current filehandle or returns a stateid. | operation that modifies the current filehandle or returns a stateid. | |||
| If an operation returns a stateid it MUST set the current stateid to | If an operation returns a stateid it MUST set the current stateid to | |||
| the returned value. If an operation sets the current filehandle but | the returned value. If an operation sets the current filehandle but | |||
| does not return a stateid, the current stateid MUST be set to the | does not return a stateid, the current stateid MUST be set to the | |||
| all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an | all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an | |||
| operation uses a stateid as an argument but does not return a | operation uses a stateid as an argument but does not return a | |||
| stateid, the current stateid MUST NOT be changed. E.g., PUTFH, | stateid, the current stateid MUST NOT be changed. E.g., PUTFH, | |||
| PUTROOFH, and PUTPUBFH will change the current server state from | PUTROOTFH, and PUTPUBFH will change the current server state from | |||
| {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current | {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current | |||
| state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP | state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP | |||
| that transform a current filehandle and component name into a new | that transform a current filehandle and component name into a new | |||
| current filehandle will also change the current stateid to {0, 0}. | current filehandle will also change the current stateid to {0, 0}. | |||
| The SAVEFH and RESTOREFH operations will save and restore both the | The SAVEFH and RESTOREFH operations will save and restore both the | |||
| current filehandle and the current stateid as a set. | current filehandle and the current stateid as a set. | |||
| The following example is the common case of a simple READ operation | The following example is the common case of a simple READ operation | |||
| with a supplied stateid showing that the PUTFH initializes the | with a supplied stateid showing that the PUTFH initializes the | |||
| current stateid to (0, 0). The subsequent READ with stateid (sid1) | current stateid to (0, 0). The subsequent READ with stateid (sid1) | |||
| leaves the current stateid unchanged, but does evaluate the the | leaves the current stateid unchanged, but does evaluate the the | |||
| operation. | operation. | |||
| PUTFH fh1 - -> {fh1, (0, 0)} | PUTFH fh1 - -> {fh1, (0, 0)} | |||
| READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} | READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} | |||
| Figure 85 | Figure 86 | |||
| This next example performs an OPEN with the root filehandle and as a | This next example performs an OPEN with the root filehandle and as a | |||
| result generates stateid (sid1). The next operation specifies the | result generates stateid (sid1). The next operation specifies the | |||
| READ with the argument stateid set such that (seqid, other) are equal | READ with the argument stateid set such that (seqid, other) are equal | |||
| to (1, 0), but the current stateid set by the previous operation is | to (1, 0), but the current stateid set by the previous operation is | |||
| actually used when the operation is evaluated. This allows correct | actually used when the operation is evaluated. This allows correct | |||
| interaction with any existing, potentially conflicting, locks. | interaction with any existing, potentially conflicting, locks. | |||
| PUTROOTFH - -> {fh1, (0, 0)} | PUTROOTFH - -> {fh1, (0, 0)} | |||
| OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} | OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} | |||
| READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} | READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} | |||
| CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} | CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} | |||
| Figure 86 | Figure 87 | |||
| The final example is similar to the second in how it passes the | The final example is similar to the second in how it passes the | |||
| stateid sid2 generated by the LOCK operation to the next READ | stateid sid2 generated by the LOCK operation to the next READ | |||
| operation. This allows the client to explicitly surround a single | operation. This allows the client to explicitly surround a single | |||
| I/O operation with a lock and its appropriate stateid to guarantee | I/O operation with a lock and its appropriate stateid to guarantee | |||
| correctness with other client locks. The example also shows how | correctness with other client locks. The example also shows how | |||
| SAVEFH and RESTOREFH can save and later re-use a filehandle and | SAVEFH and RESTOREFH can save and later re-use a filehandle and | |||
| stateid, passing them as the current filehandle and stateid to a READ | stateid, passing them as the current filehandle and stateid to a READ | |||
| operation. | operation. | |||
| skipping to change at page 389, line 27 | skipping to change at page 388, line 27 | |||
| READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} | READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} | |||
| LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} | LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} | |||
| SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} | SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} | |||
| PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} | PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} | |||
| WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} | WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} | |||
| RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} | RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} | |||
| READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} | READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} | |||
| Figure 87 | Figure 88 | |||
| 16.2.4. ERRORS | 16.2.4. ERRORS | |||
| COMPOUND will of course return every error that each operation on the | COMPOUND will of course return every error that each operation on the | |||
| fore channel can return (see Table 12). However if COMPOUND returns | fore channel can return (see Table 12). However if COMPOUND returns | |||
| zero operations, obviously the error returned by COMPOUND has nothing | zero operations, obviously the error returned by COMPOUND has nothing | |||
| to do with an error returned by an operation. The list of errors | to do with an error returned by an operation. The list of errors | |||
| COMPOUND will return if it processes zero operations include: | COMPOUND will return if it processes zero operations include: | |||
| COMPOUND error returns | COMPOUND error returns | |||
| skipping to change at page 396, line 11 | skipping to change at page 395, line 11 | |||
| NFS is not going to be acceptable to some people. Historically, | NFS is not going to be acceptable to some people. Historically, | |||
| NFS servers have allowed a user to READ a file if the user has | NFS servers have allowed a user to READ a file if the user has | |||
| execute access to the file. | execute access to the file. | |||
| As a practical example, the UNIX specification [41] states that an | As a practical example, the UNIX specification [41] states that an | |||
| implementation claiming conformance to UNIX may indicate in the | implementation claiming conformance to UNIX may indicate in the | |||
| access() programming interface's result that a privileged user has | access() programming interface's result that a privileged user has | |||
| execute rights, even if no execute permission bits are set on the | execute rights, even if no execute permission bits are set on the | |||
| regular file's attributes. It is possible to claim conformance to | regular file's attributes. It is possible to claim conformance to | |||
| the UNIX specification and instead not indicate execute rights in | the UNIX specification and instead not indicate execute rights in | |||
| that situation, which is true for some operating enviroments. | that situation, which is true for some operating environments. | |||
| Suppose the operating environments of the client and server are | Suppose the operating environments of the client and server are | |||
| implementing the access() semantics for privileged users differently, | implementing the access() semantics for privileged users differently, | |||
| and the ACCESS operation implementations of the client and server | and the ACCESS operation implementations of the client and server | |||
| follow their respective access() semantics. This can cause undesired | follow their respective access() semantics. This can cause undesired | |||
| behavior: | behavior: | |||
| o Suppose the client's access() interface returns X_OK if the user | o Suppose the client's access() interface returns X_OK if the user | |||
| is privileged and no execute permission bits are set on the | is privileged and no execute permission bits are set on the | |||
| regular file's attribute, and the server's access() interface does | regular file's attribute, and the server's access() interface does | |||
| not return X_OK in that situation. Then the client will be unable | not return X_OK in that situation. Then the client will be unable | |||
| skipping to change at page 406, line 32 | skipping to change at page 405, line 32 | |||
| nfsstat4 status; | nfsstat4 status; | |||
| }; | }; | |||
| 18.5.3. DESCRIPTION | 18.5.3. DESCRIPTION | |||
| Purges all of the delegations awaiting recovery for a given client. | Purges all of the delegations awaiting recovery for a given client. | |||
| This is useful for clients which do not commit delegation information | This is useful for clients which do not commit delegation information | |||
| to stable storage to indicate that conflicting requests need not be | to stable storage to indicate that conflicting requests need not be | |||
| delayed by the server awaiting recovery of delegation information. | delayed by the server awaiting recovery of delegation information. | |||
| The client is NOT specified by the clientid field of the request. | ||||
| The client SHOULD set the client field to zero and the server MUST | ||||
| ignore the clientid field. Instead the server MUST derive the client | ||||
| ID from the value of the session id in the arguments of the SEQUENCE | ||||
| operation that precedes DELEGPURGE in the COMPOUND request. | ||||
| This operation should be used by clients that record delegation | This operation should be used by clients that record delegation | |||
| information on stable storage on the client. In this case, | information on stable storage on the client. In this case, | |||
| DELEGPURGE should be sent immediately after doing delegation recovery | DELEGPURGE should be sent immediately after doing delegation recovery | |||
| on all delegations known to the client. Doing so will notify the | on all delegations known to the client. Doing so will notify the | |||
| server that no additional delegations for the client will be | server that no additional delegations for the client will be | |||
| recovered allowing it to free resources, and avoid delaying other | recovered allowing it to free resources, and avoid delaying other | |||
| clients which make requests that conflict with the unrecovered | clients which make requests that conflict with the unrecovered | |||
| delegations. The set of delegations known to the server and the | delegations. The set of delegations known to the server and the | |||
| client may be different. The reason for this is that a client may | client may be different. The reason for this is that a client may | |||
| fail after making a request which resulted in delegation but before | fail after making a request which resulted in delegation but before | |||
| skipping to change at page 434, line 33 | skipping to change at page 433, line 33 | |||
| | CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally | | | CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally | | |||
| | | this is done as part of recalling a | | | | this is done as part of recalling a | | |||
| | | delegation. With CLAIM_DELEGATE_CUR, the | | | | delegation. With CLAIM_DELEGATE_CUR, the | | |||
| | | file is identified by the current | | | | file is identified by the current | | |||
| | | filehandle and the specified component | | | | filehandle and the specified component | | |||
| | | name. With CLAIM_DELEG_CUR_FH (new to | | | | name. With CLAIM_DELEG_CUR_FH (new to | | |||
| | | NFSv4.1), the file is identified by just | | | | NFSv4.1), the file is identified by just | | |||
| | | the current filehandle. | | | | the current filehandle. | | |||
| | CLAIM_DELEGATE_PREV, | The client is claiming a delegation | | | CLAIM_DELEGATE_PREV, | The client is claiming a delegation | | |||
| | CLAIM_DELEG_PREV_FH | granted to a previous client instance; | | | CLAIM_DELEG_PREV_FH | granted to a previous client instance; | | |||
| | | used after the client restarts. The | | | | used after the client restarts. The server | | |||
| | | server MAY support CLAIM_DELEGATE_PREV or | | | | MAY support CLAIM_DELEGATE_PREV or | | |||
| | | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | | | | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | | |||
| | | it does support either open type, | | | | it does support either open type, | | |||
| | | CREATE_SESSION MUST NOT remove the | | | | CREATE_SESSION MUST NOT remove the | | |||
| | | client's delegation state, and the server | | | | client's delegation state, and the server | | |||
| | | MUST support the DELEGPURGE operation. | | | | MUST support the DELEGPURGE operation. | | |||
| +----------------------+--------------------------------------------+ | +----------------------+--------------------------------------------+ | |||
| For OPEN requests that reach the server during the grace period, the | For OPEN requests that reach the server during the grace period, the | |||
| server returns an error of NFS4ERR_GRACE. The following claim types | server returns an error of NFS4ERR_GRACE. The following claim types | |||
| are exceptions: | are exceptions: | |||
| skipping to change at page 466, line 44 | skipping to change at page 465, line 44 | |||
| The SECINFO operation is expected to be used by the NFS client when | The SECINFO operation is expected to be used by the NFS client when | |||
| the error value of NFS4ERR_WRONGSEC is returned from another NFS | the error value of NFS4ERR_WRONGSEC is returned from another NFS | |||
| operation. This signifies to the client that the server's security | operation. This signifies to the client that the server's security | |||
| policy is different from what the client is currently using. At this | policy is different from what the client is currently using. At this | |||
| point, the client is expected to obtain a list of possible security | point, the client is expected to obtain a list of possible security | |||
| flavors and choose what best suits its policies. | flavors and choose what best suits its policies. | |||
| As mentioned, the server's security policies will determine when a | As mentioned, the server's security policies will determine when a | |||
| client request receives NFS4ERR_WRONGSEC. See Table 14 for a list | client request receives NFS4ERR_WRONGSEC. See Table 14 for a list | |||
| operations which can return NFS4ERR_WRONGSEC. In addition, when | operations which can return NFS4ERR_WRONGSEC. In addition, when | |||
| READDIR returns attributes, the rdaddr_error (Section 5.8.1.12) can | READDIR returns attributes, the rdattr_error (Section 5.8.1.12) can | |||
| contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT | contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT | |||
| return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the | return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the | |||
| target name exists it cannot have a separate security policy from the | target name exists it cannot have a separate security policy from the | |||
| parent directory, and the security policy of the parent was checked | parent directory, and the security policy of the parent was checked | |||
| when its filehandle was injected into the COMPOUND request's | when its filehandle was injected into the COMPOUND request's | |||
| operations stream (for similar reasons, an OPEN operation that | operations stream (for similar reasons, an OPEN operation that | |||
| creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target | creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target | |||
| name exists, while it might have a separate security policy, that is | name exists, while it might have a separate security policy, that is | |||
| irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale | irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale | |||
| for REMOVE is that while that target might have separate security | for REMOVE is that while that target might have separate security | |||
| skipping to change at page 504, line 45 | skipping to change at page 503, line 45 | |||
| records introduced in the description of EXCHANGE_ID is used with the | records introduced in the description of EXCHANGE_ID is used with the | |||
| following addition: | following addition: | |||
| clientid_arg: The value of the csa_clientid field of the | clientid_arg: The value of the csa_clientid field of the | |||
| CREATE_SESSION4args structure of the current request. | CREATE_SESSION4args structure of the current request. | |||
| Since CREATE_SESSION is a non-idempotent operation, we must consider | Since CREATE_SESSION is a non-idempotent operation, we must consider | |||
| the possibility that retries may occur as a result of a client | the possibility that retries may occur as a result of a client | |||
| restart, network partition, malfunctioning router, etc. For each | restart, network partition, malfunctioning router, etc. For each | |||
| client ID created by EXCHANGE_ID, the server maintains a separate | client ID created by EXCHANGE_ID, the server maintains a separate | |||
| reply cache similar to the session reply cache used for SEQUENCE | reply cache (called the CREATE_SESSION reply cache) similar to the | |||
| operations, with two distinctions. | session reply cache used for SEQUENCE operations, with two | |||
| distinctions. | ||||
| o First this is a reply cache just for detecting and processing | o First this is a reply cache just for detecting and processing | |||
| CREATE_SESSION requests for a given client ID. | CREATE_SESSION requests for a given client ID. | |||
| o Second, the size of the client ID reply cache is of one slot (and | o Second, the size of the client ID reply cache is of one slot (and | |||
| as a result, the CREATE_SESSION request does not carry a slot | as a result, the CREATE_SESSION request does not carry a slot | |||
| number). This means that at most one CREATE_SESSION request for a | number). This means that at most one CREATE_SESSION request for a | |||
| given client ID can be outstanding. | given client ID can be outstanding. | |||
| As previously stated, CREATE_SESSION can be sent with or without a | ||||
| preceding SEQUENCE operation. Even if SEQUENCE precedes | ||||
| CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply | ||||
| cache, which is separate from the reply cache for the session | ||||
| associated with SEQUENCE. If CREATE_SESSION was originally sent by | ||||
| itself, the client MAY send a retry of the CREATE_SESSION operation | ||||
| within a COMPOUND preceded by SEQUENCE. If CREATE_SESSION was | ||||
| originally sent in a COMPOUND that started with SEQUENCE, then the | ||||
| client SHOULD send a retry in a COMPOUND that starts with SEQUENCE | ||||
| that has the same session id as the SEQUENCE of the original request. | ||||
| However, the client MAY send a retry in a COMPOUND that either has no | ||||
| preceding SEQUENCE, or has a preceding SEQUENCE that refers to a | ||||
| different session than the original CREATE_SESSION. This might be | ||||
| necessary if the client sends a CREATE_SESSION in a COMPOUND preceded | ||||
| by a SEQUENCE with session id X, and session X no longer exists. | ||||
| Regardless, any retry of CREATE_SESSION, with or without a preceding | ||||
| SEQUENCE, MUST use the same value of csa_sequence as the original. | ||||
| When a client sends a successful EXCHANGE_ID and it is returned an | When a client sends a successful EXCHANGE_ID and it is returned an | |||
| unconfirmed client ID, the client is also returned eir_sequenceid, | unconfirmed client ID, the client is also returned eir_sequenceid, | |||
| and the client is expected to set the value of csa_sequenceid in the | and the client is expected to set the value of csa_sequenceid in the | |||
| client ID-confirming-CREATE_SESSION it sends with that client ID to | client ID-confirming-CREATE_SESSION it sends with that client ID to | |||
| the value of eir_sequenceid. When EXCHANGE_ID returns a new, | the value of eir_sequenceid. When EXCHANGE_ID returns a new, | |||
| unconfirmed client ID, the server initializes the client ID slot to | unconfirmed client ID, the server initializes the client ID slot to | |||
| be equal to eir_sequenceid - 1 (accounting for underflow), and | be equal to eir_sequenceid - 1 (accounting for underflow), and | |||
| records a contrived CREATE_SESSION result with a "cached" result of | records a contrived CREATE_SESSION result with a "cached" result of | |||
| NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the | NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the | |||
| processing of the CREATE_SESSION operation is divided into four | processing of the CREATE_SESSION operation is divided into four | |||
| skipping to change at page 522, line 51 | skipping to change at page 521, line 51 | |||
| the sessionid in the preceding SEQUENCE operation), current | the sessionid in the preceding SEQUENCE operation), current | |||
| filehandle, layout type (loga_layout_type), and the layout stateid | filehandle, layout type (loga_layout_type), and the layout stateid | |||
| (loga_stateid). The use of the loga_iomode field depends upon the | (loga_stateid). The use of the loga_iomode field depends upon the | |||
| layout type, but should reflect the client's data access intent. | layout type, but should reflect the client's data access intent. | |||
| If the metadata server is in a grace period, and does not persist | If the metadata server is in a grace period, and does not persist | |||
| layouts and device ID to device address mappings, then it MUST return | layouts and device ID to device address mappings, then it MUST return | |||
| NFS4ERR_GRACE (see Section 8.4.2.1). | NFS4ERR_GRACE (see Section 8.4.2.1). | |||
| The LAYOUTGET operation returns layout information for the specified | The LAYOUTGET operation returns layout information for the specified | |||
| byte range: a layout. To get a layout from a specific offset through | byte range: a layout. The client actually specifies two ranges, both | |||
| the end-of-file, regardless of the file's length, a loga_length field | starting at the offset in the loga_offset field. The first range is | |||
| set to NFS4_UINT64_MAX is used. If loga_length is zero, or if a | between loga_offset and loga_offset + loga_length - 1 inclusive. | |||
| loga_length which is not NFS4_UINT64_MAX is specified, and the sum of | This range indicates the desired range the client wants the layout to | |||
| loga_length and loga_offset exceeds NFS4_UINT64_MAX, the error | cover. The second range is between loga_offset and loga_offset + | |||
| NFS4ERR_INVAL will result. | loga_minlength - 1 inclusive. This range indicates the required | |||
| range the client needs the layout to cover. Thus, loga_minlength | ||||
| MUST be less than or equal to loga_length. | ||||
| The loga_minlength field specifies the minimum length of layout the | When a length field is set to NFS4_UINT64_MAX, this indicates a | |||
| server MUST return with two exceptions: | desire (when loga_length is NFS4_UINT64_MAX) or requirement (when | |||
| loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset | ||||
| through the end-of-file, regardless of the file's length. | ||||
| 1. The argument loga_iomode was set to LAYOUTIOMODE_READ, and | The following rules govern the relationships among, and the minima of | |||
| loga_offset plus loga_minlength goes past the end of the file. | loga_length, loga_minlength, and loga_offset. | |||
| 2. The range from loga_offset through loga_offset + loga_minlength - | o If loga_length is less than loga_minlength, the metadata server | |||
| 1 overlaps two or more striping patterns. In which case, | MUST return NFS4ERR_INVAL. | |||
| logr_layout will contain two or more elements, and the sum of the | ||||
| lo_length fields of each element MUST be at least loga_minlength | ||||
| unless the first exception also applies. | ||||
| If this requirement cannot be met, the server MUST NOT return a | o If loga_minlength is zero, this is an indication to the metadata | |||
| layout and the error NFS4ERR_BADLAYOUT MUST be returned. | server that the client desires any layout at offset loga_offset or | |||
| less that the metadata server has "readily available". Readily is | ||||
| subjective, and depends on the layout type and the pNFS server | ||||
| implementation. For example, some metadata servers might have to | ||||
| pre-allocate stable storage when they receive a request for a | ||||
| range of a file that goes beyond the file's current length. If | ||||
| loga_minlength is zero and loga_length is greater than zero, this | ||||
| tells the metadata server what range of the layout the client | ||||
| would prefer to have. If loga_length and loga_minlength are both | ||||
| zero, then the client is indicating it desires a layout of any | ||||
| length with the ending offset of the range no less than specified | ||||
| loga_offset, and the starting offset at or below loga_offset. If | ||||
| the metadata server does not have a layout that is readily | ||||
| available, then it MUST return return NFS4ERR_LAYOUTTRYLATER. | ||||
| o If the sum of loga_offset and loga_minlength exceeds | ||||
| NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the | ||||
| error NFS4ERR_INVAL MUST result. | ||||
| o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX, | ||||
| and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL | ||||
| MUST result. | ||||
| After the metadata server has performed the above checks on | ||||
| loga_offset, loga_minlength, and loga_offset, the metadata server | ||||
| MUST return a layout according to the rules in Table 21. | ||||
| Acceptable layouts based on loga_minlength. Note: u64m = | ||||
| NFS4_UINT64_MAX; a_off = loga_offset; a_minlen = loga_minlength. | ||||
| +-----------+-----------+----------+----------+---------------------+ | ||||
| | Layout | Layout | Layout | Layout | Layout length of | | ||||
| | iomode of | a_minlen | iomode | offset | reply | | ||||
| | request | of | of reply | of reply | | | ||||
| | | request | | | | | ||||
| +-----------+-----------+----------+----------+---------------------+ | ||||
| | _READ | u64m | MAY be | MUST be | MUST be >= file | | ||||
| | | | _READ | <= a_off | length - layout | | ||||
| | | | | | offset | | ||||
| | _READ | u64m | MAY be | MUST be | MUST be u64m | | ||||
| | | | _RW | <= a_off | | | ||||
| | _READ | > 0 and < | MAY be | MUST be | MUST be >= MIN(file | | ||||
| | | u64m | _READ | <= a_off | length, a_minlen + | | ||||
| | | | | | a_off) - layout | | ||||
| | | | | | offset | | ||||
| | _READ | > 0 and < | MAY be | MUST be | MUST be >= a_off - | | ||||
| | | u64m | _RW | <= a_off | layout offset + | | ||||
| | | | | | a_minlen | | ||||
| | _READ | 0 | MAY be | MUST be | MUST be > 0 | | ||||
| | | | _READ | <= a_off | | | ||||
| | _READ | 0 | MAY be | MUST be | MUST be > 0 | | ||||
| | | | _RW | <= a_off | | | ||||
| | _RW | u64m | MUST be | MUST be | MUST be u64m | | ||||
| | | | _RW | <= a_off | | | ||||
| | _RW | > 0 and < | MUST be | MUST be | MUST be >= a_off - | | ||||
| | | u64m | _RW | <= a_off | layout offset + | | ||||
| | | | | | a_minlen | | ||||
| | _RW | 0 | MUST be | MUST be | MUST be > 0 | | ||||
| | | | _RW | <= a_off | | | ||||
| +-----------+-----------+----------+----------+---------------------+ | ||||
| Table 21 | ||||
| If loga_minlength is not zero and the metadata server cannot return a | ||||
| layout according to the rules in Table 21, then the metadata server | ||||
| MUST return the error NFS4ERR_BADLAYOUT. If loga_minlength is zero | ||||
| and the metadata server cannot or will not return a layout according | ||||
| to the rules in Table 21, then the metadata server MUST return the | ||||
| error NFS4ERR_LAYOUTTRYLATER. Assuming loga_length is greater than | ||||
| loga_minlength or equal to zero, the metadata server SHOULD return a | ||||
| layout according to the rules in Table 22. | ||||
| Desired layouts based on loga_length. The rules of Table 21 MUST be | ||||
| applied first. Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset; | ||||
| a_len = loga_length. | ||||
| +------------+------------+-----------+-----------+-----------------+ | ||||
| | Layout | Layout | Layout | Layout | Layout length | | ||||
| | iomode of | a_len of | iomode of | offset of | of reply | | ||||
| | request | request | reply | reply | | | ||||
| +------------+------------+-----------+-----------+-----------------+ | ||||
| | _READ | u64m | MAY be | MUST be | SHOULD be u64m | | ||||
| | | | _READ | <= a_off | | | ||||
| | _READ | u64m | MAY be | MUST be | SHOULD be u64m | | ||||
| | | | _RW | <= a_off | | | ||||
| | _READ | > 0 and < | MAY be | MUST be | SHOULD be >= | | ||||
| | | u64m | _READ | <= a_off | a_off - layout | | ||||
| | | | | | offset + a_len | | ||||
| | _READ | > 0 and < | MAY be | MUST be | SHOULD be >= | | ||||
| | | u64m | _RW | <= a_off | a_off - layout | | ||||
| | | | | | offset + a_len | | ||||
| | _READ | 0 | MAY be | MUST be | SHOULD be > | | ||||
| | | | _READ | <= a_off | a_off - layout | | ||||
| | | | | | offset | | ||||
| | _READ | 0 | MAY be | MUST be | SHOULD be > | | ||||
| | | | _READ | <= a_off | a_off - layout | | ||||
| | | | | | offset | | ||||
| | _RW | u64m | MUST be | MUST be | SHOULD be u64m | | ||||
| | | | _RW | <= a_off | | | ||||
| | _RW | > 0 and < | MUST be | MUST be | SHOULD be >= | | ||||
| | | u64m | _RW | <= a_off | a_off - layout | | ||||
| | | | | | offset + a_len | | ||||
| | _RW | 0 | MUST be | MUST be | SHOULD be > | | ||||
| | | | _RW | <= a_off | a_off - layout | | ||||
| | | | | | offset | | ||||
| +------------+------------+-----------+-----------+-----------------+ | ||||
| Table 22 | ||||
| The loga_stateid field specifies a valid stateid. If a layout is not | The loga_stateid field specifies a valid stateid. If a layout is not | |||
| currently held by the client, the loga_stateid field represents a | currently held by the client, the loga_stateid field represents a | |||
| stateid reflecting the correspondingly valid open, byte-range lock, | stateid reflecting the correspondingly valid open, byte-range lock, | |||
| or delegation stateid. Once a layout is held by the client for the | or delegation stateid. Once a layout is held on the file by the | |||
| file, the loga_stateid field is a stateid as returned from a previous | client, the loga_stateid field MUST be a stateid as returned from a | |||
| LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL | previous LAYOUTGET or LAYOUTRETURN operation or provided by a | |||
| operation (see Section 12.5.3). | CB_LAYOUTRECALL operation (see Section 12.5.3). | |||
| The loga_maxcount field specifies the maximum layout size (in bytes) | The loga_maxcount field specifies the maximum layout size (in bytes) | |||
| that the client can handle. If the size of the layout structure | that the client can handle. If the size of the layout structure | |||
| exceeds the size specified by maxcount, the metadata server will | exceeds the size specified by maxcount, the metadata server will | |||
| return the NFS4ERR_TOOSMALL error. | return the NFS4ERR_TOOSMALL error. | |||
| The returned layout is expressed as an array, logr_layout, with each | The returned layout is expressed as an array, logr_layout, with each | |||
| element of type layout4. If a file has a single striping pattern, | element of type layout4. If a file has a single striping pattern, | |||
| then logr_layout will contain just one entry. Otherwise, if the | then logr_layout SHOULD contain just one entry. Otherwise, if the | |||
| requested range overlaps more than one striping pattern, logr_layout | requested range overlaps more than one striping pattern, logr_layout | |||
| will contain the required number of entries. The elements of | will contain the required number of entries. The elements of | |||
| logr_layout MUST be sorted in ascending order of the value of the | logr_layout MUST be sorted in ascending order of the value of the | |||
| lo_offset field of each element. There MUST be no gaps or overlaps | lo_offset field of each element. There MUST be no gaps or overlaps | |||
| in the range between two successive elements of logr_layout. The | in the range between two successive elements of logr_layout. The | |||
| lo_iomode field in each element of logr_layout MUST be the same. | lo_iomode field in each element of logr_layout MUST be the same. | |||
| The metadata server may adjust the range of the returned layout based | Table 21 and Table 22 both refer to a returned layout iomode, offset, | |||
| on the usage implied by the loga_iomode. The client MUST be prepared | and length. Because the returned layout is encoded in the | |||
| to get a layout that does not align exactly with its request. See | logr_layout array, more description is required. | |||
| Section 12.5.2 for more details. | ||||
| The metadata server may also return a layout with an lo_iomode other | iomode | |||
| than that requested by the client. If it does so, it MUST ensure | ||||
| that the lo_iomode is more permissive than the loga_iomode requested. | The value of the returned layout iomode listed in Table 21 and | |||
| For example, this behavior allows an implementation to upgrade read- | Table 22 is equal to the value of the lo_iomode field in each | |||
| only requests to read/write requests at its discretion, within the | element of logr_layout. As shown in Table 21 and Table 22, the | |||
| limits of the layout type specific protocol. A lo_iomode of either | metadata server MAY return a layout with an lo_iomode different | |||
| LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned. | from the requested iomode (field loga_iomode of the request). If | |||
| it does so, it MUST ensure that the lo_iomode is more permissive | ||||
| than the loga_iomode requested. For example, this behavior allows | ||||
| an implementation to upgrade read-only requests to read/write | ||||
| requests at its discretion, within the limits of the layout type | ||||
| specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or | ||||
| LAYOUTIOMODE4_RW MUST be returned. | ||||
| offset | ||||
| The value of the returned layout offset listed in Table 21 and | ||||
| Table 22 is always equal to the lo_offset field of the first | ||||
| element logr_layout. | ||||
| length | ||||
| When setting the value of the returned layout length, the | ||||
| situation is complicated by the possibility that the special | ||||
| layout length value NFS4_UINT64_MAX is involved. For a | ||||
| logr_layout array of N elements, the lo_length field in the first | ||||
| N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of | ||||
| the last element of logr_layout can be NFS4_UINT64_MAX under some | ||||
| conditions as described in the following list. | ||||
| * If an applicable rule of Table 21 states the metadata server | ||||
| MUST return a layout of length NFS4_UINT64_MAX, then lo_length | ||||
| field of the last element of logr_layout MUST be | ||||
| NFS4_UINT64_MAX. | ||||
| * If an applicable rule of Table 21 states the metadata server | ||||
| MUST NOT return a layout of length NFS4_UINT64_MAX, then | ||||
| lo_length field of the last element of logr_layout MUST NOT be | ||||
| NFS4_UINT64_MAX. | ||||
| * If an applicable rule of Table 22 states the metadata server | ||||
| SHOULD return a layout of length NFS4_UINT64_MAX, then | ||||
| lo_length field of the last element of logr_layout SHOULD be | ||||
| NFS4_UINT64_MAX. | ||||
| * When the value of the returned layout length of Table 21 and | ||||
| Table 22 is not NFS4_UINT64_MAX, then the returned layout | ||||
| length is equal to the sum of the lo_length fields of each | ||||
| element of logr_layout. | ||||
| The logr_return_on_close result field is a directive to return the | The logr_return_on_close result field is a directive to return the | |||
| layout before closing the file. When the server sets this return | layout before closing the file. When the metadata server sets this | |||
| value to TRUE, it MUST be prepared to recall the layout in the case | return value to TRUE, it MUST be prepared to recall the layout in the | |||
| the client fails to return the layout before close. For the server | case the client fails to return the layout before close. For the | |||
| that knows a layout must be returned before a close of the file, this | metadata server that knows a layout must be returned before a close | |||
| return value can be used to communicate the desired behavior to the | of the file, this return value can be used to communicate the desired | |||
| client and thus remove one extra step from the client's and server's | behavior to the client and thus remove one extra step from the | |||
| interaction. | client's and metadata server's interaction. | |||
| The logr_stateid stateid is returned to the client for use in | The logr_stateid stateid is returned to the client for use in | |||
| subsequent layout related operations. See Section 8.2, | subsequent layout related operations. See Section 8.2, | |||
| Section 12.5.3, and Section 12.5.5.2 for a further discussion and | Section 12.5.3, and Section 12.5.5.2 for a further discussion and | |||
| requirements. | requirements. | |||
| The format of the returned layout (lo_content) is specific to the | The format of the returned layout (lo_content) is specific to the | |||
| layout type. The value of the layout type (lo_content.loc_type) for | layout type. The value of the layout type (lo_content.loc_type) for | |||
| each of the elements of the array of layouts returned by the server | each of the elements of the array of layouts returned by the metadata | |||
| (logr_layout) MUST be equal to the loga_layout_type specified by the | server (logr_layout) MUST be equal to the loga_layout_type specified | |||
| client. If it is not equal, the client SHOULD ignore the response as | by the client. If it is not equal, the client SHOULD ignore the | |||
| invalid and behave as if the server returned an error, even if the | response as invalid and behave as if the metadata server returned an | |||
| client does have support for the layout type returned. | error, even if the client does have support for the layout type | |||
| returned. | ||||
| If layouts are not supported for the requested file or its containing | If layouts are not supported for the requested file or its containing | |||
| file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If | file system the metadata server MUST return | |||
| the layout type is not supported, the metadata server should return | NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, the | |||
| NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout | metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts | |||
| matches the client provided layout identification, the server should | are supported but no layout matches the client provided layout | |||
| return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or | identification, the metadata server MUST return NFS4ERR_BADLAYOUT. | |||
| a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should | If an invalid loga_iomode is specified, or a loga_iomode of | |||
| return NFS4ERR_BADIOMODE. | LAYOUTIOMODE4_ANY is specified, the metadata server MUST return | |||
| NFS4ERR_BADIOMODE. | ||||
| If the layout for the file is unavailable due to transient | If the layout for the file is unavailable due to transient | |||
| conditions, e.g. file sharing prohibits layouts, the server MUST | conditions, e.g. file sharing prohibits layouts, the metadata server | |||
| return NFS4ERR_LAYOUTTRYLATER. | MUST return NFS4ERR_LAYOUTTRYLATER. | |||
| If the layout request is rejected due to an overlapping layout | If the layout request is rejected due to an overlapping layout | |||
| recall, the server MUST return NFS4ERR_RECALLCONFLICT. See | recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See | |||
| Section 12.5.5.2 for details. | Section 12.5.5.2 for details. | |||
| If the layout conflicts with a mandatory byte range lock held on the | If the layout conflicts with a mandatory byte range lock held on the | |||
| file, and if the storage devices have no method of enforcing | file, and if the storage devices have no method of enforcing | |||
| mandatory locks, other than through the restriction of layouts, the | mandatory locks, other than through the restriction of layouts, the | |||
| metadata server should return NFS4ERR_LOCKED. | metadata server SHOULD return NFS4ERR_LOCKED. | |||
| If client sets loga_signal_layout_avail to TRUE, then it is | If client sets loga_signal_layout_avail to TRUE, then it is | |||
| registering with the client a "want" for a layout in the event the | registering with the client a "want" for a layout in the event the | |||
| layout cannot be obtained due to resource exhaustion. If the server | layout cannot be obtained due to resource exhaustion. If the | |||
| supports and will honor the "want", the results will have | metadata server supports and will honor the "want", the results will | |||
| logr_will_signal_layout_avail set to TRUE. If so the client should | have logr_will_signal_layout_avail set to TRUE. If so the client | |||
| expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a layout | should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a | |||
| is available. | layout is available. | |||
| On success, the current filehandle retains its value and the current | On success, the current filehandle retains its value and the current | |||
| stateid is updated to match the value as returned in the results. | stateid is updated to match the value as returned in the results. | |||
| 18.43.4. IMPLEMENTATION | 18.43.4. IMPLEMENTATION | |||
| Typically, LAYOUTGET will be called as part of a COMPOUND request | Typically, LAYOUTGET will be called as part of a COMPOUND request | |||
| after an OPEN operation and results in the client having location | after an OPEN operation and results in the client having location | |||
| information for the file; this requires that loga_stateid be set to | information for the file; this requires that loga_stateid be set to | |||
| the special stateid that tells the server to use the current stateid, | the special stateid that tells the metadata server to use the current | |||
| which is set by OPEN (see Section 16.2.3.1.2) . A client may also | stateid, which is set by OPEN (see Section 16.2.3.1.2) . A client | |||
| hold a layout across multiple OPENs. The client specifies a layout | may also hold a layout across multiple OPENs. The client specifies a | |||
| type that limits what kind of layout the server will return. This | layout type that limits what kind of layout the metadata server will | |||
| prevents servers from issuing layouts that are unusable by the | return. This prevents metadata servers from granting layouts that | |||
| client. | are unusable by the client. | |||
| As indicated by Table 21 and Table 22 the specification of LAYOUTGET | ||||
| allows a pNFS client and server considerable flexibility. A pNFS | ||||
| client can take several strategies for sending LAYOUTGET. Some | ||||
| examples are as follows. | ||||
| o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and | ||||
| the OPEN requests read access, the client might opt to request a | ||||
| _READ layout with loga_offset set to zero, loga_minlength set to | ||||
| zero, and loga_length set to NFS4_UINT64_MAX. If the file has | ||||
| space allocated to it, that space is striped over one or more | ||||
| storage devices, and there is either no conflicting layout, or the | ||||
| concept of a conflicting layout does not apply to the pNFS | ||||
| server's layout type or implementation, then the metadata server | ||||
| might return a layout with a starting offset of zero, and a length | ||||
| equal to the length of the file, if not NFS4_UINT64_MAX. If the | ||||
| length of the file is not a multiple of the pNFS server's stripe | ||||
| width (see Section 13.2 for a formal definition), the metadata | ||||
| server might round the returned layout's length up. | ||||
| o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and | ||||
| the OPEN does not truncate the file, and requests write access, | ||||
| the client might opt to request a _RW layout with loga_offset set | ||||
| to zero, loga_minlength set to zero, and loga_length set to the | ||||
| file's current length (if known), or NFS4_UINT64_MAX. As with the | ||||
| previous case, under some conditions the metadata server might | ||||
| return a layout that covers the entire length of the file or | ||||
| beyond. | ||||
| o As above, but the OPEN truncates the file. In this case, client | ||||
| might anticipate it will be writing to the file from offset zero, | ||||
| and so loga_offset and loga_minlength are set to zero, and | ||||
| loga_length is set to the value of threshold4_write_iosize. The | ||||
| metadata server might return a layout from offset zero with a | ||||
| length at least as long as as threshold4_write_iosize. | ||||
| o A process on the client invokes a request to read from offset | ||||
| 10000 for length 50000. The client is using buffered I/O, and has | ||||
| buffer sizes of 4096 bytes. The client intends to map the request | ||||
| of the process into a series of READ requests starting at offset | ||||
| 8192. The end offset needs to be higher than 10000 + 50000 = | ||||
| 60000, and the next offset that is a multiple of 4096 is 61440. | ||||
| The difference between 61440 and that starting offset of the | ||||
| layout is 53248 (which is the product of 4096 and 15). The value | ||||
| of threshold4_read_iosize is less than 53248, so the client sends | ||||
| a LAYOUTGET request with loga_offset set to 8192, loga_minlength | ||||
| set to 53248, and loga_length set to the file's length (if known) | ||||
| minus 8192 or NFS4_UINT64_MAX (if the file's length is not known). | ||||
| Since this LAYOUTGET request exceeds the metadata server's | ||||
| threshold, it grants the layout, possibly with an initial offset | ||||
| of 0, with an end offset of at least 8192 + 53248 - 1 = 61439, but | ||||
| preferably a layout with an offset aligned on the stripe width and | ||||
| a length that is a multiple of the stripe width. | ||||
| o As above, but the client is not using buffered I/O, and instead | ||||
| all internal I/O requests are sent directly to the server. The | ||||
| LAYOUTGET request has loga_offset equal to 10000, and | ||||
| loga_minlength set to 50000. The value of loga_length is set to | ||||
| the length of the file. The metadata server is free to return a | ||||
| layout that fully overlaps the requested range, with a starting | ||||
| offset and length aligned on the stripe width. | ||||
| o Again a process on the client invokes a request to read from | ||||
| offset 10000 for length 50000, and buffered I/O is in use. The | ||||
| client is expecting that the server might not be able to return | ||||
| the layout for the full I/O range, with loga_offset set to 8192 | ||||
| and loga_minlength set to 53248. The client intends to map the | ||||
| request of the process into a series of READ requests starting at | ||||
| offset 8192, each with length 4096, with a total length of 53248 | ||||
| (which equals 13 * 4096). Because the value of | ||||
| threshold4_read_iosize is equal to 4096, it is practical and | ||||
| reasonable for the client to use several LAYOUTGETs to complete | ||||
| the series of READs. The client sends a LAYOUTGET request with | ||||
| loga_offset set to 8192, loga_minlength set to 4096, and | ||||
| loga_length set to 53248 or higher. The server will grant a | ||||
| layout possibly with an initial offset of 0, with an end offset of | ||||
| at least 8192 + 4096 - 1 = 12287, but preferably a layout with an | ||||
| offset aligned on the stripe width and a length that is a multiple | ||||
| of the stripe width. This will allow the client to make forward | ||||
| progress, possibly having to issue more LAYOUTGET requests for the | ||||
| remainder of the range. | ||||
| o An NFS client detects a sequential read pattern, and so issues a | ||||
| LAYOUTGET that goes well beyond any current or pending read | ||||
| requests to the server. The server might likewise detect this | ||||
| pattern, and grant the LAYOUTGET request. The client continues to | ||||
| send LAYOUTGET requests once it has read from an offset of the | ||||
| file that represents 50% of the way through the last layout it | ||||
| received. | ||||
| o As above but the client fails to detect the pattern, but the | ||||
| server does. The next time the metadata server gets a LAYOUTGET, | ||||
| it returns a layout with a length that is well beyond | ||||
| loga_minlength. | ||||
| o A client is using buffered I/O, and has a long queue of write | ||||
| behinds to process and also detects a sequential write pattern. | ||||
| It issues a LAYOUTGET for a layout that spans the range of the | ||||
| queued write behinds and well beyond, including ranges beyond the | ||||
| filer's current length. The client continues to issue LAYOUTGETs | ||||
| once the write behind queue reaches 50% of the maximum queue | ||||
| length. | ||||
| Once the client has obtained a layout referring to a particular | Once the client has obtained a layout referring to a particular | |||
| device ID, the server MUST NOT delete the device ID until the layout | device ID, the metadata server MUST NOT delete the device ID until | |||
| is returned or revoked. | the layout is returned or revoked. | |||
| CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is | CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is | |||
| that LAYOUTGET returns a device ID the client does not have device | that LAYOUTGET returns a device ID the client does not have device | |||
| address mappings for, and the server sends a CB_NOTIFY_DEVICEID to | address mappings for, and the metadata server sends a | |||
| add the device ID to the client's awareness and meanwhile the client | CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and | |||
| sends GETDEVICEINFO on the device ID. This scenario is discussed in | meanwhile the client sends GETDEVICEINFO on the device ID. This | |||
| Section 18.40.4. Another scenario is that the CB_NOTIFY_DEVICEID is | scenario is discussed in Section 18.40.4. Another scenario is that | |||
| processed by the client before it processes the results from | the CB_NOTIFY_DEVICEID is processed by the client before it processes | |||
| LAYOUTGET. The client will send a GETDEVICEINFO on the device ID. | the results from LAYOUTGET. The client will send a GETDEVICEINFO on | |||
| If the results from GETDEVICEINFO are received before the client gets | the device ID. If the results from GETDEVICEINFO are received before | |||
| results from LAYTOUTGET, then there is no longer a race. If the | the client gets results from LAYTOUTGET, then there is no longer a | |||
| results from LAYOUTGET are received before the results from | race. If the results from LAYOUTGET are received before the results | |||
| GETDEVICEINFO, the client can either wait for results of | from GETDEVICEINFO, the client can either wait for results of | |||
| GETDEVICEINFO, or send another one to get possibly more up to date | GETDEVICEINFO, or send another one to get possibly more up to date | |||
| device address mappings for the device ID. | device address mappings for the device ID. | |||
| 18.44. Operation 51: LAYOUTRETURN - Release Layout Information | 18.44. Operation 51: LAYOUTRETURN - Release Layout Information | |||
| 18.44.1. ARGUMENT | 18.44.1. ARGUMENT | |||
| /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ | /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ | |||
| const LAYOUT4_RET_REC_FILE = 1; | const LAYOUT4_RET_REC_FILE = 1; | |||
| const LAYOUT4_RET_REC_FSID = 2; | const LAYOUT4_RET_REC_FSID = 2; | |||
| skipping to change at page 537, line 15 | skipping to change at page 541, line 15 | |||
| If SEQUENCE returns an error, then the state of the slot (sequence | If SEQUENCE returns an error, then the state of the slot (sequence | |||
| id, cached reply) MUST NOT change, and the associated lease MUST NOT | id, cached reply) MUST NOT change, and the associated lease MUST NOT | |||
| be renewed. | be renewed. | |||
| If SEQUENCE returns NFS4_OK, then the associated lease MUST be | If SEQUENCE returns NFS4_OK, then the associated lease MUST be | |||
| renewed (see Section 8.3), except if | renewed (see Section 8.3), except if | |||
| SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. | SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. | |||
| 18.46.4. IMPLEMENTATION | 18.46.4. IMPLEMENTATION | |||
| The server MUST maintain a mapping of sessionid to client ID in order | The server MUST maintain a mapping of session id to client ID in | |||
| to validate any operations that follow SEQUENCE that take a stateid | order to validate any operations that follow SEQUENCE that take a | |||
| as an argument and/or result. | stateid as an argument and/or result. | |||
| If the client establishes a persistent session, then a SEQUENCE done | If the client establishes a persistent session, then a SEQUENCE done | |||
| after a server restart may encounter requests performed and recorded | after a server restart may encounter requests performed and recorded | |||
| in a persistent reply cache before the server restart. In this case, | in a persistent reply cache before the server restart. In this case, | |||
| SEQUENCE will be processed successfully, while requests which were | SEQUENCE will be processed successfully, while requests which were | |||
| not processed previously are rejected with NFS4ERR_DEADSESSION. | not processed previously are rejected with NFS4ERR_DEADSESSION. | |||
| Depending on which of the operations within the COMPOUND were | Depending on which of the operations within the COMPOUND were | |||
| successfully performed before the server restart, these operations | successfully performed before the server restart, these operations | |||
| will also have replies sent from the server reply cache. Note that | will also have replies sent from the server reply cache. Note that | |||
| skipping to change at page 547, line 4 | skipping to change at page 551, line 4 | |||
| Once a RECLAIM_COMPLETE is done, there can be no further reclaim | Once a RECLAIM_COMPLETE is done, there can be no further reclaim | |||
| operations for locks whose scope is defined as having completed | operations for locks whose scope is defined as having completed | |||
| recovery. Once the client sends RECLAIM_COMPLETE, the server will | recovery. Once the client sends RECLAIM_COMPLETE, the server will | |||
| not allow the client to do subsequent reclaims of locking state for | not allow the client to do subsequent reclaims of locking state for | |||
| that scope and if these are attempted, will return NFS4ERR_NO_GRACE. | that scope and if these are attempted, will return NFS4ERR_NO_GRACE. | |||
| Whenever a client establishes a new client ID and before it does the | Whenever a client establishes a new client ID and before it does the | |||
| first non-reclaim operation that obtains a lock, it MUST do a global | first non-reclaim operation that obtains a lock, it MUST do a global | |||
| RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- | RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- | |||
| reclaim locking operations are done before the RECLAIM_COMPLETE, a | reclaim locking operations are done before the RECLAIM_COMPLETE, an | |||
| NFS4ERR_GRACE error will be returned. | NFS4ERR_GRACE error will be returned. | |||
| Similarly, when the client accesses a file system on a new server, | Similarly, when the client accesses a file system on a new server, | |||
| before it sends the first non-reclaim operation that obtains a lock | before it sends the first non-reclaim operation that obtains a lock | |||
| on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set | on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set | |||
| to TRUE and current filehandle within that file system, even if there | to TRUE and current filehandle within that file system, even if there | |||
| are no locks to reclaim. If non-reclaim locking operations are done | are no locks to reclaim. If non-reclaim locking operations are done | |||
| on that file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will | on that file system before the RECLAIM_COMPLETE, an NFS4ERR_GRACE | |||
| be returned. | will be returned. | |||
| Any locks not reclaimed at the point at which RECLAIM_COMPLETE is | Any locks not reclaimed at the point at which RECLAIM_COMPLETE is | |||
| done become non-reclaimable. The client MUST NOT attempt to reclaim | done become non-reclaimable. The client MUST NOT attempt to reclaim | |||
| them, either during the current server instance or in any subsequent | them, either during the current server instance or in any subsequent | |||
| server instance, or on another server to which responsibility for | server instance, or on another server to which responsibility for | |||
| that file system is transferred. If the client were to do so, it | that file system is transferred. If the client were to do so, it | |||
| would be violating the protocol by representing itself as owning | would be violating the protocol by representing itself as owning | |||
| locks that it does not own, and so has no right to reclaim. See | locks that it does not own, and so has no right to reclaim. See | |||
| Section 8.4.3 for a discussion of edge conditions related to lock | Section 8.4.3 for a discussion of edge conditions related to lock | |||
| reclaim. | reclaim. | |||
| skipping to change at page 549, line 19 | skipping to change at page 553, line 19 | |||
| 19.1.1. ARGUMENTS | 19.1.1. ARGUMENTS | |||
| void; | void; | |||
| 19.1.2. RESULTS | 19.1.2. RESULTS | |||
| void; | void; | |||
| 19.1.3. DESCRIPTION | 19.1.3. DESCRIPTION | |||
| Standard NULL procedure. Void argument, void response. Even though | CB_NULL is the standard ONC RPC NULL procedure, with the standard | |||
| there is no direct functionality associated with this procedure, the | void argument and void response. Even though there is no direct | |||
| server will use CB_NULL to confirm the existence of a path for RPCs | functionality associated with this procedure, the server will use | |||
| from server to client. | CB_NULL to confirm the existence of a path for RPCs from the server | |||
| to client. | ||||
| 19.1.4. ERRORS | 19.1.4. ERRORS | |||
| None. | None. | |||
| 19.2. Procedure 1: CB_COMPOUND - Compound Operations | 19.2. Procedure 1: CB_COMPOUND - Compound Operations | |||
| 19.2.1. ARGUMENTS | 19.2.1. ARGUMENTS | |||
| enum nfs_cb_opnum4 { | enum nfs_cb_opnum4 { | |||
| skipping to change at page 552, line 17 | skipping to change at page 556, line 17 | |||
| nfs_cb_resop4 resarray<>; | nfs_cb_resop4 resarray<>; | |||
| }; | }; | |||
| 19.2.3. DESCRIPTION | 19.2.3. DESCRIPTION | |||
| The CB_COMPOUND procedure is used to combine one or more of the | The CB_COMPOUND procedure is used to combine one or more of the | |||
| callback procedures into a single RPC request. The main callback RPC | callback procedures into a single RPC request. The main callback RPC | |||
| program has two main procedures: CB_NULL and CB_COMPOUND. All other | program has two main procedures: CB_NULL and CB_COMPOUND. All other | |||
| operations use the CB_COMPOUND procedure as a wrapper. | operations use the CB_COMPOUND procedure as a wrapper. | |||
| In the processing of the CB_COMPOUND procedure, the client may find | During the processing of the CB_COMPOUND procedure, the client may | |||
| that it does not have the available resources to execute any or all | find that it does not have the available resources to execute any or | |||
| of the operations within the CB_COMPOUND sequence. This is discussed | all of the operations within the CB_COMPOUND sequence. Refer to | |||
| in Section 2.10.5.4. | Section 2.10.5.4 for details. | |||
| The minorversion field of the arguments MUST be the same as the | The minorversion field of the arguments MUST be the same as the | |||
| minorversion of the COMPOUND procedure used to created the client ID | minorversion of the COMPOUND procedure used to created the client ID | |||
| and session. For NFSv4.1, minorversion MUST be set to 1. | and session. For NFSv4.1, minorversion MUST be set to 1. | |||
| Contained within the CB_COMPOUND results is a 'status' field. This | Contained within the CB_COMPOUND results is a 'status' field. This | |||
| status must be equivalent to the status of the last operation that | status must be equivalent to the status of the last operation that | |||
| was executed within the CB_COMPOUND procedure. Therefore, if an | was executed within the CB_COMPOUND procedure. Therefore, if an | |||
| operation incurred an error then the 'status' value will be the same | operation incurred an error then the 'status' value will be the same | |||
| error value as is being returned for the operation that failed. | error value as is being returned for the operation that failed. | |||
| For a description of the "tag" field, see Section 16.2.3 where the | The "tag" field is handled the same way as that of COMPOUND procedure | |||
| corresponding forward channel procedure is described. | (see Section 16.2.3). | |||
| Illegal operation codes are handled in the same way as they are | Illegal operation codes are handled in the same way as they are | |||
| handled for the COMPOUND procedure. | handled for the COMPOUND procedure. | |||
| 19.2.4. IMPLEMENTATION | 19.2.4. IMPLEMENTATION | |||
| The CB_COMPOUND procedure is used to combine individual operations | The CB_COMPOUND procedure is used to combine individual operations | |||
| into a single RPC request. The client interprets each of the | into a single RPC request. The client interprets each of the | |||
| operations in turn. If an operation is executed by the client and | operations in turn. If an operation is executed by the client and | |||
| the status of that operation is NFS4_OK, then the next operation in | the status of that operation is NFS4_OK, then the next operation in | |||
| skipping to change at page 553, line 28 | skipping to change at page 557, line 28 | |||
| | NFS4ERR_INVAL | The tag argument is not in UTF-8 | | | NFS4ERR_INVAL | The tag argument is not in UTF-8 | | |||
| | | encoding. | | | | encoding. | | |||
| | NFS4ERR_MINOR_VERS_MISMATCH | | | | NFS4ERR_MINOR_VERS_MISMATCH | | | |||
| | NFS4ERR_SERVERFAULT | | | | NFS4ERR_SERVERFAULT | | | |||
| | NFS4ERR_TOO_MANY_OPS | | | | NFS4ERR_TOO_MANY_OPS | | | |||
| | NFS4ERR_REP_TOO_BIG | | | | NFS4ERR_REP_TOO_BIG | | | |||
| | NFS4ERR_REP_TOO_BIG_TO_CACHE | | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | | | |||
| | NFS4ERR_REQ_TOO_BIG | | | | NFS4ERR_REQ_TOO_BIG | | | |||
| +------------------------------+------------------------------------+ | +------------------------------+------------------------------------+ | |||
| Table 21 | Table 23 | |||
| 20. NFSv4.1 Callback Operations | 20. NFSv4.1 Callback Operations | |||
| 20.1. Operation 3: CB_GETATTR - Get Attributes | 20.1. Operation 3: CB_GETATTR - Get Attributes | |||
| 20.1.1. ARGUMENT | 20.1.1. ARGUMENT | |||
| struct CB_GETATTR4args { | struct CB_GETATTR4args { | |||
| nfs_fh4 fh; | nfs_fh4 fh; | |||
| bitmap4 attr_request; | bitmap4 attr_request; | |||
| skipping to change at page 554, line 27 | skipping to change at page 558, line 27 | |||
| 20.1.3. DESCRIPTION | 20.1.3. DESCRIPTION | |||
| The CB_GETATTR operation is used by the server to obtain the current | The CB_GETATTR operation is used by the server to obtain the current | |||
| modified state of a file that has been write delegated. The | modified state of a file that has been write delegated. The | |||
| attributes size and change are the only ones guaranteed to be | attributes size and change are the only ones guaranteed to be | |||
| serviced by the client. See Section 10.4.3 for a full description of | serviced by the client. See Section 10.4.3 for a full description of | |||
| how the client and server are to interact with the use of CB_GETATTR. | how the client and server are to interact with the use of CB_GETATTR. | |||
| If the filehandle specified is not one for which the client holds a | If the filehandle specified is not one for which the client holds a | |||
| write open delegation, an NFS4ERR_BADHANDLE error is returned. | write delegation, an NFS4ERR_BADHANDLE error is returned. | |||
| 20.1.4. IMPLEMENTATION | 20.1.4. IMPLEMENTATION | |||
| The client returns attrmask bits and the associated attribute values | The client returns attrmask bits and the associated attribute values | |||
| only for the change attribute, and attributes that it may change | only for the change attribute, and attributes that it may change | |||
| (time_modify, and size). | (time_modify, and size). | |||
| 20.2. Operation 4: CB_RECALL - Recall an Open Delegation | 20.2. Operation 4: CB_RECALL - Recall a Delegation | |||
| 20.2.1. ARGUMENT | 20.2.1. ARGUMENT | |||
| struct CB_RECALL4args { | struct CB_RECALL4args { | |||
| stateid4 stateid; | stateid4 stateid; | |||
| bool truncate; | bool truncate; | |||
| nfs_fh4 fh; | nfs_fh4 fh; | |||
| }; | }; | |||
| 20.2.2. RESULT | 20.2.2. RESULT | |||
| struct CB_RECALL4res { | struct CB_RECALL4res { | |||
| nfsstat4 status; | nfsstat4 status; | |||
| }; | }; | |||
| 20.2.3. DESCRIPTION | 20.2.3. DESCRIPTION | |||
| The CB_RECALL operation is used to begin the process of recalling an | The CB_RECALL operation is used to begin the process of recalling a | |||
| open delegation and returning it to the server. | delegation and returning it to the server. | |||
| The truncate flag is used to optimize recall for a file which is | The truncate flag is used to optimize recall for a file object which | |||
| about to be truncated to zero. When it is set, the client is freed | is a regular file and is about to be truncated to zero. When it is | |||
| of obligation to propagate modified data for the file to the server, | TRUE, the client is freed of the obligation to propagate modified | |||
| since this data is irrelevant. | data for the file to the server, since this data is irrelevant. | |||
| If the handle specified is not one for which the client holds an open | If the handle specified is not one for which the client holds a | |||
| delegation, an NFS4ERR_BADHANDLE error is returned. | delegation, an NFS4ERR_BADHANDLE error is returned. | |||
| If the stateid specified is not one corresponding to an open | If the stateid specified is not one corresponding to an open | |||
| delegation for the file specified by the filehandle, an | delegation for the file specified by the filehandle, an | |||
| NFS4ERR_BAD_STATEID is returned. | NFS4ERR_BAD_STATEID is returned. | |||
| 20.2.4. IMPLEMENTATION | 20.2.4. IMPLEMENTATION | |||
| The client should reply to the callback immediately. Replying does | The client SHOULD reply to the callback immediately. Replying does | |||
| not complete the recall except when an error was returned. The | not complete the recall except when the value of the reply's status | |||
| recall is not complete until the delegation is returned using a | field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not | |||
| DELEGRETURN. | complete until the delegation is returned using a DELEGRETURN | |||
| operation. | ||||
| 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client | 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client | |||
| 20.3.1. ARGUMENT | 20.3.1. ARGUMENT | |||
| /* | /* | |||
| * NFSv4.1 callback arguments and results | * NFSv4.1 callback arguments and results | |||
| */ | */ | |||
| enum layoutrecall_type4 { | enum layoutrecall_type4 { | |||
| skipping to change at page 556, line 50 | skipping to change at page 560, line 50 | |||
| 20.3.2. RESULT | 20.3.2. RESULT | |||
| struct CB_LAYOUTRECALL4res { | struct CB_LAYOUTRECALL4res { | |||
| nfsstat4 clorr_status; | nfsstat4 clorr_status; | |||
| }; | }; | |||
| 20.3.3. DESCRIPTION | 20.3.3. DESCRIPTION | |||
| The CB_LAYOUTRECALL operation is used by the server to recall layouts | The CB_LAYOUTRECALL operation is used by the server to recall layouts | |||
| from the client; as a result, the client will begin the process of | from the client; as a result, the client will begin the process of | |||
| returning layouts with LAYOUTRETURN. The CB_LAYOUTRECALL operation | returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation | |||
| specifies one of three forms of recall processing with the value of | specifies one of three forms of recall processing with the value of | |||
| layoutrecall_type4. The recall is either for a specific layout (by | layoutrecall_type4. The recall is either for a specific layout (by | |||
| file), for an entire file system (FSID), or for all file systems | file), for an entire file system (FSID), or for all file systems | |||
| (ALL). | (ALL). | |||
| The behavior of the operation varies based on the value of the | The behavior of the operation varies based on the value of the | |||
| layoutrecall_type4. The value and behaviors are: | layoutrecall_type4. The value and behaviors are: | |||
| LAYOUTRECALL4_FILE | LAYOUTRECALL4_FILE | |||
| For a layout to match the recall request, the following fields | For a layout to match the recall request, the values of the | |||
| must match in value with the layout: clora_type, clora_iomode, | following fields must match those of the layout: clora_type, | |||
| lor_fh, and the byte range specified by lor_offset, and | clora_iomode, lor_fh, and the byte range specified by lor_offset | |||
| lor_length. The clora_iomode field may have a special value of | and lor_length. The clora_iomode field may have a special value | |||
| LAYOUTIOMODE4_ANY. The LAYOUTIOMODE4_ANY will match any value | of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will | |||
| originally returned in a layout; therefore it acts as a wild card | match any iomode originally returned in a layout; therefore it | |||
| for iomode. The other special value used is for lor_length. If | acts as a wild card. The other special value used is for | |||
| lor_length has a value of NFS4_MAXFILELEN, the lor_length field | lor_length. If lor_length has a value of NFS4_UINT64_MAX, the | |||
| means the maximum possible file size. If a matching layout is | lor_length field means the maximum possible file size. If a | |||
| found, it MUST be returned using the LAYOUTRETURN operation, see | matching layout is found, it MUST be returned using the | |||
| Section 18.44. An example of the field's special value use is if | LAYOUTRETURN operation (see Section 18.44). An example of the | |||
| clora_iomode is LAYOUTIOMODE4_ANY, lor_offset is zero, and | field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY, | |||
| lor_length is NFS4_MAXFILELEN, then the entire layout is to be | lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the | |||
| returned. | entire layout is to be returned. | |||
| The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | |||
| client does not hold layouts for the file or if the client does | client does not hold layouts for the file or if the client does | |||
| not have any overlapping layouts for the specification in the | not have any overlapping layouts for the specification in the | |||
| layout recall. | layout recall. | |||
| LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL | LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL | |||
| If LAYOUTRECALL4_FSID is specified, the fsid specifies the file | If LAYOUTRECALL4_FSID is specified, the fsid specifies the file | |||
| system for which any outstanding layouts MUST be returned. If | system for which any outstanding layouts MUST be returned. If | |||
| skipping to change at page 557, line 51 | skipping to change at page 561, line 51 | |||
| respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or | respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or | |||
| LAYOUTRETURN4_ALL acknowledges to the server that the client | LAYOUTRETURN4_ALL acknowledges to the server that the client | |||
| invalidated the said device mappings. See Section 12.5.5.2.1.5 | invalidated the said device mappings. See Section 12.5.5.2.1.5 | |||
| for considerations with "bulk" recall of layouts. | for considerations with "bulk" recall of layouts. | |||
| The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the | |||
| client does not hold layouts and does not have valid deviceid | client does not hold layouts and does not have valid deviceid | |||
| mappings. | mappings. | |||
| In processing the layout recall request, the client also varies its | In processing the layout recall request, the client also varies its | |||
| behavior on the value of the clora_changed field. This field is used | behavior based on the value of the clora_changed field. This field | |||
| by the server to provide additional context for the reason why the | is used by the server to provide additional context for the reason | |||
| layout is being recalled. A FALSE value for clora_changed indicates | why the layout is being recalled. A FALSE value for clora_changed | |||
| that no change in the layout is expected and the client may write | indicates that no change in the layout is expected and the client may | |||
| modified data to the storage devices involved; this must be done | write modified data to the storage devices involved; this must be | |||
| prior to returning the layout via LAYOUTRETURN. A TRUE value for | done prior to returning the layout via LAYOUTRETURN. A TRUE value | |||
| clora_changed indicates that the server is changing the layout. | for clora_changed indicates that the server is changing the layout. | |||
| Examples of layout changes and reasons for a TRUE indication are: | Examples of layout changes and reasons for a TRUE indication are: the | |||
| metadata server is restriping the file or a permanent error has | metadata server is restriping the file or a permanent error has | |||
| occurred on a storage device and the metadata server would like to | occurred on a storage device and the metadata server would like to | |||
| provide a new layout for the file. Therefore, a clora_changed value | provide a new layout for the file. Therefore, a clora_changed value | |||
| of TRUE indicates some level of change for the layout and the client | of TRUE indicates some level of change for the layout and the client | |||
| SHOULD NOT write and commit modified data to the storage devices. In | SHOULD NOT write and commit modified data to the storage devices. In | |||
| this case, the client writes and commits data through the metadata | this case, the client writes and commits data through the metadata | |||
| server. | server. | |||
| See Section 12.5.3 for a description of how the lor_stateid field in | See Section 12.5.3 for a description of how the lor_stateid field in | |||
| the arguments is to be constructed. Note that the "seqid" field of | the arguments is to be constructed. Note that the "seqid" field of | |||
| lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and | lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and | |||
| Section 12.5.5.2 for a further discussion and requirements. | Section 12.5.5.2 for a further discussion and requirements. | |||
| 20.3.4. IMPLEMENTATION | 20.3.4. IMPLEMENTATION | |||
| The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL | The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL | |||
| (recall of file delegations) in that straightforward processing of | (recall of file delegations) in that the client responds to the | |||
| the layout recall done and the client responds to the request before | request before actually returning layouts via the LAYOUTRETURN | |||
| actually returning layouts with the LAYOUTRETURN operation. While | operation. While the client responds to the CB_LAYOUTRECALL | |||
| the client responds to the CB_LAYOUTRECALL immediately, the operation | immediately, the operation is not considered complete (i.e. | |||
| is not considered complete (i.e. considered pending) until all | considered pending) until all affected layouts are returned to the | |||
| affected layouts are returned to the server with the LAYOUTRETURN | server via the LAYOUTRETURN operation. | |||
| operation. | ||||
| Before returning the layout to the server with LAYOUTRETURN, the | Before returning the layout to the server via LAYOUTRETURN, the | |||
| client should wait for the response from in-process or in-flight | client should wait for the response from in-process or in-flight | |||
| READ, WRITE, or COMMIT operations that use the recalled layout. | READ, WRITE, or COMMIT operations that use the recalled layout. | |||
| If the client is holding modified data which is effected by a | If the client is holding modified data which is affected by a | |||
| recalled layout, the client has various options for writing the data | recalled layout, the client has various options for writing the data | |||
| to the server. As always, the client may write the data through the | to the server. As always, the client may write the data through the | |||
| metadata server. In fact, the client may not have a choice other | metadata server. In fact, the client may not have a choice other | |||
| than writing to the metadata server when the clora_changed argument | than writing to the metadata server when the clora_changed argument | |||
| is TRUE and a new layout is unavailable from the server. However, | is TRUE and a new layout is unavailable from the server. However, | |||
| the client may be able to write the modified data to the storage | the client may be able to write the modified data to the storage | |||
| device if the clora_changed argument is FALSE; this needs to be done | device if the clora_changed argument is FALSE; this needs to be done | |||
| before returning the layout with LAYOUTRETURN. If the client were to | before returning the layout via LAYOUTRETURN. If the client were to | |||
| obtain a new layout covering the modified data's range, then writing | obtain a new layout covering the modified data's range, then writing | |||
| to the storage devices is an available alternative. Note that before | to the storage devices is an available alternative. Note that before | |||
| obtaining a new layout, the client must first return the original | obtaining a new layout, the client must first return the original | |||
| layout. | layout. | |||
| In the case of modified data being written while the layout is held, | In the case of modified data being written while the layout is held, | |||
| the client must use LAYOUTCOMMIT operations at the appropriate time; | the client must use LAYOUTCOMMIT operations at the appropriate time; | |||
| as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a | as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a | |||
| large amount of modified data is outstanding, the client may send | large amount of modified data is outstanding, the client may send | |||
| LAYOUTRETURNs for portions of the recalled layout; this allows the | LAYOUTRETURNs for portions of the recalled layout; this allows the | |||
| skipping to change at page 561, line 24 | skipping to change at page 565, line 24 | |||
| to clients about changes to delegated directories The registration of | to clients about changes to delegated directories The registration of | |||
| notifications for the directories occurs when the delegation is | notifications for the directories occurs when the delegation is | |||
| established using GET_DIR_DELEGATION. These notifications are sent | established using GET_DIR_DELEGATION. These notifications are sent | |||
| over the backchannel. The notification is sent once the original | over the backchannel. The notification is sent once the original | |||
| request has been processed on the server. The server will send an | request has been processed on the server. The server will send an | |||
| array of notifications for changes that might have occurred in the | array of notifications for changes that might have occurred in the | |||
| directory. The notifications are sent as list of pairs of bitmaps | directory. The notifications are sent as list of pairs of bitmaps | |||
| and values. See Section 3.3.7 for a description of how NFSv4.1 | and values. See Section 3.3.7 for a description of how NFSv4.1 | |||
| bitmaps work. | bitmaps work. | |||
| If the server has more notifications then can fit in the CB_COMPOUND | If the server has more notifications than can fit in the CB_COMPOUND | |||
| request, it SHOULD send a sequence of serial CB_COMPOUND requests so | request, it SHOULD send a sequence of serial CB_COMPOUND requests so | |||
| that the client's view of the directory does not become confused. | that the client's view of the directory does not become confused. | |||
| E.g. If the server indicates a file named "foo" is added, and that | E.g. If the server indicates a file named "foo" is added, and that | |||
| the file "foo" is removed, the order it which the client receives | the file "foo" is removed, the order in which the client receives | |||
| these notifications are processed needs to be the same as the order | these notifications needs to be the same as the order in which | |||
| in which corresponding operations occurred on the server. | corresponding operations occurred on the server. | |||
| If the client holding the delegation makes any changes in the | If the client holding the delegation makes any changes in the | |||
| directory that cause files or sub directories to be added or removed, | directory that cause files or sub directories to be added or removed, | |||
| the server will notify that client of the resulting change(s). If | the server will notify that client of the resulting change(s). If | |||
| the client holding the delegation is making attribute or cookie | the client holding the delegation is making attribute or cookie | |||
| verifier changes only, the server does not need to send notifications | verifier changes only, the server does not need to send notifications | |||
| to that client. The server will send the following information for | to that client. The server will send the following information for | |||
| each operation: | each operation: | |||
| NOTIFY4_ADD_ENTRY | NOTIFY4_ADD_ENTRY | |||
| The server will send information about the new directory entry | The server will send information about the new directory entry | |||
| being created along with the cookie for that entry. The entry | being created along with the cookie for that entry. The entry | |||
| information (data type notify_add4) includes the component name of | information (data type notify_add4) includes the component name of | |||
| the entry and attributes. The server will send this type of entry | the entry and attributes. The server will send this type of entry | |||
| when a file is actually being created, when an entry is being | when a file is actually being created, when an entry is being | |||
| added to a directory as a result of a rename across directories | added to a directory as a result of a rename across directories | |||
| (see below), and when a hard link is being created to an existing | (see below), and when a hard link is being created to an existing | |||
| file. If this entry is added to the end of the directory, the | file. If this entry is added to the end of the directory, the | |||
| server will set the nad_last_entry flag to true. If the file is | server will set the nad_last_entry flag to TRUE. If the file is | |||
| added such that there is at least one entry before it, the server | added such that there is at least one entry before it, the server | |||
| will also return the previous entry information (nad_prev_entry, a | will also return the previous entry information (nad_prev_entry, a | |||
| variable length array of up to one element. If the array is of | variable length array of up to one element. If the array is of | |||
| zero length, there is no previous entry), along with its cookie. | zero length, there is no previous entry), along with its cookie. | |||
| This is to help clients find the right location in their DNLC or | This is to help clients find the right location in their file name | |||
| directory caches where this entry should be cached. If the new | caches and directory caches where this entry should be cached. If | |||
| entry's cookie is available, it will be in nad_new_entry_cookie | the new entry's cookie is available, it will be in the | |||
| (another variable length array of up to one element). If the | nad_new_entry_cookie (another variable length array of up to one | |||
| addition of the entry causes another entry to be deleted (which | element) field. If the addition of the entry causes another entry | |||
| can only happen in the rename case) atomically with the addition, | to be deleted (which can only happen in the rename case) | |||
| then information on this entry is reported in nad_old_entry. | atomically with the addition, then information on this entry is | |||
| reported in nad_old_entry. | ||||
| NOTIFY4_REMOVE_ENTRY | NOTIFY4_REMOVE_ENTRY | |||
| The server will send information about the directory entry being | The server will send information about the directory entry being | |||
| deleted. The server will also send the cookie value for the | deleted. The server will also send the cookie value for the | |||
| deleted entry so that clients can get to the cached information | deleted entry so that clients can get to the cached information | |||
| for this entry. | for this entry. | |||
| NOTIFY4_RENAME_ENTRY | NOTIFY4_RENAME_ENTRY | |||
| The server will send information about both the old entry and the | The server will send information about both the old entry and the | |||
| new entry. This includes name and attributes for each entry. In | new entry. This includes name and attributes for each entry. In | |||
| skipping to change at page 563, line 32 | skipping to change at page 567, line 32 | |||
| 20.5.2. RESULT | 20.5.2. RESULT | |||
| struct CB_PUSH_DELEG4res { | struct CB_PUSH_DELEG4res { | |||
| nfsstat4 cpdr_status; | nfsstat4 cpdr_status; | |||
| }; | }; | |||
| 20.5.3. DESCRIPTION | 20.5.3. DESCRIPTION | |||
| CB_PUSH_DELEG is used by the server to both signal to the client that | CB_PUSH_DELEG is used by the server to both signal to the client that | |||
| the delegation it wants is available and to simultaneously offer the | the delegation it wants (previously indicated via a want established | |||
| delegation to the client. The client has the choice of accepting the | from an OPEN or WANT_DELEGATION operation) is available and to | |||
| delegation by returning NFS4_OK to the server, delaying the decision | simultaneously offer the delegation to the client. The client has | |||
| to accept the offered delegation by returning NFS4ERR_DELAY or | the choice of accepting the delegation by returning NFS4_OK to the | |||
| permanently rejecting the offer of the delegation by returning | server, delaying the decision to accept the offered delegation by | |||
| NFS4ERR_REJECT_DELEG. When a delegation is rejected in this fashion, | returning NFS4ERR_DELAY or permanently rejecting the offer of the | |||
| the want previously established is permanently deleted. | delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is | |||
| rejected in this fashion, the want previously established is | ||||
| The server MUST send in cpda_delegation a delegation which satisfies | permanently deleted and the delegation is subject to acquisition by | |||
| a request made in an OPEN or WANT_DELEGATION operation. | another client. | |||
| 20.5.4. IMPLEMENTATION | 20.5.4. IMPLEMENTATION | |||
| If the client does return NFS4ERR_DELAY and there is a conflicting | If the client does return NFS4ERR_DELAY and there is a conflicting | |||
| delegation request, the server MAY process it at the expense of the | delegation request, the server MAY process it at the expense of the | |||
| client that returned NFS4ERR_DELAY. The client's want will typically | client that returned NFS4ERR_DELAY. The client's want will typically | |||
| not be cancelled, but MAY processed behind other delegation requests | not be cancelled, but MAY processed behind other delegation requests | |||
| or registered wants. | or registered wants. | |||
| When a client returns a status other than NFS4_OK, NFSERR_DELAY, or | When a client returns a status other than NFS4_OK, NFSERR_DELAY, or | |||
| NFS4ERR_REJECT_DELAY, the want remains pending, although servers may | NFS4ERR_REJECT_DELAY, the want remains pending, although servers may | |||
| decide to cancel the want by sending a CB_WANTS_CANCELLED. | decide to cancel the want by sending a CB_WANTS_CANCELLED. | |||
| 20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations | 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable objects | |||
| Notify client to return delegation and keep N of them. | Notify client to return all but N recallable objects. | |||
| 20.6.1. ARGUMENT | 20.6.1. ARGUMENT | |||
| const RCA4_TYPE_MASK_RDATA_DLG = 0; | const RCA4_TYPE_MASK_RDATA_DLG = 0; | |||
| const RCA4_TYPE_MASK_WDATA_DLG = 1; | const RCA4_TYPE_MASK_WDATA_DLG = 1; | |||
| const RCA4_TYPE_MASK_DIR_DLG = 2; | const RCA4_TYPE_MASK_DIR_DLG = 2; | |||
| const RCA4_TYPE_MASK_FILE_LAYOUT = 3; | const RCA4_TYPE_MASK_FILE_LAYOUT = 3; | |||
| const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4; | const RCA4_TYPE_MASK_BLK_LAYOUT = 4; | |||
| const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7; | ||||
| const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; | const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; | |||
| const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11; | const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9; | |||
| const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; | const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; | |||
| const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; | const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; | |||
| struct CB_RECALL_ANY4args { | struct CB_RECALL_ANY4args { | |||
| uint32_t craa_objects_to_keep; | uint32_t craa_objects_to_keep; | |||
| bitmap4 craa_type_mask; | bitmap4 craa_type_mask; | |||
| }; | }; | |||
| 20.6.2. RESULT | 20.6.2. RESULT | |||
| skipping to change at page 565, line 23 | skipping to change at page 569, line 23 | |||
| resource pools for layouts and for delegations, or further separate | resource pools for layouts and for delegations, or further separate | |||
| resources by types of delegations. | resources by types of delegations. | |||
| When a given resource pool is over-utilized, the server can send a | When a given resource pool is over-utilized, the server can send a | |||
| CB_RECALL_ANY to clients holding recallable objects of the types | CB_RECALL_ANY to clients holding recallable objects of the types | |||
| involved, allowing it to keep a certain number of such objects and | involved, allowing it to keep a certain number of such objects and | |||
| return any excess. A mask specifies which types of objects are to be | return any excess. A mask specifies which types of objects are to be | |||
| limited. The client chooses, based on its own knowledge of current | limited. The client chooses, based on its own knowledge of current | |||
| usefulness, which of the objects in that class should be returned. | usefulness, which of the objects in that class should be returned. | |||
| For NFSv4.1, a number of bits are defined. For some of these, ranges | A number of bits are defined. For some of these, ranges are defined | |||
| are defined and it is up to the definition of the storage protocol to | and it is up to the definition of the storage protocol to specify how | |||
| specify how these are to be used. There are ranges for blocks-based | these are to be used. There are ranges reserved for object-based | |||
| storage protocols, for object-based storage protocols and a reserved | storage protocols and for other experimental storage protocols. An | |||
| range for other experimental storage protocols. The RFC defining | RFC defining such a storage protocol needs to specify how particular | |||
| such a storage protocol needs to specify how particular bits within | bits within its range are to be used. For example, it may specify a | |||
| its range are to be used. For example, it may specify a mapping | mapping between attributes of the layout (read vs. write, size of | |||
| between attributes of the layout (read vs. write, size of area) and | area) and the bit to be used or it may define a field in the layout | |||
| the bit to be used or it may define a field in the layout where the | where the associated bit position is made available by the server to | |||
| associated bit position is made available by the server to the | the client. | |||
| client. | ||||
| When an undefined bit is set in the type mask, NFS4ERR_INVAL should | RCA4_TYPE_MASK_RDATA_DLG | |||
| be returned. If a client does not support an object of the specified | ||||
| type, if the bit is defined, NFS4ERR_INVAL should not be returned. | The client is to return read delegations on non-directory file | |||
| Future minor versions of NFSv4 may expand the set of valid type mask | objects. | |||
| bits. | ||||
| RCA4_TYPE_MASK_WDATA_DLG | ||||
| The client is to return write delegations on regular file objects. | ||||
| RCA4_TYPE_MASK_DIR_DLG | ||||
| The client is to return directory delegations. | ||||
| RCA4_TYPE_MASK_FILE_LAYOUT | ||||
| The client is to return layouts of type LAYOUT4_NFSV4_1_FILES. | ||||
| RCA4_TYPE_MASK_BLK_LAYOUT | ||||
| See [31] for a description. | ||||
| RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX | ||||
| See [30] for a description. | ||||
| RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX | ||||
| This range is reserved for telling the client to recall layouts of | ||||
| experimental or site specific layout types (see Section 3.3.13). | ||||
| When a bit is set in the type mask that corresponds to an undefined | ||||
| type of recallable object, NFS4ERR_INVAL MUST be returned. When a | ||||
| bit is set that corresponds to a defined type of object, but the | ||||
| client does not support an object of the type, NFS4ERR_INVAL MUST NOT | ||||
| be returned. Future minor versions of NFSv4 may expand the set of | ||||
| valid type mask bits. | ||||
| CB_RECALL_ANY specifies a count of objects that the client may keep | CB_RECALL_ANY specifies a count of objects that the client may keep | |||
| as opposed to a count that the client must return. This is to avoid | as opposed to a count that the client must return. This is to avoid | |||
| potential race between a CB_RECALL_ANY that had a count of objects to | potential race between a CB_RECALL_ANY that had a count of objects to | |||
| free with a set of client-originated operations to return layouts or | free with a set of client-originated operations to return layouts or | |||
| delegations. As a result of the race, the client and server would | delegations. As a result of the race, the client and server would | |||
| have differing ideas as to how many objects to return. Hence the | have differing ideas as to how many objects to return. Hence the | |||
| client could mistakenly free too many. | client could mistakenly free too many. | |||
| If resource demands prompt it, the server may send another | If resource demands prompt it, the server may send another | |||
| skipping to change at page 567, line 18 | skipping to change at page 571, line 46 | |||
| nfsstat4 croa_status; | nfsstat4 croa_status; | |||
| }; | }; | |||
| 20.7.3. DESCRIPTION | 20.7.3. DESCRIPTION | |||
| CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client | CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client | |||
| that the server has resources to grant recallable objects that might | that the server has resources to grant recallable objects that might | |||
| previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, | previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, | |||
| or LAYOUTGET. | or LAYOUTGET. | |||
| The argument, objects_to_keep means the total number of recallable | The argument craa_objects_to_keep means the total number of | |||
| objects of the types indicated in the argument type_mask that the | recallable objects of the types indicated in the argument type_mask | |||
| server believes it can allow the client to have, including the number | that the server believes it can allow the client to have, including | |||
| of such objects the client already has. A client that tries to | the number of such objects the client already has. A client that | |||
| acquire more recallable objects than the server informs it can have | tries to acquire more recallable objects than the server informs it | |||
| runs the risk of having objects recalled. | can have runs the risk of having objects recalled. | |||
| The server is not obligated to reserve the difference between the | ||||
| number of the objects the client currently has and the value of | ||||
| craa_objects_to_keep, nor does delaying the reply to | ||||
| CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources | ||||
| of the recallable objects for another purpose. Indeed, if a client | ||||
| responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might | ||||
| interpret the client as having reduced capability to manage | ||||
| recallable objects, and so cancel or reduce any reservation it is | ||||
| maintaining on behalf of the client. Thus if the client desires to | ||||
| acquire more recallable objects, it needs to reply quickly to | ||||
| CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to | ||||
| acquire recallable objects. | ||||
| 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits | 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits | |||
| Change flow control limits | Change flow control limits | |||
| 20.8.1. ARGUMENT | 20.8.1. ARGUMENT | |||
| struct CB_RECALL_SLOT4args { | struct CB_RECALL_SLOT4args { | |||
| slotid4 rsa_target_highest_slotid; | slotid4 rsa_target_highest_slotid; | |||
| }; | }; | |||
| skipping to change at page 567, line 45 | skipping to change at page 572, line 40 | |||
| 20.8.2. RESULT | 20.8.2. RESULT | |||
| struct CB_RECALL_SLOT4res { | struct CB_RECALL_SLOT4res { | |||
| nfsstat4 rsr_status; | nfsstat4 rsr_status; | |||
| }; | }; | |||
| 20.8.3. DESCRIPTION | 20.8.3. DESCRIPTION | |||
| The CB_RECALL_SLOT operation requests the client to return session | The CB_RECALL_SLOT operation requests the client to return session | |||
| slots, and if applicable, transport credits (e.g. RDMA credits for | slots, and if applicable, transport credits (e.g. RDMA credits for | |||
| connections associated with the operations channel) to the server. | connections associated with the operations channel) of the session's | |||
| CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target | fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid, | |||
| highest_slot the server wants for the session. The client, should | the value of the target highest slot id the server wants for the | |||
| then work toward reducing the highest_slot to the target. | session. The client MUST then progress toward reducing the session's | |||
| highest slot id to the target value. | ||||
| If the session has only non-RDMA connections associated with its | If the session has only non-RDMA connections associated with its | |||
| operations channel, then the client need only wait for all | operations channel, then the client need only wait for all | |||
| outstanding requests with a slotid > rsa_target_highest_slotid to | outstanding requests with a slotid > rsa_target_highest_slotid to | |||
| complete, then send a single COMPOUND consisting of a single SEQUENCE | complete, then send a single COMPOUND consisting of a single SEQUENCE | |||
| operation, with the sa_highestslot field set to | operation, with the sa_highestslot field set to | |||
| rsa_target_highest_slotid. If there are RDMA-based connections | rsa_target_highest_slotid. If there are RDMA-based connections | |||
| associated with operation channel, then the client needs to also send | associated with operation channel, then the client needs to also send | |||
| enough zero-length RDMA Sends to take the total RDMA credit count to | enough zero-length RDMA Sends to take the total RDMA credit count to | |||
| rsa_target_highest_slotid + 1 or below. | rsa_target_highest_slotid + 1 or below. | |||
| skipping to change at page 569, line 26 | skipping to change at page 574, line 26 | |||
| case NFS4_OK: | case NFS4_OK: | |||
| CB_SEQUENCE4resok csr_resok4; | CB_SEQUENCE4resok csr_resok4; | |||
| default: | default: | |||
| void; | void; | |||
| }; | }; | |||
| 20.9.3. DESCRIPTION | 20.9.3. DESCRIPTION | |||
| The CB_SEQUENCE operation is used to manage operational accounting | The CB_SEQUENCE operation is used to manage operational accounting | |||
| for the backchannel of the session on which a request is sent. The | for the backchannel of the session on which a request is sent. The | |||
| contents include the session to which this request belongs, slot id | contents include the session id to which this request belongs, the | |||
| and sequence id used by the server to implement session request | slot id and sequence id used by the server to implement session | |||
| control and exactly once semantics, and exchanged slot maximums which | request control and exactly once semantics, and exchanged slot id | |||
| are used to adjust the size of the reply cache. This operation MUST | maxima which are used to adjust the size of the reply cache. This | |||
| appear once as the first operation in each CB_COMPOUND request or a | operation will appear once as the first operation in each CB_COMPOUND | |||
| protocol error must result. See Section 18.46.3 for a description of | request or a protocol error MUST result. See Section 18.46.3 for a | |||
| how slots are processed. | description of how slots are processed. | |||
| If csa_cachethis is TRUE, then the server is requesting that the | If csa_cachethis is TRUE, then the server is requesting that the | |||
| client cache the reply in the callback reply cache. The client MUST | client cache the reply in the callback reply cache. The client MUST | |||
| cache the reply (see Section 2.10.5.1.3). | cache the reply (see Section 2.10.5.1.3). | |||
| The csa_referring_call_lists array is the list of COMPOUND requests, | The csa_referring_call_lists array is the list of COMPOUND requests, | |||
| identified by sessionid, slot id and sequencid. These are requests | identified by sessionid, slot id and sequencid. These are requests | |||
| that the client previously sent to the server. These previous | that the client previously sent to the server. These previous | |||
| requests created state that some operation(s) in the same CB_COMPOUND | requests created state that some operation(s) in the same CB_COMPOUND | |||
| as the csa_referring_call_lists is identifying. A sessionid is | as the csa_referring_call_lists are identifying. A session id is | |||
| included because leased state is tied to a client ID, and a client ID | included because leased state is tied to a client ID, and a client ID | |||
| can have multiple sessions. See Section 2.10.5.3. | can have multiple sessions. See Section 2.10.5.3. | |||
| The value of csa_sequenceid argument relative to the cached sequence | The value of the csa_sequenceid argument relative to the cached | |||
| id on the slot falls into one of three cases. | sequence id on the slot falls into one of three cases. | |||
| o If the difference between csa_sequenceid and the client's cached | o If the difference between csa_sequenceid and the client's cached | |||
| sequence id at the slot id is two (2) or more, or if | sequence id at the slot id is two (2) or more, or if | |||
| csa_sequenceid is less than the cached sequence id (accounting for | csa_sequenceid is less than the cached sequence id (accounting for | |||
| wraparound of the unsigned sequence id value), then the client | wraparound of the unsigned sequence id value), then the client | |||
| MUST return NFS4ERR_SEQ_MISORDERED. | MUST return NFS4ERR_SEQ_MISORDERED. | |||
| o If csa_sequenceid and the cached sequence id are the same, this is | o If csa_sequenceid and the cached sequence id are the same, this is | |||
| a retry, and the client returns the CB_COMPOUND request's cached | a retry, and the client returns the CB_COMPOUND request's cached | |||
| reply. | reply. | |||
| skipping to change at page 570, line 36 | skipping to change at page 575, line 36 | |||
| id, cached reply) MUST NOT change. | id, cached reply) MUST NOT change. | |||
| The client returns two "highest_slotid" values: csr_highest_slotid, | The client returns two "highest_slotid" values: csr_highest_slotid, | |||
| and csr_target_highest_slotid. The former is the highest slot id the | and csr_target_highest_slotid. The former is the highest slot id the | |||
| client will accept in a future CB_SEQUENCE operation, and SHOULD NOT | client will accept in a future CB_SEQUENCE operation, and SHOULD NOT | |||
| be less than the value of csa_highest_slotid (but see | be less than the value of csa_highest_slotid (but see | |||
| Section 2.10.5.1 for an exception). The latter is the highest slot | Section 2.10.5.1 for an exception). The latter is the highest slot | |||
| id the client would prefer the server use on a future CB_SEQUENCE | id the client would prefer the server use on a future CB_SEQUENCE | |||
| operation. | operation. | |||
| 20.9.4. IMPLEMENTATION | ||||
| 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation | 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation | |||
| Wants | Wants | |||
| Retracts promise to signal delegation availability. | Retracts promise to signal delegation availability. | |||
| 20.10.1. ARGUMENT | 20.10.1. ARGUMENT | |||
| struct CB_WANTS_CANCELLED4args { | struct CB_WANTS_CANCELLED4args { | |||
| bool cwca_contended_wants_cancelled; | bool cwca_contended_wants_cancelled; | |||
| bool cwca_resourced_wants_cancelled; | bool cwca_resourced_wants_cancelled; | |||
| skipping to change at page 572, line 13 | skipping to change at page 577, line 13 | |||
| }; | }; | |||
| 20.11.2. RESULT | 20.11.2. RESULT | |||
| struct CB_NOTIFY_LOCK4res { | struct CB_NOTIFY_LOCK4res { | |||
| nfsstat4 cnlr_status; | nfsstat4 cnlr_status; | |||
| }; | }; | |||
| 20.11.3. DESCRIPTION | 20.11.3. DESCRIPTION | |||
| The server can use this operation to indicate that a lock for the | The server can use this operation to indicate that a byte-range lock | |||
| given file and lock-owner, previously requested by the client via an | for the given file and lock-owner, previously requested by the client | |||
| unsuccessful LOCK request, might be available. | via an unsuccessful LOCK request, might be available. | |||
| This callback is meant to be used by servers to help reduce the | This callback is meant to be used by servers to help reduce the | |||
| latency of blocking locks in the case where they recognize that a | latency of blocking locks in the case where they recognize that a | |||
| client which has been polling for a blocking lock may now be able to | client which has been polling for a blocking lock may now be able to | |||
| acquire the lock. If the server supports this callback for a given | acquire the lock. If the server supports this callback for a given | |||
| file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when | file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when | |||
| responding to successful opens for that file. This does not commit | responding to successful opens for that file. This does not commit | |||
| the server to use of CB_NOTIFY_LOCK, but the client may use this as a | the server to the use of CB_NOTIFY_LOCK, but the client may use this | |||
| hint to decide how frequently to poll for locks derived from that | as a hint to decide how frequently to poll for locks derived from | |||
| open. | that open. | |||
| If an OPEN operation results in an upgrade, in which the stateid | If an OPEN operation results in an upgrade, in which the stateid | |||
| returned has an "other" value matching that of a stateid already | returned has an "other" value matching that of a stateid already | |||
| allocated, with a new "seqid" indicating a change in the lock being | allocated, with a new "seqid" indicating a change in the lock being | |||
| represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag | represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag | |||
| when responding to that new OPEN controls handling from that point | when responding to that new OPEN controls handling from that point | |||
| going forward. When parallel OPENs are done on the same file and | going forward. When parallel OPENs are done on the same file and | |||
| open-owner, the ordering of the "seqid" field of the returned stateid | open-owner, the ordering of the "seqid" field of the returned stateid | |||
| (subject to wraparound) are to be used to select the controlling | (subject to wraparound) are to be used to select the controlling | |||
| value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. | value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. | |||
| 20.11.4. IMPLEMENTATION | 20.11.4. IMPLEMENTATION | |||
| The server must not grant the lock to the client unless and until it | The server MUST NOT grant the lock to the client unless and until it | |||
| receives an actual lock request from the client. Similarly, the | receives an actual LOCK request from the client. Similarly, the | |||
| client receiving this callback cannot assume that it now has the | client receiving this callback cannot assume that it now has the | |||
| lock, or that a subsequent request for the lock will be successful. | lock, or that a subsequent LOCK request for the lock will be | |||
| successful. | ||||
| The server is not required to implement this callback, and even if it | The server is not required to implement this callback, and even if it | |||
| does, it is not required to use it in any particular case. Therefore | does, it is not required to use it in any particular case. Therefore | |||
| the client must still rely on polling for blocking locks, as | the client must still rely on polling for blocking locks, as | |||
| described in Section 9.6. | described in Section 9.6. | |||
| Similarly, the client is not required to implement this callback, and | Similarly, the client is not required to implement this callback, and | |||
| even it does, is still free to ignore it. Therefore the server MUST | even it does, is still free to ignore it. Therefore the server MUST | |||
| NOT assume that the client will act based on the callback. | NOT assume that the client will act based on the callback. | |||
| skipping to change at page 573, line 46 | skipping to change at page 578, line 47 | |||
| 20.12.2. RESULT | 20.12.2. RESULT | |||
| struct CB_NOTIFY_DEVICEID4res { | struct CB_NOTIFY_DEVICEID4res { | |||
| nfsstat4 cndr_status; | nfsstat4 cndr_status; | |||
| }; | }; | |||
| 20.12.3. DESCRIPTION | 20.12.3. DESCRIPTION | |||
| The CB_NOTIFY_DEVICEID operation is used by the server to send | The CB_NOTIFY_DEVICEID operation is used by the server to send | |||
| notifications to clients about changes to pNFS device IDs. The | notifications to clients about changes to pNFS device IDs. The | |||
| registration of device ID notifications occurs when the device | registration of device ID notifications is optional and is done via | |||
| mapping stateid is established using GETDEVICEINFO or GETDEVICELIST. | GETDEVICEINFO. These notifications are sent over the backchannel | |||
| These notifications are sent over the backchannel. The notification | once the original request has been processed on the server. The | |||
| is sent once the original request has been processed on the server. | server will send an array of notifications, cnda_changes, as a list | |||
| The server will send an array of notifications, cnda_changes, as a | of pairs of bitmaps and values. See Section 3.3.7 for a description | |||
| list of pairs of bitmaps and values. See Section 3.3.7 for a | of how NFSv4.1 bitmaps work. | |||
| description of how NFSv4.1 bitmaps work. | ||||
| As with CB_NOTIFY (Section 20.4.3), it is possible the server has | As with CB_NOTIFY (Section 20.4.3), it is possible the server has | |||
| more notifications than can fit in a CB_COMPOUND, thus requiring | more notifications than can fit in a CB_COMPOUND, thus requiring | |||
| multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an | multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an | |||
| issue because unlike directory entries, device IDs cannot be re-used | issue because unlike directory entries, device IDs cannot be re-used | |||
| after being deleted (Section 12.2.10). | after being deleted (Section 12.2.10). | |||
| All device ID notifications contain a device ID and a layout type. | All device ID notifications contain a device ID and a layout type. | |||
| The layout type is necessary because two different layout types can | The layout type is necessary because two different layout types can | |||
| share the same device ID, and the common device ID can have | share the same device ID, and the common device ID can have | |||
| completely different mappings for each layout type. | completely different mappings for each layout type. | |||
| The server will send the following notifications: | The server will send the following notifications: | |||
| NOTIFY_DEVICEID4_CHANGE | NOTIFY_DEVICEID4_CHANGE | |||
| A previously provided device ID to device address mapping has | A previously provided device ID to device address mapping has | |||
| changed and the client uses GETDEVICEINFO or GETDEVICELIST to | changed and the client uses GETDEVICEINFO to obtain the updated | |||
| obtain the updated mapping. The notification is encoded in a | mapping. The notification is encoded in a value of data type | |||
| value of data type notify_deviceid_change4. This data type also | notify_deviceid_change4. This data type also contains a boolean | |||
| contains a boolean field, ndc_immediate, which if TRUE indicates | field, ndc_immediate, which if TRUE indicates that the change will | |||
| that the change will be enforced immediately, and so the client | be enforced immediately, and so the client might not be able to | |||
| might not be able to complete any pending I/O to the device ID. | complete any pending I/O to the device ID. If ndc_immediate is | |||
| If ndc_immediate is FALSE, then for an indefinite time, the client | FALSE, then for an indefinite time, the client can complete | |||
| can complete pending I/O. After pending I/O is complete, the | pending I/O. After pending I/O is complete, the client SHOULD get | |||
| client SHOULD get the new device ID to device address mappings | the new device ID to device address mappings before issuing new | |||
| before issuing new I/O to the device ID. | I/O to the device ID. | |||
| NOTIFY4_DEVICEID_DELETE | NOTIFY4_DEVICEID_DELETE | |||
| Deletes a device ID from the mappings. This notification MUST NOT | Deletes a device ID from the mappings. This notification MUST NOT | |||
| be sent if the client has a layout that refers to the device ID. | be sent if the client has a layout that refers to the device ID. | |||
| In other words if the server is sending a delete device ID | In other words if the server is sending a delete device ID | |||
| notification, one of the following is true for layouts associated | notification, one of the following is true for layouts associated | |||
| with the layout type: | with the layout type: | |||
| * The client never had a layout referring to that device ID. | * The client never had a layout referring to that device ID. | |||
| skipping to change at page 575, line 23 | skipping to change at page 580, line 23 | |||
| /* | /* | |||
| * CB_ILLEGAL: Response for illegal operation numbers | * CB_ILLEGAL: Response for illegal operation numbers | |||
| */ | */ | |||
| struct CB_ILLEGAL4res { | struct CB_ILLEGAL4res { | |||
| nfsstat4 status; | nfsstat4 status; | |||
| }; | }; | |||
| 20.13.3. DESCRIPTION | 20.13.3. DESCRIPTION | |||
| This operation is a placeholder for encoding a result to handle the | This operation is a placeholder for encoding a result to handle the | |||
| case of the client sending an operation code within COMPOUND that is | case of the server sending an operation code within CB_COMPOUND that | |||
| not defined in the NFSv4.1 specification. See Section 16.2.3 for | is not defined in the NFSv4.1 specification. See Section 19.2.3 for | |||
| more details. | more details. | |||
| The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. | The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. | |||
| 20.13.4. IMPLEMENTATION | 20.13.4. IMPLEMENTATION | |||
| A server will probably not send an operation with code OP_CB_ILLEGAL | A server will probably not send an operation with code OP_CB_ILLEGAL | |||
| but if it does, the response will be CB_ILLEGAL4res just as it would | but if it does, the response will be CB_ILLEGAL4res just as it would | |||
| be with any other invalid operation code. Note that if the client | be with any other invalid operation code. Note that if the client | |||
| gets an illegal operation code that is not OP_ILLEGAL, and if the | gets an illegal operation code that is not OP_ILLEGAL, and if the | |||
| client checks for legal operation codes during the XDR decode phase, | client checks for legal operation codes during the XDR decode phase, | |||
| then the CB_ILLEGAL4res would not be returned. | then an instance of data type CB_ILLEGAL4res will not be returned. | |||
| 21. Security Considerations | 21. Security Considerations | |||
| NFS has historically used a model where, from an authentication | Historically the authentication of model of NFS had the entire | |||
| perspective, the client was the entire machine, or at least the | machine being the NFS client, and the NFS server trusting the NFS | |||
| source network address of the machine. The NFS server relied on the | client to authenticate the end-user. The NFS server in turn shared | |||
| NFS client to make the proper authentication of the end-user. The | its files only to specific clients, as identified by the client's | |||
| NFS server in turn shared its files only to specific clients, as | source network address. Given this model, the AUTH_SYS RPC security | |||
| identified by the client's source network address. Given this model, | flavor simply identified the end-user using the client to the NFS | |||
| the AUTH_SYS RPC security flavor simply identified the end-user using | server. When processing NFS responses, the client ensured that the | |||
| the client to the NFS server. When processing NFS responses, the | responses came from the same network address and port number that the | |||
| client ensured that the responses came from the same network address | request was sent to. While such a model is easy to implement and | |||
| and port number that the request was sent to. While such a model is | simple to deploy and use, it is unsafe. Thus, NFSv4.1 | |||
| easy to implement and simple to deploy and use, it is certainly not a | implementations are REQUIRED to support a security model that uses | |||
| safe model. Thus, NFSv4.1 implementations are REQUIRED to support a | end to end authentication, where an end-user on a client mutually | |||
| security model that uses end to end authentication, where an end-user | authenticates (via cryptographic schemes that do not expose passwords | |||
| on a client mutually authenticates (via cryptographic schemes that do | or keys in the clear on the network) to a principal on an NFS server. | |||
| not expose passwords or keys in the clear on the network) to a | Consideration is also be given to the integrity and privacy of NFS | |||
| principal on an NFS server. Consideration should also be given to | requests and responses. The issues of end to end mutual | |||
| the integrity and privacy of NFS requests and responses. The issues | authentication, integrity, and privacy are discussed | |||
| of end to end mutual authentication, integrity, and privacy are | Section 2.2.1.1.1. | |||
| discussed Section 2.2.1.1.1. | ||||
| Note that while NFSv4.1 mandates an end to end mutual authentication | Note that being REQUIRED to implement does not mean REQUIRED to use; | |||
| model, the "classic" model of machine authentication via network | AUTH_SYS can be used by NFSv4.1 clients and servers. However, | |||
| address checking and AUTH_SYS identification can still be supported | AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so | |||
| with the caveat that the AUTH_SYS flavor is neither REQUIRED nor | interoperability via AUTH_SYS is not assured. | |||
| RECOMMENDED by this specification, and so interoperability via | ||||
| AUTH_SYS is not assured. | ||||
| For reasons of reduced administration overhead, better performance | For reasons of reduced administration overhead, better performance | |||
| and/or reduction of CPU utilization, users of NFSv4.1 implementations | and/or reduction of CPU utilization, users of NFSv4.1 implementations | |||
| may opt to not use security mechanisms that enable integrity | may opt to not use security mechanisms that enable integrity | |||
| protection on each remote procedure call and response. The use of | protection on each remote procedure call and response. The use of | |||
| mechanisms without integrity leaves the user vulnerable to an | mechanisms without integrity leaves the user vulnerable to an | |||
| attacker in the middle of the NFS client and server that modifies the | attacker in the middle of the NFS client and server that modifies the | |||
| RPC request and/or the response. While implementations are free to | RPC request and/or the response. While implementations are free to | |||
| provide the option to use weaker security mechanisms, there are three | provide the option to use weaker security mechanisms, there are three | |||
| operations in particular that warrant the implementation overriding | operations in particular that warrant the implementation overriding | |||
| user choices. | user choices. | |||
| The first two such operations are SECINFO SECINFO_NO_NAME. It is | o The first two such operations are SECINFO and SECINFO_NO_NAME. It | |||
| RECOMMENDED that the client send the either operation such that it is | is RECOMMENDED that the client send both operations such that they | |||
| protected with a security flavor that has integrity protection, such | is protected with a security flavor that has integrity protection, | |||
| as RPCSEC_GSS with either the rpc_gss_svc_integrity or | such as RPCSEC_GSS with either the rpc_gss_svc_integrity or | |||
| rpc_gss_svc_privacy service. Without integrity protection | rpc_gss_svc_privacy service. Without integrity protection | |||
| encapsulating SECINFO and SECINFO_NO_NAME and their results, an | encapsulating SECINFO and SECINFO_NO_NAME and their results, an | |||
| attacker in the middle could modify results such that the client | attacker in the middle could modify results such that the client | |||
| might select a weaker algorithm in the set allowed by server, making | might select a weaker algorithm in the set allowed by server, | |||
| the client and/or server vulnerable to further attacks. | making the client and/or server vulnerable to further attacks. | |||
| The second operation that should definitely use integrity protection | o The third operation that should definitely use integrity | |||
| is any GETATTR for the fs_locations attribute. The attack has two | protection is any GETATTR for the fs_locations and | |||
| steps. First the attacker modifies the unprotected results of some | fs_locations_info attributes. The attack has two steps. First | |||
| operation to return NFS4ERR_MOVED. Second, when the client follows | the attacker modifies the unprotected results of some operation to | |||
| up with a GETATTR for the fs_locations attribute, the attacker | return NFS4ERR_MOVED. Second, when the client follows up with a | |||
| modifies the results to cause the client migrate its traffic to a | GETATTR for the fs_locations or fs_locations_info attributes, the | |||
| server controlled by the attacker. | attacker modifies the results to cause the client migrate its | |||
| traffic to a server controlled by the attacker. | ||||
| Relative to previous NFS versions, NFSv4.1 has additional security | Relative to previous NFS versions, NFSv4.1 has additional security | |||
| considerations for pNFS (see Section 12.9 and Section 13.12), locking | considerations for pNFS (see Section 12.9 and Section 13.12), locking | |||
| and session state (see Section 2.10.7.3). | and session state (see Section 2.10.7.3). | |||
| 22. IANA Considerations | 22. IANA Considerations | |||
| 22.1. Named Attribute Definitions | 22.1. Named Attribute Definitions | |||
| The NFSv4.1 protocol provides for the association of named attributes | The NFSv4.1 protocol supports the association of a file with zero or | |||
| to files. The name space identifiers for these attributes are | more named attributes. The name space identifiers for these | |||
| defined as string names. The protocol does not define the specific | attributes are defined as string names. The protocol does not define | |||
| assignment of the name space for these file attributes. Even though | the specific assignment of the name space for these file attributes. | |||
| the name space is not specifically controlled to prevent collisions, | Even though the name space is not specifically controlled to prevent | |||
| an IANA registry has been created for the registration of NFSv4.1 | collisions, an IANA registry has been created for the registration of | |||
| named attributes. Registration will be achieved through the | NFSv4.1 named attributes. Registration will be achieved through the | |||
| publication of an Informational RFC and will require not only the | publication of an Informational RFC and will require not only the | |||
| name of the attribute but the syntax and semantics of the named | name of the attribute but the syntax and semantics of the named | |||
| attribute contents; the intent is to promote interoperability where | attribute contents; the intent is to promote interoperability where | |||
| common interests exist. While application developers are allowed to | common interests exist. While application developers are allowed to | |||
| define and use attributes as needed, they are encouraged to register | define and use attributes as needed, they are encouraged to register | |||
| the attributes with IANA. | the attributes with IANA. | |||
| Such registered named attributes are presumed to apply to all minor | Such registered named attributes are presumed to apply to all minor | |||
| versions of NFSv4, including those defined subsequently to the | versions of NFSv4, including those defined subsequently to the | |||
| registration. Where the named attribute is intended to be limited | registration. Where the named attribute is intended to be limited | |||
| with regard to the minor versions for which they are not be used, the | with regard to the minor versions for which they are not be used, the | |||
| Informational RFC must clearly state the applicable limits. | Informational RFC must clearly state the applicable limits. | |||
| 22.2. ONC RPC Network Identifiers (netids) | 22.2. ONC RPC Network Identifiers (netids) | |||
| Section 3.3.9) discussed the r_netid field and the corresponding | Section 3.3.9) discussed the r_netid field and the corresponding | |||
| r_addr field within a netaddr4 structure. The NFSv4 protocol depends | r_addr field within a netaddr4 structure. The NFSv4 protocol depends | |||
| on the syntax and semantics of these fields to effectively | on the syntax and semantics of these fields to effectively | |||
| communicate callback information between client and server. | communicate callback and other information between client and server. | |||
| Therefore, an IANA registry has been created to include the values | Therefore, an IANA registry has been created to include the values | |||
| defined in this document and to allow for future expansion based on | defined in this document and to allow for future expansion based on | |||
| transport usage/availability. Additions to this ONC RPC Network | transport usage/availability. Additions to this ONC RPC Network | |||
| Identifier registry must be done with the publication of an RFC. | Identifier registry must be done with the publication of an RFC. | |||
| The initial values for this registry are as follows (some of this | The initial values for this registry are as follows (some of this | |||
| text is replicated from Section 3.3.9 for clarity): | text is replicated from Section 3.3.9 for clarity): | |||
| The Network Identifier (or r_netid for short) is used to specify a | The Network Identifier (or r_netid for short) is used to specify a | |||
| transport protocol and associated universal address (or r_addr for | transport protocol and associated universal address (or r_addr for | |||
| skipping to change at page 578, line 44 | skipping to change at page 583, line 44 | |||
| to NFSv4. This requires a new minor version of NFSv4, and requires a | to NFSv4. This requires a new minor version of NFSv4, and requires a | |||
| standards track document from IETF. Another way to add a | standards track document from IETF. Another way to add a | |||
| notification is to specify a new layout type. Notifications for new | notification is to specify a new layout type. Notifications for new | |||
| layout types would be requested via GETDEVICELIST (Section 18.41) and | layout types would be requested via GETDEVICELIST (Section 18.41) and | |||
| GETDEVICEINFO (Section 18.40). See Section 22.4). | GETDEVICEINFO (Section 18.40). See Section 22.4). | |||
| 22.4. Defining New Layout Types | 22.4. Defining New Layout Types | |||
| New layout type numbers will be requested from IANA. IANA will only | New layout type numbers will be requested from IANA. IANA will only | |||
| provide layout type numbers for Standards Track RFCs approved by the | provide layout type numbers for Standards Track RFCs approved by the | |||
| IESG, in accordance with Standards Action policy defined in RFC2434 | IESG, in accordance with Standards Action policy defined in [20]. | |||
| [20]. | All layout types assigned by IANA MUST be in the range 0x00000001 to | |||
| 0x7FFFFFFF. | ||||
| The author of a new pNFS layout specification must follow these steps | The author of a new pNFS layout specification must follow these steps | |||
| to obtain acceptance of the layout type as a standard: | to obtain acceptance of the layout type as a standard: | |||
| 1. The author devises the new layout specification. | 1. The author devises the new layout specification. | |||
| 2. The new layout type specification MUST, at a minimum: | 2. The new layout type specification MUST, at a minimum: | |||
| * Define the contents of the layout-type-specific fields of the | * Define the contents of the layout-type-specific fields of the | |||
| following data types: | following data types: | |||
| skipping to change at page 579, line 36 | skipping to change at page 584, line 36 | |||
| 1. Failure and restart for client, server, storage device. | 1. Failure and restart for client, server, storage device. | |||
| 2. Lease expiration from perspective of the active client, | 2. Lease expiration from perspective of the active client, | |||
| server, storage device. | server, storage device. | |||
| 3. Loss of layout state resulting in fencing of client access | 3. Loss of layout state resulting in fencing of client access | |||
| to storage devices (for an example, see Section 12.7.3). | to storage devices (for an example, see Section 12.7.3). | |||
| * A list of any new notification values for CB_NOTIFY_DEVICEID. | * A list of any new notification values for CB_NOTIFY_DEVICEID. | |||
| * A list of any new recallable object types for CB_RECALL_ANY. | ||||
| * Include an IANA considerations section. | * Include an IANA considerations section. | |||
| * Include a security considerations section. | * Include a security considerations section. | |||
| 3. The author documents the new layout specification as an Internet | 3. The author documents the new layout specification as an Internet | |||
| Draft. | Draft. | |||
| 4. The author submits the Internet Draft for review through the IETF | 4. The author submits the Internet Draft for review through the IETF | |||
| standards process as defined in "Internet Official Protocol | standards process as defined in "Internet Official Protocol | |||
| Standards" (STD 1). The new layout specification will be | Standards" (STD 1). The new layout specification will be | |||
| skipping to change at page 583, line 6 | skipping to change at page 588, line 7 | |||
| [27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , | [27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , | |||
| February 1996. | February 1996. | |||
| [28] Nowicki, B., "NFS: Network File System Protocol specification", | [28] Nowicki, B., "NFS: Network File System Protocol specification", | |||
| RFC 1094, March 1989. | RFC 1094, March 1989. | |||
| [29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available | [29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available | |||
| Network Server", USENIX Conference Proceedings , January 1991. | Network Server", USENIX Conference Proceedings , January 1991. | |||
| [30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS | [30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS | |||
| Operations", September 2007, <ftp://www.ietf.org/ | Operations", April 2008, <ftp://www.ietf.org/internet-drafts/ | |||
| internet-drafts/draft-nfsv4-pnfs-obj-04.txt>. | draft-nfsv4-pnfs-obj-07.txt>. | |||
| [31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume | [31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume | |||
| Layout", November 2007, <ftp://www.ietf.org/internet-drafts/ | Layout", April 2008, <ftp://www.ietf.org/internet-drafts/ | |||
| draft-ietf-nfsv4-pnfs-block-05.txt>. | draft-ietf-nfsv4-pnfs-block-08.txt>. | |||
| [32] Callaghan, B., "WebNFS Client Specification", RFC 2054, | [32] Callaghan, B., "WebNFS Client Specification", RFC 2054, | |||
| October 1996. | October 1996. | |||
| [33] Callaghan, B., "WebNFS Server Specification", RFC 2055, | [33] Callaghan, B., "WebNFS Server Specification", RFC 2055, | |||
| October 1996. | October 1996. | |||
| [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, | [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, | |||
| June 1999. | June 1999. | |||
| skipping to change at page 584, line 29 | skipping to change at page 589, line 29 | |||
| Burnett, and Charles Fan with contributions from Ted Anderson, Neil | Burnett, and Charles Fan with contributions from Ted Anderson, Neil | |||
| Brown, and Jon Haswell. | Brown, and Jon Haswell. | |||
| The initial drafts for the Directory Delegations support were | The initial drafts for the Directory Delegations support were | |||
| contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, | contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, | |||
| Carl Burnett, Ted Anderson and Tom Talpey. | Carl Burnett, Ted Anderson and Tom Talpey. | |||
| The initial drafts for the ACL explanations were contributed by Sam | The initial drafts for the ACL explanations were contributed by Sam | |||
| Falkner and Lisa Week. | Falkner and Lisa Week. | |||
| The pNFS work was inspired by the NASD and OSD work done by Garth | ||||
| Gibson. Gary Grider has also been a champion of high-performance | ||||
| parallel I/O. Garth Gibson and Peter Corbett started the pNFS effort | ||||
| with a problem statement document for IETF that formed the basis for | ||||
| the pNFS work in NFSv4.1. | ||||
| The initial drafts for the parallel NFS support were edited by Brent | The initial drafts for the parallel NFS support were edited by Brent | |||
| Welch and Garth Goodson. Additional authors for those documents were | Welch and Garth Goodson. Additional authors for those documents were | |||
| Benny Halevy, David Black, and Andy Adamson. Additional input came | Benny Halevy, David Black, and Andy Adamson. Additional input came | |||
| from the informal group which contributed to the construction of the | from the informal group which contributed to the construction of the | |||
| initial pNFS drafts; specific acknowledgement goes to Gary Grider, | initial pNFS drafts; specific acknowledgement goes to Gary Grider, | |||
| Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. | Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. | |||
| The pNFS work was inspired by the NASD and OSD work done by Garth | ||||
| Gibson. Gary Grider of the national labs (LANL) has also been a | ||||
| champion of high-performance parallel I/O. | ||||
| Fredric Isaman found several errors in draft versions of the ONC RPC | Fredric Isaman found several errors in draft versions of the ONC RPC | |||
| XDR description of the NFSv4.1 protocol. | XDR description of the NFSv4.1 protocol. | |||
| Audrey Van Bellingham provided, in numerous ways, essential co- | Audrey Van Bellingham provided, in numerous ways, essential co- | |||
| ordination and management of the process of editing the specification | ordination and management of the process of editing the specification | |||
| drafts. | drafts. | |||
| Richard Jernigan gave feedback on the file layout's striping pattern | Richard Jernigan gave feedback on the file layout's striping pattern | |||
| design. | design. | |||
| skipping to change at page 585, line 49 | skipping to change at page 590, line 51 | |||
| Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer | Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer | |||
| Shepler, Renu Tewari, Lisa Week, and Brent Welch. | Shepler, Renu Tewari, Lisa Week, and Brent Welch. | |||
| A review team worked together to generate the tables of assignments | A review team worked together to generate the tables of assignments | |||
| of error sets to operations and make sure that each such assignment | of error sets to operations and make sure that each such assignment | |||
| had two or more people validating it. Participating in the process | had two or more people validating it. Participating in the process | |||
| were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert | were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert | |||
| Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy | Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy | |||
| Weaver, and Lisa Week. | Weaver, and Lisa Week. | |||
| Others who provided comments include: Mahesh Siddheshwar. | Others who provided comments include: Jason Goldschmidt and Mahesh | |||
| Siddheshwar. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Spencer Shepler | Spencer Shepler | |||
| Sun Microsystems, Inc. | Sun Microsystems, Inc. | |||
| 7808 Moonflower Drive | 7808 Moonflower Drive | |||
| Austin, TX 78750 | Austin, TX 78750 | |||
| USA | USA | |||
| Phone: +1-512-401-1080 | Phone: +1-512-401-1080 | |||
| End of changes. 173 change blocks. | ||||
| 573 lines changed or deleted | 905 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||