Found wdiff, but it reported no recognisable version. Falling back to builtin diff colouring... Diff: draft-ietf-nfsv4-minorversion1-22.txt - draft-ietf-nfsv4-minorversion1-23.txt
 draft-ietf-nfsv4-minorversion1-22.txt   draft-ietf-nfsv4-minorversion1-23.txt 
NFSv4 S. Shepler NFSv4 S. Shepler
Internet-Draft M. Eisler Internet-Draft M. Eisler
Intended status: Standards Track D. Noveck Intended status: Standards Track D. Noveck
Expires: November 2, 2008 Editors Expires: November 10, 2008 Editors
May 1, 2008 May 9, 2008
NFS Version 4 Minor Version 1 NFS Version 4 Minor Version 1
draft-ietf-nfsv4-minorversion1-22.txt draft-ietf-nfsv4-minorversion1-23.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 35 skipping to change at page 1, line 35
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 2, 2008. This Internet-Draft will expire on November 10, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This Internet-Draft describes NFS version 4 minor version one, This Internet-Draft describes NFS version 4 minor version one,
including features retained from the base protocol and protocol including features retained from the base protocol and protocol
extensions made subsequently. Major extensions introduced in NFS extensions made subsequently. Major extensions introduced in NFS
skipping to change at page 4, line 26 skipping to change at page 4, line 26
7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147 7.4. Multiple Roots . . . . . . . . . . . . . . . . . . . . . 147
7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147 7.5. Filehandle Volatility . . . . . . . . . . . . . . . . . 147
7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147 7.6. Exported Root . . . . . . . . . . . . . . . . . . . . . 147
7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148 7.7. Mount Point Crossing . . . . . . . . . . . . . . . . . . 148
7.8. Security Policy and Namespace Presentation . . . . . . . 148 7.8. Security Policy and Namespace Presentation . . . . . . . 148
8. State Management . . . . . . . . . . . . . . . . . . . . . . 149 8. State Management . . . . . . . . . . . . . . . . . . . . . . 149
8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150 8.1. Client and Session ID . . . . . . . . . . . . . . . . . 150
8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150 8.2. Stateid Definition . . . . . . . . . . . . . . . . . . . 150
8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151 8.2.1. Stateid Types . . . . . . . . . . . . . . . . . . . 151
8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152 8.2.2. Stateid Structure . . . . . . . . . . . . . . . . . 152
8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 153 8.2.3. Special Stateids . . . . . . . . . . . . . . . . . . 154
8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155 8.2.4. Stateid Lifetime and Validation . . . . . . . . . . 155
8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158 8.2.5. Stateid Use for I/O Operations . . . . . . . . . . . 158
8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159 8.2.6. Stateid Use for SETATTR Operations . . . . . . . . . 159
8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159 8.3. Lease Renewal . . . . . . . . . . . . . . . . . . . . . 159
8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161 8.4. Crash Recovery . . . . . . . . . . . . . . . . . . . . . 161
8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162 8.4.1. Client Failure and Recovery . . . . . . . . . . . . 162
8.4.2. Server Failure and Recovery . . . . . . . . . . . . 162 8.4.2. Server Failure and Recovery . . . . . . . . . . . . 163
8.4.3. Network Partitions and Recovery . . . . . . . . . . 166 8.4.3. Network Partitions and Recovery . . . . . . . . . . 166
8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171 8.5. Server Revocation of Locks . . . . . . . . . . . . . . . 171
8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172 8.6. Short and Long Leases . . . . . . . . . . . . . . . . . 172
8.7. Clocks, Propagation Delay, and Calculating Lease 8.7. Clocks, Propagation Delay, and Calculating Lease
Expiration . . . . . . . . . . . . . . . . . . . . . . . 172 Expiration . . . . . . . . . . . . . . . . . . . . . . . 172
8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173 8.8. Obsolete Locking Infrastructure From NFSv4.0 . . . . . . 173
9. File Locking and Share Reservations . . . . . . . . . . . . . 174 9. File Locking and Share Reservations . . . . . . . . . . . . . 174
9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174 9.1. Opens and Byte-Range Locks . . . . . . . . . . . . . . . 174
9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174 9.1.1. State-owner Definition . . . . . . . . . . . . . . . 174
9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175 9.1.2. Use of the Stateid and Locking . . . . . . . . . . . 175
skipping to change at page 6, line 30 skipping to change at page 6, line 30
11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259 11.10.3. The fs_locations_item4 Structure . . . . . . . . . . 259
11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261 11.11. The Attribute fs_status . . . . . . . . . . . . . . . . 261
12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265 12. Parallel NFS (pNFS) . . . . . . . . . . . . . . . . . . . . . 265
12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265 12.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 265
12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266 12.2. pNFS Definitions . . . . . . . . . . . . . . . . . . . . 266
12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267 12.2.1. Metadata . . . . . . . . . . . . . . . . . . . . . . 267
12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267 12.2.2. Metadata Server . . . . . . . . . . . . . . . . . . 267
12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267 12.2.3. pNFS Client . . . . . . . . . . . . . . . . . . . . 267
12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267 12.2.4. Storage Device . . . . . . . . . . . . . . . . . . . 267
12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 267 12.2.5. Storage Protocol . . . . . . . . . . . . . . . . . . 267
12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 268 12.2.6. Control Protocol . . . . . . . . . . . . . . . . . . 267
12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268 12.2.7. Layout Types . . . . . . . . . . . . . . . . . . . . 268
12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 269 12.2.8. Layout . . . . . . . . . . . . . . . . . . . . . . . 268
12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269 12.2.9. Layout Iomode . . . . . . . . . . . . . . . . . . . 269
12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270 12.2.10. Device IDs . . . . . . . . . . . . . . . . . . . . . 270
12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271 12.3. pNFS Operations . . . . . . . . . . . . . . . . . . . . 271
12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272 12.4. pNFS Attributes . . . . . . . . . . . . . . . . . . . . 272
12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272 12.5. Layout Semantics . . . . . . . . . . . . . . . . . . . . 272
12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272 12.5.1. Guarantees Provided by Layouts . . . . . . . . . . . 272
12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273 12.5.2. Getting a Layout . . . . . . . . . . . . . . . . . . 273
12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274 12.5.3. Layout Stateid . . . . . . . . . . . . . . . . . . . 274
12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275 12.5.4. Committing a Layout . . . . . . . . . . . . . . . . 275
12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 279 12.5.5. Recalling a Layout . . . . . . . . . . . . . . . . . 278
12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287 12.5.6. Revoking Layouts . . . . . . . . . . . . . . . . . . 287
12.5.7. Metadata Server Write Propagation . . . . . . . . . 287 12.5.7. Metadata Server Write Propagation . . . . . . . . . 287
12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287 12.6. pNFS Mechanics . . . . . . . . . . . . . . . . . . . . . 287
12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289 12.7. Recovery . . . . . . . . . . . . . . . . . . . . . . . . 289
12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289 12.7.1. Recovery from Client Restart . . . . . . . . . . . . 289
12.7.2. Dealing with Lease Expiration on the Client . . . . 290 12.7.2. Dealing with Lease Expiration on the Client . . . . 289
12.7.3. Dealing with Loss of Layout State on the Metadata 12.7.3. Dealing with Loss of Layout State on the Metadata
Server . . . . . . . . . . . . . . . . . . . . . . . 291 Server . . . . . . . . . . . . . . . . . . . . . . . 290
12.7.4. Recovery from Metadata Server Restart . . . . . . . 291 12.7.4. Recovery from Metadata Server Restart . . . . . . . 291
12.7.5. Operations During Metadata Server Grace Period . . . 293 12.7.5. Operations During Metadata Server Grace Period . . . 293
12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 294 12.7.6. Storage Device Recovery . . . . . . . . . . . . . . 293
12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294 12.8. Metadata and Storage Device Roles . . . . . . . . . . . 294
12.9. Security Considerations for pNFS . . . . . . . . . . . . 294 12.9. Security Considerations for pNFS . . . . . . . . . . . . 294
13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295 13. PNFS: NFSv4.1 File Layout Type . . . . . . . . . . . . . . . 295
13.1. Client ID and Session Considerations . . . . . . . . . . 296 13.1. Client ID and Session Considerations . . . . . . . . . . 295
13.1.1. Sessions Considerations for Data Servers . . . . . . 298 13.1.1. Sessions Considerations for Data Servers . . . . . . 297
13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298 13.2. File Layout Definitions . . . . . . . . . . . . . . . . 298
13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299 13.3. File Layout Data Types . . . . . . . . . . . . . . . . . 299
13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303 13.4. Interpreting the File Layout . . . . . . . . . . . . . . 303
13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303 13.4.1. Determining the Stripe Unit Number . . . . . . . . . 303
13.4.2. Interpreting the File Layout Using Sparse Packing . 303 13.4.2. Interpreting the File Layout Using Sparse Packing . 303
13.4.3. Interpreting the File Layout Using Dense Packing . . 306 13.4.3. Interpreting the File Layout Using Dense Packing . . 305
13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308 13.4.4. Sparse and Dense Stripe Unit Packing . . . . . . . . 308
13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 310 13.5. Data Server Multipathing . . . . . . . . . . . . . . . . 309
13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 311 13.6. Operations Sent to NFSv4.1 Data Servers . . . . . . . . 310
13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313 13.7. COMMIT Through Metadata Server . . . . . . . . . . . . . 313
13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 315 13.8. The Layout Iomode . . . . . . . . . . . . . . . . . . . 314
13.9. Metadata and Data Server State Coordination . . . . . . 315 13.9. Metadata and Data Server State Coordination . . . . . . 314
13.9.1. Global Stateid Requirements . . . . . . . . . . . . 315 13.9.1. Global Stateid Requirements . . . . . . . . . . . . 314
13.9.2. Data Server State Propagation . . . . . . . . . . . 316 13.9.2. Data Server State Propagation . . . . . . . . . . . 315
13.10. Data Server Component File Size . . . . . . . . . . . . 318 13.10. Data Server Component File Size . . . . . . . . . . . . 317
13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 319 13.11. Layout Revocation and Fencing . . . . . . . . . . . . . 318
13.12. Security Considerations for the File Layout Type . . . . 319 13.12. Security Considerations for the File Layout Type . . . . 319
14. Internationalization . . . . . . . . . . . . . . . . . . . . 320 14. Internationalization . . . . . . . . . . . . . . . . . . . . 320
14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321 14.1. Stringprep profile for the utf8str_cs type . . . . . . . 321
14.2. Stringprep profile for the utf8str_cis type . . . . . . 323 14.2. Stringprep profile for the utf8str_cis type . . . . . . 322
14.3. Stringprep profile for the utf8str_mixed type . . . . . 324 14.3. Stringprep profile for the utf8str_mixed type . . . . . 324
14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 326 14.4. UTF-8 Capabilities . . . . . . . . . . . . . . . . . . . 325
14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 326 14.5. UTF-8 Related Errors . . . . . . . . . . . . . . . . . . 325
15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 327 15. Error Values . . . . . . . . . . . . . . . . . . . . . . . . 326
15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 327 15.1. Error Definitions . . . . . . . . . . . . . . . . . . . 326
15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 329 15.1.1. General Errors . . . . . . . . . . . . . . . . . . . 328
15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 331 15.1.2. Filehandle Errors . . . . . . . . . . . . . . . . . 330
15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332 15.1.3. Compound Structure Errors . . . . . . . . . . . . . 332
15.1.4. File System Errors . . . . . . . . . . . . . . . . . 334 15.1.4. File System Errors . . . . . . . . . . . . . . . . . 333
15.1.5. State Management Errors . . . . . . . . . . . . . . 336 15.1.5. State Management Errors . . . . . . . . . . . . . . 335
15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 337 15.1.6. Security Errors . . . . . . . . . . . . . . . . . . 336
15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337 15.1.7. Name Errors . . . . . . . . . . . . . . . . . . . . 337
15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 338 15.1.8. Locking Errors . . . . . . . . . . . . . . . . . . . 337
15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339 15.1.9. Reclaim Errors . . . . . . . . . . . . . . . . . . . 339
15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 340 15.1.10. pNFS Errors . . . . . . . . . . . . . . . . . . . . 339
15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341 15.1.11. Session Use Errors . . . . . . . . . . . . . . . . . 341
15.1.12. Session Management Errors . . . . . . . . . . . . . 343 15.1.12. Session Management Errors . . . . . . . . . . . . . 342
15.1.13. Client Management Errors . . . . . . . . . . . . . . 343 15.1.13. Client Management Errors . . . . . . . . . . . . . . 342
15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 344 15.1.14. Delegation Errors . . . . . . . . . . . . . . . . . 343
15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344 15.1.15. Attribute Handling Errors . . . . . . . . . . . . . 344
15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 345 15.1.16. Obsoleted Errors . . . . . . . . . . . . . . . . . . 344
15.2. Operations and their valid errors . . . . . . . . . . . 346 15.2. Operations and their valid errors . . . . . . . . . . . 345
15.3. Callback operations and their valid errors . . . . . . . 362 15.3. Callback operations and their valid errors . . . . . . . 361
15.4. Errors and the operations that use them . . . . . . . . 364 15.4. Errors and the operations that use them . . . . . . . . 363
16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 378 16. NFSv4.1 Procedures . . . . . . . . . . . . . . . . . . . . . 377
16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 378 16.1. Procedure 0: NULL - No Operation . . . . . . . . . . . . 377
16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 379 16.2. Procedure 1: COMPOUND - Compound Operations . . . . . . 378
17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 390 17. Operations: REQUIRED, RECOMMENDED, or OPTIONAL . . . . . . . 389
18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 393 18. NFSv4.1 Operations . . . . . . . . . . . . . . . . . . . . . 392
18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 393 18.1. Operation 3: ACCESS - Check Access Rights . . . . . . . 392
18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 399 18.2. Operation 4: CLOSE - Close File . . . . . . . . . . . . 398
18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 400 18.3. Operation 5: COMMIT - Commit Cached Data . . . . . . . . 399
18.4. Operation 6: CREATE - Create a Non-Regular File Object . 403 18.4. Operation 6: CREATE - Create a Non-Regular File Object . 402
18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting 18.5. Operation 7: DELEGPURGE - Purge Delegations Awaiting
Recovery . . . . . . . . . . . . . . . . . . . . . . . . 406 Recovery . . . . . . . . . . . . . . . . . . . . . . . . 405
18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 407 18.6. Operation 8: DELEGRETURN - Return Delegation . . . . . . 406
18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 407 18.7. Operation 9: GETATTR - Get Attributes . . . . . . . . . 406
18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 409 18.8. Operation 10: GETFH - Get Current Filehandle . . . . . . 408
18.9. Operation 11: LINK - Create Link to a File . . . . . . . 410 18.9. Operation 11: LINK - Create Link to a File . . . . . . . 409
18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 413 18.10. Operation 12: LOCK - Create Lock . . . . . . . . . . . . 412
18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 417 18.11. Operation 13: LOCKT - Test For Lock . . . . . . . . . . 416
18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 418 18.12. Operation 14: LOCKU - Unlock File . . . . . . . . . . . 417
18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 420 18.13. Operation 15: LOOKUP - Lookup Filename . . . . . . . . . 419
18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 421 18.14. Operation 16: LOOKUPP - Lookup Parent Directory . . . . 420
18.15. Operation 17: NVERIFY - Verify Difference in 18.15. Operation 17: NVERIFY - Verify Difference in
Attributes . . . . . . . . . . . . . . . . . . . . . . . 423 Attributes . . . . . . . . . . . . . . . . . . . . . . . 422
18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 424 18.16. Operation 18: OPEN - Open a Regular File . . . . . . . . 423
18.17. Operation 19: OPENATTR - Open Named Attribute 18.17. Operation 19: OPENATTR - Open Named Attribute
Directory . . . . . . . . . . . . . . . . . . . . . . . 443 Directory . . . . . . . . . . . . . . . . . . . . . . . 442
18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 444 18.18. Operation 21: OPEN_DOWNGRADE - Reduce Open File Access . 443
18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 446 18.19. Operation 22: PUTFH - Set Current Filehandle . . . . . . 445
18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 446 18.20. Operation 23: PUTPUBFH - Set Public Filehandle . . . . . 445
18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 448 18.21. Operation 24: PUTROOTFH - Set Root Filehandle . . . . . 447
18.22. Operation 25: READ - Read from File . . . . . . . . . . 449 18.22. Operation 25: READ - Read from File . . . . . . . . . . 448
18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 451 18.23. Operation 26: READDIR - Read Directory . . . . . . . . . 450
18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 455 18.24. Operation 27: READLINK - Read Symbolic Link . . . . . . 454
18.25. Operation 28: REMOVE - Remove File System Object . . . . 456 18.25. Operation 28: REMOVE - Remove File System Object . . . . 455
18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 458 18.26. Operation 29: RENAME - Rename Directory Entry . . . . . 457
18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 462 18.27. Operation 31: RESTOREFH - Restore Saved Filehandle . . . 461
18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 463 18.28. Operation 32: SAVEFH - Save Current Filehandle . . . . . 462
18.29. Operation 33: SECINFO - Obtain Available Security . . . 464 18.29. Operation 33: SECINFO - Obtain Available Security . . . 463
18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 468 18.30. Operation 34: SETATTR - Set Attributes . . . . . . . . . 467
18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 471 18.31. Operation 37: VERIFY - Verify Same Attributes . . . . . 470
18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 472 18.32. Operation 38: WRITE - Write to File . . . . . . . . . . 471
18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 476 18.33. Operation 40: BACKCHANNEL_CTL - Backchannel control . . 475
18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 478 18.34. Operation 41: BIND_CONN_TO_SESSION . . . . . . . . . . . 477
18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 481 18.35. Operation 42: EXCHANGE_ID - Instantiate Client ID . . . 480
18.36. Operation 43: CREATE_SESSION - Create New Session and 18.36. Operation 43: CREATE_SESSION - Create New Session and
Confirm Client ID . . . . . . . . . . . . . . . . . . . 498 Confirm Client ID . . . . . . . . . . . . . . . . . . . 497
18.37. Operation 44: DESTROY_SESSION - Destroy existing 18.37. Operation 44: DESTROY_SESSION - Destroy existing
session . . . . . . . . . . . . . . . . . . . . . . . . 508 session . . . . . . . . . . . . . . . . . . . . . . . . 507
18.38. Operation 45: FREE_STATEID - Free stateid with no 18.38. Operation 45: FREE_STATEID - Free stateid with no
locks . . . . . . . . . . . . . . . . . . . . . . . . . 509 locks . . . . . . . . . . . . . . . . . . . . . . . . . 508
18.39. Operation 46: GET_DIR_DELEGATION - Get a directory 18.39. Operation 46: GET_DIR_DELEGATION - Get a directory
delegation . . . . . . . . . . . . . . . . . . . . . . . 510 delegation . . . . . . . . . . . . . . . . . . . . . . . 509
18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 514 18.40. Operation 47: GETDEVICEINFO - Get Device Information . . 513
18.41. Operation 48: GETDEVICELIST - Get All Device Mappings 18.41. Operation 48: GETDEVICELIST - Get All Device Mappings
for a File System . . . . . . . . . . . . . . . . . . . 516 for a File System . . . . . . . . . . . . . . . . . . . 515
18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using 18.42. Operation 49: LAYOUTCOMMIT - Commit writes made using
a layout . . . . . . . . . . . . . . . . . . . . . . . . 518 a layout . . . . . . . . . . . . . . . . . . . . . . . . 517
18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 521 18.43. Operation 50: LAYOUTGET - Get Layout Information . . . . 520
18.44. Operation 51: LAYOUTRETURN - Release Layout 18.44. Operation 51: LAYOUTRETURN - Release Layout
Information . . . . . . . . . . . . . . . . . . . . . . 526 Information . . . . . . . . . . . . . . . . . . . . . . 530
18.45. Operation 52: SECINFO_NO_NAME - Get Security on 18.45. Operation 52: SECINFO_NO_NAME - Get Security on
Unnamed Object . . . . . . . . . . . . . . . . . . . . . 530 Unnamed Object . . . . . . . . . . . . . . . . . . . . . 534
18.46. Operation 53: SEQUENCE - Supply per-procedure 18.46. Operation 53: SEQUENCE - Supply per-procedure
sequencing and control . . . . . . . . . . . . . . . . . 531 sequencing and control . . . . . . . . . . . . . . . . . 536
18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 537 18.47. Operation 54: SET_SSV - Update SSV for a Client ID . . . 541
18.48. Operation 55: TEST_STATEID - Test stateids for 18.48. Operation 55: TEST_STATEID - Test stateids for
validity . . . . . . . . . . . . . . . . . . . . . . . . 539 validity . . . . . . . . . . . . . . . . . . . . . . . . 543
18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 541 18.49. Operation 56: WANT_DELEGATION - Request Delegation . . . 545
18.50. Operation 57: DESTROY_CLIENTID - Destroy existing 18.50. Operation 57: DESTROY_CLIENTID - Destroy existing
client ID . . . . . . . . . . . . . . . . . . . . . . . 545 client ID . . . . . . . . . . . . . . . . . . . . . . . 549
18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims
Finished . . . . . . . . . . . . . . . . . . . . . . . . 545 Finished . . . . . . . . . . . . . . . . . . . . . . . . 549
18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 548 18.52. Operation 10044: ILLEGAL - Illegal operation . . . . . . 552
19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 548 19. NFSv4.1 Callback Procedures . . . . . . . . . . . . . . . . . 552
19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 549 19.1. Procedure 0: CB_NULL - No Operation . . . . . . . . . . 553
19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 549 19.2. Procedure 1: CB_COMPOUND - Compound Operations . . . . . 553
20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 553 20. NFSv4.1 Callback Operations . . . . . . . . . . . . . . . . . 557
20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 553 20.1. Operation 3: CB_GETATTR - Get Attributes . . . . . . . . 557
20.2. Operation 4: CB_RECALL - Recall an Open Delegation . . . 554 20.2. Operation 4: CB_RECALL - Recall a Delegation . . . . . . 558
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from
Client . . . . . . . . . . . . . . . . . . . . . . . . . 555 Client . . . . . . . . . . . . . . . . . . . . . . . . . 559
20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 559 20.4. Operation 6: CB_NOTIFY - Notify directory changes . . . 563
20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to 20.5. Operation 7: CB_PUSH_DELEG - Offer Delegation to
Client . . . . . . . . . . . . . . . . . . . . . . . . . 563 Client . . . . . . . . . . . . . . . . . . . . . . . . . 567
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations . . 564 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable
objects . . . . . . . . . . . . . . . . . . . . . . . . 568
20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal 20.7. Operation 9: CB_RECALLABLE_OBJ_AVAIL - Signal
Resources for Recallable Objects . . . . . . . . . . . . 566 Resources for Recallable Objects . . . . . . . . . . . . 571
20.8. Operation 10: CB_RECALL_SLOT - change flow control 20.8. Operation 10: CB_RECALL_SLOT - change flow control
limits . . . . . . . . . . . . . . . . . . . . . . . . . 567 limits . . . . . . . . . . . . . . . . . . . . . . . . . 572
20.9. Operation 11: CB_SEQUENCE - Supply backchannel 20.9. Operation 11: CB_SEQUENCE - Supply backchannel
sequencing and control . . . . . . . . . . . . . . . . . 568 sequencing and control . . . . . . . . . . . . . . . . . 573
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending
Delegation Wants . . . . . . . . . . . . . . . . . . . . 570 Delegation Wants . . . . . . . . . . . . . . . . . . . . 575
20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible 20.11. Operation 13: CB_NOTIFY_LOCK - Notify of possible
lock availability . . . . . . . . . . . . . . . . . . . 571 lock availability . . . . . . . . . . . . . . . . . . . 576
20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID 20.12. Operation 14: CB_NOTIFY_DEVICEID - Notify device ID
changes . . . . . . . . . . . . . . . . . . . . . . . . 573 changes . . . . . . . . . . . . . . . . . . . . . . . . 578
20.13. Operation 10044: CB_ILLEGAL - Illegal Callback 20.13. Operation 10044: CB_ILLEGAL - Illegal Callback
Operation . . . . . . . . . . . . . . . . . . . . . . . 575 Operation . . . . . . . . . . . . . . . . . . . . . . . 580
21. Security Considerations . . . . . . . . . . . . . . . . . . . 575 21. Security Considerations . . . . . . . . . . . . . . . . . . . 580
22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 577 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 582
22.1. Named Attribute Definitions . . . . . . . . . . . . . . 577 22.1. Named Attribute Definitions . . . . . . . . . . . . . . 582
22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 577 22.2. ONC RPC Network Identifiers (netids) . . . . . . . . . . 582
22.3. Defining New Notifications . . . . . . . . . . . . . . . 578 22.3. Defining New Notifications . . . . . . . . . . . . . . . 583
22.4. Defining New Layout Types . . . . . . . . . . . . . . . 578 22.4. Defining New Layout Types . . . . . . . . . . . . . . . 583
22.5. Path Variable Definitions . . . . . . . . . . . . . . . 580 22.5. Path Variable Definitions . . . . . . . . . . . . . . . 585
22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 580 22.5.1. Path Variable Values . . . . . . . . . . . . . . . . 585
22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 580 22.5.2. Path Variable Names . . . . . . . . . . . . . . . . 585
23. References . . . . . . . . . . . . . . . . . . . . . . . . . 580 23. References . . . . . . . . . . . . . . . . . . . . . . . . . 585
23.1. Normative References . . . . . . . . . . . . . . . . . . 580 23.1. Normative References . . . . . . . . . . . . . . . . . . 585
23.2. Informative References . . . . . . . . . . . . . . . . . 582 23.2. Informative References . . . . . . . . . . . . . . . . . 587
Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 584 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 589
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 586 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 591
Intellectual Property and Copyright Statements . . . . . . . . . 587 Intellectual Property and Copyright Statements . . . . . . . . . 592
1. Introduction 1. Introduction
1.1. The NFS Version 4 Minor Version 1 Protocol 1.1. The NFS Version 4 Minor Version 1 Protocol
The NFS version 4 minor version 1 (NFSv4.1) protocol is the second The NFS version 4 minor version 1 (NFSv4.1) protocol is the second
minor version of the NFS version 4 (NFSv4) protocol. The first minor minor version of the NFS version 4 (NFSv4) protocol. The first minor
version, NFSv4.0 is described in [21]. It generally follows the version, NFSv4.0 is described in [21]. It generally follows the
guidelines for minor versioning model listed in Section 10 of RFC guidelines for minor versioning model listed in Section 10 of RFC
3530. However, it diverges from guidelines 11 ("a client and server 3530. However, it diverges from guidelines 11 ("a client and server
skipping to change at page 26, line 12 skipping to change at page 26, line 12
information to distinguish the client from other user level information to distinguish the client from other user level
clients running on the same host, such as a process identifier or clients running on the same host, such as a process identifier or
other unique sequence. other unique sequence.
The client ID is assigned by the server (the eir_clientid result from The client ID is assigned by the server (the eir_clientid result from
EXCHANGE_ID) and should be chosen so that it will not conflict with a EXCHANGE_ID) and should be chosen so that it will not conflict with a
client ID previously assigned by the server. This applies across client ID previously assigned by the server. This applies across
server restarts. server restarts.
In the event of a server restart, a client may find out that its In the event of a server restart, a client may find out that its
current client ID is no longer valid when it receives a current client ID is no longer valid when it receives an
NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on NFS4ERR_STALE_CLIENTID error. The precise circumstances depend on
the characteristics of the sessions involved, specifically whether the characteristics of the sessions involved, specifically whether
the session is persistent (see Section 2.10.5.5), but in each case the session is persistent (see Section 2.10.5.5), but in each case
the client will receive this error when it attempts to establish a the client will receive this error when it attempts to establish a
new session with the existing client ID and receives the error new session with the existing client ID and receives the error
NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be NFS4ERR_STALE_CLIENTID, indicating that a new client ID must be
obtained via EXCHANGE_ID and the new session established with that obtained via EXCHANGE_ID and the new session established with that
client ID. client ID.
When a session is not persistent, the client will find out that it When a session is not persistent, the client will find out that it
skipping to change at page 46, line 7 skipping to change at page 46, line 7
two different EXCHANGE_ID requests, and the eir_clientid, two different EXCHANGE_ID requests, and the eir_clientid,
eir_server_owner.so_major_id, and eir_server_scope results match eir_server_owner.so_major_id, and eir_server_scope results match
in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id in both EXCHANGE_ID results, but the eir_server_owner.so_minor_id
results do not match then the client is permitted to perform results do not match then the client is permitted to perform
client ID trunking. The client can associate each connection with client ID trunking. The client can associate each connection with
different sessions, where each session is associated with the same different sessions, where each session is associated with the same
server. server.
Of course, even if the eir_server_owner.so_minor_id fields do Of course, even if the eir_server_owner.so_minor_id fields do
match, the client is free to employ client ID trunking instead of match, the client is free to employ client ID trunking instead of
sessiond trunking. session trunking.
The client completes the act of client ID trunking by invoking The client completes the act of client ID trunking by invoking
CREATE_SESSION on each connection, using the same client ID that CREATE_SESSION on each connection, using the same client ID that
was returned in eir_clientid. These invocations create two was returned in eir_clientid. These invocations create two
sessions and also associate each connection with each session. sessions and also associate each connection with each session.
When doing client ID trunking, locking state is shared across When doing client ID trunking, locking state is shared across
sessions associated with the same client ID. This requires the sessions associated with the same client ID. This requires the
server to coordinate state across sessions. server to coordinate state across sessions.
skipping to change at page 51, line 37 skipping to change at page 51, line 37
CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is CB_SEQUENCE (e.g. BIND_CONN_TO_SESSION), then the RPC XID is
needed for correct operation to match the reply to the request. needed for correct operation to match the reply to the request.
o The SEQUENCE or CB_SEQUENCE operation may generate an error. If o The SEQUENCE or CB_SEQUENCE operation may generate an error. If
so, the embedded slot id, sequence id, and sessionid (if present) so, the embedded slot id, sequence id, and sessionid (if present)
in the request will not be in the reply, and the requester has in the request will not be in the reply, and the requester has
only the XID to match the reply to the request. only the XID to match the reply to the request.
Given that well formulated XIDs continue to be required, this begs Given that well formulated XIDs continue to be required, this begs
the question why SEQUENCE and CB_SEQUENCE replies have a sessionid, the question why SEQUENCE and CB_SEQUENCE replies have a sessionid,
slot id and sequence id? Having the sessionid in the reply means the slot id and sequence id? Having the session id in the reply means
requester does not have to use the XID to lookup the sessionid, which the requester does not have to use the XID to lookup the session id,
would be necessary if the connection were associated with multiple which would be necessary if the connection were associated with
sessions. Having the slot id and sequence id in the reply means multiple sessions. Having the slot id and sequence id in the reply
requester does not have to use the XID to lookup the slot id and means requester does not have to use the XID to lookup the slot id
sequence id. Furhermore, since the XID is only 32 bits, it is too and sequence id. Furhermore, since the XID is only 32 bits, it is
small to guarantee the re-association of a reply with its request too small to guarantee the re-association of a reply with its request
([27]); having sessionid, slot id, and sequence id in the reply ([27]); having sessionid, slot id, and sequence id in the reply
allows the client to validate that the reply in fact belongs to the allows the client to validate that the reply in fact belongs to the
matched request. matched request.
The SEQUENCE (and CB_SEQUENCE) operation also carries a The SEQUENCE (and CB_SEQUENCE) operation also carries a
"highest_slotid" value which carries additional requester slot usage "highest_slotid" value which carries additional requester slot usage
information. The requester must always indicate the slot id information. The requester must always indicate the slot id
representing the outstanding request with the highest-numbered slot representing the outstanding request with the highest-numbered slot
value. The requester should in all cases provide the most value. The requester should in all cases provide the most
conservative value possible, although it can be increased somewhat conservative value possible, although it can be increased somewhat
skipping to change at page 53, line 30 skipping to change at page 53, line 30
entries at least as large as the old value of maximum requests entries at least as large as the old value of maximum requests
outstanding, until it can infer that the requester has seen a outstanding, until it can infer that the requester has seen a
reply containing the new granted highest_slotid. The replier can reply containing the new granted highest_slotid. The replier can
infer that requester as seen such a reply when it receives a new infer that requester as seen such a reply when it receives a new
request with the same slotid as the request replied to and the request with the same slotid as the request replied to and the
next higher sequenceid. next higher sequenceid.
2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies 2.10.5.1.1. Caching of SEQUENCE and CB_SEQUENCE Replies
When a SEQUENCE or CB_SEQUENCE operation is successfully executed, When a SEQUENCE or CB_SEQUENCE operation is successfully executed,
its reply MUST always be cached. Specifically, sessionid, its reply MUST always be cached. Specifically, session id, sequence
sequenceid, and slotid MUST be cached in the reply cache. The reply id, and slot id MUST be cached in the reply cache. The reply from
from SEQUENCE also includes the highest slotid, target highest SEQUENCE also includes the highest slot id, target highest slot id,
slotid, and status flags. Instead of caching these values, the and status flags. Instead of caching these values, the server MAY
server MAY re-compute the values from the current state of the fore re-compute the values from the current state of the fore channel,
channel, session and/or client ID as appropriate. Similarly, the session and/or client ID as appropriate. Similarly, the reply from
reply from CB_SEQUENCE includes a highest slotid and target highest CB_SEQUENCE includes a highest slot id and target highest slot id.
slotid. The client MAY re-compute the values from the current state The client MAY re-compute the values from the current state of the
of the session as appropriate. session as appropriate.
Regardless of whether a replier is re-computing highest slotid, Regardless of whether a replier is re-computing highest slotid,
target slotid, and status on replies to retries or not, the requester target slot id, and status on replies to retries or not, the
MUST NOT assume the values are being re-computed whenever it receives requester MUST NOT assume the values are being re-computed whenever
a reply after a retry is sent, since it has no way of knowing whether it receives a reply after a retry is sent, since it has no way of
the reply it has received was sent by the server in response to the knowing whether the reply it has received was sent by the server in
retry, or is a delayed response to the original request. Therefore, response to the retry, or is a delayed response to the original
it may be the case that highest slotid, target slotid, or status bits request. Therefore, it may be the case that highest slot id, target
may reflect the state of affairs when the request was first executed. slot id, or status bits may reflect the state of affairs when the
Although acting based on such delayed information is valid, it may request was first executed. Although acting based on such delayed
cause the receiver to do unneeded work. Requesters MAY choose to information is valid, it may cause the receiver to do unneeded work.
send additional requests to get the current state of affairs or use Requesters MAY choose to send additional requests to get the current
the state of affairs reported by subsequent requests, in preference state of affairs or use the state of affairs reported by subsequent
to acting immediately on data which may be out of date. requests, in preference to acting immediately on data which may be
out of date.
2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE 2.10.5.1.2. Errors from SEQUENCE and CB_SEQUENCE
Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of Any time SEQUENCE or CB_SEQUENCE return an error, the sequence id of
the slot MUST NOT change. The replier MUST NOT modify the reply the slot MUST NOT change. The replier MUST NOT modify the reply
cache entry for the slot whenever an error is returned from SEQUENCE cache entry for the slot whenever an error is returned from SEQUENCE
or CB_SEQUENCE. or CB_SEQUENCE.
2.10.5.1.3. Optional Reply Caching 2.10.5.1.3. Optional Reply Caching
skipping to change at page 56, line 19 skipping to change at page 56, line 19
client may have been granted a delegation to a file it has opened, client may have been granted a delegation to a file it has opened,
but the reply to the OPEN (informing the client of the granting of but the reply to the OPEN (informing the client of the granting of
the delegation) may be delayed in the network. If a conflicting the delegation) may be delayed in the network. If a conflicting
operation arrives at the server, it will recall the delegation using operation arrives at the server, it will recall the delegation using
the backchannel, which may be on a different transport connection, the backchannel, which may be on a different transport connection,
perhaps even a different network, or even a different session perhaps even a different network, or even a different session
associated with the same client ID associated with the same client ID
The presence of a session between client and server alleviates this The presence of a session between client and server alleviates this
issue. When a session is in place, each client request is uniquely issue. When a session is in place, each client request is uniquely
identified by its { sessionid, slot id, sequence id } triple. By the identified by its { session id, slot id, sequence id } triple. By
rules under which slot entries (reply cache entries) are retired, the the rules under which slot entries (reply cache entries) are retired,
server has knowledge whether the client has "seen" each of the the server has knowledge whether the client has "seen" each of the
server's replies. The server can therefore provide sufficient server's replies. The server can therefore provide sufficient
information to the client to allow it to disambiguate between an information to the client to allow it to disambiguate between an
erroneous or conflicting callback race condition. erroneous or conflicting callback race condition.
For each client operation which might result in some sort of server For each client operation which might result in some sort of server
callback, the server SHOULD "remember" the { sessionid, slot id, callback, the server SHOULD "remember" the { sessionid, slot id,
sequence id } triple of the client request until the slot id sequence id } triple of the client request until the slot id
retirement rules allow the server to determine that the client has, retirement rules allow the server to determine that the client has,
in fact, seen the server's reply. Until the time the { sessionid, in fact, seen the server's reply. Until the time the { sessionid,
slot id, sequence id } request triple can be retired, any recalls of slot id, sequence id } request triple can be retired, any recalls of
the associated object MUST carry an array of these referring the associated object MUST carry an array of these referring
identifiers (in the CB_SEQUENCE operation's arguments), for the identifiers (in the CB_SEQUENCE operation's arguments), for the
benefit of the client. After this time, it is not necessary for the benefit of the client. After this time, it is not necessary for the
server to provide this information in related callbacks, since it is server to provide this information in related callbacks, since it is
certain that a race condition can no longer occur. certain that a race condition can no longer occur.
The CB_SEQUENCE operation which begins each server callback carries a The CB_SEQUENCE operation which begins each server callback carries a
list of "referring" { sessionid, slot id, sequence id } triples. If list of "referring" { sessionid, slot id, sequence id } triples. If
the client finds the request corresponding to the referring the client finds the request corresponding to the referring session
sessionid, slot id and sequence id to be currently outstanding (i.e. id, slot id and sequence id to be currently outstanding (i.e. the
the server's reply has not been seen by the client), it can determine server's reply has not been seen by the client), it can determine
that the callback has raced the reply, and act accordingly. If the that the callback has raced the reply, and act accordingly. If the
client does not find the request corresponding the referring triple client does not find the request corresponding the referring triple
to be outstanding (including the case of a sessionid referring to a to be outstanding (including the case of a sessionid referring to a
destroyed session), then there is no race with respect to this destroyed session), then there is no race with respect to this
triple. The server SHOULD limit the referring triples to requests triple. The server SHOULD limit the referring triples to requests
that refer to just those that apply to the objects referred to in the that refer to just those that apply to the objects referred to in the
CB_COMPOUND procedure. CB_COMPOUND procedure.
The client must not simply wait forever for the expected server reply The client must not simply wait forever for the expected server reply
to arrive before responding to the CB_COMPOUND that won the race, to arrive before responding to the CB_COMPOUND that won the race,
skipping to change at page 57, line 28 skipping to change at page 57, line 28
back), the client and server negotiate the maximum sized request they back), the client and server negotiate the maximum sized request they
will send or process (ca_maxrequestsize), the maximum sized reply will send or process (ca_maxrequestsize), the maximum sized reply
they will return or process (ca_maxresponsesize), and the maximum they will return or process (ca_maxresponsesize), and the maximum
sized reply they will store in the reply cache sized reply they will store in the reply cache
(ca_maxresponsesize_cached). (ca_maxresponsesize_cached).
If a request exceeds ca_maxrequestsize, the reply will have the If a request exceeds ca_maxrequestsize, the reply will have the
status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG status NFS4ERR_REQ_TOO_BIG. A replier MAY return NFS4ERR_REQ_TOO_BIG
as the status for first operation (SEQUENCE or CB_SEQUENCE) in the as the status for first operation (SEQUENCE or CB_SEQUENCE) in the
request (which means no operations in the request executed, and the request (which means no operations in the request executed, and the
state of the slot in the reply cache is unchanged), or it MAY chose state of the slot in the reply cache is unchanged), or it MAY opt to
to return it on a subsequent operation in the same COMPOUND or return it on a subsequent operation in the same COMPOUND or
CB_COMPOUND request (which means at least one operation did execute CB_COMPOUND request (which means at least one operation did execute
and the state of the slot in reply cache does change). The replier and the state of the slot in reply cache does change). The replier
SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds SHOULD set NFS4ERR_REQ_TOO_BIG on the operation that exceeds
ca_maxrequestsize. ca_maxrequestsize.
If a reply exceeds ca_maxresponsesize, the reply will have the status If a reply exceeds ca_maxresponsesize, the reply will have the status
NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the NFS4ERR_REP_TOO_BIG. A replier MAY return NFS4ERR_REP_TOO_BIG as the
status for first operation (SEQUENCE or CB_SEQUENCE) in the request, status for first operation (SEQUENCE or CB_SEQUENCE) in the request,
or it MAY chose to return it on a subsequent operation (in the same or it MAY opt to return it on a subsequent operation (in the same
COMPOUND or CB_COMPOUND reply). A replier MAY return COMPOUND or CB_COMPOUND reply). A replier MAY return
NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if NFS4ERR_REP_TOO_BIG in the reply to SEQUENCE or CB_SEQUENCE, even if
the response would still exceed ca_maxresponsesize. the response would still exceed ca_maxresponsesize.
If sa_cachethis or csa_cachethis are TRUE, then the replier MUST If sa_cachethis or csa_cachethis are TRUE, then the replier MUST
cache a reply except if an error is returned by the SEQUENCE or cache a reply except if an error is returned by the SEQUENCE or
CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds CB_SEQUENCE operation (see Section 2.10.5.1.2). If the reply exceeds
ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are ca_maxresponsesize_cached, (and sa_cachethis or csa_cachethis are
TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even TRUE) then the server MUST return NFS4ERR_REP_TOO_BIG_TO_CACHE. Even
if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter) if NFS4ERR_REP_TOO_BIG_TO_CACHE (or any other error for that matter)
skipping to change at page 59, line 37 skipping to change at page 59, line 37
sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by sequence id) MUST be rejected with NFS4ERR_DEADSESSION (returned by
SEQUENCE). Such a session is considered dead. A server MAY re- SEQUENCE). Such a session is considered dead. A server MAY re-
animate a session after a server restart so that the session will animate a session after a server restart so that the session will
accept new requests as well as retries. To re-animate a session the accept new requests as well as retries. To re-animate a session the
server needs to persist additional information through server server needs to persist additional information through server
restart: restart:
o The client ID. This is a prerequisite to let the client to create o The client ID. This is a prerequisite to let the client to create
more sessions associated with the same client ID as the more sessions associated with the same client ID as the
o The client ID's sequenceid that is used for creating sessions (see o The client ID's sequence id that is used for creating sessions
Section 18.35 and Section 18.36. This is a prerequisite to let (see Section 18.35 and Section 18.36). This is a prerequisite to
the client create more sessions. let the client create more sessions.
o The principal that created the client ID. This allows the server o The principal that created the client ID. This allows the server
to authenticate the client when it sends EXCHANGE_ID. to authenticate the client when it sends EXCHANGE_ID.
o The SSV, if SP4_SSV state protection was specified when the client o The SSV, if SP4_SSV state protection was specified when the client
ID was created (see Section 18.35). This lets the client create ID was created (see Section 18.35). This lets the client create
new sessions, and associate connections with the new and existing new sessions, and associate connections with the new and existing
sessions. sessions.
o The properties of the client ID as defined in Section 18.35. o The properties of the client ID as defined in Section 18.35.
skipping to change at page 76, line 21 skipping to change at page 76, line 21
o A catastrophe that causes the reply cache to be corrupted or lost o A catastrophe that causes the reply cache to be corrupted or lost
on the media it was stored on. This applies even if the replier on the media it was stored on. This applies even if the replier
indicated in the CREATE_SESSION results that it would persist the indicated in the CREATE_SESSION results that it would persist the
cache. cache.
o The server purges the session of a client that has been inactive o The server purges the session of a client that has been inactive
for a very extended period of time. for a very extended period of time.
Loss of reply cache is equivalent to loss of session. The replier Loss of reply cache is equivalent to loss of session. The replier
indicates loss of session to the requester by returning indicates loss of session to the requester by returning
NFS4ERR_BADSESSION on the next operation that uses the sessionid that NFS4ERR_BADSESSION on the next operation that uses the session id
refers to the lost session. that refers to the lost session.
After an event like a server restart, the client may have lost its After an event like a server restart, the client may have lost its
connections. The client assumes for the moment that the session has connections. The client assumes for the moment that the session has
not been lost. It reconnects, and if it specified connection not been lost. It reconnects, and if it specified connection
association enforcement when the session was created, it invokes association enforcement when the session was created, it invokes
BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes BIND_CONN_TO_SESSION using the sessionid. Otherwise, it invokes
SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns SEQUENCE. If BIND_CONN_TO_SESSION or SEQUENCE returns
NFS4ERR_BADSESSION, the client knows the session was lost. If the NFS4ERR_BADSESSION, the client knows the session was lost. If the
connection survives session loss, then the next SEQUENCE operation connection survives session loss, then the next SEQUENCE operation
the client sends over the connection will get back the client sends over the connection will get back
skipping to change at page 80, line 19 skipping to change at page 80, line 19
| | Various defined file types. | | | Various defined file types. |
| nfsstat4 | enum nfsstat4; | | nfsstat4 | enum nfsstat4; |
| | Return value for operations. | | | Return value for operations. |
| offset4 | typedef uint64_t offset4; | | offset4 | typedef uint64_t offset4; |
| | Various offset designations (READ, WRITE, LOCK, | | | Various offset designations (READ, WRITE, LOCK, |
| | COMMIT). | | | COMMIT). |
| qop4 | typedef uint32_t qop4; | | qop4 | typedef uint32_t qop4; |
| | Quality of protection designation in SECINFO. | | | Quality of protection designation in SECINFO. |
| sec_oid4 | typedef opaque sec_oid4<>; | | sec_oid4 | typedef opaque sec_oid4<>; |
| | Security Object Identifier. The sec_oid4 data | | | Security Object Identifier. The sec_oid4 data |
| | type is not really opaque. Instead it contains | | | type is not really opaque. Instead it contains an |
| | an ASN.1 OBJECT IDENTIFIER as used by GSS-API in | | | ASN.1 OBJECT IDENTIFIER as used by GSS-API in the |
| | the mech_type argument to GSS_Init_sec_context. | | | mech_type argument to GSS_Init_sec_context. See |
| | See [7] for details. | | | [7] for details. |
| sequenceid4 | typedef uint32_t sequenceid4; | | sequenceid4 | typedef uint32_t sequenceid4; |
| | Sequence number used for various session | | | Sequence number used for various session |
| | operations (EXCHANGE_ID, CREATE_SESSION, | | | operations (EXCHANGE_ID, CREATE_SESSION, |
| | SEQUENCE, CB_SEQUENCE). | | | SEQUENCE, CB_SEQUENCE). |
| seqid4 | typedef uint32_t seqid4; | | seqid4 | typedef uint32_t seqid4; |
| | Sequence identifier used for file locking. | | | Sequence identifier used for file locking. |
| sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; | | sessionid4 | typedef opaque sessionid4[NFS4_SESSIONID_SIZE]; |
| | Session identifier. | | | Session identifier. |
| slotid4 | typedef uint32_t slotid4; | | slotid4 | typedef uint32_t slotid4; |
| | Sequencing artifact for various session | | | Sequencing artifact for various session |
skipping to change at page 100, line 47 skipping to change at page 100, line 47
Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can Some REQUIRED and RECOMMENDED attributes are set-only, i.e. they can
be set via SETATTR but not retrieved via GETATTR. Similarly, some be set via SETATTR but not retrieved via GETATTR. Similarly, some
REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be REQUIRED and RECOMMENDED attributes are get-only, i.e. they can be
retrieved GETATTR but not set via SETATTR. If a client attempts to retrieved GETATTR but not set via SETATTR. If a client attempts to
set a get-only attribute or get a set-only attributes, the server set a get-only attribute or get a set-only attributes, the server
MUST return NFS4ERR_INVAL. MUST return NFS4ERR_INVAL.
5.6. REQUIRED Attributes - List and Definition References 5.6. REQUIRED Attributes - List and Definition References
The list of REQUIRED attributes appears in Table 4. The meaning of The list of REQUIRED attributes appears in Table 4. The meaning of
hte columns of the table are: the columns of the table are:
o Name: the name of attribute o Name: the name of attribute
o Id: the number assigned to the attribute. In the event of o Id: the number assigned to the attribute. In the event of
conflicts between the assigned number and [12], the latter is conflicts between the assigned number and [12], the latter is
authoritative. authoritative.
o Data Type: The XDR data type of the attribute. o Data Type: The XDR data type of the attribute.
o Acc: Access allowed to the attribute. R means read-only (GETATTR o Acc: Access allowed to the attribute. R means read-only (GETATTR
skipping to change at page 143, line 25 skipping to change at page 143, line 25
ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute, ACE4_INHERIT_ONLY_ACE set. (In the case of a dacl or sacl attribute,
both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.) both of those ACEs SHOULD also have the ACE4_INHERITED_ACE flag set.)
This makes it simpler to modify the effective permissions on the This makes it simpler to modify the effective permissions on the
directory without modifying the ACE which is to be inherited to the directory without modifying the ACE which is to be inherited to the
new directory's children. new directory's children.
6.4.3.2. Automatic Inheritance 6.4.3.2. Automatic Inheritance
The acl attribute consists only of an array of ACEs, but the sacl The acl attribute consists only of an array of ACEs, but the sacl
(Section 6.2.3) and dacl (Section 6.2.2) attributes also include an (Section 6.2.3) and dacl (Section 6.2.2) attributes also include an
additional flag field. The flag field applies to the entire sacl or additional flag field.
dacl; three flag values are defined:
struct nfsacl41 {
aclflag4 na41_flag;
nfsace4 na41_aces<>;
};
The flag field applies to the entire sacl or dacl; three flag values
are defined:
const ACL4_AUTO_INHERIT = 0x00000001; const ACL4_AUTO_INHERIT = 0x00000001;
const ACL4_PROTECTED = 0x00000002; const ACL4_PROTECTED = 0x00000002;
const ACL4_DEFAULTED = 0x00000004; const ACL4_DEFAULTED = 0x00000004;
and all other bits must be cleared. The ACE4_INHERITED_ACE flag may and all other bits must be cleared. The ACE4_INHERITED_ACE flag may
be set in the ACEs of the sacl or dacl (whereas it must always be be set in the ACEs of the sacl or dacl (whereas it must always be
cleared in the acl). cleared in the acl).
Together these features allow a server to support automatic Together these features allow a server to support automatic
skipping to change at page 146, line 27 skipping to change at page 146, line 32
In NFSv3, the client expects all LOOKUP operations to remain within a In NFSv3, the client expects all LOOKUP operations to remain within a
single server file system. For example, the device attribute will single server file system. For example, the device attribute will
not change. This prevents a client from taking namespace paths that not change. This prevents a client from taking namespace paths that
span exports. span exports.
In the case of NFSv3, an automounter on the client can obtain a In the case of NFSv3, an automounter on the client can obtain a
snapshot of the server's namespace using the EXPORTS procedure of the snapshot of the server's namespace using the EXPORTS procedure of the
MOUNT protocol. If it understands the server's pathname syntax, it MOUNT protocol. If it understands the server's pathname syntax, it
can create an image of the server's namespace on the client. The can create an image of the server's namespace on the client. The
parts of the namespace that are not exported by the server are filled parts of the namespace that are not exported by the server are filled
in with directories that might be constructed similarly to a NFSv4.1 in with directories that might be constructed similarly to an NFSv4.1
"pseudo file system" (see Section 7.3) that allows the user to browse "pseudo file system" (see Section 7.3) that allows the user to browse
from one mounted file system to another. There is a drawback to this from one mounted file system to another. There is a drawback to this
representation of the server's namespace on the client: it is static. representation of the server's namespace on the client: it is static.
If the server administrator adds a new export the client will be If the server administrator adds a new export the client will be
unaware of it. unaware of it.
7.3. Server Pseudo File System 7.3. Server Pseudo File System
NFSv4.1 servers avoid this namespace inconsistency by presenting all NFSv4.1 servers avoid this namespace inconsistency by presenting all
the exports for a given server within the framework of a single the exports for a given server within the framework of a single
skipping to change at page 150, line 28 skipping to change at page 150, line 33
which represents a client as a whole to the eventual lightweight which represents a client as a whole to the eventual lightweight
stateid used for most client and server locking interactions. The stateid used for most client and server locking interactions. The
details of this transition will vary with the type of object but it details of this transition will vary with the type of object but it
always starts with a client ID. always starts with a client ID.
8.1. Client and Session ID 8.1. Client and Session ID
A client must establish a client ID (see Section 2.4) and then one or A client must establish a client ID (see Section 2.4) and then one or
more sessionids (see Section 2.10) before performing any operations more sessionids (see Section 2.10) before performing any operations
to open, lock, delegate, or obtain a layout for a file object. Each to open, lock, delegate, or obtain a layout for a file object. Each
sessionid is associated with a specific client ID, and thus serves as session id is associated with a specific client ID, and thus serves
a shorthand reference to an NFSv4.1 client. as a shorthand reference to an NFSv4.1 client.
For some types of locking interactions, the client will represent For some types of locking interactions, the client will represent
some number of internal locking entities called "owners", which some number of internal locking entities called "owners", which
normally correspond to processes internal to the client. For other normally correspond to processes internal to the client. For other
types of locking-related objects, such as delegations and layouts, no types of locking-related objects, such as delegations and layouts, no
such intermediate entities are provided for, and the locking-related such intermediate entities are provided for, and the locking-related
objects are considered to be transferred directly between the server objects are considered to be transferred directly between the server
and a unitary client. and a unitary client.
8.2. Stateid Definition 8.2. Stateid Definition
skipping to change at page 156, line 26 skipping to change at page 156, line 31
appropriate error returned when necessary. Special and non-special appropriate error returned when necessary. Special and non-special
stateids are handled separately. (See Section 8.2.3 for a discussion stateids are handled separately. (See Section 8.2.3 for a discussion
of special stateids.) of special stateids.)
Note that stateids are implicitly qualified by the current client ID, Note that stateids are implicitly qualified by the current client ID,
as derived from the client ID associated with the current session. as derived from the client ID associated with the current session.
Note however, that the semantics of the session will prevent stateids Note however, that the semantics of the session will prevent stateids
associated with a previous client or server instance from being associated with a previous client or server instance from being
analyzed by this procedure. analyzed by this procedure.
If server restart has resulted in an invalid client ID or a sessionid If server restart has resulted in an invalid client ID or a session
which is invalid, SEQUENCE will return an error and the operation id which is invalid, SEQUENCE will return an error and the operation
that takes a stateid as an argument will never be processed. that takes a stateid as an argument will never be processed.
If there has been a server restart where there is a persistent If there has been a server restart where there is a persistent
session, and all leased state has been lost, then the session in session, and all leased state has been lost, then the session in
question will, although valid, be marked as dead, and any operation question will, although valid, be marked as dead, and any operation
not satisfied by means of the reply cache will receive the error not satisfied by means of the reply cache will receive the error
NFS4ERR_DEADSESSION, and thus not be processed as indicated below. NFS4ERR_DEADSESSION, and thus not be processed as indicated below.
When a stateid is being tested, and the "other" field is all zeros or When a stateid is being tested, and the "other" field is all zeros or
all ones, a check that the "other" and "seqid" fields match a defined all ones, a check that the "other" and "seqid" fields match a defined
skipping to change at page 249, line 20 skipping to change at page 249, line 20
referring (absent) file system nor is there any access to the referring (absent) file system nor is there any access to the
fh_expire_type attribute. fh_expire_type attribute.
o All file system instances servers should be considered as of o All file system instances servers should be considered as of
different _change_ classes. different _change_ classes.
For other class assignments, handling of file system transitions For other class assignments, handling of file system transitions
depends on the reasons for the transition: depends on the reasons for the transition:
o When the transition is due to migration, that is the client was o When the transition is due to migration, that is the client was
directed to new file system after receiving a NFS4ERR_MOVED error, directed to new file system after receiving an NFS4ERR_MOVED
the target should be treated as being of the same _write-verifier_ error, the target should be treated as being of the same _write-
class as the source. verifier_ class as the source.
o When the transition is due to failover to another replica, that o When the transition is due to failover to another replica, that
is, the client selected another replica without receiving and is, the client selected another replica without receiving and
NFS4ERR_MOVED error, the target should be treated as being of a NFS4ERR_MOVED error, the target should be treated as being of a
different _write-verifier_ class from the source. different _write-verifier_ class from the source.
The specific choices reflect typical implementation patterns for The specific choices reflect typical implementation patterns for
failover and controlled migration respectively. Since other choices failover and controlled migration respectively. Since other choices
are possible and useful, this information is better obtained by using are possible and useful, this information is better obtained by using
fs_locations_info. When a server implementation needs to communicate fs_locations_info. When a server implementation needs to communicate
skipping to change at page 263, line 24 skipping to change at page 263, line 24
open denies WRITE and the data is changed), that lock SHOULD be open denies WRITE and the data is changed), that lock SHOULD be
considered administratively revoked. considered administratively revoked.
The opaque strings fss_source and fss_current provide a way of The opaque strings fss_source and fss_current provide a way of
presenting information about the source of the file system image presenting information about the source of the file system image
being present. It is not intended that client do anything with this being present. It is not intended that client do anything with this
information other than make it available to administrative tools. It information other than make it available to administrative tools. It
is intended that this information be helpful when researching is intended that this information be helpful when researching
possible problems with a file system image that might arise when it possible problems with a file system image that might arise when it
is unclear if the correct image is being accessed and if not, how is unclear if the correct image is being accessed and if not, how
that image came to be made. This kind of dianostic information will that image came to be made. This kind of diagnostic information will
be helpful, if, as seems likely, copies of file systems are made in be helpful, if, as seems likely, copies of file systems are made in
many different ways (e.g. simple user-level copies, file system-level many different ways (e.g. simple user-level copies, file system-level
point-in-time copies, clones of the underlying storage), under a point-in-time copies, clones of the underlying storage), under a
variety of administrative arrangements. In such environments, variety of administrative arrangements. In such environments,
determining how a given set of data was constructed can be very determining how a given set of data was constructed can be very
helpful in resolving problems. helpful in resolving problems.
The opaque string fss_source is used to indicate the source of a The opaque string fss_source is used to indicate the source of a
given file system with the expectation that tools capable of creating given file system with the expectation that tools capable of creating
a file system image propagate this information, when that is a file system image propagate this information, when that is
skipping to change at page 265, line 45 skipping to change at page 265, line 45
||| | ||| |
||| | ||| |
||| Storage +-----------+ | ||| Storage +-----------+ |
||| Protocol |+-----------+ | ||| Protocol |+-----------+ |
||+----------------||+-----------+ Control | ||+----------------||+-----------+ Control |
|+-----------------||| | Protocol| |+-----------------||| | Protocol|
+------------------+|| Storage |------------+ +------------------+|| Storage |------------+
+| Devices | +| Devices |
+-----------+ +-----------+
Figure 67 Figure 68
In this model, the clients, server, and storage devices are In this model, the clients, server, and storage devices are
responsible for managing file access. This is in contrast to NFSv4 responsible for managing file access. This is in contrast to NFSv4
without pNFS where it is primarily the server's responsibility; some without pNFS where it is primarily the server's responsibility; some
of this responsibility may be delegated to the client under strictly of this responsibility may be delegated to the client under strictly
specified conditions. specified conditions.
pNFS takes the form of OPTIONAL operations that manage protocol pNFS takes the form of OPTIONAL operations that manage protocol
objects called 'layouts' which contain data location information. objects called 'layouts' which contain a byte-range and storage
The layout is managed in a similar fashion as NFSv4.1 data location information. The layout is managed in a similar fashion as
delegations are managed. For example, the layout is leased, NFSv4.1 data delegations. For example, the layout is leased,
recallable and revocable. However, layouts are distinct abstractions recallable and revocable. However, layouts are distinct abstractions
and are manipulated with new operations. When a client holds a and are manipulated with new operations. When a client holds a
layout, it is granted the ability to access the data location layout, it is granted the ability to directly access the byte-range
directly using the location information specified in the layout. at the storage location specified in the layout.
There are interactions between layouts and other NFSv4.1 abstractions There are interactions between layouts and other NFSv4.1 abstractions
such as data delegations and byte-range locking. Delegation issues such as data delegations and byte-range locking. Delegation issues
are discussed in Section 12.5.5. Byte range locking issues are are discussed in Section 12.5.5. Byte range locking issues are
discussed in Section 12.2.9 and Section 12.5.1. discussed in Section 12.2.9 and Section 12.5.1.
The NFSv4.1 pNFS feature has been structured to allow for a variety The NFSv4.1 pNFS feature has been structured to allow for a variety
of storage protocols to be defined and used. As noted in the diagram of storage protocols to be defined and used. As noted in the diagram
above, the storage protocol is the method used by the client to store above, the storage protocol is the method used by the client to store
and retrieve data directly from the storage devices. The NFSv4.1 and retrieve data directly from the storage devices. The NFSv4.1
skipping to change at page 266, line 46 skipping to change at page 266, line 46
o Object protocols such as OSD over iSCSI or Fibre Channel [40]. o Object protocols such as OSD over iSCSI or Fibre Channel [40].
o Other storage protocols, including PVFS and other file systems o Other storage protocols, including PVFS and other file systems
that are in use in HPC environments. that are in use in HPC environments.
It is possible that various storage protocols are available to both It is possible that various storage protocols are available to both
client and server and it may be possible that a client and server do client and server and it may be possible that a client and server do
not have a matching storage protocol available to them. Because of not have a matching storage protocol available to them. Because of
this, the pNFS server MUST support normal NFSv4.1 access to any file this, the pNFS server MUST support normal NFSv4.1 access to any file
accessible by the pNFS feature; this will allow for continued accessible by the pNFS feature; this will allow for continued
interoperability between a NFSv4.1 client and server. interoperability between an NFSv4.1 client and server.
12.2. pNFS Definitions 12.2. pNFS Definitions
NFSv4.1's pNFS feature partitions the file system protocol into two NFSv4.1's pNFS feature partitions the file system protocol into two
parts: metadata and data. Where data is the contents of a file and parts: metadata and data. Where data being the contents of a file
metadata is "everything else". The metadata functionality is and the metadata is "everything else". The metadata functionality is
implemented by a metadata server that supports pNFS and the implemented by a NFSv4.1 server that supports pNFS and the operations
operations described in (Section 18). The data functionality is described in (Section 18) (a metadata server). The data
implemented by a storage device that supports the storage protocol. functionality is implemented by one or more storage devices, each of
A subset (defined in Section 13.6) of NFSv4.1 itself is one such which are accessed by the client via a storage protocol. A subset
storage protocol. New terms are introduced to the NFSv4.1 (defined in Section 13.6) of NFSv4.1 is one such storage protocol.
nomenclature and existing terms are clarified to allow for the New terms are introduced to the NFSv4.1 nomenclature and existing
description of the pNFS feature. terms are clarified to allow for the description of the pNFS feature.
12.2.1. Metadata 12.2.1. Metadata
Information about a file system object, such as its name, location Information about a file system object, such as its name, location
within the namespace, owner, ACL and other attributes. Metadata may within the namespace, owner, ACL and other attributes. Metadata may
also include storage location information and this will vary based on also include storage location information and this will vary based on
the underlying storage mechanism that is used. the underlying storage mechanism that is used.
12.2.2. Metadata Server 12.2.2. Metadata Server
An NFSv4.1 server which supports the pNFS feature. A variety of An NFSv4.1 server which supports the pNFS feature. A variety of
architectural choices exists for the metadata server and its use of architectural choices exists for the metadata server and its use of
what file system information is held at the server. Some servers may file system information held at the server. Some servers may contain
contain metadata only for the file objects that reside at the metadata only for file objects residing at the metadata server while
metadata server while file data resides on the associated storage the file data resides on associated storage devices. Other metadata
devices. Other metadata servers may hold both metadata and a varying servers may hold both metadata and a varying degree of file data.
degree of file data.
12.2.3. pNFS Client 12.2.3. pNFS Client
An NFSv4.1 client that supports pNFS operations and supports at least An NFSv4.1 client that supports pNFS operations and supports at least
one storage protocol or layout type for performing I/O to storage one storage protocol for performing I/O to storage devices.
devices.
12.2.4. Storage Device 12.2.4. Storage Device
A storage device stores a regular file's data, but leaves metadata A storage device stores a regular file's data, but leaves metadata
management to the metadata server. A storage device could be another management to the metadata server. A storage device could be another
NFSv4.1 server, an object storage device (OSD), a block device NFSv4.1 server, an object storage device (OSD), a block device
accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some accessed over a SAN (e.g., either FiberChannel or iSCSI SAN), or some
other entity. other entity.
12.2.5. Storage Protocol 12.2.5. Storage Protocol
skipping to change at page 268, line 32 skipping to change at page 268, line 26
devices that hold the data. A layout is said to belong to a specific devices that hold the data. A layout is said to belong to a specific
layout type (data type layouttype4, see Section 3.3.13). The layout layout type (data type layouttype4, see Section 3.3.13). The layout
type allows for variants to handle different storage protocols, such type allows for variants to handle different storage protocols, such
as those associated with block/volume [31], object [30], and file as those associated with block/volume [31], object [30], and file
(Section 13) layout types. A metadata server, along with its control (Section 13) layout types. A metadata server, along with its control
protocol, MUST support at least one layout type. A private sub-range protocol, MUST support at least one layout type. A private sub-range
of the layout type name space is also defined. Values from the of the layout type name space is also defined. Values from the
private layout type range MAY be used for internal testing or private layout type range MAY be used for internal testing or
experimentation. experimentation.
As an example, layout of the file layout type could be an array of As an example, the organization of the file layout type could be an
tuples (e.g., deviceID, file_handle), along with a definition of how array of tuples (e.g., deviceID, file_handle), along with a
the data is stored across the devices (e.g., striping). A block/ definition of how the data is stored across the devices (e.g.,
volume layout might be an array of tuples that store <deviceID, striping). A block/volume layout might be an array of tuples that
block_number, block count> along with information about block size store <deviceID, block_number, block count> along with information
and the associated file offset of the block number. An object layout about block size and the associated file offset of the block number.
might be an array of tuples <deviceID, objectID> and an additional An object layout might be an array of tuples <deviceID, objectID> and
structure (i.e., the aggregation map) that defines how the logical an additional structure (i.e., the aggregation map) that defines how
byte sequence of the file data is serialized into the different the logical byte sequence of the file data is serialized into the
objects. Note that the actual layouts are typically more complex different objects. Note that the actual layouts are typically more
than these simple expository examples. complex than these simple expository examples.
Requests for pNFS-related operations will often specify a layout Requests for pNFS-related operations will often specify a layout
type. Examples of such operations are GETDEVICEINFO and LAYOUTGET. type. Examples of such operations are GETDEVICEINFO and LAYOUTGET.
The response for these operations will include structures such a The response for these operations will include structures such a
device_addr4 or a layout4, each of which includes a layout type device_addr4 or a layout4, each of which includes a layout type
within it. The layout type sent by the server MUST always be the within it. The layout type sent by the server MUST always be the
same one requested by the client. When a client sends a response same one requested by the client. When a server sends a response
that includes a different layout type, the client SHOULD ignore the that includes a different layout type, the client SHOULD ignore the
response and behave as if the server had returned an error response. response and behave as if the server had returned an error response.
12.2.8. Layout 12.2.8. Layout
A layout defines how a file's data is organized on one or more A layout defines how a file's data is organized on one or more
storage devices. There are many potential layout types; each of the storage devices. There are many potential layout types; each of the
layout types are differentiated by the storage protocol used to layout types are differentiated by the storage protocol used to
access data and in the aggregation scheme that lays out the file data access data and in the aggregation scheme that lays out the file data
on the underlying storage devices. A layout is precisely identified on the underlying storage devices. A layout is precisely identified
skipping to change at page 269, line 33 skipping to change at page 269, line 27
permissible for layouts with different iomodes, pertaining to the permissible for layouts with different iomodes, pertaining to the
same byte range, to be held by the same client. An example of this same byte range, to be held by the same client. An example of this
would be copy-on-write functionality for a block/volume layout type. would be copy-on-write functionality for a block/volume layout type.
12.2.9. Layout Iomode 12.2.9. Layout Iomode
The layout iomode (data type layoutiomode4, see Section 3.3.20) The layout iomode (data type layoutiomode4, see Section 3.3.20)
indicates to the metadata server the client's intent to perform indicates to the metadata server the client's intent to perform
either just read operations or a mixture of I/O possibly containing either just read operations or a mixture of I/O possibly containing
read and write operations. For certain layout types, it is useful read and write operations. For certain layout types, it is useful
for a client to specify this intent at LAYOUTGET (Section 18.43) for a client to specify this intent at the time it sends LAYOUTGET
time. For example, block/volume based protocols, block allocation (Section 18.43). For example, block/volume based protocols, block
could occur when a READ/WRITE iomode is specified. A special allocation could occur when a READ/WRITE iomode is specified. A
LAYOUTIOMODE4_ANY iomode is defined and can only be used for special LAYOUTIOMODE4_ANY iomode is defined and can only be used for
LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies LAYOUTRETURN and CB_LAYOUTRECALL, not for LAYOUTGET. It specifies
that layouts pertaining to both READ and READ/WRITE iomodes are being that layouts pertaining to both READ and READ/WRITE iomodes are being
returned or recalled, respectively. returned or recalled, respectively.
A storage device may validate I/O with regards to the iomode; this is A storage device may validate I/O with regard to the iomode; this is
dependent upon storage device implementation and layout type. Thus, dependent upon storage device implementation and layout type. Thus,
if the client's layout iomode is inconsistent with the I/O being if the client's layout iomode is inconsistent with the I/O being
performed, the storage device may reject the client's I/O with an performed, the storage device may reject the client's I/O with an
error indicating a new layout with the correct I/O mode should be error indicating a new layout with the correct iomode should be
fetched. For example, if a client gets a layout with a READ iomode obtained via LAYOUTGET. For example, if a client gets a layout with
and performs a WRITE to a storage device, the storage device is a READ iomode and performs a WRITE to a storage device, the storage
allowed to reject that WRITE. device is allowed to reject that WRITE.
The iomode does not conflict with OPEN share modes or lock requests; The use of the layout iomode does not conflict with OPEN share modes
open mode and lock conflicts are enforced as they are without the use or byte-range lock requests; open mode and lock conflicts are
of pNFS, and are logically separate from the pNFS layout level. As enforced as they are without the use of pNFS, and are logically
well, open modes and locks are the preferred method for restricting separate from the pNFS layout level. Open modes and locks are the
user access to data files. For example, an OPEN of read, deny-write preferred method for restricting user access to data files. For
does not conflict with a LAYOUTGET containing an iomode of READ/WRITE example, an OPEN of read, deny-write does not conflict with a
performed by another client. Applications that depend on writing LAYOUTGET containing an iomode of READ/WRITE performed by another
into the same file concurrently may use byte-range locking to client. Applications that depend on writing into the same file
serialize their accesses. concurrently may use byte-range locking to serialize their accesses.
12.2.10. Device IDs 12.2.10. Device IDs
The device ID (data type deviceid4, see Section 3.3.14) names a group The device ID (data type deviceid4, see Section 3.3.14) identifies a
of storage devices. The scope of a device ID is per pair of client group of storage devices. The scope of a device ID is the pair
ID and layout type. In practice, a significant amount of information <client ID, layout type>. In practice, a significant amount of
may be required to fully address a storage device. Rather than information may be required to fully address a storage device.
embedding all such information in a layout, layouts embed device IDs. Rather than embedding all such information in a layout, layouts embed
The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is used to device IDs. The NFSv4.1 operation GETDEVICEINFO (Section 18.40) is
retrieve the complete address information (including all device used to retrieve the complete address information (including all
addresses for the device ID) regarding the storage device according device addresses for the device ID) regarding the storage device
to its layout type and device ID. For example, the address of an according to its layout type and device ID. For example, the address
NFSv4.1 data server or of an object storage device could be an IP of an NFSv4.1 data server or of an object storage device could be an
address and port. The address of a block storage device could be a IP address and port. The address of a block storage device could be
volume label. a volume label.
Clients cannot expect the mapping between a device ID and its storage Clients cannot expect the mapping between a device ID and its storage
device address(es) to persist across metadata server restart. See device address(es) to persist across metadata server restart. See
Section 12.7.4 for a description of how recovery works in that Section 12.7.4 for a description of how recovery works in that
situation. situation.
A device ID lives as long as there is a layout referring to the A device ID lives as long as there is a layout referring to the
device ID. If there are no layouts referring to the device ID, the device ID. If there are no layouts referring to the device ID, the
server is free to delete the device ID any time. Once a device ID is server is free to delete the device ID any time. Once a device ID is
deleted by the server, the server MUST NOT reuse the device ID for deleted by the server, the server MUST NOT reuse the device ID for
skipping to change at page 273, line 39 skipping to change at page 273, line 31
is incapable of providing this check in the presence of mandatory is incapable of providing this check in the presence of mandatory
file locks, the metadata server then MUST NOT grant layouts and file locks, the metadata server then MUST NOT grant layouts and
mandatory file locks simultaneously. mandatory file locks simultaneously.
12.5.2. Getting a Layout 12.5.2. Getting a Layout
A client obtains a layout with the LAYOUTGET operation. The metadata A client obtains a layout with the LAYOUTGET operation. The metadata
server will grant layouts of a particular type (e.g., block/volume, server will grant layouts of a particular type (e.g., block/volume,
object, or file). The client selects an appropriate layout type that object, or file). The client selects an appropriate layout type that
the server supports and the client is prepared to use. The layout the server supports and the client is prepared to use. The layout
returned to the client may not exactly align with the requested byte returned to the client might not exactly match the requested byte
range. A field within the LAYOUTGET request, loga_minlength, range as described in Section 18.43.3. As needed a client may make
specifies the minimum length of the layout. The loga_minlength field multiple LAYOUTGET requests; these might result in multiple
should be at least one. As needed a client may make multiple overlapping, non-conflicting layouts (see Section 12.2.8).
LAYOUTGET requests; these will result in multiple overlapping, non-
conflicting layouts.
In order to get a layout, the client must first have opened the file In order to get a layout, the client must first have opened the file
via the OPEN operation. When a client has no layout on a file, it via the OPEN operation. When a client has no layout on a file, it
MUST present a stateid as returned by OPEN, a delegation stateid, or MUST present a stateid as returned by OPEN, a delegation stateid, or
a byte-range lock stateid in the loga_stateid argument. A successful a byte-range lock stateid in the loga_stateid argument. A successful
LAYOUTGET result includes a layout stateid. The first successful LAYOUTGET result includes a layout stateid. The first successful
LAYOUTGET processed by the server using a non-layout stateid as an LAYOUTGET processed by the server using a non-layout stateid as an
argument MUST have the "seqid" field of the layout stateid in the argument MUST have the "seqid" field of the layout stateid in the
response set to one. Thereafter, the client uses a layout stateid response set to one. Thereafter, the client uses a layout stateid
(see Section 12.5.3) on future invocations of LAYOUTGET on the file, (see Section 12.5.3) on future invocations of LAYOUTGET on the file,
skipping to change at page 275, line 24 skipping to change at page 275, line 14
correct "seqid" is defined as the highest "seqid" value from correct "seqid" is defined as the highest "seqid" value from
responses of fully processed LAYOUTGET or LAYOUTRETURN operations or responses of fully processed LAYOUTGET or LAYOUTRETURN operations or
arguments of a fully processed CB_LAYOUTRECALL operation. Since the arguments of a fully processed CB_LAYOUTRECALL operation. Since the
server is incrementing the "seqid" value on each layout operation, server is incrementing the "seqid" value on each layout operation,
the client may determine the order of operation processing by the client may determine the order of operation processing by
inspecting the "seqid" value. In the case of overlapping layout inspecting the "seqid" value. In the case of overlapping layout
ranges, the ordering information will provide the client the ranges, the ordering information will provide the client the
knowledge of which layout ranges are held. Note that overlapping knowledge of which layout ranges are held. Note that overlapping
layout ranges may occur because of the client's specific requests or layout ranges may occur because of the client's specific requests or
because the server is allowed to expand the range of a requested because the server is allowed to expand the range of a requested
layout and notify the client in the LAYOUTRETURN results Additional layout and notify the client in the LAYOUTRETURN results. Additional
layout stateid sequencing requirements are provided in layout stateid sequencing requirements are provided in
Section 12.5.5.2. Section 12.5.5.2.
The client's receipt of a "seqid" is not sufficient for subsequent The client's receipt of a "seqid" is not sufficient for subsequent
use. The client must fully process the operations before the "seqid" use. The client must fully process the operations before the "seqid"
can be used. For LAYOUTGET results, if the client is not using the can be used. For LAYOUTGET results, if the client is not using the
forgetful model (Section 12.5.5.1), it MUST first update its record forgetful model (Section 12.5.5.1), it MUST first update its record
of what ranges of the file's layout it has before using the seqid. of what ranges of the file's layout it has before using the seqid.
For LAYOUTRETURN results, the client MUST delete the range from its For LAYOUTRETURN results, the client MUST delete the range from its
record of what ranges of the file's layout it had before using the record of what ranges of the file's layout it had before using the
skipping to change at page 295, line 4 skipping to change at page 294, line 36
NFSv4.1) what role the request to the common server network NFSv4.1) what role the request to the common server network
address is directed to. address is directed to.
12.9. Security Considerations for pNFS 12.9. Security Considerations for pNFS
pNFS separates file system metadata and data and provides access to pNFS separates file system metadata and data and provides access to
both. There are pNFS-specific operations (listed in Section 12.3) both. There are pNFS-specific operations (listed in Section 12.3)
that provide access to the metadata; all existing NFSv4.1 that provide access to the metadata; all existing NFSv4.1
conventional (non-pNFS) security mechanisms and features apply to conventional (non-pNFS) security mechanisms and features apply to
accessing the metadata. The combination of components in a pNFS accessing the metadata. The combination of components in a pNFS
system (see Figure 67) is required to preserve the security system (see Figure 68) is required to preserve the security
properties of NFSv4.1 with respect to an entity accessing storage properties of NFSv4.1 with respect to an entity accessing storage
device from a client, including security countermeasures to defend device from a client, including security countermeasures to defend
against threats that NFSv4.1 provides defenses for in environments against threats that NFSv4.1 provides defenses for in environments
where these threats are considered significant. where these threats are considered significant.
In some cases, the security countermeasures for connections to In some cases, the security countermeasures for connections to
storage devices may take the form of physical isolation or a storage devices may take the form of physical isolation or a
recommendation not to use pNFS in an environment. For example, it recommendation not to use pNFS in an environment. For example, it
may be impractical to provide confidentiality protection for some may be impractical to provide confidentiality protection for some
storage protocols to protect against eavesdropping; in environments storage protocols to protect against eavesdropping; in environments
skipping to change at page 316, line 21 skipping to change at page 315, line 41
o Otherwise, there must be an open stateid for the current open- o Otherwise, there must be an open stateid for the current open-
owner, and that open stateid for the open file in question is owner, and that open stateid for the open file in question is
used, unless mandatory locking, prevents that. See below. used, unless mandatory locking, prevents that. See below.
o If the data server had previously responded with NFS4ERR_LOCKED to o If the data server had previously responded with NFS4ERR_LOCKED to
use of the open stateid, then the client should use the lock use of the open stateid, then the client should use the lock
stateid whenever one exists for that open file with the current stateid whenever one exists for that open file with the current
lock-owner. lock-owner.
o Special stateids should never be used and if used the data server o Special stateids should never be used and if used the data server
MUST reject the I/O with a NFS4ERR_BAD_STATEID error. MUST reject the I/O with an NFS4ERR_BAD_STATEID error.
13.9.2. Data Server State Propagation 13.9.2. Data Server State Propagation
Since the metadata server, which handles lock and open-mode state Since the metadata server, which handles lock and open-mode state
changes, as well as ACLs, may not be co-located with the data servers changes, as well as ACLs, may not be co-located with the data servers
where I/O access are validated, the server implementation MUST take where I/O access are validated, the server implementation MUST take
care of propagating changes of this state to the data servers. Once care of propagating changes of this state to the data servers. Once
the propagation to the data servers is complete, the full effect of the propagation to the data servers is complete, the full effect of
those changes MUST be in effect at the data servers. However, some those changes MUST be in effect at the data servers. However, some
state changes need not be propagated immediately, although all state changes need not be propagated immediately, although all
skipping to change at page 378, line 42 skipping to change at page 377, line 42
16.1.1. ARGUMENTS 16.1.1. ARGUMENTS
void; void;
16.1.2. RESULTS 16.1.2. RESULTS
void; void;
16.1.3. DESCRIPTION 16.1.3. DESCRIPTION
Standard NULL procedure. Void argument, void response. This This is the standard NULL procedure with the standard void argument
procedure has no functionality associated with it. Because of this and void response. This procedure has no functionality associated
it is sometimes used to measure the overhead of processing a service with it. Because of this it is sometimes used to measure the
request. Therefore, the server should ensure that no unnecessary overhead of processing a service request. Therefore, the server
work is done in servicing this procedure. SHOULD ensure that no unnecessary work is done in servicing this
procedure.
16.1.4. ERRORS 16.1.4. ERRORS
None. None.
16.2. Procedure 1: COMPOUND - Compound Operations 16.2. Procedure 1: COMPOUND - Compound Operations
16.2.1. ARGUMENTS 16.2.1. ARGUMENTS
enum nfs_opnum4 { enum nfs_opnum4 {
skipping to change at page 387, line 24 skipping to change at page 386, line 24
PUTFH fh1 {fh1} PUTFH fh1 {fh1}
LOOKUP "compA" {fh2} LOOKUP "compA" {fh2}
GETATTR {fh2} GETATTR {fh2}
LOOKUP "compB" {fh3} LOOKUP "compB" {fh3}
GETATTR {fh3} GETATTR {fh3}
LOOKUP "compC" {fh4} LOOKUP "compC" {fh4}
GETATTR {fh4} GETATTR {fh4}
GETFH GETFH
Figure 84 Figure 85
In this example, the PUTFH (Section 18.19) operation explicitly sets In this example, the PUTFH (Section 18.19) operation explicitly sets
the current filehandle value while the result of each LOOKUP the current filehandle value while the result of each LOOKUP
operation sets the current filehandle value to the resultant file operation sets the current filehandle value to the resultant file
system object. Also, the client is able to insert GETATTR operations system object. Also, the client is able to insert GETATTR operations
using the current filehandle as an argument. using the current filehandle as an argument.
The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations The PUTROOTFH (Section 18.21) and PUTPUBFH (Section 18.21) operations
also set the current filehandle. The above example would replace also set the current filehandle. The above example would replace
"PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in "PUTFH fh1" with PUTROOTFH or PUTPUBFH with no filehandle argument in
skipping to change at page 388, line 22 skipping to change at page 387, line 22
A "current stateid" is the stateid that is associated with the A "current stateid" is the stateid that is associated with the
current filehandle. The current stateid may only be changed by an current filehandle. The current stateid may only be changed by an
operation that modifies the current filehandle or returns a stateid. operation that modifies the current filehandle or returns a stateid.
If an operation returns a stateid it MUST set the current stateid to If an operation returns a stateid it MUST set the current stateid to
the returned value. If an operation sets the current filehandle but the returned value. If an operation sets the current filehandle but
does not return a stateid, the current stateid MUST be set to the does not return a stateid, the current stateid MUST be set to the
all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an all-zeros special stateid, i.e. (seqid, other) = (0, 0). If an
operation uses a stateid as an argument but does not return a operation uses a stateid as an argument but does not return a
stateid, the current stateid MUST NOT be changed. E.g., PUTFH, stateid, the current stateid MUST NOT be changed. E.g., PUTFH,
PUTROOFH, and PUTPUBFH will change the current server state from PUTROOTFH, and PUTPUBFH will change the current server state from
{ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current {ocfh, (osid)} to {cfh, (0, 0)} while LOCK will change the current
state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP state from {cfh, (osid} to {cfh, (nsid)}. Operations like LOOKUP
that transform a current filehandle and component name into a new that transform a current filehandle and component name into a new
current filehandle will also change the current stateid to {0, 0}. current filehandle will also change the current stateid to {0, 0}.
The SAVEFH and RESTOREFH operations will save and restore both the The SAVEFH and RESTOREFH operations will save and restore both the
current filehandle and the current stateid as a set. current filehandle and the current stateid as a set.
The following example is the common case of a simple READ operation The following example is the common case of a simple READ operation
with a supplied stateid showing that the PUTFH initializes the with a supplied stateid showing that the PUTFH initializes the
current stateid to (0, 0). The subsequent READ with stateid (sid1) current stateid to (0, 0). The subsequent READ with stateid (sid1)
leaves the current stateid unchanged, but does evaluate the the leaves the current stateid unchanged, but does evaluate the the
operation. operation.
PUTFH fh1 - -> {fh1, (0, 0)} PUTFH fh1 - -> {fh1, (0, 0)}
READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)} READ (sid1), 0, 1024 {fh1, (0, 0)} -> {fh1, (0, 0)}
Figure 85 Figure 86
This next example performs an OPEN with the root filehandle and as a This next example performs an OPEN with the root filehandle and as a
result generates stateid (sid1). The next operation specifies the result generates stateid (sid1). The next operation specifies the
READ with the argument stateid set such that (seqid, other) are equal READ with the argument stateid set such that (seqid, other) are equal
to (1, 0), but the current stateid set by the previous operation is to (1, 0), but the current stateid set by the previous operation is
actually used when the operation is evaluated. This allows correct actually used when the operation is evaluated. This allows correct
interaction with any existing, potentially conflicting, locks. interaction with any existing, potentially conflicting, locks.
PUTROOTFH - -> {fh1, (0, 0)} PUTROOTFH - -> {fh1, (0, 0)}
OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)} OPEN "compA" {fh1, (0, 0)} -> {fh2, (sid1)}
READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)} READ (1, 0), 0, 1024 {fh2, (sid1)} -> {fh2, (sid1)}
CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)} CLOSE (1, 0) {fh2, (sid1)} -> {fh2, (sid2)}
Figure 86 Figure 87
The final example is similar to the second in how it passes the The final example is similar to the second in how it passes the
stateid sid2 generated by the LOCK operation to the next READ stateid sid2 generated by the LOCK operation to the next READ
operation. This allows the client to explicitly surround a single operation. This allows the client to explicitly surround a single
I/O operation with a lock and its appropriate stateid to guarantee I/O operation with a lock and its appropriate stateid to guarantee
correctness with other client locks. The example also shows how correctness with other client locks. The example also shows how
SAVEFH and RESTOREFH can save and later re-use a filehandle and SAVEFH and RESTOREFH can save and later re-use a filehandle and
stateid, passing them as the current filehandle and stateid to a READ stateid, passing them as the current filehandle and stateid to a READ
operation. operation.
skipping to change at page 389, line 27 skipping to change at page 388, line 27
READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)} READ (1, 0), 0, 1024 {fh1, (sid2)} -> {fh1, (sid2)}
LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)} LOCKU 0, 1024, (1, 0) {fh1, (sid2)} -> {fh1, (sid3)}
SAVEFH {fh1, (sid3)} -> {fh1, (sid3)} SAVEFH {fh1, (sid3)} -> {fh1, (sid3)}
PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)} PUTFH fh2 {fh1, (sid3)} -> {fh2, (0, 0)}
WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)} WRITE (1, 0), 0, 1024 {fh2, (0, 0)} -> {fh2, (0, 0)}
RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)} RESTOREFH {fh2, (0, 0)} -> {fh1, (sid3)}
READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)} READ (1, 0), 1024, 1024 {fh1, (sid3)} -> {fh1, (sid3)}
Figure 87 Figure 88
16.2.4. ERRORS 16.2.4. ERRORS
COMPOUND will of course return every error that each operation on the COMPOUND will of course return every error that each operation on the
fore channel can return (see Table 12). However if COMPOUND returns fore channel can return (see Table 12). However if COMPOUND returns
zero operations, obviously the error returned by COMPOUND has nothing zero operations, obviously the error returned by COMPOUND has nothing
to do with an error returned by an operation. The list of errors to do with an error returned by an operation. The list of errors
COMPOUND will return if it processes zero operations include: COMPOUND will return if it processes zero operations include:
COMPOUND error returns COMPOUND error returns
skipping to change at page 396, line 11 skipping to change at page 395, line 11
NFS is not going to be acceptable to some people. Historically, NFS is not going to be acceptable to some people. Historically,
NFS servers have allowed a user to READ a file if the user has NFS servers have allowed a user to READ a file if the user has
execute access to the file. execute access to the file.
As a practical example, the UNIX specification [41] states that an As a practical example, the UNIX specification [41] states that an
implementation claiming conformance to UNIX may indicate in the implementation claiming conformance to UNIX may indicate in the
access() programming interface's result that a privileged user has access() programming interface's result that a privileged user has
execute rights, even if no execute permission bits are set on the execute rights, even if no execute permission bits are set on the
regular file's attributes. It is possible to claim conformance to regular file's attributes. It is possible to claim conformance to
the UNIX specification and instead not indicate execute rights in the UNIX specification and instead not indicate execute rights in
that situation, which is true for some operating enviroments. that situation, which is true for some operating environments.
Suppose the operating environments of the client and server are Suppose the operating environments of the client and server are
implementing the access() semantics for privileged users differently, implementing the access() semantics for privileged users differently,
and the ACCESS operation implementations of the client and server and the ACCESS operation implementations of the client and server
follow their respective access() semantics. This can cause undesired follow their respective access() semantics. This can cause undesired
behavior: behavior:
o Suppose the client's access() interface returns X_OK if the user o Suppose the client's access() interface returns X_OK if the user
is privileged and no execute permission bits are set on the is privileged and no execute permission bits are set on the
regular file's attribute, and the server's access() interface does regular file's attribute, and the server's access() interface does
not return X_OK in that situation. Then the client will be unable not return X_OK in that situation. Then the client will be unable
skipping to change at page 406, line 32 skipping to change at page 405, line 32
nfsstat4 status; nfsstat4 status;
}; };
18.5.3. DESCRIPTION 18.5.3. DESCRIPTION
Purges all of the delegations awaiting recovery for a given client. Purges all of the delegations awaiting recovery for a given client.
This is useful for clients which do not commit delegation information This is useful for clients which do not commit delegation information
to stable storage to indicate that conflicting requests need not be to stable storage to indicate that conflicting requests need not be
delayed by the server awaiting recovery of delegation information. delayed by the server awaiting recovery of delegation information.
The client is NOT specified by the clientid field of the request.
The client SHOULD set the client field to zero and the server MUST
ignore the clientid field. Instead the server MUST derive the client
ID from the value of the session id in the arguments of the SEQUENCE
operation that precedes DELEGPURGE in the COMPOUND request.
This operation should be used by clients that record delegation This operation should be used by clients that record delegation
information on stable storage on the client. In this case, information on stable storage on the client. In this case,
DELEGPURGE should be sent immediately after doing delegation recovery DELEGPURGE should be sent immediately after doing delegation recovery
on all delegations known to the client. Doing so will notify the on all delegations known to the client. Doing so will notify the
server that no additional delegations for the client will be server that no additional delegations for the client will be
recovered allowing it to free resources, and avoid delaying other recovered allowing it to free resources, and avoid delaying other
clients which make requests that conflict with the unrecovered clients which make requests that conflict with the unrecovered
delegations. The set of delegations known to the server and the delegations. The set of delegations known to the server and the
client may be different. The reason for this is that a client may client may be different. The reason for this is that a client may
fail after making a request which resulted in delegation but before fail after making a request which resulted in delegation but before
skipping to change at page 434, line 33 skipping to change at page 433, line 33
| CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally | | CLAIM_DELEG_CUR_FH | OPEN as granted by the server. Generally |
| | this is done as part of recalling a | | | this is done as part of recalling a |
| | delegation. With CLAIM_DELEGATE_CUR, the | | | delegation. With CLAIM_DELEGATE_CUR, the |
| | file is identified by the current | | | file is identified by the current |
| | filehandle and the specified component | | | filehandle and the specified component |
| | name. With CLAIM_DELEG_CUR_FH (new to | | | name. With CLAIM_DELEG_CUR_FH (new to |
| | NFSv4.1), the file is identified by just | | | NFSv4.1), the file is identified by just |
| | the current filehandle. | | | the current filehandle. |
| CLAIM_DELEGATE_PREV, | The client is claiming a delegation | | CLAIM_DELEGATE_PREV, | The client is claiming a delegation |
| CLAIM_DELEG_PREV_FH | granted to a previous client instance; | | CLAIM_DELEG_PREV_FH | granted to a previous client instance; |
| | used after the client restarts. The | | | used after the client restarts. The server |
| | server MAY support CLAIM_DELEGATE_PREV or | | | MAY support CLAIM_DELEGATE_PREV or |
| | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If | | | CLAIM_DELEG_PREV_FH (new to NFSv4.1). If |
| | it does support either open type, | | | it does support either open type, |
| | CREATE_SESSION MUST NOT remove the | | | CREATE_SESSION MUST NOT remove the |
| | client's delegation state, and the server | | | client's delegation state, and the server |
| | MUST support the DELEGPURGE operation. | | | MUST support the DELEGPURGE operation. |
+----------------------+--------------------------------------------+ +----------------------+--------------------------------------------+
For OPEN requests that reach the server during the grace period, the For OPEN requests that reach the server during the grace period, the
server returns an error of NFS4ERR_GRACE. The following claim types server returns an error of NFS4ERR_GRACE. The following claim types
are exceptions: are exceptions:
skipping to change at page 466, line 44 skipping to change at page 465, line 44
The SECINFO operation is expected to be used by the NFS client when The SECINFO operation is expected to be used by the NFS client when
the error value of NFS4ERR_WRONGSEC is returned from another NFS the error value of NFS4ERR_WRONGSEC is returned from another NFS
operation. This signifies to the client that the server's security operation. This signifies to the client that the server's security
policy is different from what the client is currently using. At this policy is different from what the client is currently using. At this
point, the client is expected to obtain a list of possible security point, the client is expected to obtain a list of possible security
flavors and choose what best suits its policies. flavors and choose what best suits its policies.
As mentioned, the server's security policies will determine when a As mentioned, the server's security policies will determine when a
client request receives NFS4ERR_WRONGSEC. See Table 14 for a list client request receives NFS4ERR_WRONGSEC. See Table 14 for a list
operations which can return NFS4ERR_WRONGSEC. In addition, when operations which can return NFS4ERR_WRONGSEC. In addition, when
READDIR returns attributes, the rdaddr_error (Section 5.8.1.12) can READDIR returns attributes, the rdattr_error (Section 5.8.1.12) can
contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT contain NFS4ERR_WRONGSEC. Note that CREATE and REMOVE MUST NOT
return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the return NFS4ERR_WRONGSEC. The rationale for CREATE is that unless the
target name exists it cannot have a separate security policy from the target name exists it cannot have a separate security policy from the
parent directory, and the security policy of the parent was checked parent directory, and the security policy of the parent was checked
when its filehandle was injected into the COMPOUND request's when its filehandle was injected into the COMPOUND request's
operations stream (for similar reasons, an OPEN operation that operations stream (for similar reasons, an OPEN operation that
creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target creates the target MUST NOT return NFS4ERR_WRONGSEC). If the target
name exists, while it might have a separate security policy, that is name exists, while it might have a separate security policy, that is
irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale irrelevant because CREATE MUST return NFS4ERR_EXIST. The rationale
for REMOVE is that while that target might have separate security for REMOVE is that while that target might have separate security
skipping to change at page 504, line 45 skipping to change at page 503, line 45
records introduced in the description of EXCHANGE_ID is used with the records introduced in the description of EXCHANGE_ID is used with the
following addition: following addition:
clientid_arg: The value of the csa_clientid field of the clientid_arg: The value of the csa_clientid field of the
CREATE_SESSION4args structure of the current request. CREATE_SESSION4args structure of the current request.
Since CREATE_SESSION is a non-idempotent operation, we must consider Since CREATE_SESSION is a non-idempotent operation, we must consider
the possibility that retries may occur as a result of a client the possibility that retries may occur as a result of a client
restart, network partition, malfunctioning router, etc. For each restart, network partition, malfunctioning router, etc. For each
client ID created by EXCHANGE_ID, the server maintains a separate client ID created by EXCHANGE_ID, the server maintains a separate
reply cache similar to the session reply cache used for SEQUENCE reply cache (called the CREATE_SESSION reply cache) similar to the
operations, with two distinctions. session reply cache used for SEQUENCE operations, with two
distinctions.
o First this is a reply cache just for detecting and processing o First this is a reply cache just for detecting and processing
CREATE_SESSION requests for a given client ID. CREATE_SESSION requests for a given client ID.
o Second, the size of the client ID reply cache is of one slot (and o Second, the size of the client ID reply cache is of one slot (and
as a result, the CREATE_SESSION request does not carry a slot as a result, the CREATE_SESSION request does not carry a slot
number). This means that at most one CREATE_SESSION request for a number). This means that at most one CREATE_SESSION request for a
given client ID can be outstanding. given client ID can be outstanding.
As previously stated, CREATE_SESSION can be sent with or without a
preceding SEQUENCE operation. Even if SEQUENCE precedes
CREATE_SESSION, the server MUST maintain the CREATE_SESSION reply
cache, which is separate from the reply cache for the session
associated with SEQUENCE. If CREATE_SESSION was originally sent by
itself, the client MAY send a retry of the CREATE_SESSION operation
within a COMPOUND preceded by SEQUENCE. If CREATE_SESSION was
originally sent in a COMPOUND that started with SEQUENCE, then the
client SHOULD send a retry in a COMPOUND that starts with SEQUENCE
that has the same session id as the SEQUENCE of the original request.
However, the client MAY send a retry in a COMPOUND that either has no
preceding SEQUENCE, or has a preceding SEQUENCE that refers to a
different session than the original CREATE_SESSION. This might be
necessary if the client sends a CREATE_SESSION in a COMPOUND preceded
by a SEQUENCE with session id X, and session X no longer exists.
Regardless, any retry of CREATE_SESSION, with or without a preceding
SEQUENCE, MUST use the same value of csa_sequence as the original.
When a client sends a successful EXCHANGE_ID and it is returned an When a client sends a successful EXCHANGE_ID and it is returned an
unconfirmed client ID, the client is also returned eir_sequenceid, unconfirmed client ID, the client is also returned eir_sequenceid,
and the client is expected to set the value of csa_sequenceid in the and the client is expected to set the value of csa_sequenceid in the
client ID-confirming-CREATE_SESSION it sends with that client ID to client ID-confirming-CREATE_SESSION it sends with that client ID to
the value of eir_sequenceid. When EXCHANGE_ID returns a new, the value of eir_sequenceid. When EXCHANGE_ID returns a new,
unconfirmed client ID, the server initializes the client ID slot to unconfirmed client ID, the server initializes the client ID slot to
be equal to eir_sequenceid - 1 (accounting for underflow), and be equal to eir_sequenceid - 1 (accounting for underflow), and
records a contrived CREATE_SESSION result with a "cached" result of records a contrived CREATE_SESSION result with a "cached" result of
NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the NFS4ERR_SEQ_MISORDERED. With the slot thus initialized, the
processing of the CREATE_SESSION operation is divided into four processing of the CREATE_SESSION operation is divided into four
skipping to change at page 522, line 51 skipping to change at page 521, line 51
the sessionid in the preceding SEQUENCE operation), current the sessionid in the preceding SEQUENCE operation), current
filehandle, layout type (loga_layout_type), and the layout stateid filehandle, layout type (loga_layout_type), and the layout stateid
(loga_stateid). The use of the loga_iomode field depends upon the (loga_stateid). The use of the loga_iomode field depends upon the
layout type, but should reflect the client's data access intent. layout type, but should reflect the client's data access intent.
If the metadata server is in a grace period, and does not persist If the metadata server is in a grace period, and does not persist
layouts and device ID to device address mappings, then it MUST return layouts and device ID to device address mappings, then it MUST return
NFS4ERR_GRACE (see Section 8.4.2.1). NFS4ERR_GRACE (see Section 8.4.2.1).
The LAYOUTGET operation returns layout information for the specified The LAYOUTGET operation returns layout information for the specified
byte range: a layout. To get a layout from a specific offset through byte range: a layout. The client actually specifies two ranges, both
the end-of-file, regardless of the file's length, a loga_length field starting at the offset in the loga_offset field. The first range is
set to NFS4_UINT64_MAX is used. If loga_length is zero, or if a between loga_offset and loga_offset + loga_length - 1 inclusive.
loga_length which is not NFS4_UINT64_MAX is specified, and the sum of This range indicates the desired range the client wants the layout to
loga_length and loga_offset exceeds NFS4_UINT64_MAX, the error cover. The second range is between loga_offset and loga_offset +
NFS4ERR_INVAL will result. loga_minlength - 1 inclusive. This range indicates the required
range the client needs the layout to cover. Thus, loga_minlength
MUST be less than or equal to loga_length.
The loga_minlength field specifies the minimum length of layout the When a length field is set to NFS4_UINT64_MAX, this indicates a
server MUST return with two exceptions: desire (when loga_length is NFS4_UINT64_MAX) or requirement (when
loga_minlength is NFS4_UINT64_MAX) to get a layout from loga_offset
through the end-of-file, regardless of the file's length.
1. The argument loga_iomode was set to LAYOUTIOMODE_READ, and The following rules govern the relationships among, and the minima of
loga_offset plus loga_minlength goes past the end of the file. loga_length, loga_minlength, and loga_offset.
2. The range from loga_offset through loga_offset + loga_minlength - o If loga_length is less than loga_minlength, the metadata server
1 overlaps two or more striping patterns. In which case, MUST return NFS4ERR_INVAL.
logr_layout will contain two or more elements, and the sum of the
lo_length fields of each element MUST be at least loga_minlength
unless the first exception also applies.
If this requirement cannot be met, the server MUST NOT return a o If loga_minlength is zero, this is an indication to the metadata
layout and the error NFS4ERR_BADLAYOUT MUST be returned. server that the client desires any layout at offset loga_offset or
less that the metadata server has "readily available". Readily is
subjective, and depends on the layout type and the pNFS server
implementation. For example, some metadata servers might have to
pre-allocate stable storage when they receive a request for a
range of a file that goes beyond the file's current length. If
loga_minlength is zero and loga_length is greater than zero, this
tells the metadata server what range of the layout the client
would prefer to have. If loga_length and loga_minlength are both
zero, then the client is indicating it desires a layout of any
length with the ending offset of the range no less than specified
loga_offset, and the starting offset at or below loga_offset. If
the metadata server does not have a layout that is readily
available, then it MUST return return NFS4ERR_LAYOUTTRYLATER.
o If the sum of loga_offset and loga_minlength exceeds
NFS4_UINT64_MAX, and loga_minlength is not NFS4_UINT64_MAX, the
error NFS4ERR_INVAL MUST result.
o If the sum of loga_offset and loga_length exceeds NFS4_UINT64_MAX,
and loga_length is not NFS4_UINT64_MAX, the error NFS4ERR_INVAL
MUST result.
After the metadata server has performed the above checks on
loga_offset, loga_minlength, and loga_offset, the metadata server
MUST return a layout according to the rules in Table 21.
Acceptable layouts based on loga_minlength. Note: u64m =
NFS4_UINT64_MAX; a_off = loga_offset; a_minlen = loga_minlength.
+-----------+-----------+----------+----------+---------------------+
| Layout | Layout | Layout | Layout | Layout length of |
| iomode of | a_minlen | iomode | offset | reply |
| request | of | of reply | of reply | |
| | request | | | |
+-----------+-----------+----------+----------+---------------------+
| _READ | u64m | MAY be | MUST be | MUST be >= file |
| | | _READ | <= a_off | length - layout |
| | | | | offset |
| _READ | u64m | MAY be | MUST be | MUST be u64m |
| | | _RW | <= a_off | |
| _READ | > 0 and < | MAY be | MUST be | MUST be >= MIN(file |
| | u64m | _READ | <= a_off | length, a_minlen + |
| | | | | a_off) - layout |
| | | | | offset |
| _READ | > 0 and < | MAY be | MUST be | MUST be >= a_off - |
| | u64m | _RW | <= a_off | layout offset + |
| | | | | a_minlen |
| _READ | 0 | MAY be | MUST be | MUST be > 0 |
| | | _READ | <= a_off | |
| _READ | 0 | MAY be | MUST be | MUST be > 0 |
| | | _RW | <= a_off | |
| _RW | u64m | MUST be | MUST be | MUST be u64m |
| | | _RW | <= a_off | |
| _RW | > 0 and < | MUST be | MUST be | MUST be >= a_off - |
| | u64m | _RW | <= a_off | layout offset + |
| | | | | a_minlen |
| _RW | 0 | MUST be | MUST be | MUST be > 0 |
| | | _RW | <= a_off | |
+-----------+-----------+----------+----------+---------------------+
Table 21
If loga_minlength is not zero and the metadata server cannot return a
layout according to the rules in Table 21, then the metadata server
MUST return the error NFS4ERR_BADLAYOUT. If loga_minlength is zero
and the metadata server cannot or will not return a layout according
to the rules in Table 21, then the metadata server MUST return the
error NFS4ERR_LAYOUTTRYLATER. Assuming loga_length is greater than
loga_minlength or equal to zero, the metadata server SHOULD return a
layout according to the rules in Table 22.
Desired layouts based on loga_length. The rules of Table 21 MUST be
applied first. Note: u64m = NFS4_UINT64_MAX; a_off = loga_offset;
a_len = loga_length.
+------------+------------+-----------+-----------+-----------------+
| Layout | Layout | Layout | Layout | Layout length |
| iomode of | a_len of | iomode of | offset of | of reply |
| request | request | reply | reply | |
+------------+------------+-----------+-----------+-----------------+
| _READ | u64m | MAY be | MUST be | SHOULD be u64m |
| | | _READ | <= a_off | |
| _READ | u64m | MAY be | MUST be | SHOULD be u64m |
| | | _RW | <= a_off | |
| _READ | > 0 and < | MAY be | MUST be | SHOULD be >= |
| | u64m | _READ | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _READ | > 0 and < | MAY be | MUST be | SHOULD be >= |
| | u64m | _RW | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _READ | 0 | MAY be | MUST be | SHOULD be > |
| | | _READ | <= a_off | a_off - layout |
| | | | | offset |
| _READ | 0 | MAY be | MUST be | SHOULD be > |
| | | _READ | <= a_off | a_off - layout |
| | | | | offset |
| _RW | u64m | MUST be | MUST be | SHOULD be u64m |
| | | _RW | <= a_off | |
| _RW | > 0 and < | MUST be | MUST be | SHOULD be >= |
| | u64m | _RW | <= a_off | a_off - layout |
| | | | | offset + a_len |
| _RW | 0 | MUST be | MUST be | SHOULD be > |
| | | _RW | <= a_off | a_off - layout |
| | | | | offset |
+------------+------------+-----------+-----------+-----------------+
Table 22
The loga_stateid field specifies a valid stateid. If a layout is not The loga_stateid field specifies a valid stateid. If a layout is not
currently held by the client, the loga_stateid field represents a currently held by the client, the loga_stateid field represents a
stateid reflecting the correspondingly valid open, byte-range lock, stateid reflecting the correspondingly valid open, byte-range lock,
or delegation stateid. Once a layout is held by the client for the or delegation stateid. Once a layout is held on the file by the
file, the loga_stateid field is a stateid as returned from a previous client, the loga_stateid field MUST be a stateid as returned from a
LAYOUTGET or LAYOUTRETURN operation or provided by a CB_LAYOUTRECALL previous LAYOUTGET or LAYOUTRETURN operation or provided by a
operation (see Section 12.5.3). CB_LAYOUTRECALL operation (see Section 12.5.3).
The loga_maxcount field specifies the maximum layout size (in bytes) The loga_maxcount field specifies the maximum layout size (in bytes)
that the client can handle. If the size of the layout structure that the client can handle. If the size of the layout structure
exceeds the size specified by maxcount, the metadata server will exceeds the size specified by maxcount, the metadata server will
return the NFS4ERR_TOOSMALL error. return the NFS4ERR_TOOSMALL error.
The returned layout is expressed as an array, logr_layout, with each The returned layout is expressed as an array, logr_layout, with each
element of type layout4. If a file has a single striping pattern, element of type layout4. If a file has a single striping pattern,
then logr_layout will contain just one entry. Otherwise, if the then logr_layout SHOULD contain just one entry. Otherwise, if the
requested range overlaps more than one striping pattern, logr_layout requested range overlaps more than one striping pattern, logr_layout
will contain the required number of entries. The elements of will contain the required number of entries. The elements of
logr_layout MUST be sorted in ascending order of the value of the logr_layout MUST be sorted in ascending order of the value of the
lo_offset field of each element. There MUST be no gaps or overlaps lo_offset field of each element. There MUST be no gaps or overlaps
in the range between two successive elements of logr_layout. The in the range between two successive elements of logr_layout. The
lo_iomode field in each element of logr_layout MUST be the same. lo_iomode field in each element of logr_layout MUST be the same.
The metadata server may adjust the range of the returned layout based Table 21 and Table 22 both refer to a returned layout iomode, offset,
on the usage implied by the loga_iomode. The client MUST be prepared and length. Because the returned layout is encoded in the
to get a layout that does not align exactly with its request. See logr_layout array, more description is required.
Section 12.5.2 for more details.
The metadata server may also return a layout with an lo_iomode other iomode
than that requested by the client. If it does so, it MUST ensure
that the lo_iomode is more permissive than the loga_iomode requested. The value of the returned layout iomode listed in Table 21 and
For example, this behavior allows an implementation to upgrade read- Table 22 is equal to the value of the lo_iomode field in each
only requests to read/write requests at its discretion, within the element of logr_layout. As shown in Table 21 and Table 22, the
limits of the layout type specific protocol. A lo_iomode of either metadata server MAY return a layout with an lo_iomode different
LAYOUTIOMODE4_READ or LAYOUTIOMODE4_RW MUST be returned. from the requested iomode (field loga_iomode of the request). If
it does so, it MUST ensure that the lo_iomode is more permissive
than the loga_iomode requested. For example, this behavior allows
an implementation to upgrade read-only requests to read/write
requests at its discretion, within the limits of the layout type
specific protocol. A lo_iomode of either LAYOUTIOMODE4_READ or
LAYOUTIOMODE4_RW MUST be returned.
offset
The value of the returned layout offset listed in Table 21 and
Table 22 is always equal to the lo_offset field of the first
element logr_layout.
length
When setting the value of the returned layout length, the
situation is complicated by the possibility that the special
layout length value NFS4_UINT64_MAX is involved. For a
logr_layout array of N elements, the lo_length field in the first
N-1 elements MUST NOT be NFS4_UINT64_MAX. The lo_length field of
the last element of logr_layout can be NFS4_UINT64_MAX under some
conditions as described in the following list.
* If an applicable rule of Table 21 states the metadata server
MUST return a layout of length NFS4_UINT64_MAX, then lo_length
field of the last element of logr_layout MUST be
NFS4_UINT64_MAX.
* If an applicable rule of Table 21 states the metadata server
MUST NOT return a layout of length NFS4_UINT64_MAX, then
lo_length field of the last element of logr_layout MUST NOT be
NFS4_UINT64_MAX.
* If an applicable rule of Table 22 states the metadata server
SHOULD return a layout of length NFS4_UINT64_MAX, then
lo_length field of the last element of logr_layout SHOULD be
NFS4_UINT64_MAX.
* When the value of the returned layout length of Table 21 and
Table 22 is not NFS4_UINT64_MAX, then the returned layout
length is equal to the sum of the lo_length fields of each
element of logr_layout.
The logr_return_on_close result field is a directive to return the The logr_return_on_close result field is a directive to return the
layout before closing the file. When the server sets this return layout before closing the file. When the metadata server sets this
value to TRUE, it MUST be prepared to recall the layout in the case return value to TRUE, it MUST be prepared to recall the layout in the
the client fails to return the layout before close. For the server case the client fails to return the layout before close. For the
that knows a layout must be returned before a close of the file, this metadata server that knows a layout must be returned before a close
return value can be used to communicate the desired behavior to the of the file, this return value can be used to communicate the desired
client and thus remove one extra step from the client's and server's behavior to the client and thus remove one extra step from the
interaction. client's and metadata server's interaction.
The logr_stateid stateid is returned to the client for use in The logr_stateid stateid is returned to the client for use in
subsequent layout related operations. See Section 8.2, subsequent layout related operations. See Section 8.2,
Section 12.5.3, and Section 12.5.5.2 for a further discussion and Section 12.5.3, and Section 12.5.5.2 for a further discussion and
requirements. requirements.
The format of the returned layout (lo_content) is specific to the The format of the returned layout (lo_content) is specific to the
layout type. The value of the layout type (lo_content.loc_type) for layout type. The value of the layout type (lo_content.loc_type) for
each of the elements of the array of layouts returned by the server each of the elements of the array of layouts returned by the metadata
(logr_layout) MUST be equal to the loga_layout_type specified by the server (logr_layout) MUST be equal to the loga_layout_type specified
client. If it is not equal, the client SHOULD ignore the response as by the client. If it is not equal, the client SHOULD ignore the
invalid and behave as if the server returned an error, even if the response as invalid and behave as if the metadata server returned an
client does have support for the layout type returned. error, even if the client does have support for the layout type
returned.
If layouts are not supported for the requested file or its containing If layouts are not supported for the requested file or its containing
file system the server SHOULD return NFS4ERR_LAYOUTUNAVAILABLE. If file system the metadata server MUST return
the layout type is not supported, the metadata server should return NFS4ERR_LAYOUTUNAVAILABLE. If the layout type is not supported, the
NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts are supported but no layout metadata server MUST return NFS4ERR_UNKNOWN_LAYOUTTYPE. If layouts
matches the client provided layout identification, the server should are supported but no layout matches the client provided layout
return NFS4ERR_BADLAYOUT. If an invalid loga_iomode is specified, or identification, the metadata server MUST return NFS4ERR_BADLAYOUT.
a loga_iomode of LAYOUTIOMODE4_ANY is specified, the server should If an invalid loga_iomode is specified, or a loga_iomode of
return NFS4ERR_BADIOMODE. LAYOUTIOMODE4_ANY is specified, the metadata server MUST return
NFS4ERR_BADIOMODE.
If the layout for the file is unavailable due to transient If the layout for the file is unavailable due to transient
conditions, e.g. file sharing prohibits layouts, the server MUST conditions, e.g. file sharing prohibits layouts, the metadata server
return NFS4ERR_LAYOUTTRYLATER. MUST return NFS4ERR_LAYOUTTRYLATER.
If the layout request is rejected due to an overlapping layout If the layout request is rejected due to an overlapping layout
recall, the server MUST return NFS4ERR_RECALLCONFLICT. See recall, the metadata server MUST return NFS4ERR_RECALLCONFLICT. See
Section 12.5.5.2 for details. Section 12.5.5.2 for details.
If the layout conflicts with a mandatory byte range lock held on the If the layout conflicts with a mandatory byte range lock held on the
file, and if the storage devices have no method of enforcing file, and if the storage devices have no method of enforcing
mandatory locks, other than through the restriction of layouts, the mandatory locks, other than through the restriction of layouts, the
metadata server should return NFS4ERR_LOCKED. metadata server SHOULD return NFS4ERR_LOCKED.
If client sets loga_signal_layout_avail to TRUE, then it is If client sets loga_signal_layout_avail to TRUE, then it is
registering with the client a "want" for a layout in the event the registering with the client a "want" for a layout in the event the
layout cannot be obtained due to resource exhaustion. If the server layout cannot be obtained due to resource exhaustion. If the
supports and will honor the "want", the results will have metadata server supports and will honor the "want", the results will
logr_will_signal_layout_avail set to TRUE. If so the client should have logr_will_signal_layout_avail set to TRUE. If so the client
expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a layout should expect a CB_RECALLABLE_OBJ_AVAIL operation to indicate that a
is available. layout is available.
On success, the current filehandle retains its value and the current On success, the current filehandle retains its value and the current
stateid is updated to match the value as returned in the results. stateid is updated to match the value as returned in the results.
18.43.4. IMPLEMENTATION 18.43.4. IMPLEMENTATION
Typically, LAYOUTGET will be called as part of a COMPOUND request Typically, LAYOUTGET will be called as part of a COMPOUND request
after an OPEN operation and results in the client having location after an OPEN operation and results in the client having location
information for the file; this requires that loga_stateid be set to information for the file; this requires that loga_stateid be set to
the special stateid that tells the server to use the current stateid, the special stateid that tells the metadata server to use the current
which is set by OPEN (see Section 16.2.3.1.2) . A client may also stateid, which is set by OPEN (see Section 16.2.3.1.2) . A client
hold a layout across multiple OPENs. The client specifies a layout may also hold a layout across multiple OPENs. The client specifies a
type that limits what kind of layout the server will return. This layout type that limits what kind of layout the metadata server will
prevents servers from issuing layouts that are unusable by the return. This prevents metadata servers from granting layouts that
client. are unusable by the client.
As indicated by Table 21 and Table 22 the specification of LAYOUTGET
allows a pNFS client and server considerable flexibility. A pNFS
client can take several strategies for sending LAYOUTGET. Some
examples are as follows.
o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and
the OPEN requests read access, the client might opt to request a
_READ layout with loga_offset set to zero, loga_minlength set to
zero, and loga_length set to NFS4_UINT64_MAX. If the file has
space allocated to it, that space is striped over one or more
storage devices, and there is either no conflicting layout, or the
concept of a conflicting layout does not apply to the pNFS
server's layout type or implementation, then the metadata server
might return a layout with a starting offset of zero, and a length
equal to the length of the file, if not NFS4_UINT64_MAX. If the
length of the file is not a multiple of the pNFS server's stripe
width (see Section 13.2 for a formal definition), the metadata
server might round the returned layout's length up.
o If LAYOUTGET is preceded by OPEN in the same COMPOUND request, and
the OPEN does not truncate the file, and requests write access,
the client might opt to request a _RW layout with loga_offset set
to zero, loga_minlength set to zero, and loga_length set to the
file's current length (if known), or NFS4_UINT64_MAX. As with the
previous case, under some conditions the metadata server might
return a layout that covers the entire length of the file or
beyond.
o As above, but the OPEN truncates the file. In this case, client
might anticipate it will be writing to the file from offset zero,
and so loga_offset and loga_minlength are set to zero, and
loga_length is set to the value of threshold4_write_iosize. The
metadata server might return a layout from offset zero with a
length at least as long as as threshold4_write_iosize.
o A process on the client invokes a request to read from offset
10000 for length 50000. The client is using buffered I/O, and has
buffer sizes of 4096 bytes. The client intends to map the request
of the process into a series of READ requests starting at offset
8192. The end offset needs to be higher than 10000 + 50000 =
60000, and the next offset that is a multiple of 4096 is 61440.
The difference between 61440 and that starting offset of the
layout is 53248 (which is the product of 4096 and 15). The value
of threshold4_read_iosize is less than 53248, so the client sends
a LAYOUTGET request with loga_offset set to 8192, loga_minlength
set to 53248, and loga_length set to the file's length (if known)
minus 8192 or NFS4_UINT64_MAX (if the file's length is not known).
Since this LAYOUTGET request exceeds the metadata server's
threshold, it grants the layout, possibly with an initial offset
of 0, with an end offset of at least 8192 + 53248 - 1 = 61439, but
preferably a layout with an offset aligned on the stripe width and
a length that is a multiple of the stripe width.
o As above, but the client is not using buffered I/O, and instead
all internal I/O requests are sent directly to the server. The
LAYOUTGET request has loga_offset equal to 10000, and
loga_minlength set to 50000. The value of loga_length is set to
the length of the file. The metadata server is free to return a
layout that fully overlaps the requested range, with a starting
offset and length aligned on the stripe width.
o Again a process on the client invokes a request to read from
offset 10000 for length 50000, and buffered I/O is in use. The
client is expecting that the server might not be able to return
the layout for the full I/O range, with loga_offset set to 8192
and loga_minlength set to 53248. The client intends to map the
request of the process into a series of READ requests starting at
offset 8192, each with length 4096, with a total length of 53248
(which equals 13 * 4096). Because the value of
threshold4_read_iosize is equal to 4096, it is practical and
reasonable for the client to use several LAYOUTGETs to complete
the series of READs. The client sends a LAYOUTGET request with
loga_offset set to 8192, loga_minlength set to 4096, and
loga_length set to 53248 or higher. The server will grant a
layout possibly with an initial offset of 0, with an end offset of
at least 8192 + 4096 - 1 = 12287, but preferably a layout with an
offset aligned on the stripe width and a length that is a multiple
of the stripe width. This will allow the client to make forward
progress, possibly having to issue more LAYOUTGET requests for the
remainder of the range.
o An NFS client detects a sequential read pattern, and so issues a
LAYOUTGET that goes well beyond any current or pending read
requests to the server. The server might likewise detect this
pattern, and grant the LAYOUTGET request. The client continues to
send LAYOUTGET requests once it has read from an offset of the
file that represents 50% of the way through the last layout it
received.
o As above but the client fails to detect the pattern, but the
server does. The next time the metadata server gets a LAYOUTGET,
it returns a layout with a length that is well beyond
loga_minlength.
o A client is using buffered I/O, and has a long queue of write
behinds to process and also detects a sequential write pattern.
It issues a LAYOUTGET for a layout that spans the range of the
queued write behinds and well beyond, including ranges beyond the
filer's current length. The client continues to issue LAYOUTGETs
once the write behind queue reaches 50% of the maximum queue
length.
Once the client has obtained a layout referring to a particular Once the client has obtained a layout referring to a particular
device ID, the server MUST NOT delete the device ID until the layout device ID, the metadata server MUST NOT delete the device ID until
is returned or revoked. the layout is returned or revoked.
CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is CB_NOTIFY_DEVICEID can race with LAYOUTGET. One race scenario is
that LAYOUTGET returns a device ID the client does not have device that LAYOUTGET returns a device ID the client does not have device
address mappings for, and the server sends a CB_NOTIFY_DEVICEID to address mappings for, and the metadata server sends a
add the device ID to the client's awareness and meanwhile the client CB_NOTIFY_DEVICEID to add the device ID to the client's awareness and
sends GETDEVICEINFO on the device ID. This scenario is discussed in meanwhile the client sends GETDEVICEINFO on the device ID. This
Section 18.40.4. Another scenario is that the CB_NOTIFY_DEVICEID is scenario is discussed in Section 18.40.4. Another scenario is that
processed by the client before it processes the results from the CB_NOTIFY_DEVICEID is processed by the client before it processes
LAYOUTGET. The client will send a GETDEVICEINFO on the device ID. the results from LAYOUTGET. The client will send a GETDEVICEINFO on
If the results from GETDEVICEINFO are received before the client gets the device ID. If the results from GETDEVICEINFO are received before
results from LAYTOUTGET, then there is no longer a race. If the the client gets results from LAYTOUTGET, then there is no longer a
results from LAYOUTGET are received before the results from race. If the results from LAYOUTGET are received before the results
GETDEVICEINFO, the client can either wait for results of from GETDEVICEINFO, the client can either wait for results of
GETDEVICEINFO, or send another one to get possibly more up to date GETDEVICEINFO, or send another one to get possibly more up to date
device address mappings for the device ID. device address mappings for the device ID.
18.44. Operation 51: LAYOUTRETURN - Release Layout Information 18.44. Operation 51: LAYOUTRETURN - Release Layout Information
18.44.1. ARGUMENT 18.44.1. ARGUMENT
/* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */ /* Constants used for LAYOUTRETURN and CB_LAYOUTRECALL */
const LAYOUT4_RET_REC_FILE = 1; const LAYOUT4_RET_REC_FILE = 1;
const LAYOUT4_RET_REC_FSID = 2; const LAYOUT4_RET_REC_FSID = 2;
skipping to change at page 537, line 15 skipping to change at page 541, line 15
If SEQUENCE returns an error, then the state of the slot (sequence If SEQUENCE returns an error, then the state of the slot (sequence
id, cached reply) MUST NOT change, and the associated lease MUST NOT id, cached reply) MUST NOT change, and the associated lease MUST NOT
be renewed. be renewed.
If SEQUENCE returns NFS4_OK, then the associated lease MUST be If SEQUENCE returns NFS4_OK, then the associated lease MUST be
renewed (see Section 8.3), except if renewed (see Section 8.3), except if
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags. SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED is returned in sr_status_flags.
18.46.4. IMPLEMENTATION 18.46.4. IMPLEMENTATION
The server MUST maintain a mapping of sessionid to client ID in order The server MUST maintain a mapping of session id to client ID in
to validate any operations that follow SEQUENCE that take a stateid order to validate any operations that follow SEQUENCE that take a
as an argument and/or result. stateid as an argument and/or result.
If the client establishes a persistent session, then a SEQUENCE done If the client establishes a persistent session, then a SEQUENCE done
after a server restart may encounter requests performed and recorded after a server restart may encounter requests performed and recorded
in a persistent reply cache before the server restart. In this case, in a persistent reply cache before the server restart. In this case,
SEQUENCE will be processed successfully, while requests which were SEQUENCE will be processed successfully, while requests which were
not processed previously are rejected with NFS4ERR_DEADSESSION. not processed previously are rejected with NFS4ERR_DEADSESSION.
Depending on which of the operations within the COMPOUND were Depending on which of the operations within the COMPOUND were
successfully performed before the server restart, these operations successfully performed before the server restart, these operations
will also have replies sent from the server reply cache. Note that will also have replies sent from the server reply cache. Note that
skipping to change at page 547, line 4 skipping to change at page 551, line 4
Once a RECLAIM_COMPLETE is done, there can be no further reclaim Once a RECLAIM_COMPLETE is done, there can be no further reclaim
operations for locks whose scope is defined as having completed operations for locks whose scope is defined as having completed
recovery. Once the client sends RECLAIM_COMPLETE, the server will recovery. Once the client sends RECLAIM_COMPLETE, the server will
not allow the client to do subsequent reclaims of locking state for not allow the client to do subsequent reclaims of locking state for
that scope and if these are attempted, will return NFS4ERR_NO_GRACE. that scope and if these are attempted, will return NFS4ERR_NO_GRACE.
Whenever a client establishes a new client ID and before it does the Whenever a client establishes a new client ID and before it does the
first non-reclaim operation that obtains a lock, it MUST do a global first non-reclaim operation that obtains a lock, it MUST do a global
RECLAIM_COMPLETE, even if there are no locks to reclaim. If non- RECLAIM_COMPLETE, even if there are no locks to reclaim. If non-
reclaim locking operations are done before the RECLAIM_COMPLETE, a reclaim locking operations are done before the RECLAIM_COMPLETE, an
NFS4ERR_GRACE error will be returned. NFS4ERR_GRACE error will be returned.
Similarly, when the client accesses a file system on a new server, Similarly, when the client accesses a file system on a new server,
before it sends the first non-reclaim operation that obtains a lock before it sends the first non-reclaim operation that obtains a lock
on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set on this new server, it must do a RECLAIM_COMPLETE with rca_one_fs set
to TRUE and current filehandle within that file system, even if there to TRUE and current filehandle within that file system, even if there
are no locks to reclaim. If non-reclaim locking operations are done are no locks to reclaim. If non-reclaim locking operations are done
on that file system before the RECLAIM_COMPLETE, a NFS4ERR_GRACE will on that file system before the RECLAIM_COMPLETE, an NFS4ERR_GRACE
be returned. will be returned.
Any locks not reclaimed at the point at which RECLAIM_COMPLETE is Any locks not reclaimed at the point at which RECLAIM_COMPLETE is
done become non-reclaimable. The client MUST NOT attempt to reclaim done become non-reclaimable. The client MUST NOT attempt to reclaim
them, either during the current server instance or in any subsequent them, either during the current server instance or in any subsequent
server instance, or on another server to which responsibility for server instance, or on another server to which responsibility for
that file system is transferred. If the client were to do so, it that file system is transferred. If the client were to do so, it
would be violating the protocol by representing itself as owning would be violating the protocol by representing itself as owning
locks that it does not own, and so has no right to reclaim. See locks that it does not own, and so has no right to reclaim. See
Section 8.4.3 for a discussion of edge conditions related to lock Section 8.4.3 for a discussion of edge conditions related to lock
reclaim. reclaim.
skipping to change at page 549, line 19 skipping to change at page 553, line 19
19.1.1. ARGUMENTS 19.1.1. ARGUMENTS
void; void;
19.1.2. RESULTS 19.1.2. RESULTS
void; void;
19.1.3. DESCRIPTION 19.1.3. DESCRIPTION
Standard NULL procedure. Void argument, void response. Even though CB_NULL is the standard ONC RPC NULL procedure, with the standard
there is no direct functionality associated with this procedure, the void argument and void response. Even though there is no direct
server will use CB_NULL to confirm the existence of a path for RPCs functionality associated with this procedure, the server will use
from server to client. CB_NULL to confirm the existence of a path for RPCs from the server
to client.
19.1.4. ERRORS 19.1.4. ERRORS
None. None.
19.2. Procedure 1: CB_COMPOUND - Compound Operations 19.2. Procedure 1: CB_COMPOUND - Compound Operations
19.2.1. ARGUMENTS 19.2.1. ARGUMENTS
enum nfs_cb_opnum4 { enum nfs_cb_opnum4 {
skipping to change at page 552, line 17 skipping to change at page 556, line 17
nfs_cb_resop4 resarray<>; nfs_cb_resop4 resarray<>;
}; };
19.2.3. DESCRIPTION 19.2.3. DESCRIPTION
The CB_COMPOUND procedure is used to combine one or more of the The CB_COMPOUND procedure is used to combine one or more of the
callback procedures into a single RPC request. The main callback RPC callback procedures into a single RPC request. The main callback RPC
program has two main procedures: CB_NULL and CB_COMPOUND. All other program has two main procedures: CB_NULL and CB_COMPOUND. All other
operations use the CB_COMPOUND procedure as a wrapper. operations use the CB_COMPOUND procedure as a wrapper.
In the processing of the CB_COMPOUND procedure, the client may find During the processing of the CB_COMPOUND procedure, the client may
that it does not have the available resources to execute any or all find that it does not have the available resources to execute any or
of the operations within the CB_COMPOUND sequence. This is discussed all of the operations within the CB_COMPOUND sequence. Refer to
in Section 2.10.5.4. Section 2.10.5.4 for details.
The minorversion field of the arguments MUST be the same as the The minorversion field of the arguments MUST be the same as the
minorversion of the COMPOUND procedure used to created the client ID minorversion of the COMPOUND procedure used to created the client ID
and session. For NFSv4.1, minorversion MUST be set to 1. and session. For NFSv4.1, minorversion MUST be set to 1.
Contained within the CB_COMPOUND results is a 'status' field. This Contained within the CB_COMPOUND results is a 'status' field. This
status must be equivalent to the status of the last operation that status must be equivalent to the status of the last operation that
was executed within the CB_COMPOUND procedure. Therefore, if an was executed within the CB_COMPOUND procedure. Therefore, if an
operation incurred an error then the 'status' value will be the same operation incurred an error then the 'status' value will be the same
error value as is being returned for the operation that failed. error value as is being returned for the operation that failed.
For a description of the "tag" field, see Section 16.2.3 where the The "tag" field is handled the same way as that of COMPOUND procedure
corresponding forward channel procedure is described. (see Section 16.2.3).
Illegal operation codes are handled in the same way as they are Illegal operation codes are handled in the same way as they are
handled for the COMPOUND procedure. handled for the COMPOUND procedure.
19.2.4. IMPLEMENTATION 19.2.4. IMPLEMENTATION
The CB_COMPOUND procedure is used to combine individual operations The CB_COMPOUND procedure is used to combine individual operations
into a single RPC request. The client interprets each of the into a single RPC request. The client interprets each of the
operations in turn. If an operation is executed by the client and operations in turn. If an operation is executed by the client and
the status of that operation is NFS4_OK, then the next operation in the status of that operation is NFS4_OK, then the next operation in
skipping to change at page 553, line 28 skipping to change at page 557, line 28
| NFS4ERR_INVAL | The tag argument is not in UTF-8 | | NFS4ERR_INVAL | The tag argument is not in UTF-8 |
| | encoding. | | | encoding. |
| NFS4ERR_MINOR_VERS_MISMATCH | | | NFS4ERR_MINOR_VERS_MISMATCH | |
| NFS4ERR_SERVERFAULT | | | NFS4ERR_SERVERFAULT | |
| NFS4ERR_TOO_MANY_OPS | | | NFS4ERR_TOO_MANY_OPS | |
| NFS4ERR_REP_TOO_BIG | | | NFS4ERR_REP_TOO_BIG | |
| NFS4ERR_REP_TOO_BIG_TO_CACHE | | | NFS4ERR_REP_TOO_BIG_TO_CACHE | |
| NFS4ERR_REQ_TOO_BIG | | | NFS4ERR_REQ_TOO_BIG | |
+------------------------------+------------------------------------+ +------------------------------+------------------------------------+
Table 21 Table 23
20. NFSv4.1 Callback Operations 20. NFSv4.1 Callback Operations
20.1. Operation 3: CB_GETATTR - Get Attributes 20.1. Operation 3: CB_GETATTR - Get Attributes
20.1.1. ARGUMENT 20.1.1. ARGUMENT
struct CB_GETATTR4args { struct CB_GETATTR4args {
nfs_fh4 fh; nfs_fh4 fh;
bitmap4 attr_request; bitmap4 attr_request;
skipping to change at page 554, line 27 skipping to change at page 558, line 27
20.1.3. DESCRIPTION 20.1.3. DESCRIPTION
The CB_GETATTR operation is used by the server to obtain the current The CB_GETATTR operation is used by the server to obtain the current
modified state of a file that has been write delegated. The modified state of a file that has been write delegated. The
attributes size and change are the only ones guaranteed to be attributes size and change are the only ones guaranteed to be
serviced by the client. See Section 10.4.3 for a full description of serviced by the client. See Section 10.4.3 for a full description of
how the client and server are to interact with the use of CB_GETATTR. how the client and server are to interact with the use of CB_GETATTR.
If the filehandle specified is not one for which the client holds a If the filehandle specified is not one for which the client holds a
write open delegation, an NFS4ERR_BADHANDLE error is returned. write delegation, an NFS4ERR_BADHANDLE error is returned.
20.1.4. IMPLEMENTATION 20.1.4. IMPLEMENTATION
The client returns attrmask bits and the associated attribute values The client returns attrmask bits and the associated attribute values
only for the change attribute, and attributes that it may change only for the change attribute, and attributes that it may change
(time_modify, and size). (time_modify, and size).
20.2. Operation 4: CB_RECALL - Recall an Open Delegation 20.2. Operation 4: CB_RECALL - Recall a Delegation
20.2.1. ARGUMENT 20.2.1. ARGUMENT
struct CB_RECALL4args { struct CB_RECALL4args {
stateid4 stateid; stateid4 stateid;
bool truncate; bool truncate;
nfs_fh4 fh; nfs_fh4 fh;
}; };
20.2.2. RESULT 20.2.2. RESULT
struct CB_RECALL4res { struct CB_RECALL4res {
nfsstat4 status; nfsstat4 status;
}; };
20.2.3. DESCRIPTION 20.2.3. DESCRIPTION
The CB_RECALL operation is used to begin the process of recalling an The CB_RECALL operation is used to begin the process of recalling a
open delegation and returning it to the server. delegation and returning it to the server.
The truncate flag is used to optimize recall for a file which is The truncate flag is used to optimize recall for a file object which
about to be truncated to zero. When it is set, the client is freed is a regular file and is about to be truncated to zero. When it is
of obligation to propagate modified data for the file to the server, TRUE, the client is freed of the obligation to propagate modified
since this data is irrelevant. data for the file to the server, since this data is irrelevant.
If the handle specified is not one for which the client holds an open If the handle specified is not one for which the client holds a
delegation, an NFS4ERR_BADHANDLE error is returned. delegation, an NFS4ERR_BADHANDLE error is returned.
If the stateid specified is not one corresponding to an open If the stateid specified is not one corresponding to an open
delegation for the file specified by the filehandle, an delegation for the file specified by the filehandle, an
NFS4ERR_BAD_STATEID is returned. NFS4ERR_BAD_STATEID is returned.
20.2.4. IMPLEMENTATION 20.2.4. IMPLEMENTATION
The client should reply to the callback immediately. Replying does The client SHOULD reply to the callback immediately. Replying does
not complete the recall except when an error was returned. The not complete the recall except when the value of the reply's status
recall is not complete until the delegation is returned using a field is neither NFS4ERR_DELAY nor NFS4_OK. The recall is not
DELEGRETURN. complete until the delegation is returned using a DELEGRETURN
operation.
20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client 20.3. Operation 5: CB_LAYOUTRECALL - Recall Layout from Client
20.3.1. ARGUMENT 20.3.1. ARGUMENT
/* /*
* NFSv4.1 callback arguments and results * NFSv4.1 callback arguments and results
*/ */
enum layoutrecall_type4 { enum layoutrecall_type4 {
skipping to change at page 556, line 50 skipping to change at page 560, line 50
20.3.2. RESULT 20.3.2. RESULT
struct CB_LAYOUTRECALL4res { struct CB_LAYOUTRECALL4res {
nfsstat4 clorr_status; nfsstat4 clorr_status;
}; };
20.3.3. DESCRIPTION 20.3.3. DESCRIPTION
The CB_LAYOUTRECALL operation is used by the server to recall layouts The CB_LAYOUTRECALL operation is used by the server to recall layouts
from the client; as a result, the client will begin the process of from the client; as a result, the client will begin the process of
returning layouts with LAYOUTRETURN. The CB_LAYOUTRECALL operation returning layouts via LAYOUTRETURN. The CB_LAYOUTRECALL operation
specifies one of three forms of recall processing with the value of specifies one of three forms of recall processing with the value of
layoutrecall_type4. The recall is either for a specific layout (by layoutrecall_type4. The recall is either for a specific layout (by
file), for an entire file system (FSID), or for all file systems file), for an entire file system (FSID), or for all file systems
(ALL). (ALL).
The behavior of the operation varies based on the value of the The behavior of the operation varies based on the value of the
layoutrecall_type4. The value and behaviors are: layoutrecall_type4. The value and behaviors are:
LAYOUTRECALL4_FILE LAYOUTRECALL4_FILE
For a layout to match the recall request, the following fields For a layout to match the recall request, the values of the
must match in value with the layout: clora_type, clora_iomode, following fields must match those of the layout: clora_type,
lor_fh, and the byte range specified by lor_offset, and clora_iomode, lor_fh, and the byte range specified by lor_offset
lor_length. The clora_iomode field may have a special value of and lor_length. The clora_iomode field may have a special value
LAYOUTIOMODE4_ANY. The LAYOUTIOMODE4_ANY will match any value of LAYOUTIOMODE4_ANY. The special value LAYOUTIOMODE4_ANY will
originally returned in a layout; therefore it acts as a wild card match any iomode originally returned in a layout; therefore it
for iomode. The other special value used is for lor_length. If acts as a wild card. The other special value used is for
lor_length has a value of NFS4_MAXFILELEN, the lor_length field lor_length. If lor_length has a value of NFS4_UINT64_MAX, the
means the maximum possible file size. If a matching layout is lor_length field means the maximum possible file size. If a
found, it MUST be returned using the LAYOUTRETURN operation, see matching layout is found, it MUST be returned using the
Section 18.44. An example of the field's special value use is if LAYOUTRETURN operation (see Section 18.44). An example of the
clora_iomode is LAYOUTIOMODE4_ANY, lor_offset is zero, and field's special value use is if clora_iomode is LAYOUTIOMODE4_ANY,
lor_length is NFS4_MAXFILELEN, then the entire layout is to be lor_offset is zero, and lor_length is NFS4_UINT64_MAX, then the
returned. entire layout is to be returned.
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
client does not hold layouts for the file or if the client does client does not hold layouts for the file or if the client does
not have any overlapping layouts for the specification in the not have any overlapping layouts for the specification in the
layout recall. layout recall.
LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL LAYOUTRECALL4_FSID and LAYOUTRECALL4_ALL
If LAYOUTRECALL4_FSID is specified, the fsid specifies the file If LAYOUTRECALL4_FSID is specified, the fsid specifies the file
system for which any outstanding layouts MUST be returned. If system for which any outstanding layouts MUST be returned. If
skipping to change at page 557, line 51 skipping to change at page 561, line 51
respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or respective LAYOUTRETURN with either LAYOUTRETURN4_FSID or
LAYOUTRETURN4_ALL acknowledges to the server that the client LAYOUTRETURN4_ALL acknowledges to the server that the client
invalidated the said device mappings. See Section 12.5.5.2.1.5 invalidated the said device mappings. See Section 12.5.5.2.1.5
for considerations with "bulk" recall of layouts. for considerations with "bulk" recall of layouts.
The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the The NFS4ERR_NOMATCHING_LAYOUT error is only returned when the
client does not hold layouts and does not have valid deviceid client does not hold layouts and does not have valid deviceid
mappings. mappings.
In processing the layout recall request, the client also varies its In processing the layout recall request, the client also varies its
behavior on the value of the clora_changed field. This field is used behavior based on the value of the clora_changed field. This field
by the server to provide additional context for the reason why the is used by the server to provide additional context for the reason
layout is being recalled. A FALSE value for clora_changed indicates why the layout is being recalled. A FALSE value for clora_changed
that no change in the layout is expected and the client may write indicates that no change in the layout is expected and the client may
modified data to the storage devices involved; this must be done write modified data to the storage devices involved; this must be
prior to returning the layout via LAYOUTRETURN. A TRUE value for done prior to returning the layout via LAYOUTRETURN. A TRUE value
clora_changed indicates that the server is changing the layout. for clora_changed indicates that the server is changing the layout.
Examples of layout changes and reasons for a TRUE indication are: Examples of layout changes and reasons for a TRUE indication are: the
metadata server is restriping the file or a permanent error has metadata server is restriping the file or a permanent error has
occurred on a storage device and the metadata server would like to occurred on a storage device and the metadata server would like to
provide a new layout for the file. Therefore, a clora_changed value provide a new layout for the file. Therefore, a clora_changed value
of TRUE indicates some level of change for the layout and the client of TRUE indicates some level of change for the layout and the client
SHOULD NOT write and commit modified data to the storage devices. In SHOULD NOT write and commit modified data to the storage devices. In
this case, the client writes and commits data through the metadata this case, the client writes and commits data through the metadata
server. server.
See Section 12.5.3 for a description of how the lor_stateid field in See Section 12.5.3 for a description of how the lor_stateid field in
the arguments is to be constructed. Note that the "seqid" field of the arguments is to be constructed. Note that the "seqid" field of
lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and lor_stateid MUST NOT be zero. See Section 8.2, Section 12.5.3, and
Section 12.5.5.2 for a further discussion and requirements. Section 12.5.5.2 for a further discussion and requirements.
20.3.4. IMPLEMENTATION 20.3.4. IMPLEMENTATION
The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL The client's processing for CB_LAYOUTRECALL is similar to CB_RECALL
(recall of file delegations) in that straightforward processing of (recall of file delegations) in that the client responds to the
the layout recall done and the client responds to the request before request before actually returning layouts via the LAYOUTRETURN
actually returning layouts with the LAYOUTRETURN operation. While operation. While the client responds to the CB_LAYOUTRECALL
the client responds to the CB_LAYOUTRECALL immediately, the operation immediately, the operation is not considered complete (i.e.
is not considered complete (i.e. considered pending) until all considered pending) until all affected layouts are returned to the
affected layouts are returned to the server with the LAYOUTRETURN server via the LAYOUTRETURN operation.
operation.
Before returning the layout to the server with LAYOUTRETURN, the Before returning the layout to the server via LAYOUTRETURN, the
client should wait for the response from in-process or in-flight client should wait for the response from in-process or in-flight
READ, WRITE, or COMMIT operations that use the recalled layout. READ, WRITE, or COMMIT operations that use the recalled layout.
If the client is holding modified data which is effected by a If the client is holding modified data which is affected by a
recalled layout, the client has various options for writing the data recalled layout, the client has various options for writing the data
to the server. As always, the client may write the data through the to the server. As always, the client may write the data through the
metadata server. In fact, the client may not have a choice other metadata server. In fact, the client may not have a choice other
than writing to the metadata server when the clora_changed argument than writing to the metadata server when the clora_changed argument
is TRUE and a new layout is unavailable from the server. However, is TRUE and a new layout is unavailable from the server. However,
the client may be able to write the modified data to the storage the client may be able to write the modified data to the storage
device if the clora_changed argument is FALSE; this needs to be done device if the clora_changed argument is FALSE; this needs to be done
before returning the layout with LAYOUTRETURN. If the client were to before returning the layout via LAYOUTRETURN. If the client were to
obtain a new layout covering the modified data's range, then writing obtain a new layout covering the modified data's range, then writing
to the storage devices is an available alternative. Note that before to the storage devices is an available alternative. Note that before
obtaining a new layout, the client must first return the original obtaining a new layout, the client must first return the original
layout. layout.
In the case of modified data being written while the layout is held, In the case of modified data being written while the layout is held,
the client must use LAYOUTCOMMIT operations at the appropriate time; the client must use LAYOUTCOMMIT operations at the appropriate time;
as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a as required LAYOUTCOMMIT must be done before the LAYOUTRETURN. If a
large amount of modified data is outstanding, the client may send large amount of modified data is outstanding, the client may send
LAYOUTRETURNs for portions of the recalled layout; this allows the LAYOUTRETURNs for portions of the recalled layout; this allows the
skipping to change at page 561, line 24 skipping to change at page 565, line 24
to clients about changes to delegated directories The registration of to clients about changes to delegated directories The registration of
notifications for the directories occurs when the delegation is notifications for the directories occurs when the delegation is
established using GET_DIR_DELEGATION. These notifications are sent established using GET_DIR_DELEGATION. These notifications are sent
over the backchannel. The notification is sent once the original over the backchannel. The notification is sent once the original
request has been processed on the server. The server will send an request has been processed on the server. The server will send an
array of notifications for changes that might have occurred in the array of notifications for changes that might have occurred in the
directory. The notifications are sent as list of pairs of bitmaps directory. The notifications are sent as list of pairs of bitmaps
and values. See Section 3.3.7 for a description of how NFSv4.1 and values. See Section 3.3.7 for a description of how NFSv4.1
bitmaps work. bitmaps work.
If the server has more notifications then can fit in the CB_COMPOUND If the server has more notifications than can fit in the CB_COMPOUND
request, it SHOULD send a sequence of serial CB_COMPOUND requests so request, it SHOULD send a sequence of serial CB_COMPOUND requests so
that the client's view of the directory does not become confused. that the client's view of the directory does not become confused.
E.g. If the server indicates a file named "foo" is added, and that E.g. If the server indicates a file named "foo" is added, and that
the file "foo" is removed, the order it which the client receives the file "foo" is removed, the order in which the client receives
these notifications are processed needs to be the same as the order these notifications needs to be the same as the order in which
in which corresponding operations occurred on the server. corresponding operations occurred on the server.
If the client holding the delegation makes any changes in the If the client holding the delegation makes any changes in the
directory that cause files or sub directories to be added or removed, directory that cause files or sub directories to be added or removed,
the server will notify that client of the resulting change(s). If the server will notify that client of the resulting change(s). If
the client holding the delegation is making attribute or cookie the client holding the delegation is making attribute or cookie
verifier changes only, the server does not need to send notifications verifier changes only, the server does not need to send notifications
to that client. The server will send the following information for to that client. The server will send the following information for
each operation: each operation:
NOTIFY4_ADD_ENTRY NOTIFY4_ADD_ENTRY
The server will send information about the new directory entry The server will send information about the new directory entry
being created along with the cookie for that entry. The entry being created along with the cookie for that entry. The entry
information (data type notify_add4) includes the component name of information (data type notify_add4) includes the component name of
the entry and attributes. The server will send this type of entry the entry and attributes. The server will send this type of entry
when a file is actually being created, when an entry is being when a file is actually being created, when an entry is being
added to a directory as a result of a rename across directories added to a directory as a result of a rename across directories
(see below), and when a hard link is being created to an existing (see below), and when a hard link is being created to an existing
file. If this entry is added to the end of the directory, the file. If this entry is added to the end of the directory, the
server will set the nad_last_entry flag to true. If the file is server will set the nad_last_entry flag to TRUE. If the file is
added such that there is at least one entry before it, the server added such that there is at least one entry before it, the server
will also return the previous entry information (nad_prev_entry, a will also return the previous entry information (nad_prev_entry, a
variable length array of up to one element. If the array is of variable length array of up to one element. If the array is of
zero length, there is no previous entry), along with its cookie. zero length, there is no previous entry), along with its cookie.
This is to help clients find the right location in their DNLC or This is to help clients find the right location in their file name
directory caches where this entry should be cached. If the new caches and directory caches where this entry should be cached. If
entry's cookie is available, it will be in nad_new_entry_cookie the new entry's cookie is available, it will be in the
(another variable length array of up to one element). If the nad_new_entry_cookie (another variable length array of up to one
addition of the entry causes another entry to be deleted (which element) field. If the addition of the entry causes another entry
can only happen in the rename case) atomically with the addition, to be deleted (which can only happen in the rename case)
then information on this entry is reported in nad_old_entry. atomically with the addition, then information on this entry is
reported in nad_old_entry.
NOTIFY4_REMOVE_ENTRY NOTIFY4_REMOVE_ENTRY
The server will send information about the directory entry being The server will send information about the directory entry being
deleted. The server will also send the cookie value for the deleted. The server will also send the cookie value for the
deleted entry so that clients can get to the cached information deleted entry so that clients can get to the cached information
for this entry. for this entry.
NOTIFY4_RENAME_ENTRY NOTIFY4_RENAME_ENTRY
The server will send information about both the old entry and the The server will send information about both the old entry and the
new entry. This includes name and attributes for each entry. In new entry. This includes name and attributes for each entry. In
skipping to change at page 563, line 32 skipping to change at page 567, line 32
20.5.2. RESULT 20.5.2. RESULT
struct CB_PUSH_DELEG4res { struct CB_PUSH_DELEG4res {
nfsstat4 cpdr_status; nfsstat4 cpdr_status;
}; };
20.5.3. DESCRIPTION 20.5.3. DESCRIPTION
CB_PUSH_DELEG is used by the server to both signal to the client that CB_PUSH_DELEG is used by the server to both signal to the client that
the delegation it wants is available and to simultaneously offer the the delegation it wants (previously indicated via a want established
delegation to the client. The client has the choice of accepting the from an OPEN or WANT_DELEGATION operation) is available and to
delegation by returning NFS4_OK to the server, delaying the decision simultaneously offer the delegation to the client. The client has
to accept the offered delegation by returning NFS4ERR_DELAY or the choice of accepting the delegation by returning NFS4_OK to the
permanently rejecting the offer of the delegation by returning server, delaying the decision to accept the offered delegation by
NFS4ERR_REJECT_DELEG. When a delegation is rejected in this fashion, returning NFS4ERR_DELAY or permanently rejecting the offer of the
the want previously established is permanently deleted. delegation by returning NFS4ERR_REJECT_DELEG. When a delegation is
rejected in this fashion, the want previously established is
The server MUST send in cpda_delegation a delegation which satisfies permanently deleted and the delegation is subject to acquisition by
a request made in an OPEN or WANT_DELEGATION operation. another client.
20.5.4. IMPLEMENTATION 20.5.4. IMPLEMENTATION
If the client does return NFS4ERR_DELAY and there is a conflicting If the client does return NFS4ERR_DELAY and there is a conflicting
delegation request, the server MAY process it at the expense of the delegation request, the server MAY process it at the expense of the
client that returned NFS4ERR_DELAY. The client's want will typically client that returned NFS4ERR_DELAY. The client's want will typically
not be cancelled, but MAY processed behind other delegation requests not be cancelled, but MAY processed behind other delegation requests
or registered wants. or registered wants.
When a client returns a status other than NFS4_OK, NFSERR_DELAY, or When a client returns a status other than NFS4_OK, NFSERR_DELAY, or
NFS4ERR_REJECT_DELAY, the want remains pending, although servers may NFS4ERR_REJECT_DELAY, the want remains pending, although servers may
decide to cancel the want by sending a CB_WANTS_CANCELLED. decide to cancel the want by sending a CB_WANTS_CANCELLED.
20.6. Operation 8: CB_RECALL_ANY - Keep any N delegations 20.6. Operation 8: CB_RECALL_ANY - Keep any N recallable objects
Notify client to return delegation and keep N of them. Notify client to return all but N recallable objects.
20.6.1. ARGUMENT 20.6.1. ARGUMENT
const RCA4_TYPE_MASK_RDATA_DLG = 0; const RCA4_TYPE_MASK_RDATA_DLG = 0;
const RCA4_TYPE_MASK_WDATA_DLG = 1; const RCA4_TYPE_MASK_WDATA_DLG = 1;
const RCA4_TYPE_MASK_DIR_DLG = 2; const RCA4_TYPE_MASK_DIR_DLG = 2;
const RCA4_TYPE_MASK_FILE_LAYOUT = 3; const RCA4_TYPE_MASK_FILE_LAYOUT = 3;
const RCA4_TYPE_MASK_BLK_LAYOUT_MIN = 4; const RCA4_TYPE_MASK_BLK_LAYOUT = 4;
const RCA4_TYPE_MASK_BLK_LAYOUT_MAX = 7;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8; const RCA4_TYPE_MASK_OBJ_LAYOUT_MIN = 8;
const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 11; const RCA4_TYPE_MASK_OBJ_LAYOUT_MAX = 9;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12; const RCA4_TYPE_MASK_OTHER_LAYOUT_MIN = 12;
const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15; const RCA4_TYPE_MASK_OTHER_LAYOUT_MAX = 15;
struct CB_RECALL_ANY4args { struct CB_RECALL_ANY4args {
uint32_t craa_objects_to_keep; uint32_t craa_objects_to_keep;
bitmap4 craa_type_mask; bitmap4 craa_type_mask;
}; };
20.6.2. RESULT 20.6.2. RESULT
skipping to change at page 565, line 23 skipping to change at page 569, line 23
resource pools for layouts and for delegations, or further separate resource pools for layouts and for delegations, or further separate
resources by types of delegations. resources by types of delegations.
When a given resource pool is over-utilized, the server can send a When a given resource pool is over-utilized, the server can send a
CB_RECALL_ANY to clients holding recallable objects of the types CB_RECALL_ANY to clients holding recallable objects of the types
involved, allowing it to keep a certain number of such objects and involved, allowing it to keep a certain number of such objects and
return any excess. A mask specifies which types of objects are to be return any excess. A mask specifies which types of objects are to be
limited. The client chooses, based on its own knowledge of current limited. The client chooses, based on its own knowledge of current
usefulness, which of the objects in that class should be returned. usefulness, which of the objects in that class should be returned.
For NFSv4.1, a number of bits are defined. For some of these, ranges A number of bits are defined. For some of these, ranges are defined
are defined and it is up to the definition of the storage protocol to and it is up to the definition of the storage protocol to specify how
specify how these are to be used. There are ranges for blocks-based these are to be used. There are ranges reserved for object-based
storage protocols, for object-based storage protocols and a reserved storage protocols and for other experimental storage protocols. An
range for other experimental storage protocols. The RFC defining RFC defining such a storage protocol needs to specify how particular
such a storage protocol needs to specify how particular bits within bits within its range are to be used. For example, it may specify a
its range are to be used. For example, it may specify a mapping mapping between attributes of the layout (read vs. write, size of
between attributes of the layout (read vs. write, size of area) and area) and the bit to be used or it may define a field in the layout
the bit to be used or it may define a field in the layout where the where the associated bit position is made available by the server to
associated bit position is made available by the server to the the client.
client.
When an undefined bit is set in the type mask, NFS4ERR_INVAL should RCA4_TYPE_MASK_RDATA_DLG
be returned. If a client does not support an object of the specified
type, if the bit is defined, NFS4ERR_INVAL should not be returned. The client is to return read delegations on non-directory file
Future minor versions of NFSv4 may expand the set of valid type mask objects.
bits.
RCA4_TYPE_MASK_WDATA_DLG
The client is to return write delegations on regular file objects.
RCA4_TYPE_MASK_DIR_DLG
The client is to return directory delegations.
RCA4_TYPE_MASK_FILE_LAYOUT
The client is to return layouts of type LAYOUT4_NFSV4_1_FILES.
RCA4_TYPE_MASK_BLK_LAYOUT
See [31] for a description.
RCA4_TYPE_MASK_OBJ_LAYOUT_MIN to RCA4_TYPE_MASK_OBJ_LAYOUT_MAX
See [30] for a description.
RCA4_TYPE_MASK_OTHER_LAYOUT_MIN to RCA4_TYPE_MASK_OTHER_LAYOUT_MAX
This range is reserved for telling the client to recall layouts of
experimental or site specific layout types (see Section 3.3.13).
When a bit is set in the type mask that corresponds to an undefined
type of recallable object, NFS4ERR_INVAL MUST be returned. When a
bit is set that corresponds to a defined type of object, but the
client does not support an object of the type, NFS4ERR_INVAL MUST NOT
be returned. Future minor versions of NFSv4 may expand the set of
valid type mask bits.
CB_RECALL_ANY specifies a count of objects that the client may keep CB_RECALL_ANY specifies a count of objects that the client may keep
as opposed to a count that the client must return. This is to avoid as opposed to a count that the client must return. This is to avoid
potential race between a CB_RECALL_ANY that had a count of objects to potential race between a CB_RECALL_ANY that had a count of objects to
free with a set of client-originated operations to return layouts or free with a set of client-originated operations to return layouts or
delegations. As a result of the race, the client and server would delegations. As a result of the race, the client and server would
have differing ideas as to how many objects to return. Hence the have differing ideas as to how many objects to return. Hence the
client could mistakenly free too many. client could mistakenly free too many.
If resource demands prompt it, the server may send another If resource demands prompt it, the server may send another
skipping to change at page 567, line 18 skipping to change at page 571, line 46
nfsstat4 croa_status; nfsstat4 croa_status;
}; };
20.7.3. DESCRIPTION 20.7.3. DESCRIPTION
CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client CB_RECALLABLE_OBJ_AVAIL is used by the server to signal the client
that the server has resources to grant recallable objects that might that the server has resources to grant recallable objects that might
previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG, previously have been denied by OPEN, WANT_DELEGATION, GET_DIR_DELEG,
or LAYOUTGET. or LAYOUTGET.
The argument, objects_to_keep means the total number of recallable The argument craa_objects_to_keep means the total number of
objects of the types indicated in the argument type_mask that the recallable objects of the types indicated in the argument type_mask
server believes it can allow the client to have, including the number that the server believes it can allow the client to have, including
of such objects the client already has. A client that tries to the number of such objects the client already has. A client that
acquire more recallable objects than the server informs it can have tries to acquire more recallable objects than the server informs it
runs the risk of having objects recalled. can have runs the risk of having objects recalled.
The server is not obligated to reserve the difference between the
number of the objects the client currently has and the value of
craa_objects_to_keep, nor does delaying the reply to
CB_RECALLABLE_OBJ_AVAIL prevent the server from using the resources
of the recallable objects for another purpose. Indeed, if a client
responds slowly to CB_RECALLABLE_OBJ_AVAIL, the server might
interpret the client as having reduced capability to manage
recallable objects, and so cancel or reduce any reservation it is
maintaining on behalf of the client. Thus if the client desires to
acquire more recallable objects, it needs to reply quickly to
CB_RECALLABLE_OBJ_AVAIL, and then send the appropriate operations to
acquire recallable objects.
20.8. Operation 10: CB_RECALL_SLOT - change flow control limits 20.8. Operation 10: CB_RECALL_SLOT - change flow control limits
Change flow control limits Change flow control limits
20.8.1. ARGUMENT 20.8.1. ARGUMENT
struct CB_RECALL_SLOT4args { struct CB_RECALL_SLOT4args {
slotid4 rsa_target_highest_slotid; slotid4 rsa_target_highest_slotid;
}; };
skipping to change at page 567, line 45 skipping to change at page 572, line 40
20.8.2. RESULT 20.8.2. RESULT
struct CB_RECALL_SLOT4res { struct CB_RECALL_SLOT4res {
nfsstat4 rsr_status; nfsstat4 rsr_status;
}; };
20.8.3. DESCRIPTION 20.8.3. DESCRIPTION
The CB_RECALL_SLOT operation requests the client to return session The CB_RECALL_SLOT operation requests the client to return session
slots, and if applicable, transport credits (e.g. RDMA credits for slots, and if applicable, transport credits (e.g. RDMA credits for
connections associated with the operations channel) to the server. connections associated with the operations channel) of the session's
CB_RECALL_SLOT specifies rsa_target_highest_slotid, the target fore channel. CB_RECALL_SLOT specifies rsa_target_highest_slotid,
highest_slot the server wants for the session. The client, should the value of the target highest slot id the server wants for the
then work toward reducing the highest_slot to the target. session. The client MUST then progress toward reducing the session's
highest slot id to the target value.
If the session has only non-RDMA connections associated with its If the session has only non-RDMA connections associated with its
operations channel, then the client need only wait for all operations channel, then the client need only wait for all
outstanding requests with a slotid > rsa_target_highest_slotid to outstanding requests with a slotid > rsa_target_highest_slotid to
complete, then send a single COMPOUND consisting of a single SEQUENCE complete, then send a single COMPOUND consisting of a single SEQUENCE
operation, with the sa_highestslot field set to operation, with the sa_highestslot field set to
rsa_target_highest_slotid. If there are RDMA-based connections rsa_target_highest_slotid. If there are RDMA-based connections
associated with operation channel, then the client needs to also send associated with operation channel, then the client needs to also send
enough zero-length RDMA Sends to take the total RDMA credit count to enough zero-length RDMA Sends to take the total RDMA credit count to
rsa_target_highest_slotid + 1 or below. rsa_target_highest_slotid + 1 or below.
skipping to change at page 569, line 26 skipping to change at page 574, line 26
case NFS4_OK: case NFS4_OK:
CB_SEQUENCE4resok csr_resok4; CB_SEQUENCE4resok csr_resok4;
default: default:
void; void;
}; };
20.9.3. DESCRIPTION 20.9.3. DESCRIPTION
The CB_SEQUENCE operation is used to manage operational accounting The CB_SEQUENCE operation is used to manage operational accounting
for the backchannel of the session on which a request is sent. The for the backchannel of the session on which a request is sent. The
contents include the session to which this request belongs, slot id contents include the session id to which this request belongs, the
and sequence id used by the server to implement session request slot id and sequence id used by the server to implement session
control and exactly once semantics, and exchanged slot maximums which request control and exactly once semantics, and exchanged slot id
are used to adjust the size of the reply cache. This operation MUST maxima which are used to adjust the size of the reply cache. This
appear once as the first operation in each CB_COMPOUND request or a operation will appear once as the first operation in each CB_COMPOUND
protocol error must result. See Section 18.46.3 for a description of request or a protocol error MUST result. See Section 18.46.3 for a
how slots are processed. description of how slots are processed.
If csa_cachethis is TRUE, then the server is requesting that the If csa_cachethis is TRUE, then the server is requesting that the
client cache the reply in the callback reply cache. The client MUST client cache the reply in the callback reply cache. The client MUST
cache the reply (see Section 2.10.5.1.3). cache the reply (see Section 2.10.5.1.3).
The csa_referring_call_lists array is the list of COMPOUND requests, The csa_referring_call_lists array is the list of COMPOUND requests,
identified by sessionid, slot id and sequencid. These are requests identified by sessionid, slot id and sequencid. These are requests
that the client previously sent to the server. These previous that the client previously sent to the server. These previous
requests created state that some operation(s) in the same CB_COMPOUND requests created state that some operation(s) in the same CB_COMPOUND
as the csa_referring_call_lists is identifying. A sessionid is as the csa_referring_call_lists are identifying. A session id is
included because leased state is tied to a client ID, and a client ID included because leased state is tied to a client ID, and a client ID
can have multiple sessions. See Section 2.10.5.3. can have multiple sessions. See Section 2.10.5.3.
The value of csa_sequenceid argument relative to the cached sequence The value of the csa_sequenceid argument relative to the cached
id on the slot falls into one of three cases. sequence id on the slot falls into one of three cases.
o If the difference between csa_sequenceid and the client's cached o If the difference between csa_sequenceid and the client's cached
sequence id at the slot id is two (2) or more, or if sequence id at the slot id is two (2) or more, or if
csa_sequenceid is less than the cached sequence id (accounting for csa_sequenceid is less than the cached sequence id (accounting for
wraparound of the unsigned sequence id value), then the client wraparound of the unsigned sequence id value), then the client
MUST return NFS4ERR_SEQ_MISORDERED. MUST return NFS4ERR_SEQ_MISORDERED.
o If csa_sequenceid and the cached sequence id are the same, this is o If csa_sequenceid and the cached sequence id are the same, this is
a retry, and the client returns the CB_COMPOUND request's cached a retry, and the client returns the CB_COMPOUND request's cached
reply. reply.
skipping to change at page 570, line 36 skipping to change at page 575, line 36
id, cached reply) MUST NOT change. id, cached reply) MUST NOT change.
The client returns two "highest_slotid" values: csr_highest_slotid, The client returns two "highest_slotid" values: csr_highest_slotid,
and csr_target_highest_slotid. The former is the highest slot id the and csr_target_highest_slotid. The former is the highest slot id the
client will accept in a future CB_SEQUENCE operation, and SHOULD NOT client will accept in a future CB_SEQUENCE operation, and SHOULD NOT
be less than the value of csa_highest_slotid (but see be less than the value of csa_highest_slotid (but see
Section 2.10.5.1 for an exception). The latter is the highest slot Section 2.10.5.1 for an exception). The latter is the highest slot
id the client would prefer the server use on a future CB_SEQUENCE id the client would prefer the server use on a future CB_SEQUENCE
operation. operation.
20.9.4. IMPLEMENTATION
20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation 20.10. Operation 12: CB_WANTS_CANCELLED - Cancel Pending Delegation
Wants Wants
Retracts promise to signal delegation availability. Retracts promise to signal delegation availability.
20.10.1. ARGUMENT 20.10.1. ARGUMENT
struct CB_WANTS_CANCELLED4args { struct CB_WANTS_CANCELLED4args {
bool cwca_contended_wants_cancelled; bool cwca_contended_wants_cancelled;
bool cwca_resourced_wants_cancelled; bool cwca_resourced_wants_cancelled;
skipping to change at page 572, line 13 skipping to change at page 577, line 13
}; };
20.11.2. RESULT 20.11.2. RESULT
struct CB_NOTIFY_LOCK4res { struct CB_NOTIFY_LOCK4res {
nfsstat4 cnlr_status; nfsstat4 cnlr_status;
}; };
20.11.3. DESCRIPTION 20.11.3. DESCRIPTION
The server can use this operation to indicate that a lock for the The server can use this operation to indicate that a byte-range lock
given file and lock-owner, previously requested by the client via an for the given file and lock-owner, previously requested by the client
unsuccessful LOCK request, might be available. via an unsuccessful LOCK request, might be available.
This callback is meant to be used by servers to help reduce the This callback is meant to be used by servers to help reduce the
latency of blocking locks in the case where they recognize that a latency of blocking locks in the case where they recognize that a
client which has been polling for a blocking lock may now be able to client which has been polling for a blocking lock may now be able to
acquire the lock. If the server supports this callback for a given acquire the lock. If the server supports this callback for a given
file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when file, it MUST set the OPEN4_RESULT_MAY_NOTIFY_LOCK flag when
responding to successful opens for that file. This does not commit responding to successful opens for that file. This does not commit
the server to use of CB_NOTIFY_LOCK, but the client may use this as a the server to the use of CB_NOTIFY_LOCK, but the client may use this
hint to decide how frequently to poll for locks derived from that as a hint to decide how frequently to poll for locks derived from
open. that open.
If an OPEN operation results in an upgrade, in which the stateid If an OPEN operation results in an upgrade, in which the stateid
returned has an "other" value matching that of a stateid already returned has an "other" value matching that of a stateid already
allocated, with a new "seqid" indicating a change in the lock being allocated, with a new "seqid" indicating a change in the lock being
represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag represented, then the value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag
when responding to that new OPEN controls handling from that point when responding to that new OPEN controls handling from that point
going forward. When parallel OPENs are done on the same file and going forward. When parallel OPENs are done on the same file and
open-owner, the ordering of the "seqid" field of the returned stateid open-owner, the ordering of the "seqid" field of the returned stateid
(subject to wraparound) are to be used to select the controlling (subject to wraparound) are to be used to select the controlling
value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag. value of the OPEN4_RESULT_MAY_NOTIFY_LOCK flag.
20.11.4. IMPLEMENTATION 20.11.4. IMPLEMENTATION
The server must not grant the lock to the client unless and until it The server MUST NOT grant the lock to the client unless and until it
receives an actual lock request from the client. Similarly, the receives an actual LOCK request from the client. Similarly, the
client receiving this callback cannot assume that it now has the client receiving this callback cannot assume that it now has the
lock, or that a subsequent request for the lock will be successful. lock, or that a subsequent LOCK request for the lock will be
successful.
The server is not required to implement this callback, and even if it The server is not required to implement this callback, and even if it
does, it is not required to use it in any particular case. Therefore does, it is not required to use it in any particular case. Therefore
the client must still rely on polling for blocking locks, as the client must still rely on polling for blocking locks, as
described in Section 9.6. described in Section 9.6.
Similarly, the client is not required to implement this callback, and Similarly, the client is not required to implement this callback, and
even it does, is still free to ignore it. Therefore the server MUST even it does, is still free to ignore it. Therefore the server MUST
NOT assume that the client will act based on the callback. NOT assume that the client will act based on the callback.
skipping to change at page 573, line 46 skipping to change at page 578, line 47
20.12.2. RESULT 20.12.2. RESULT
struct CB_NOTIFY_DEVICEID4res { struct CB_NOTIFY_DEVICEID4res {
nfsstat4 cndr_status; nfsstat4 cndr_status;
}; };
20.12.3. DESCRIPTION 20.12.3. DESCRIPTION
The CB_NOTIFY_DEVICEID operation is used by the server to send The CB_NOTIFY_DEVICEID operation is used by the server to send
notifications to clients about changes to pNFS device IDs. The notifications to clients about changes to pNFS device IDs. The
registration of device ID notifications occurs when the device registration of device ID notifications is optional and is done via
mapping stateid is established using GETDEVICEINFO or GETDEVICELIST. GETDEVICEINFO. These notifications are sent over the backchannel
These notifications are sent over the backchannel. The notification once the original request has been processed on the server. The
is sent once the original request has been processed on the server. server will send an array of notifications, cnda_changes, as a list
The server will send an array of notifications, cnda_changes, as a of pairs of bitmaps and values. See Section 3.3.7 for a description
list of pairs of bitmaps and values. See Section 3.3.7 for a of how NFSv4.1 bitmaps work.
description of how NFSv4.1 bitmaps work.
As with CB_NOTIFY (Section 20.4.3), it is possible the server has As with CB_NOTIFY (Section 20.4.3), it is possible the server has
more notifications than can fit in a CB_COMPOUND, thus requiring more notifications than can fit in a CB_COMPOUND, thus requiring
multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an multiple CB_COMPOUNDs. Unlike CB_NOTIFY, serialization is not an
issue because unlike directory entries, device IDs cannot be re-used issue because unlike directory entries, device IDs cannot be re-used
after being deleted (Section 12.2.10). after being deleted (Section 12.2.10).
All device ID notifications contain a device ID and a layout type. All device ID notifications contain a device ID and a layout type.
The layout type is necessary because two different layout types can The layout type is necessary because two different layout types can
share the same device ID, and the common device ID can have share the same device ID, and the common device ID can have
completely different mappings for each layout type. completely different mappings for each layout type.
The server will send the following notifications: The server will send the following notifications:
NOTIFY_DEVICEID4_CHANGE NOTIFY_DEVICEID4_CHANGE
A previously provided device ID to device address mapping has A previously provided device ID to device address mapping has
changed and the client uses GETDEVICEINFO or GETDEVICELIST to changed and the client uses GETDEVICEINFO to obtain the updated
obtain the updated mapping. The notification is encoded in a mapping. The notification is encoded in a value of data type
value of data type notify_deviceid_change4. This data type also notify_deviceid_change4. This data type also contains a boolean
contains a boolean field, ndc_immediate, which if TRUE indicates field, ndc_immediate, which if TRUE indicates that the change will
that the change will be enforced immediately, and so the client be enforced immediately, and so the client might not be able to
might not be able to complete any pending I/O to the device ID. complete any pending I/O to the device ID. If ndc_immediate is
If ndc_immediate is FALSE, then for an indefinite time, the client FALSE, then for an indefinite time, the client can complete
can complete pending I/O. After pending I/O is complete, the pending I/O. After pending I/O is complete, the client SHOULD get
client SHOULD get the new device ID to device address mappings the new device ID to device address mappings before issuing new
before issuing new I/O to the device ID. I/O to the device ID.
NOTIFY4_DEVICEID_DELETE NOTIFY4_DEVICEID_DELETE
Deletes a device ID from the mappings. This notification MUST NOT Deletes a device ID from the mappings. This notification MUST NOT
be sent if the client has a layout that refers to the device ID. be sent if the client has a layout that refers to the device ID.
In other words if the server is sending a delete device ID In other words if the server is sending a delete device ID
notification, one of the following is true for layouts associated notification, one of the following is true for layouts associated
with the layout type: with the layout type:
* The client never had a layout referring to that device ID. * The client never had a layout referring to that device ID.
skipping to change at page 575, line 23 skipping to change at page 580, line 23
/* /*
* CB_ILLEGAL: Response for illegal operation numbers * CB_ILLEGAL: Response for illegal operation numbers
*/ */
struct CB_ILLEGAL4res { struct CB_ILLEGAL4res {
nfsstat4 status; nfsstat4 status;
}; };
20.13.3. DESCRIPTION 20.13.3. DESCRIPTION
This operation is a placeholder for encoding a result to handle the This operation is a placeholder for encoding a result to handle the
case of the client sending an operation code within COMPOUND that is case of the server sending an operation code within CB_COMPOUND that
not defined in the NFSv4.1 specification. See Section 16.2.3 for is not defined in the NFSv4.1 specification. See Section 19.2.3 for
more details. more details.
The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL. The status field of CB_ILLEGAL4res MUST be set to NFS4ERR_OP_ILLEGAL.
20.13.4. IMPLEMENTATION 20.13.4. IMPLEMENTATION
A server will probably not send an operation with code OP_CB_ILLEGAL A server will probably not send an operation with code OP_CB_ILLEGAL
but if it does, the response will be CB_ILLEGAL4res just as it would but if it does, the response will be CB_ILLEGAL4res just as it would
be with any other invalid operation code. Note that if the client be with any other invalid operation code. Note that if the client
gets an illegal operation code that is not OP_ILLEGAL, and if the gets an illegal operation code that is not OP_ILLEGAL, and if the
client checks for legal operation codes during the XDR decode phase, client checks for legal operation codes during the XDR decode phase,
then the CB_ILLEGAL4res would not be returned. then an instance of data type CB_ILLEGAL4res will not be returned.
21. Security Considerations 21. Security Considerations
NFS has historically used a model where, from an authentication Historically the authentication of model of NFS had the entire
perspective, the client was the entire machine, or at least the machine being the NFS client, and the NFS server trusting the NFS
source network address of the machine. The NFS server relied on the client to authenticate the end-user. The NFS server in turn shared
NFS client to make the proper authentication of the end-user. The its files only to specific clients, as identified by the client's
NFS server in turn shared its files only to specific clients, as source network address. Given this model, the AUTH_SYS RPC security
identified by the client's source network address. Given this model, flavor simply identified the end-user using the client to the NFS
the AUTH_SYS RPC security flavor simply identified the end-user using server. When processing NFS responses, the client ensured that the
the client to the NFS server. When processing NFS responses, the responses came from the same network address and port number that the
client ensured that the responses came from the same network address request was sent to. While such a model is easy to implement and
and port number that the request was sent to. While such a model is simple to deploy and use, it is unsafe. Thus, NFSv4.1
easy to implement and simple to deploy and use, it is certainly not a implementations are REQUIRED to support a security model that uses
safe model. Thus, NFSv4.1 implementations are REQUIRED to support a end to end authentication, where an end-user on a client mutually
security model that uses end to end authentication, where an end-user authenticates (via cryptographic schemes that do not expose passwords
on a client mutually authenticates (via cryptographic schemes that do or keys in the clear on the network) to a principal on an NFS server.
not expose passwords or keys in the clear on the network) to a Consideration is also be given to the integrity and privacy of NFS
principal on an NFS server. Consideration should also be given to requests and responses. The issues of end to end mutual
the integrity and privacy of NFS requests and responses. The issues authentication, integrity, and privacy are discussed
of end to end mutual authentication, integrity, and privacy are Section 2.2.1.1.1.
discussed Section 2.2.1.1.1.
Note that while NFSv4.1 mandates an end to end mutual authentication Note that being REQUIRED to implement does not mean REQUIRED to use;
model, the "classic" model of machine authentication via network AUTH_SYS can be used by NFSv4.1 clients and servers. However,
address checking and AUTH_SYS identification can still be supported AUTH_SYS is merely an OPTIONAL security flavor in NFSv4.1, and so
with the caveat that the AUTH_SYS flavor is neither REQUIRED nor interoperability via AUTH_SYS is not assured.
RECOMMENDED by this specification, and so interoperability via
AUTH_SYS is not assured.
For reasons of reduced administration overhead, better performance For reasons of reduced administration overhead, better performance
and/or reduction of CPU utilization, users of NFSv4.1 implementations and/or reduction of CPU utilization, users of NFSv4.1 implementations
may opt to not use security mechanisms that enable integrity may opt to not use security mechanisms that enable integrity
protection on each remote procedure call and response. The use of protection on each remote procedure call and response. The use of
mechanisms without integrity leaves the user vulnerable to an mechanisms without integrity leaves the user vulnerable to an
attacker in the middle of the NFS client and server that modifies the attacker in the middle of the NFS client and server that modifies the
RPC request and/or the response. While implementations are free to RPC request and/or the response. While implementations are free to
provide the option to use weaker security mechanisms, there are three provide the option to use weaker security mechanisms, there are three
operations in particular that warrant the implementation overriding operations in particular that warrant the implementation overriding
user choices. user choices.
The first two such operations are SECINFO SECINFO_NO_NAME. It is o The first two such operations are SECINFO and SECINFO_NO_NAME. It
RECOMMENDED that the client send the either operation such that it is is RECOMMENDED that the client send both operations such that they
protected with a security flavor that has integrity protection, such is protected with a security flavor that has integrity protection,
as RPCSEC_GSS with either the rpc_gss_svc_integrity or such as RPCSEC_GSS with either the rpc_gss_svc_integrity or
rpc_gss_svc_privacy service. Without integrity protection rpc_gss_svc_privacy service. Without integrity protection
encapsulating SECINFO and SECINFO_NO_NAME and their results, an encapsulating SECINFO and SECINFO_NO_NAME and their results, an
attacker in the middle could modify results such that the client attacker in the middle could modify results such that the client
might select a weaker algorithm in the set allowed by server, making might select a weaker algorithm in the set allowed by server,
the client and/or server vulnerable to further attacks. making the client and/or server vulnerable to further attacks.
The second operation that should definitely use integrity protection o The third operation that should definitely use integrity
is any GETATTR for the fs_locations attribute. The attack has two protection is any GETATTR for the fs_locations and
steps. First the attacker modifies the unprotected results of some fs_locations_info attributes. The attack has two steps. First
operation to return NFS4ERR_MOVED. Second, when the client follows the attacker modifies the unprotected results of some operation to
up with a GETATTR for the fs_locations attribute, the attacker return NFS4ERR_MOVED. Second, when the client follows up with a
modifies the results to cause the client migrate its traffic to a GETATTR for the fs_locations or fs_locations_info attributes, the
server controlled by the attacker. attacker modifies the results to cause the client migrate its
traffic to a server controlled by the attacker.
Relative to previous NFS versions, NFSv4.1 has additional security Relative to previous NFS versions, NFSv4.1 has additional security
considerations for pNFS (see Section 12.9 and Section 13.12), locking considerations for pNFS (see Section 12.9 and Section 13.12), locking
and session state (see Section 2.10.7.3). and session state (see Section 2.10.7.3).
22. IANA Considerations 22. IANA Considerations
22.1. Named Attribute Definitions 22.1. Named Attribute Definitions
The NFSv4.1 protocol provides for the association of named attributes The NFSv4.1 protocol supports the association of a file with zero or
to files. The name space identifiers for these attributes are more named attributes. The name space identifiers for these
defined as string names. The protocol does not define the specific attributes are defined as string names. The protocol does not define
assignment of the name space for these file attributes. Even though the specific assignment of the name space for these file attributes.
the name space is not specifically controlled to prevent collisions, Even though the name space is not specifically controlled to prevent
an IANA registry has been created for the registration of NFSv4.1 collisions, an IANA registry has been created for the registration of
named attributes. Registration will be achieved through the NFSv4.1 named attributes. Registration will be achieved through the
publication of an Informational RFC and will require not only the publication of an Informational RFC and will require not only the
name of the attribute but the syntax and semantics of the named name of the attribute but the syntax and semantics of the named
attribute contents; the intent is to promote interoperability where attribute contents; the intent is to promote interoperability where
common interests exist. While application developers are allowed to common interests exist. While application developers are allowed to
define and use attributes as needed, they are encouraged to register define and use attributes as needed, they are encouraged to register
the attributes with IANA. the attributes with IANA.
Such registered named attributes are presumed to apply to all minor Such registered named attributes are presumed to apply to all minor
versions of NFSv4, including those defined subsequently to the versions of NFSv4, including those defined subsequently to the
registration. Where the named attribute is intended to be limited registration. Where the named attribute is intended to be limited
with regard to the minor versions for which they are not be used, the with regard to the minor versions for which they are not be used, the
Informational RFC must clearly state the applicable limits. Informational RFC must clearly state the applicable limits.
22.2. ONC RPC Network Identifiers (netids) 22.2. ONC RPC Network Identifiers (netids)
Section 3.3.9) discussed the r_netid field and the corresponding Section 3.3.9) discussed the r_netid field and the corresponding
r_addr field within a netaddr4 structure. The NFSv4 protocol depends r_addr field within a netaddr4 structure. The NFSv4 protocol depends
on the syntax and semantics of these fields to effectively on the syntax and semantics of these fields to effectively
communicate callback information between client and server. communicate callback and other information between client and server.
Therefore, an IANA registry has been created to include the values Therefore, an IANA registry has been created to include the values
defined in this document and to allow for future expansion based on defined in this document and to allow for future expansion based on
transport usage/availability. Additions to this ONC RPC Network transport usage/availability. Additions to this ONC RPC Network
Identifier registry must be done with the publication of an RFC. Identifier registry must be done with the publication of an RFC.
The initial values for this registry are as follows (some of this The initial values for this registry are as follows (some of this
text is replicated from Section 3.3.9 for clarity): text is replicated from Section 3.3.9 for clarity):
The Network Identifier (or r_netid for short) is used to specify a The Network Identifier (or r_netid for short) is used to specify a
transport protocol and associated universal address (or r_addr for transport protocol and associated universal address (or r_addr for
skipping to change at page 578, line 44 skipping to change at page 583, line 44
to NFSv4. This requires a new minor version of NFSv4, and requires a to NFSv4. This requires a new minor version of NFSv4, and requires a
standards track document from IETF. Another way to add a standards track document from IETF. Another way to add a
notification is to specify a new layout type. Notifications for new notification is to specify a new layout type. Notifications for new
layout types would be requested via GETDEVICELIST (Section 18.41) and layout types would be requested via GETDEVICELIST (Section 18.41) and
GETDEVICEINFO (Section 18.40). See Section 22.4). GETDEVICEINFO (Section 18.40). See Section 22.4).
22.4. Defining New Layout Types 22.4. Defining New Layout Types
New layout type numbers will be requested from IANA. IANA will only New layout type numbers will be requested from IANA. IANA will only
provide layout type numbers for Standards Track RFCs approved by the provide layout type numbers for Standards Track RFCs approved by the
IESG, in accordance with Standards Action policy defined in RFC2434 IESG, in accordance with Standards Action policy defined in [20].
[20]. All layout types assigned by IANA MUST be in the range 0x00000001 to
0x7FFFFFFF.
The author of a new pNFS layout specification must follow these steps The author of a new pNFS layout specification must follow these steps
to obtain acceptance of the layout type as a standard: to obtain acceptance of the layout type as a standard:
1. The author devises the new layout specification. 1. The author devises the new layout specification.
2. The new layout type specification MUST, at a minimum: 2. The new layout type specification MUST, at a minimum:
* Define the contents of the layout-type-specific fields of the * Define the contents of the layout-type-specific fields of the
following data types: following data types:
skipping to change at page 579, line 36 skipping to change at page 584, line 36
1. Failure and restart for client, server, storage device. 1. Failure and restart for client, server, storage device.
2. Lease expiration from perspective of the active client, 2. Lease expiration from perspective of the active client,
server, storage device. server, storage device.
3. Loss of layout state resulting in fencing of client access 3. Loss of layout state resulting in fencing of client access
to storage devices (for an example, see Section 12.7.3). to storage devices (for an example, see Section 12.7.3).
* A list of any new notification values for CB_NOTIFY_DEVICEID. * A list of any new notification values for CB_NOTIFY_DEVICEID.
* A list of any new recallable object types for CB_RECALL_ANY.
* Include an IANA considerations section. * Include an IANA considerations section.
* Include a security considerations section. * Include a security considerations section.
3. The author documents the new layout specification as an Internet 3. The author documents the new layout specification as an Internet
Draft. Draft.
4. The author submits the Internet Draft for review through the IETF 4. The author submits the Internet Draft for review through the IETF
standards process as defined in "Internet Official Protocol standards process as defined in "Internet Official Protocol
Standards" (STD 1). The new layout specification will be Standards" (STD 1). The new layout specification will be
skipping to change at page 583, line 6 skipping to change at page 588, line 7
[27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings , [27] Werme, R., "RPC XID Issues", USENIX Conference Proceedings ,
February 1996. February 1996.
[28] Nowicki, B., "NFS: Network File System Protocol specification", [28] Nowicki, B., "NFS: Network File System Protocol specification",
RFC 1094, March 1989. RFC 1094, March 1989.
[29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available [29] Bhide, A., Elnozahy, E., and S. Morgan, "A Highly Available
Network Server", USENIX Conference Proceedings , January 1991. Network Server", USENIX Conference Proceedings , January 1991.
[30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS [30] Halevy, B., Welch, B., and J. Zelenka, "Object-based pNFS
Operations", September 2007, <ftp://www.ietf.org/ Operations", April 2008, <ftp://www.ietf.org/internet-drafts/
internet-drafts/draft-nfsv4-pnfs-obj-04.txt>. draft-nfsv4-pnfs-obj-07.txt>.
[31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume [31] Black, D., Fridella, S., and J. Glasgow, "pNFS Block/Volume
Layout", November 2007, <ftp://www.ietf.org/internet-drafts/ Layout", April 2008, <ftp://www.ietf.org/internet-drafts/
draft-ietf-nfsv4-pnfs-block-05.txt>. draft-ietf-nfsv4-pnfs-block-08.txt>.
[32] Callaghan, B., "WebNFS Client Specification", RFC 2054, [32] Callaghan, B., "WebNFS Client Specification", RFC 2054,
October 1996. October 1996.
[33] Callaghan, B., "WebNFS Server Specification", RFC 2055, [33] Callaghan, B., "WebNFS Server Specification", RFC 2055,
October 1996. October 1996.
[34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624, [34] Shepler, S., "NFS Version 4 Design Considerations", RFC 2624,
June 1999. June 1999.
skipping to change at page 584, line 29 skipping to change at page 589, line 29
Burnett, and Charles Fan with contributions from Ted Anderson, Neil Burnett, and Charles Fan with contributions from Ted Anderson, Neil
Brown, and Jon Haswell. Brown, and Jon Haswell.
The initial drafts for the Directory Delegations support were The initial drafts for the Directory Delegations support were
contributed by Saadia Khan with input from Dave Noveck, Mike Eisler, contributed by Saadia Khan with input from Dave Noveck, Mike Eisler,
Carl Burnett, Ted Anderson and Tom Talpey. Carl Burnett, Ted Anderson and Tom Talpey.
The initial drafts for the ACL explanations were contributed by Sam The initial drafts for the ACL explanations were contributed by Sam
Falkner and Lisa Week. Falkner and Lisa Week.
The pNFS work was inspired by the NASD and OSD work done by Garth
Gibson. Gary Grider has also been a champion of high-performance
parallel I/O. Garth Gibson and Peter Corbett started the pNFS effort
with a problem statement document for IETF that formed the basis for
the pNFS work in NFSv4.1.
The initial drafts for the parallel NFS support were edited by Brent The initial drafts for the parallel NFS support were edited by Brent
Welch and Garth Goodson. Additional authors for those documents were Welch and Garth Goodson. Additional authors for those documents were
Benny Halevy, David Black, and Andy Adamson. Additional input came Benny Halevy, David Black, and Andy Adamson. Additional input came
from the informal group which contributed to the construction of the from the informal group which contributed to the construction of the
initial pNFS drafts; specific acknowledgement goes to Gary Grider, initial pNFS drafts; specific acknowledgement goes to Gary Grider,
Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella. Peter Corbett, Dave Noveck, Peter Honeyman, and Stephen Fridella.
The pNFS work was inspired by the NASD and OSD work done by Garth
Gibson. Gary Grider of the national labs (LANL) has also been a
champion of high-performance parallel I/O.
Fredric Isaman found several errors in draft versions of the ONC RPC Fredric Isaman found several errors in draft versions of the ONC RPC
XDR description of the NFSv4.1 protocol. XDR description of the NFSv4.1 protocol.
Audrey Van Bellingham provided, in numerous ways, essential co- Audrey Van Bellingham provided, in numerous ways, essential co-
ordination and management of the process of editing the specification ordination and management of the process of editing the specification
drafts. drafts.
Richard Jernigan gave feedback on the file layout's striping pattern Richard Jernigan gave feedback on the file layout's striping pattern
design. design.
skipping to change at page 585, line 49 skipping to change at page 590, line 51
Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer Iyer, Suchit Kaura, Trond Myklebust, Anatoly Pinchuk, Spencer
Shepler, Renu Tewari, Lisa Week, and Brent Welch. Shepler, Renu Tewari, Lisa Week, and Brent Welch.
A review team worked together to generate the tables of assignments A review team worked together to generate the tables of assignments
of error sets to operations and make sure that each such assignment of error sets to operations and make sure that each such assignment
had two or more people validating it. Participating in the process had two or more people validating it. Participating in the process
were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert were: Andy Adamson, Mike Eisler, Sam Falkner, Garth Goodson, Robert
Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy Gordon, Trond Myklebust, Dave Noveck Spencer Shepler, Tom Talpey, Amy
Weaver, and Lisa Week. Weaver, and Lisa Week.
Others who provided comments include: Mahesh Siddheshwar. Others who provided comments include: Jason Goldschmidt and Mahesh
Siddheshwar.
Authors' Addresses Authors' Addresses
Spencer Shepler Spencer Shepler
Sun Microsystems, Inc. Sun Microsystems, Inc.
7808 Moonflower Drive 7808 Moonflower Drive
Austin, TX 78750 Austin, TX 78750
USA USA
Phone: +1-512-401-1080 Phone: +1-512-401-1080
 End of changes. 173 change blocks. 
573 lines changed or deleted 905 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/