With 28 new chapters, the third edition of The Practice of System and Network Administration innovates yet again! Revised with thousands of updates and clarifications based on reader feedback, this new edition also incorporates DevOps strategies even for non-DevOps environments.
Whether you use Linux, Unix, or Windows, this new edition describes the essential practices previously handed down only from mentor to protégé. This wonderfully lucid, often funny cornucopia of information introduces beginners to advanced frameworks valuable for their entire career, yet is structured to help even experts through difficult projects.
Other books tell you what commands to type. This book teaches you the cross-platform strategies that are timeless!
DevOps techniques: Apply DevOps principles to enterprise IT infrastructure, even in environments without developers
Game-changing strategies: New ways to deliver results faster with less stress
Fleet management: A comprehensive guide to managing your fleet of desktops, laptops, servers and mobile devices
Service management: How to design, launch, upgrade and migrate services
Measurable improvement: Assess your operational effectiveness; a forty-page, pain-free assessment system you can start using today to raise the quality of all services
Design guides: Best practices for networks, data centers, email, storage, monitoring, backups and more
Management skills: Organization design, communication, negotiation, ethics, hiring and firing, and more
Have you ever had any of these problems?
Have you been surprised to discover your backup tapes are blank?
Ever spent a year launching a new service only to be told the users hate it?
Do you have more incoming support requests than you can handle?
Do you spend more time fixing problems than building the next awesome thing?
Have you suffered from a botched migration of thousands of users to a new service?
Does your company rely on a computer that, if it died, can't be rebuilt?
Is your network a fragile mess that breaks any time you try to improve it?
Is there a periodic "hell month" that happens twice a year? Twelve times a year?
Do you find out about problems when your users call you to complain?
Does your corporate "Change Review Board" terrify you?
Does each division of your company have their own broken way of doing things?
Do you fear that automation will replace you, or break more than it fixes?
Are you underpaid and overworked?
No vague "management speak" or empty platitudes. This comprehensive guide provides real solutions that prevent these problems and more!
Preface xxxix
Acknowledgments xlvii
About the Authors li
Part I: Game-Changing Strategies 1
Chapter 1: Climbing Out of the Hole 3
1.1 Organizing WIP 5
1.2 Eliminating Time Sinkholes 12
1.3 DevOps 16
1.4 DevOps Without Devs 16
1.5 Bottlenecks 18
1.6 Getting Started 20
1.7 Summary 21
Exercises 22
Chapter 2: The Small Batches Principle 23
2.1 The Carpenter Analogy 23
2.2 Fixing Hell Month 24
2.3 Improving Emergency Failovers 26
2.4 Launching Early and Often 29
2.5 Summary 34
Exercises 34
Chapter 3: Pets and Cattle 37
3.1 The Pets and Cattle Analogy 37
3.2 Scaling 39
3.3 Desktops as Cattle 40
3.4 Server Hardware as Cattle 41
3.5 Pets Store State 43
3.6 Isolating State 44
3.7 Generic Processes 47
3.8 Moving Variations to the End 51
3.9 Automation 53
3.10 Summary 53
Exercises 54
Chapter 4: Infrastructure as Code 55
4.1 Programmable Infrastructure 56
4.2 Tracking Changes 57
4.3 Benefits of Infrastructure as Code 59
4.4 Principles of Infrastructure as Code 62
4.5 Configuration Management Tools 63
4.6 Example Infrastructure as Code Systems 67
4.7 Bringing Infrastructure as Code to Your Organization 71
4.8 Infrastructure as Code for Enhanced Collaboration 72
4.9 Downsides to Infrastructure as Code 73
4.10 Automation Myths 74
4.11 Summary 75
Exercises 76
Part II: Workstation Fleet Management 77
Chapter 5: Workstation Architecture 79
5.1 Fungibility 80
5.2 Hardware 82
5.3 Operating System 82
5.4 Network Configuration 84
5.5 Accounts and Authorization 86
5.6 Data Storage 89
5.7 OS Updates 93
5.8 Security 94
5.9 Logging 97
5.10 Summary 98
Exercises 99
Chapter 6: Workstation Hardware Strategies 101
6.1 Physical Workstations 101
6.2 Virtual Desktop Infrastructure 105
6.3 Bring Your Own Device 110
6.4 Summary 113
Exercises 114
Chapter 7: Workstation Software Life Cycle 117
7.1 Life of a Machine 117
7.2 OS Installation 120
7.3 OS Configuration 120
7.4 Updating the System Software and Applications 123
7.5 Rolling Out Changes . . . Carefully 128
7.6 Disposal 130
7.7 Summary 134
Exercises 135
Chapter 8: OS Installation Strategies 137
8.1 Consistency Is More Important Than Perfection 138
8.2 Installation Strategies 142
8.3 Test-Driven Configuration Development 147
8.4 Automating in Steps 148
8.5 When Not to Automate 152
8.6 Vendor Support of OS Installation 152
8.7 Should You Trust the Vendor's Installation? 154
8.8 Summary 154
Exercises 155
Chapter 9: Workstation Service Definition 157
9.1 Basic Service Definition 157
9.2 Refresh Cycles 161
9.3 Tiered Support Levels 165
9.4 Workstations as a Managed Service 168
9.5 Summary 170
Exercises 171
Chapter 10: Workstation Fleet Logistics 173
10.1 What Employees See 173
10.2 What Employees Don't See 174
10.3 Configuration Management Database 183
10.4 Small-Scale Fleet Logistics 186
10.5 Summary 188
Exercises 188
Chapter 11: Workstation Standardization 191
11.1 Involving Customers Early 192
11.2 Releasing Early and Iterating 193
11.3 Having a Transition Interval (Overlap) 193
11.4 Ratcheting 194
11.5 Setting a Cut-Off Date 195
11.6 Adapting for Your Corporate Culture 195
11.7 Leveraging the Path of Least Resistance 196
11.8 Summary 198
Exercises 199
Chapter 12: Onboarding 201
12.1 Making a Good First Impression 201
12.2 IT Responsibilities 203
12.3 Five Keys to Successful Onboarding 203
12.4 Cadence Changes 212
12.5 Case Studies 212
12.6 Summary 216
Exercises 217
Part III: Servers 219
Chapter 13: Server Hardware Strategies 221
13.1 All Eggs in One Basket 222
13.2 Beautiful Snowflakes 224
13.3 Buy in Bulk, Allocate Fractions 228
13.4 Grid Computing 235
13.5 Blade Servers 237
13.6 Cloud-Based Compute Services 238
13.7 Server Appliances 241
13.8 Hybrid Strategies 242
13.9 Summary 243
Exercises 244
Chapter 14: Server Hardware Features 245
14.1 Workstations Versus Servers 246
14.2 Server Reliability 249
14.3 Remotely Managing Servers 254
14.4 Separate Administrative Networks 257
14.5 Maintenance Contracts and Spare Parts 258
14.6 Selecting Vendors with Server Experience 261
14.7 Summary 263
Exercises 263
Chapter 15: Server Hardware Specifications 265
15.1 Models and Product Lines 266
15.2 Server Hardware Details 266
15.3 Things to Leave Out 278
15.4 Summary 278
Exercises 279
Part IV: Services 281
Chapter 16: Service Requirements 283
16.1 Services Make the Environment 284
16.2 Starting with a Kick-Off Meeting 285
16.3 Gathering Written Requirements 286
16.4 Customer Requirements 288
16.5 Scope, Schedule, and Resources 291
16.6 Operational Requirements 292
16.7 Open Architecture 298
16.8 Summary 302
Exercises 303
Chapter 17: Service Planning and Engineering 305
17.1 General Engineering Basics 306
17.2 Simplicity 307
17.3 Vendor-Certified Designs 308
17.4 Dependency Engineering 309
17.5 Decoupling Hostname from Service Name 313
17.6 Support 315
17.7 Summary 319
Exercises 319
Chapter 18: Service Resiliency and Performance Patterns 321
18.1 Redundancy Design Patterns 322
18.2 Performance and Scaling 326
18.3 Summary 333
Exercises 334
Chapter 19: Service Launch: Fundamentals 335
19.1 Planning for Problems 335
19.2 The Six-Step Launch Process 336
19.3 Launch Readiness Review 345
19.4 Launch Calendar 348
19.5 Common Launch Problems 349
19.6 Summary 351
Exercises 351
Chapter 20: Service Launch: DevOps 353
20.1 Continuous Integration and Deployment 354
20.2 Minimum Viable Product 357
20.3 Rapid Release with Packaged Software 359
20.4 Cloning the Production Environment 362
20.5 Example: DNS/DHCP Infrastructure Software 363
20.6 Launch with Data Migration 366
20.7 Controlling Self-Updating Software 369
20.8 Summary 370
Exercises 371
Chapter 21: Service Conversions 373
21.1 Minimizing Intrusiveness 374
21.2 Layers Versus Pillars 376
21.3 Vendor Support 377
21.4 Communication 378
21.5 Training 379
21.6 Gradual Roll-Outs 379
21.7 Flash-Cuts: Doing It All at Once 380
21.8 Backout Plan 383
21.9 Summary 385
Exercises 385
Chapter 22: Disaster Recovery and Data Integrity 387
22.1 Risk Analysis 388
22.2 Legal Obligations 389
22.3 Damage Limitation 390
22.4 Preparation 391
22.5 Data Integrity 392
22.6 Redundant Sites 393
22.7 Security Disasters 394
22.8 Media Relations 394
22.9 Summary 395
Exercises 395
Part V: Infrastructure 397
Chapter 23: Network Architecture 399
23.1 Physical Versus Logical 399
23.2 The OSI Model 400
23.3 Wired Office Networks 402
23.4 Wireless Office Networks 406
23.5 Datacenter Networks 408
23.6 WAN Strategies 413
23.7 Routing 419
23.8 Internet Access 420
23.9 Corporate Standards 422
23.10 Software-Defined Networks 425
23.11 IPv6 426
23.12 Summary 428
Exercises 429
Chapter 24: Network Operations 431
24.1 Monitoring 431
24.2 Management 432
24.3 Documentation 437
24.4 Support 440
24.5 Summary 446
Exercises 447
Chapter 25: Datacenters Overview 449
25.1 Build, Rent, or Outsource 450
25.2 Requirements 452
25.3 Summary 456
Exercises 457
Chapter 26: Running a Datacenter 459
26.1 Capacity Management 459
26.2 Life-Cycle Management 465
26.3 Patch Cables 468
26.4 Labeling 471
26.5 Console Access 475
26.6 Workbench 476
26.7 Tools and Supplies 477
26.8 Summary 480
Exercises 481
Part VI: Helpdesks and Support 483
Chapter 27: Customer Support 485
27.1 Having a Helpdesk 485
27.2 Offering a Friendly Face 488
27.3 Reflecting Corporate Culture 488
27.4 Having Enough Staff 488
27.5 Defining Scope of Support 490
27.6 Specifying How to Get Help 493
27.7 Defining Processes for Staff 493
27.8 Establishing an Escalation Process 494
27.9 Defining "Emergency" in Writing 495
27.10 Supplying Request-Tracking Software 496
27.11 Statistical Improvements 498
27.12 After-Hours and 24/7 Coverage 499
27.13 Better Advertising for the Helpdesk 500
27.14 Different Helpdesks for Different Needs 501
27.15 Summary 502
Exercises 503
Chapter 28: Handling an Incident Report 505
28.1 Process Overview 506
28.2 Phase A-Step 1: The Greeting 508
28.3 Phase B: Problem Identification 509
28.4 Phase C: Planning and Execution 515
28.5 Phase D: Verification 518
28.6 Perils of Skipping a Step 519
28.7 Optimizing Customer Care 521
28.8 Summary 525
Exercises 527
Chapter 29: Debugging 529
29.1 Understanding the Customer's Problem 529
29.2 Fixing the Cause, Not the Symptom 531
29.3 Being Systematic 532
29.4 Having the Right Tools 533
29.5 End-to-End Understanding of the System 538
29.6 Summary 540
Exercises 540
Chapter 30: Fixing Things Once 541
30.1 Story: The Misconfigured Servers 541
30.2 Avoiding Temporary Fixes 543
30.3 Learn from Carpenters 545
30.4 Automation 547
30.5 Summary 549
Exercises 550
Chapter 31: Documentation 551
31.1 What to Document 552
31.2 A Simple Template for Getting Started 553
31.3 Easy Sources for Documentation 554
31.4 The Power of Checklists 556
31.5 Wiki Systems 557
31.6 Findability 559
31.7 Roll-Out Issues 559
31.8 A Content-Management System 560
31.9 A Culture of Respect 561
31.10 Taxonomy and Structure 561
31.11 Additional Documentation Uses 562
31.12 Off-Site Links 562
31.13 Summary 563
Exercises 564
Part VII: Change Processes 565
Chapter 32: Change Management 567
32.1 Change Review Boards 568
32.2 Process Overview 570
32.3 Change Proposals 570
32.4 Change Classifications 571
32.5 Risk Discovery and Quantification 572
32.6 Technical Planning 573
32.7 Scheduling 574
32.8 Communication 576
32.9 Tiered Change Review Boards 578
32.10 Change Freezes 579
32.11 Team Change Management 581
32.12 Starting with Git 583
32.13 Summary 585
Exercises 585
Chapter 33: Server Upgrades 587
33.1 The Upgrade Process 587
33.2 Step 1: Develop a Service Checklist 588
33.3 Step 2: Verify Software Compatibility 591
33.4 Step 3: Develop Verification Tests 592
33.5 Step 4: Choose an Upgrade Strategy 595
33.6 Step 5: Write a Detailed Implementation Plan 598
33.7 Step 6: Write a Backout Plan 600
33.8 Step 7: Select a Maintenance Window 600
33.9 Step 8: Announce the Upgrade 602
33.10 Step 9: Execute the Tests 603
33.11 Step 10: Lock Out Customers 604
33.12 Step 11: Do the Upgrade with Someone 605
33.13 Step 12: Test Your Work 605
33.14 Step 13: If All Else Fails, Back Out 605
33.15 Step 14: Restore Access to Customers 606
33.16 Step 15: Communicate Completion/Backout 606
33.17 Summary 608
Exercises 610
Chapter 34: Maintenance Windows 611
34.1 Process Overview 612
34.2 Getting Management Buy-In 613
34.3 Scheduling Maintenance Windows 614
34.4 Planning Maintenance Tasks 615
34.5 Selecting a Flight Director 616
34.6 Managing Change Proposals 617
34.7 Developing the Master Plan 620
34.8 Disabling Access 621
34.9 Ensuring Mechanics and Coordination 622
34.10 Change Completion Deadlines 628
34.11 Comprehensive System Testing 628
34.12 Post-maintenance Communication 630
34.13 Reenabling Remote Access 631
34.14 Be Visible the Next Morning 631
34.15 Postmortem 631
34.16 Mentoring a New Flight Director 632
34.17 Trending of Historical Data 632
34.18 Providing Limited Availability 633
34.19 High-Availability Sites 634
34.20 Summary 636
Exercises 637
Chapter 35: Centralization Overview 639
35.1 Rationale for Reorganizing 640
35.2 Approaches and Hybrids 642
35.3 Summary 643
Exercises 644
Chapter 36: Centralization Recommendations 645
36.1 Architecture 645
36.2 Security 645
36.3 Infrastructure 648
36.4 Support 654
36.5 Purchasing 655
36.6 Lab Environments 656
36.7 Summary 656
Exercises 657
Chapter 37: Centralizing a Service 659
37.1 Understand the Current Solution 660
37.2 Make a Detailed Plan 661
37.3 Get Management Support 662
37.4 Fix the Problems 662
37.5 Provide an Excellent Service 663
37.6 Start Slowly 663
37.7 Look for Low-Hanging Fruit 664
37.8 When to Decentralize 665
37.9 Managing Decentralized Services 666
37.10 Summary 667
Exercises 668
Part VIII: Service Recommendations 669
Chapter 38: Service Monitoring 671
38.1 Types of Monitoring 672
38.2 Building a Monitoring System 673
38.3 Historical Monitoring 674
38.4 Real-Time Monitoring 676
38.5 Scaling 684
38.6 Centralization and Accessibility 685
38.7 Pervasive Monitoring 686
38.8 End-to-End Tests 687
38.9 Application Response Time Monitoring 688
38.10 Compliance Monitoring 689
38.11 Meta-monitoring 690
38.12 Summary 690
Exercises 691
Chapter 39: Namespaces 693
39.1 What Is a Namespace? 693
39.2 Basic Rules of Namespaces 694
39.3 Defining Names 694
39.4 Merging Namespaces 698
39.5 Life-Cycle Management 699
39.6 Reuse 700
39.7 Usage 701
39.8 Federated Identity 708
39.9 Summary 709
Exercises 710
Chapter 40: Nameservices 711
40.1 Nameservice Data 711
40.2 Reliability 714
40.3 Access Policy 721
40.4 Change Policies 723
40.5 Change Procedures 724
40.6 Centralized Management 726
40.7 Summary 728
Exercises 728
Chapter 41: Email Service 729
41.1 Privacy Policy 730
41.2 Namespaces 730
41.3 Reliability 731
41.4 Simplicity 733
41.5 Spam and Virus Blocking 735
41.6 Generality 736
41.7 Automation 737
41.8 Monitoring 738
41.9 Redundancy 738
41.10 Scaling 739
41.11 Security Issues 742
41.12 Encryption 743
41.13 Email Retention Policy 743
41.14 Communication 744
41.15 High-Volume List Processing 745
41.16 Summary 746
Exercises 747
Chapter 42: Print Service 749
42.1 Level of Centralization 750
42.2 Print Architecture Policy 751
42.3 Documentation 754
42.4 Monitoring 755
42.5 Environmental Issues 756
42.6 Shredding 757
42.7 Summary 758
Exercises 758
Chapter 43: Data Storage 759
43.1 Terminology 760
43.2 Managing Storage 765
43.3 Storage as a Service 772
43.4 Performance 780
43.5 Evaluating New Storage Solutions 784
43.6 Common Data Storage Problems 787
43.7 Summary 789
Exercises 790
Chapter 44: Backup and Restore 793
44.1 Getting Started 794
44.2 Reasons for Restores 795
44.3 Corporate Guidelines 799
44.4 A Data-Recovery SLA and Policy 800
44.5 The Backup Schedule 801
44.6 Time and Capacity Planning 807
44.7 Consumables Planning 809
44.8 Restore-Process Issues 815
44.9 Backup Automation 816
44.10 Centralization 819
44.11 Technology Changes 820
44.12 Summary 821
Exercises 822
Chapter 45: Software Repositories 825
45.1 Types of Repositories 826
45.2 Benefits of Repositories 827
45.3 Package Management Systems 829
45.4 Anatomy of a Package 829
45.5 Anatomy of a Repository 833
45.6 Managing a Repository 837
45.7 Repository Client 841
45.8 Build Environment 843
45.9 Repository Examples 845
45.10 Summary 848
Exercises 849
Chapter 46: Web Services 851
46.1 Simple Web Servers 852
46.2 Multiple Web Servers on One Host 853
46.3 Service Level Agreements 854
46.4 Monitoring 855
46.5 Scaling for Web Services 855
46.6 Web Service Security 859
46.7 Content Management 866
46.8 Summary 868
Exercises 869
Part IX: Management Practices 871
Chapter 47: Ethics 873
47.1 Informed Consent 873
47.2 Code of Ethics 875
47.3 Customer Usage Guidelines 875
47.4 Privileged-Access Code of Conduct 877
47.5 Copyright Adherence 878
47.6 Working with Law Enforcement 881
47.7 Setting Expectations on Privacy and Monitoring 885
47.8 Being Told to Do Something Illegal/Unethical 887
47.9 Observing Illegal Activity 888
47.10 Summary 889
Exercises 889
Chapter 48: Organizational Structures 891
48.1 Sizing 892
48.2 Funding Models 894
48.3 Management Chain's Influence 897
48.4 Skill Selection 898
48.5 Infrastructure Teams 900
48.6 Customer Support 902
48.7 Helpdesk 904
48.8 Outsourcing 904
48.9 Consultants and Contractors 906
48.10 Sample Organizational Structures 907
48.11 Summary 911
Exercises 911
Chapter 49: Perception and Visibility 913
49.1 Perception 913
49.2 Visibility 925
49.3 Summary 933
Exercises 934
Chapter 50: Time Management 935
50.1 Interruptions 935
50.2 Follow-Through 937
50.3 Basic To-Do List Management 938
50.4 Setting Goals 939
50.5 Handling Email Once 940
50.6 Precompiling Decisions 942