Implementation Plan
Key Enabler: Zero-Risk Deployment
Section titled “Key Enabler: Zero-Risk Deployment”GlobalProtect stays installed and running on all endpoints throughout the entire migration. NetBird is purely additive. If NetBird fails for any user, they continue using GlobalProtect exactly as they do today. There is no scenario where a NetBird deployment causes a user outage.
Critical Flag: --network-monitor=false
Section titled “Critical Flag: --network-monitor=false”All NetBird clients must be deployed with the --network-monitor=false flag during parallel operation with GlobalProtect. This prevents a known coexistence conflict (GitHub issue #5077) where GlobalProtect’s network changes trigger WireGuard tunnel restarts, dropping TCP sessions. The flag is already included in the TRMM deployment script. Remove it only after GlobalProtect is fully decommissioned.
Phase 1: Pre-Work (Monday — Thursday)
Section titled “Phase 1: Pre-Work (Monday — Thursday)”All prerequisites are staged before the migration weekend. Nothing user-facing changes during this phase.
Monday
Section titled “Monday”| # | Task | Details |
|---|---|---|
| 1 | Verify Entra ID license tier | P1 minimum required for SSPR with password writeback (included in M365 Business Premium or E3) |
| 2 | Enable SSPR + password writeback | Entra admin center > Protection > Password reset. Enable for All or targeted group. Open Entra Connect wizard > Optional features > check “Password writeback.” |
| 3 | Test SSPR with one IT account | Reset password at aka.ms/sspr, verify writeback to on-prem AD |
| 4 | Provision Azure B2s VM | Ubuntu 24.04 LTS, West US 3 (Phoenix), 2 vCPU, 4 GB RAM |
| 5 | Configure NSG rules | Inbound: TCP 80, 443 + UDP 3478 |
| 6 | Assign static public IP | Required for DNS stability |
| 7 | Install Docker + docker-compose | On the Azure VM |
| 8 | Create DNS A record | netbird.gsisg.com pointing to VM public IP |
Tuesday
Section titled “Tuesday”| # | Task | Details |
|---|---|---|
| 1 | Create Entra ID App Registration | Name: “NetBird”, Single tenant, Redirect URIs (SPA): https://netbird.gsisg.com/auth and https://netbird.gsisg.com/silent-auth, Mobile/desktop redirect: http://localhost:53000 |
| 2 | Configure App Registration details | Create application scope “api”, grant User.Read.All permission (admin consent), set accessTokenAcceptedVersion = 2, generate client secret |
| 3 | Record credentials | Application (client) ID, Directory (tenant) ID, Object ID, Client Secret |
| 4 | Pre-build Honolulu routing peer VM | Ubuntu 24.04, 1 vCPU, 1 GB RAM on Hyper-V (DATA003 or DATA004). Do NOT connect to NetBird yet. |
| 5 | Pre-build Boulder routing peer VM | Ubuntu 24.04, 1 vCPU, 1 GB RAM on Hyper-V (DATA001 or DATA007). Do NOT connect to NetBird yet. |
Wednesday
Section titled “Wednesday”| # | Task | Details |
|---|---|---|
| 1 | Finalize TRMM deployment script | Existing: trmm-deploy-netbird.ps1. Verify --network-monitor=false flag is present. |
| 2 | Test MSI install on 1 IT machine | Via TRMM. Will fail to connect (management server not yet up) — verify MSI install + service creation only. |
| 3 | Push AV/EDR exclusions | C:\Program Files\NetBird\ excluded on ALL endpoints via TRMM |
| 4 | Push GPO firewall rules | Allow NetBird wt0 interface in Windows Firewall (prevents GPO from overriding NetBird’s auto-created rules) |
| 5 | Prepare communications | Pilot user briefing (email/Slack). All-staff Monday email with SSPR instructions (aka.ms/sspr). |
| 6 | Confirm DNS propagation | Verify netbird.gsisg.com resolves to the Azure VM public IP |
Thursday
Section titled “Thursday”| # | Task | Details |
|---|---|---|
| 1 | Final pre-flight check | VM accessible via SSH, Docker running, DNS resolved, Entra app registration complete |
| 2 | Brief helpdesk team | What NetBird is, what to tell users, escalation path |
| 3 | Confirm pilot users | Available Saturday for testing (8-10 users across 5 scenarios) |
| 4 | Prepare network route configs | Print/document subnets (10.15.0.0/24, 10.100.7.0/24), groups, ACL policies |
Phase 2: Friday Evening (4 hours: 6 PM — 10 PM)
Section titled “Phase 2: Friday Evening (4 hours: 6 PM — 10 PM)”Infrastructure deployment. No users are affected.
| Time | Task | Duration |
|---|---|---|
| 6:00 PM | Deploy NetBird management server — docker-compose up -d on Azure VM. Verify dashboard loads at https://netbird.gsisg.com. | 15 min |
| 6:15 PM | Configure Entra OIDC integration — Populate setup.env with Entra variables (client ID, tenant ID, secret, OIDC endpoint). Run configure script, restart containers. Test SSO login. | 30 min |
| 6:45 PM | Create break-glass local admin account | 5 min |
| 6:50 PM | Create setup keys + groups — “Routing-Peers” (no expiration), “Company-Laptops” (with expiration), “IT-Admins”, “Hawaii-Engineers”, “Boulder-Engineers” | 15 min |
| 7:05 PM | Deploy routing peers (PARALLEL): | 45 min |
Honolulu — SSH to pre-built VM, install NetBird, connect with routing peer setup key, enable IP forwarding (sysctl -w net.ipv4.ip_forward=1), enable systemd service | ||
| Boulder — Install NetBird on Hyper-V VM (gsi-nb-bld-01), connect with routing peer setup key, verify peer in dashboard | ||
| 7:50 PM | Configure network routes — Honolulu: 10.100.7.0/24 via Honolulu peer (masquerade ON). Boulder: 10.15.0.0/24 via Boulder peer (masquerade ON). Distribution group: “Company-Laptops”. | 20 min |
| 8:10 PM | Configure access control policies (see ACL table below) | 15 min |
| 8:25 PM | IT team self-test — Install NetBird on 2-3 IT laptops. Test: ping DCs at both sites, SMB share access, RDP to a test VM, netbird status, SSPR password reset + cached credential update. | 90 min |
| 9:55 PM | Go/No-Go decision for Saturday pilot — All routes working? OIDC login working? SMB/RDP verified? If No: debug or postpone to next weekend. Zero user impact. | 5 min |
Access Control Policies
Section titled “Access Control Policies”| Policy | Source Group | Destination | Protocols |
|---|---|---|---|
| All Staff — DC Access | All Users | DCs at both sites | TCP/UDP 53, 88, 123, 135, 389, 445, 464, 636, 3268, 3269 |
| Hawaii Engineers | Hawaii-Engineers | Honolulu network | All |
| Boulder Engineers | Boulder-Engineers | Boulder network | All |
| IT Full Access | IT-Admins | All networks | All |
Phase 3: Saturday Pilot (8 hours: 9 AM — 5 PM)
Section titled “Phase 3: Saturday Pilot (8 hours: 9 AM — 5 PM)”Deploy to the pilot group (8-10 users) via TRMM. Validate all 5 scenarios.
| Time | Task | Duration |
|---|---|---|
| 9:00 AM | Deploy to pilot group via TRMM — push script to pilot users’ machines, monitor for successful installs | 30 min |
| 9:30 AM | Contact pilot users — brief each on what to test, provide direct Slack/Teams/phone support | 30 min |
| 10:00 AM | Scenario 1: Hawaii remote to Boulder SMB — map drive to \\10.15.0.x\share, copy 100 MB file, compare performance to GlobalProtect | 60 min |
| 11:00 AM | Scenario 2: Maryland to Boulder RDP — RDP session, assess input lag, run CAD/GIS/SAGE applications | 60 min |
| 12:00 PM | Lunch break | 60 min |
| 1:00 PM | Scenario 3: Honolulu field to local Sage on cellular — access Sage at 10.100.7.40 from cellular, walk between locations to test handoff | 60 min |
| 2:00 PM | Scenario 4: Boulder office to Honolulu files — access \\10.100.7.15\share via routing peers | 60 min |
| 3:00 PM | Scenario 5: Password reset (SSPR) — navigate to aka.ms/sspr, reset password, Win+L, unlock with new password, verify cached credentials updated through NetBird tunnel | 60 min |
| 4:00 PM | Collect pilot feedback — connection failures? DNS issues? Performance problems? | 30 min |
| 4:30 PM | Go/No-Go decision for Sunday full deployment — if all 5 scenarios pass: proceed. If issues found: fix and re-test, or postpone. Postponing has ZERO user impact (GP still works). | 30 min |
Pilot Group Design
Section titled “Pilot Group Design”Recommended size: 8-10 users — large enough to cover all 5 scenarios with redundancy, small enough to provide hands-on support.
| Scenario | User Profile | Selection Criteria | Count |
|---|---|---|---|
| 1. Hawaii remote to Boulder SMB | Hawaii-based worker who regularly accesses Boulder file shares | Tech-comfortable, good at reporting issues, tests hairpin elimination | 1-2 |
| 2. Maryland to Boulder RDP | East Coast worker who RDPs to Boulder VMs for CAD/GIS/SAGE | Tests maximum latency improvement, likely to notice and report performance difference | 1-2 |
| 3. Honolulu field to local Sage | Field worker on cellular accessing Sage (10.100.7.40) | Tests cellular handoff, WireGuard roaming, relay performance | 1-2 |
| 4. Boulder office to Honolulu files | Boulder office worker accessing FILES server (10.100.7.15) or GIS data | Tests site-to-site via routing peers | 1-2 |
| 5. Password reset | Any general staff user (the 90% group) | Tests the primary use case for the majority of users; needs only SSPR + tunnel to DC | 2-3 |
Selection criteria for all pilot users:
- Tech-savvy enough to report issues clearly (able to describe what happened vs. what they expected)
- Good communicators — willing to provide feedback, respond to Slack/Teams messages
- Variety of OS versions — include at least one Windows 10 and one Windows 11 machine
- Variety of hardware ages — include at least one older machine to catch edge cases
- At least 1-2 IT staff members for deep troubleshooting
- Voluntary participation — interested and engaged, not coerced
Phase 4: Sunday Full Deployment (4 hours: 10 AM — 2 PM)
Section titled “Phase 4: Sunday Full Deployment (4 hours: 10 AM — 2 PM)”| Time | Task | Duration |
|---|---|---|
| 10:00 AM | Update TRMM script with any changes from pilot feedback — adjust setup key, management URL, flags if needed | 15 min |
| 10:15 AM | Full deployment via TRMM — bulk execution to all remaining ~90 agents. Monitor for install success/failure. Expect ~90% success on first push. | 30-60 min |
| 11:15 AM | Troubleshoot failures — re-run script on failed endpoints. Check for AV blocking, network issues, disk space. | 60 min |
| 12:15 PM | Verify dashboard — peer count matches expected, routing peers healthy, spot-check netbird status on endpoints via TRMM | 30 min |
| 12:45 PM | Send Monday communication — “A new always-on network service (NetBird) has been deployed to all company laptops. No action needed. For password resets, use aka.ms/sspr. Contact helpdesk if you experience any connectivity issues.” | 15 min |
| 1:00 PM | Configure monitoring — Zabbix alerts for Azure VM, Docker restart policies (unless-stopped), login expiration policy (24h recommended) | 30 min |
| 1:30 PM | Final status check — document issues encountered and resolutions, update helpdesk troubleshooting guide | 30 min |
Phase 5: Monday — Monitor and Support
Section titled “Phase 5: Monday — Monitor and Support”- Monitor NetBird dashboard throughout the day — watch for disconnected peers
- Check TRMM for any endpoints that failed to install
- Be available on Slack/Teams for user questions
- Track helpdesk tickets related to NetBird (expect very few since it is a silent install)
- Verify field workers on cellular have connectivity
- Run
netbird statusspot checks via TRMM on random endpoints - Address stragglers (machines that were off during Sunday deployment)
- Schedule follow-up TRMM task to catch machines that were offline
Phase 6: Decommission GlobalProtect
Section titled “Phase 6: Decommission GlobalProtect”Begin after 2+ weeks of stable operation with all users on NetBird.
| Step | Task | Notes |
|---|---|---|
| 1 | Disable GP auto-connect on endpoints | Do NOT uninstall yet |
| 2 | Remove --network-monitor=false flag | GP removal eliminates the coexistence trigger |
| 3 | Monitor for 30 days of sole NetBird operation | Confirm stability without GP fallback |
| 4 | Uninstall GlobalProtect client from all endpoints via TRMM | Bulk execution |
| 5 | Back up PA-2020 configuration | Power down, retain 90 days before disposal |
| 6 | Remove vpn.gsisg.com DNS record | |
| 7 | Notify cyber insurance broker | Migration to ZTNA architecture |
| 8 | Document final architecture |
Rollback Plan
Section titled “Rollback Plan”The fundamental safety net: NetBird is additive. GlobalProtect stays installed, configured, and running on all endpoints throughout the migration and beyond. There is no scenario where a NetBird failure causes a user outage.
Level 1: User Self-Remediation (0 minutes)
Section titled “Level 1: User Self-Remediation (0 minutes)”User simply uses GlobalProtect as before. No action needed — GP is still installed and running.
Level 2: Individual Endpoint Removal via TRMM (5 minutes)
Section titled “Level 2: Individual Endpoint Removal via TRMM (5 minutes)”Run as Administrator via TRMM on a single machine:
Stop-Service "NetBird" -Force -ErrorAction SilentlyContinueStart-Sleep -Seconds 2& "C:\Program Files\NetBird\netbird_uninstall.exe" /SStart-Sleep -Seconds 5Remove-Item -Path "C:\ProgramData\Netbird" -Recurse -Force -ErrorAction SilentlyContinueWrite-Host "NetBird removed. GlobalProtect remains active."Level 3: Mass Uninstall via TRMM (15-30 minutes)
Section titled “Level 3: Mass Uninstall via TRMM (15-30 minutes)”- Create the uninstall script above in TRMM
- Create an Automation Policy targeting all clients/sites
- Run as “Fire and Forget” on all agents
- Verify removal via TRMM script that checks for the NetBird service
- GlobalProtect continues to function — users experience zero disruption
Level 4: Infrastructure Teardown
Section titled “Level 4: Infrastructure Teardown”- Stop NetBird containers on Azure VM:
docker compose down - Remove Honolulu routing peer:
sudo netbird down && sudo apt remove netbird - Remove Boulder routing peer:
sudo apt remove netbirdon the Hyper-V VM - Delete Azure VM to stop billing (optional)
- Remove
netbird.gsisg.comDNS record
Timeline Summary
Section titled “Timeline Summary”| Day | Phase | Hours | Milestone |
|---|---|---|---|
| Mon-Thu | Phase 1: Pre-Work | ~8 hrs total | Azure VM, DNS, Docker, Entra App Registration, SSPR, TRMM script, AV exclusions, GPO rules, communications |
| Friday | Phase 2: Infrastructure | 4 hrs (6-10 PM) | Management server, OIDC, routing peers, routes, ACLs, IT self-test, Go/No-Go |
| Saturday | Phase 3: Pilot | 8 hrs (9 AM-5 PM) | 8-10 pilot users, 5 scenarios validated, Go/No-Go |
| Sunday | Phase 4: Full Deploy | 4 hrs (10 AM-2 PM) | TRMM bulk deployment to ~90 remaining endpoints, monitoring configured |
| Monday | Phase 5: Support | Full day | Monitor dashboard, catch stragglers, support users |
| Week 3+ | Phase 6: Decommission | 30+ days | Disable GP, remove --network-monitor=false, uninstall GP, decommission PA-2020 |
Total active deployment time: ~24 hours across one week (pre-work + weekend). The compressed timeline is possible because NetBird deployment is silent, additive, and GlobalProtect provides a complete fallback for every failure scenario.