Azure Sizing & GP Coexistence
Azure VM Sizing and GlobalProtect Coexistence
Section titled “Azure VM Sizing and GlobalProtect Coexistence”Executive Summary
Section titled “Executive Summary”B2s (2 vCPU / 4 GB RAM) is the correct VM size for the NetBird management server. B1ms (1 vCPU / 2 GB) meets the official minimum for the management plane but is risky under relay load or mass reconnection events. The extra $15/month eliminates scaling risk.
SQLite is adequate for 100-150 peers with modest policy sets. PostgreSQL becomes necessary at 300+ peers. A separate relay server is unnecessary — the embedded relay handles the ~5-15% of connections that require relaying. Total Azure cost: ~$39/month pay-as-you-go or ~$28/month with 1-year reserved instance.
Regarding GlobalProtect coexistence during migration: the issue is real but well-understood. GP triggers NetBird’s network monitor to restart the WireGuard interface, killing active sessions. The workaround (--network-monitor=false) is safe with manageable side effects. Both VPNs CAN run simultaneously because their routes do not overlap (100.64.0.0/10 overlay vs. corporate subnets).
Azure VM Sizing for 100-150 Peers
Section titled “Azure VM Sizing for 100-150 Peers”Why B2s Wins
Section titled “Why B2s Wins”| Factor | B1ms (1 vCPU / 2 GB) | B2s (2 vCPU / 4 GB) |
|---|---|---|
| Normal operation | Sufficient | Comfortable |
| Mass reconnection event | CPU-constrained; peers queue | Handles gracefully |
| Relay under load | Risk of CPU starvation | Adequate headroom |
| Future growth (150-250 peers) | Requires migration | Handles without changes |
| Docker overhead (4 containers) | Tight on 2 GB RAM | Comfortable on 4 GB RAM |
| Monthly cost (pay-as-you-go) | ~$15.11 | ~$30.37 |
| Cost difference | Baseline | +$15.23/mo ($183/yr) |
Management plane only: B1ms is sufficient. The management server is a lightweight Go binary. Official docs confirm “1 CPU and 2 GB of memory” as the minimum. One user reports running 1,000 active users successfully.
With relay traffic: B1ms becomes risky. The v0.62+ QUIC relay is more CPU-efficient than the old Coturn relay, but relay spikes during network events can saturate a single vCPU.
Policy complexity matters more than peer count. With simple policies (5-15 for GSISG), compute overhead is trivial. The pathological case (70 peers, 480 policies requiring 8 vCPU) does not apply here.
Resource Consumption Estimates
Section titled “Resource Consumption Estimates”| Component | CPU | RAM |
|---|---|---|
| Management server (peer sync) | ~5-15% of 1 vCPU | ~200-400 MB |
| Signal server | <5% of 1 vCPU | ~50-100 MB |
| Embedded relay | 0-30% of 1 vCPU | ~50-150 MB |
| Dashboard + Traefik | <5% of 1 vCPU | ~100-200 MB |
| Total (normal) | ~15-50% of 1 vCPU | ~400-850 MB |
| Total (peak, all reconnecting) | ~80-100% of 1 vCPU | ~1-1.5 GB |
Embedded Relay Sufficiency
Section titled “Embedded Relay Sufficiency”A separate relay server is not needed for GSISG’s deployment.
GSISG’s network profile:
| User Category | Count | Expected Connection Type |
|---|---|---|
| Office workers (Honolulu) | ~60-70 | P2P in most cases |
| Office workers (Boulder) | ~20-30 | P2P in most cases |
| Remote workers (home) | ~20-30 | P2P (home router = easy NAT) |
| Field workers (cellular) | ~5-10 | Likely relayed (CGNAT) |
Estimated relay percentage: 5-15% of active connections. At peak: 1-3 simultaneous relayed connections at 2-5 Mbps each = 5-15 Mbps total relay bandwidth. Well within B2s capacity.
A separate relay makes sense only at >50 peers simultaneously relaying, geographic distribution across Azure regions, or >100 Mbps sustained relay bandwidth. None of these apply to GSISG.
SQLite vs. PostgreSQL
Section titled “SQLite vs. PostgreSQL”SQLite is fine for 100-150 peers. PostgreSQL is overkill.
The primary SQLite concern is events.db growth — event logging can grow to multiple GB within months with 100+ peers connecting/disconnecting daily. Mitigation: periodic archive/truncate via cron job.
| Threshold | Action |
|---|---|
| <150 peers, <20 policies | Stay on SQLite |
| 150-300 peers | Consider PostgreSQL or events cleanup |
| 300+ peers OR 100+ policies | Migrate to PostgreSQL |
Azure Cost Breakdown
Section titled “Azure Cost Breakdown”| Component | Specification | Monthly | Annual |
|---|---|---|---|
| VM: Standard_B2s | 2 vCPU, 4 GB RAM, Linux | $30.37 | $364.44 |
| OS Disk: P4 Premium SSD | 32 GB | $5.28 | $63.36 |
| Public IP: Standard Static | Required for peer connectivity | $3.65 | $43.80 |
| Bandwidth (egress) | 2-6 GB/mo (within 100 GB free tier) | $0.00 | $0.00 |
| Total (pay-as-you-go) | ~$39.30 | ~$471.60 | |
| Total (1-yr reserved) | ~$28.06 | ~$336.72 |
GlobalProtect + NetBird Coexistence
Section titled “GlobalProtect + NetBird Coexistence”The Problem (GitHub #5077)
Section titled “The Problem (GitHub #5077)”When a user activates GlobalProtect while NetBird is running:
- GP creates “PANGP Virtual Ethernet Adapter Secure” and adds routes
- NetBird’s network monitor detects the route change as a “significant network change”
- NetBird restarts the WireGuard interface, destroying all active TCP sessions
- NetBird reconnects, but established SSH/RDP sessions are lost
Root cause: NetBird’s network monitor watches for default route changes and does not distinguish between physical network changes and VPN virtual adapter route additions.
The --network-monitor=false Workaround
Section titled “The --network-monitor=false Workaround”| Scenario | With network-monitor (default) | With —network-monitor=false |
|---|---|---|
| Switch Wi-Fi to Ethernet | Auto-reconnects in seconds | May take 25+ seconds |
| Switch between Wi-Fi networks | Auto-reconnects in seconds | May lose connection; manual toggle needed |
| VPN connects/disconnects | Interface restart (the bug) | No disruption (desired behavior) |
| Resume from sleep/hibernate | Quick reconnection | May require manual reconnection |
Assessment for GSISG migration: Side effects are manageable. Most users are on stable office Ethernet or home Wi-Fi. The 25-second WireGuard keepalive timeout provides passive reconnection.
PR Status (March 2026)
Section titled “PR Status (March 2026)”| PR | Status | Notes |
|---|---|---|
| #5155 | CLOSED (not merged) | Test failures and regression risk |
| #5156 | OPEN (not merged) | Passed quality gate; awaiting maintainer approval |
No version includes the fix yet. --network-monitor=false remains the only workaround.
Routing and DNS Coexistence
Section titled “Routing and DNS Coexistence”The routes do NOT conflict:
- GlobalProtect: Corporate subnets (e.g., 10.x.x.x)
- NetBird: Overlay network (100.64.0.0/10) + configured routes
NetBird does NOT add a default route (unless exit node is configured). If GP is in full-tunnel mode (0.0.0.0/0), add NetBird management server IP to GP’s split-tunnel exclusion list.
For DNS: configure NetBird with match-domain DNS for the AD domain; leave primary DNS to GP or the system default.
Recommended Migration Sequence
Section titled “Recommended Migration Sequence”Phase 1: Install NetBird (GP Remains Primary) — Week 1-2
Section titled “Phase 1: Install NetBird (GP Remains Primary) — Week 1-2”- Deploy management server, configure Entra ID
- Install NetBird on pilot machines with
--network-monitor=false - Verify overlay connectivity; GP handles all production traffic
Phase 2: Configure Routing (GP Still Active) — Week 2-3
Section titled “Phase 2: Configure Routing (GP Still Active) — Week 2-3”- Deploy routing peers at both offices
- Pilot users have TWO paths to resources
- Test disconnecting GP on pilot machines
Phase 3: Expand to All Users (GP as Fallback) — Week 3-5
Section titled “Phase 3: Expand to All Users (GP as Fallback) — Week 3-5”- Deploy via TacticalRMM to all endpoints
- GP remains installed and functional
Phase 4: Disable GP (NetBird Primary) — Week 5-8
Section titled “Phase 4: Disable GP (NetBird Primary) — Week 5-8”- Disable GP auto-connect (do NOT uninstall)
- Remove
--network-monitor=falseif PR #5156 has merged - Monitor for 2 weeks
Phase 5: Decommission GP — Week 8+
Section titled “Phase 5: Decommission GP — Week 8+”- Uninstall GP from all endpoints
- Remove
--network-monitor=falseflag - Power down PA-2020 (retain 30 days before decommission)
Critical Rules:
- NEVER remove GP before NetBird is verified for all users
- Always use
--network-monitor=falsewhile both VPNs installed - Test rollback before expanding beyond pilot
- Field workers on cellular should be in a later wave
- Keep PA-2020 powered on for 30 days after full migration
Gaps & Uncertainties
Section titled “Gaps & Uncertainties”| Gap | Impact | Mitigation |
|---|---|---|
| PR #5156 merge timeline | MEDIUM | Flag is safe to run indefinitely |
| SQLite events.db growth rate | LOW | Implement cleanup cron job |
| GP full-tunnel vs split-tunnel mode | MEDIUM | Ask GSISG IT admin |
| Exact Azure reserved pricing | LOW | Use Azure Pricing Calculator |
Sources
Section titled “Sources”GitHub: #5077, #5155, #5156, #4488, #1473, #1824
Azure Pricing: cloudprice.net, Azure bandwidth pricing, Azure managed disks pricing
Community: HN NetBird discussion, carlpearson.net self-hosting guide, Cloudron forum
Official: docs.netbird.io (scaling, NAT, how-netbird-works, CLI reference)