Skip to content

Azure Sizing & GP Coexistence

Azure VM Sizing and GlobalProtect Coexistence

Section titled “Azure VM Sizing and GlobalProtect Coexistence”

B2s (2 vCPU / 4 GB RAM) is the correct VM size for the NetBird management server. B1ms (1 vCPU / 2 GB) meets the official minimum for the management plane but is risky under relay load or mass reconnection events. The extra $15/month eliminates scaling risk.

SQLite is adequate for 100-150 peers with modest policy sets. PostgreSQL becomes necessary at 300+ peers. A separate relay server is unnecessary — the embedded relay handles the ~5-15% of connections that require relaying. Total Azure cost: ~$39/month pay-as-you-go or ~$28/month with 1-year reserved instance.

Regarding GlobalProtect coexistence during migration: the issue is real but well-understood. GP triggers NetBird’s network monitor to restart the WireGuard interface, killing active sessions. The workaround (--network-monitor=false) is safe with manageable side effects. Both VPNs CAN run simultaneously because their routes do not overlap (100.64.0.0/10 overlay vs. corporate subnets).


FactorB1ms (1 vCPU / 2 GB)B2s (2 vCPU / 4 GB)
Normal operationSufficientComfortable
Mass reconnection eventCPU-constrained; peers queueHandles gracefully
Relay under loadRisk of CPU starvationAdequate headroom
Future growth (150-250 peers)Requires migrationHandles without changes
Docker overhead (4 containers)Tight on 2 GB RAMComfortable on 4 GB RAM
Monthly cost (pay-as-you-go)~$15.11~$30.37
Cost differenceBaseline+$15.23/mo ($183/yr)

Management plane only: B1ms is sufficient. The management server is a lightweight Go binary. Official docs confirm “1 CPU and 2 GB of memory” as the minimum. One user reports running 1,000 active users successfully.

With relay traffic: B1ms becomes risky. The v0.62+ QUIC relay is more CPU-efficient than the old Coturn relay, but relay spikes during network events can saturate a single vCPU.

Policy complexity matters more than peer count. With simple policies (5-15 for GSISG), compute overhead is trivial. The pathological case (70 peers, 480 policies requiring 8 vCPU) does not apply here.

ComponentCPURAM
Management server (peer sync)~5-15% of 1 vCPU~200-400 MB
Signal server<5% of 1 vCPU~50-100 MB
Embedded relay0-30% of 1 vCPU~50-150 MB
Dashboard + Traefik<5% of 1 vCPU~100-200 MB
Total (normal)~15-50% of 1 vCPU~400-850 MB
Total (peak, all reconnecting)~80-100% of 1 vCPU~1-1.5 GB

A separate relay server is not needed for GSISG’s deployment.

GSISG’s network profile:

User CategoryCountExpected Connection Type
Office workers (Honolulu)~60-70P2P in most cases
Office workers (Boulder)~20-30P2P in most cases
Remote workers (home)~20-30P2P (home router = easy NAT)
Field workers (cellular)~5-10Likely relayed (CGNAT)

Estimated relay percentage: 5-15% of active connections. At peak: 1-3 simultaneous relayed connections at 2-5 Mbps each = 5-15 Mbps total relay bandwidth. Well within B2s capacity.

A separate relay makes sense only at >50 peers simultaneously relaying, geographic distribution across Azure regions, or >100 Mbps sustained relay bandwidth. None of these apply to GSISG.


SQLite is fine for 100-150 peers. PostgreSQL is overkill.

The primary SQLite concern is events.db growth — event logging can grow to multiple GB within months with 100+ peers connecting/disconnecting daily. Mitigation: periodic archive/truncate via cron job.

ThresholdAction
<150 peers, <20 policiesStay on SQLite
150-300 peersConsider PostgreSQL or events cleanup
300+ peers OR 100+ policiesMigrate to PostgreSQL

ComponentSpecificationMonthlyAnnual
VM: Standard_B2s2 vCPU, 4 GB RAM, Linux$30.37$364.44
OS Disk: P4 Premium SSD32 GB$5.28$63.36
Public IP: Standard StaticRequired for peer connectivity$3.65$43.80
Bandwidth (egress)2-6 GB/mo (within 100 GB free tier)$0.00$0.00
Total (pay-as-you-go)~$39.30~$471.60
Total (1-yr reserved)~$28.06~$336.72

When a user activates GlobalProtect while NetBird is running:

  1. GP creates “PANGP Virtual Ethernet Adapter Secure” and adds routes
  2. NetBird’s network monitor detects the route change as a “significant network change”
  3. NetBird restarts the WireGuard interface, destroying all active TCP sessions
  4. NetBird reconnects, but established SSH/RDP sessions are lost

Root cause: NetBird’s network monitor watches for default route changes and does not distinguish between physical network changes and VPN virtual adapter route additions.

ScenarioWith network-monitor (default)With —network-monitor=false
Switch Wi-Fi to EthernetAuto-reconnects in secondsMay take 25+ seconds
Switch between Wi-Fi networksAuto-reconnects in secondsMay lose connection; manual toggle needed
VPN connects/disconnectsInterface restart (the bug)No disruption (desired behavior)
Resume from sleep/hibernateQuick reconnectionMay require manual reconnection

Assessment for GSISG migration: Side effects are manageable. Most users are on stable office Ethernet or home Wi-Fi. The 25-second WireGuard keepalive timeout provides passive reconnection.

PRStatusNotes
#5155CLOSED (not merged)Test failures and regression risk
#5156OPEN (not merged)Passed quality gate; awaiting maintainer approval

No version includes the fix yet. --network-monitor=false remains the only workaround.

The routes do NOT conflict:

  • GlobalProtect: Corporate subnets (e.g., 10.x.x.x)
  • NetBird: Overlay network (100.64.0.0/10) + configured routes

NetBird does NOT add a default route (unless exit node is configured). If GP is in full-tunnel mode (0.0.0.0/0), add NetBird management server IP to GP’s split-tunnel exclusion list.

For DNS: configure NetBird with match-domain DNS for the AD domain; leave primary DNS to GP or the system default.


Phase 1: Install NetBird (GP Remains Primary) — Week 1-2

Section titled “Phase 1: Install NetBird (GP Remains Primary) — Week 1-2”
  • Deploy management server, configure Entra ID
  • Install NetBird on pilot machines with --network-monitor=false
  • Verify overlay connectivity; GP handles all production traffic

Phase 2: Configure Routing (GP Still Active) — Week 2-3

Section titled “Phase 2: Configure Routing (GP Still Active) — Week 2-3”
  • Deploy routing peers at both offices
  • Pilot users have TWO paths to resources
  • Test disconnecting GP on pilot machines

Phase 3: Expand to All Users (GP as Fallback) — Week 3-5

Section titled “Phase 3: Expand to All Users (GP as Fallback) — Week 3-5”
  • Deploy via TacticalRMM to all endpoints
  • GP remains installed and functional

Phase 4: Disable GP (NetBird Primary) — Week 5-8

Section titled “Phase 4: Disable GP (NetBird Primary) — Week 5-8”
  • Disable GP auto-connect (do NOT uninstall)
  • Remove --network-monitor=false if PR #5156 has merged
  • Monitor for 2 weeks
  • Uninstall GP from all endpoints
  • Remove --network-monitor=false flag
  • Power down PA-2020 (retain 30 days before decommission)

Critical Rules:

  1. NEVER remove GP before NetBird is verified for all users
  2. Always use --network-monitor=false while both VPNs installed
  3. Test rollback before expanding beyond pilot
  4. Field workers on cellular should be in a later wave
  5. Keep PA-2020 powered on for 30 days after full migration

GapImpactMitigation
PR #5156 merge timelineMEDIUMFlag is safe to run indefinitely
SQLite events.db growth rateLOWImplement cleanup cron job
GP full-tunnel vs split-tunnel modeMEDIUMAsk GSISG IT admin
Exact Azure reserved pricingLOWUse Azure Pricing Calculator

GitHub: #5077, #5155, #5156, #4488, #1473, #1824

Azure Pricing: cloudprice.net, Azure bandwidth pricing, Azure managed disks pricing

Community: HN NetBird discussion, carlpearson.net self-hosting guide, Cloudron forum

Official: docs.netbird.io (scaling, NAT, how-netbird-works, CLI reference)