Blog
Enterprise
Oct 3, 2025
·
20
minutes read

Setting Up Production-Ready Windows VMs for RPA: The Complete Guide

You've built your RPA script. It works beautifully on your local machine. Now you need to deploy it to a VM that runs 24/7, handles failures gracefully, and doesn't require you to RDP in every morning to check if it's still alive.

Here's how to set up Windows VMs for RPA the right way with proper networking, monitoring, health checks, and bulletproof process management.

Part 1: Provisioning Your Windows VM

Choosing Your Cloud Provider and Instance Type

For RPA workloads, you want:

  • Windows Server 2019 or 2022 (not Windows 10/11 licensing gets expensive)
  • At least 2 vCPUs and 4GB RAM (more if running heavy applications)
  • 50GB+ storage (applications, logs, recordings accumulate fast)
  • A static public IP or stable hostname

Provider recommendations:

AWS EC2: t3.medium or t3.large Windows instances. Easy spot instance support for cost savings.

Azure: B2s or B2ms VMs. Native Windows environment, good RDP experience.

Google Cloud: e2-medium with Windows Server. Solid performance, competitive pricing.

DigitalOcean: Simple droplets starting at $24/month for Windows. Great for smaller workloads.

Initial VM Configuration

Once your VM is running, here's the setup checklist:

1. Set a strong password and enable RDP access (temporarily)

# From your local machine, RDP in
mstsc /v:your-vm-ip

2. Windows Updates (get this pain over with early)

# Check for updates
Start-Process ms-settings:windowsupdate

# Or via PowerShell
Install-Module PSWindowsUpdate
Get-WindowsUpdate
Install-WindowsUpdate -AcceptAll -AutoReboot

3. Install essential software

# Install Chocolatey (Windows package manager)
Set-ExecutionPolicy Bypass -Scope Process -Force
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072
iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))

# Install Python
choco install python -y

# Install Git
choco install git -y

# Refresh environment variables
refreshenv

# Verify installations
python --version
git --version

4. Disable unnecessary services to save resources

# Disable Windows Search (RPA VMs don't need it)
Stop-Service "WSearch" -Force
Set-Service "WSearch" -StartupType Disabled

# Disable Windows Update during business hours (configure maintenance windows instead)
# We'll handle updates manually during off-hours

5. Configure Windows Firewall

# We'll open port 5000 for our Flask API (change as needed)
New-NetFirewallRule -DisplayName "RPA API" -Direction Inbound -Protocol TCP -LocalPort 5000 -Action Allow

# Verify the rule
Get-NetFirewallRule -DisplayName "RPA API"

6. Set up a dedicated service account

# Create a service account for running RPA scripts
$Password = ConvertTo-SecureString "YourStrongPassword123!" -AsPlainText -Force
New-LocalUser "RPAService" -Password $Password -FullName "RPA Service Account" -Description "Account for running RPA automations"

# Add to appropriate groups
Add-LocalGroupMember -Group "Users" -Member "RPAService"

# Grant logon as service right (needed for NSSM later)
# This requires secpol.msc or a script - we'll handle it when setting up NSSM

Part 2: Building the Control API with Flask

Your RPA scripts shouldn't just run blindly. You need a way to:

  • Start/stop scripts remotely
  • Check if the script is running
  • View recent logs
  • Get health status

Create a Flask API to expose these capabilities securely.

Setting Up the Flask Application

Create a project structure:

C:\RPA\
├── api\
│   ├── app.py
│   ├── config.py
│   ├── auth.py
│   └── requirements.txt
├── scripts\
│   ├── your_rpa_script.py
│   └── process_invoices.py
├── logs\
└── data\

requirements.txt:

flask==3.0.0
flask-cors==4.0.0
python-dotenv==1.0.0
opentelemetry-api==1.21.0
opentelemetry-sdk==1.21.0
opentelemetry-exporter-otlp==1.21.0
opentelemetry-instrumentation-flask==0.42b0
psutil==5.9.6

config.py:

import os
from dotenv import load_dotenv

load_dotenv()

class Config:
   # Security
   API_KEY = os.getenv('API_KEY', 'change-this-in-production')
   SECRET_KEY = os.getenv('SECRET_KEY', 'another-secret-key')
   
   # Paths
   SCRIPTS_DIR = r'C:\RPA\scripts'
   LOGS_DIR = r'C:\RPA\logs'
   
   # OpenTelemetry
   OTEL_ENDPOINT = os.getenv('OTEL_ENDPOINT', 'http://localhost:4318')
   SERVICE_NAME = os.getenv('SERVICE_NAME', 'rpa-vm-api')

auth.py:

from functools import wraps
from flask import request, jsonify
from config import Config

def require_api_key(f):
   """Decorator to require API key authentication"""
   @wraps(f)
   def decorated_function(*args, **kwargs):
       api_key = request.headers.get('X-API-Key')
       
       if not api_key:
           return jsonify({'error': 'API key required'}), 401
       
       if api_key != Config.API_KEY:
           return jsonify({'error': 'Invalid API key'}), 403
       
       return f(*args, **kwargs)
   
   return decorated_function

app.py:

from flask import Flask, jsonify, request
from flask_cors import CORS
import subprocess
import psutil
import os
import json
from datetime import datetime
from pathlib import Path
import logging

from config import Config
from auth import require_api_key

# OpenTelemetry setup
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.sdk.resources import Resource

# Initialize OpenTelemetry
resource = Resource.create({"service.name": Config.SERVICE_NAME})

# Tracing
trace_provider = TracerProvider(resource=resource)
otlp_trace_exporter = OTLPSpanExporter(endpoint=Config.OTEL_ENDPOINT)
trace_provider.add_span_processor(BatchSpanProcessor(otlp_trace_exporter))
trace.set_tracer_provider(trace_provider)

# Metrics
metric_reader = PeriodicExportingMetricReader(
   OTLPMetricExporter(endpoint=Config.OTEL_ENDPOINT)
)
metrics.set_meter_provider(MeterProvider(resource=resource, metric_readers=[metric_reader]))

# Flask app
app = Flask(__name__)
app.config.from_object(Config)
CORS(app)

# Instrument Flask with OpenTelemetry
FlaskInstrumentor().instrument_app(app)

# Set up logging
logging.basicConfig(
   level=logging.INFO,
   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
   handlers=[
       logging.FileHandler(os.path.join(Config.LOGS_DIR, 'api.log')),
       logging.StreamHandler()
   ]
)
logger = logging.getLogger(__name__)

# Track running processes
running_processes = {}

@app.route('/health', methods=['GET'])
def health_check():
   """
   Public health check endpoint
   Returns system status and basic metrics
   """
   try:
       # Get system metrics
       cpu_percent = psutil.cpu_percent(interval=1)
       memory = psutil.virtual_memory()
       disk = psutil.disk_usage('C:\\')
       
       # Check if any scripts are running
       scripts_running = len(running_processes)
       
       health_status = {
           'status': 'healthy',
           'timestamp': datetime.now().isoformat(),
           'system': {
               'cpu_percent': cpu_percent,
               'memory_percent': memory.percent,
               'memory_available_gb': round(memory.available / (1024**3), 2),
               'disk_percent': disk.percent,
               'disk_free_gb': round(disk.free / (1024**3), 2)
           },
           'rpa': {
               'scripts_running': scripts_running,
               'scripts': list(running_processes.keys())
           }
       }
       
       # Set unhealthy if resources are critically low
       if cpu_percent > 95 or memory.percent > 95 or disk.percent > 90:
           health_status['status'] = 'unhealthy'
           health_status['warnings'] = []
           
           if cpu_percent > 95:
               health_status['warnings'].append('CPU usage critical')
           if memory.percent > 95:
               health_status['warnings'].append('Memory usage critical')
           if disk.percent > 90:
               health_status['warnings'].append('Disk space low')
       
       status_code = 200 if health_status['status'] == 'healthy' else 503
       return jsonify(health_status), status_code
       
   except Exception as e:
       logger.error(f"Health check failed: {e}")
       return jsonify({
           'status': 'unhealthy',
           'error': str(e),
           'timestamp': datetime.now().isoformat()
       }), 503

@app.route('/api/scripts', methods=['GET'])
@require_api_key
def list_scripts():
   """List all available RPA scripts"""
   try:
       scripts_dir = Path(Config.SCRIPTS_DIR)
       scripts = []
       
       for script_file in scripts_dir.glob('*.py'):
           # Get script metadata if available
           script_info = {
               'name': script_file.stem,
               'filename': script_file.name,
               'path': str(script_file),
               'size_kb': round(script_file.stat().st_size / 1024, 2),
               'modified': datetime.fromtimestamp(script_file.stat().st_mtime).isoformat(),
               'is_running': script_file.stem in running_processes
           }
           scripts.append(script_info)
       
       return jsonify({
           'scripts': scripts,
           'count': len(scripts)
       })
       
   except Exception as e:
       logger.error(f"Error listing scripts: {e}")
       return jsonify({'error': str(e)}), 500

@app.route('/api/scripts/<script_name>/start', methods=['POST'])
@require_api_key
def start_script(script_name):
   """Start an RPA script"""
   try:
       if script_name in running_processes:
           return jsonify({
               'error': f'Script {script_name} is already running',
               'pid': running_processes[script_name]['pid']
           }), 400
       
       script_path = Path(Config.SCRIPTS_DIR) / f'{script_name}.py'
       
       if not script_path.exists():
           return jsonify({'error': f'Script {script_name} not found'}), 404
       
       # Get parameters from request
       params = request.json.get('parameters', {}) if request.json else {}
       
       # Start the script as a subprocess
       cmd = ['python', str(script_path)]
       
       # Add parameters as command line args
       for key, value in params.items():
           cmd.extend([f'--{key}', str(value)])
       
       logger.info(f"Starting script: {script_name} with command: {' '.join(cmd)}")
       
       process = subprocess.Popen(
           cmd,
           stdout=subprocess.PIPE,
           stderr=subprocess.PIPE,
           cwd=Config.SCRIPTS_DIR
       )
       
       # Track the process
       running_processes[script_name] = {
           'pid': process.pid,
           'started_at': datetime.now().isoformat(),
           'process': process,
           'parameters': params
       }
       
       logger.info(f"Script {script_name} started with PID {process.pid}")
       
       return jsonify({
           'message': f'Script {script_name} started successfully',
           'pid': process.pid,
           'started_at': running_processes[script_name]['started_at']
       }), 200
       
   except Exception as e:
       logger.error(f"Error starting script {script_name}: {e}")
       return jsonify({'error': str(e)}), 500

@app.route('/api/scripts/<script_name>/stop', methods=['POST'])
@require_api_key
def stop_script(script_name):
   """Stop a running RPA script"""
   try:
       if script_name not in running_processes:
           return jsonify({'error': f'Script {script_name} is not running'}), 400
       
       process_info = running_processes[script_name]
       process = process_info['process']
       
       # Try graceful termination first
       process.terminate()
       
       try:
           process.wait(timeout=10)
           logger.info(f"Script {script_name} terminated gracefully")
       except subprocess.TimeoutExpired:
           # Force kill if it doesn't terminate
           process.kill()
           logger.warning(f"Script {script_name} force killed")
       
       # Remove from tracking
       del running_processes[script_name]
       
       return jsonify({
           'message': f'Script {script_name} stopped successfully',
           'pid': process_info['pid']
       }), 200
       
   except Exception as e:
       logger.error(f"Error stopping script {script_name}: {e}")
       return jsonify({'error': str(e)}), 500

@app.route('/api/scripts/<script_name>/status', methods=['GET'])
@require_api_key
def script_status(script_name):
   """Get the status of a specific script"""
   try:
       if script_name not in running_processes:
           return jsonify({
               'script': script_name,
               'is_running': False
           })
       
       process_info = running_processes[script_name]
       process = process_info['process']
       
       # Check if process is still alive
       poll = process.poll()
       
       if poll is not None:
           # Process has finished
           del running_processes[script_name]
           return jsonify({
               'script': script_name,
               'is_running': False,
               'exit_code': poll
           })
       
       # Process is still running
       try:
           proc = psutil.Process(process_info['pid'])
           cpu_percent = proc.cpu_percent(interval=0.1)
           memory_mb = proc.memory_info().rss / (1024 * 1024)
           
           return jsonify({
               'script': script_name,
               'is_running': True,
               'pid': process_info['pid'],
               'started_at': process_info['started_at'],
               'parameters': process_info.get('parameters', {}),
               'resources': {
                   'cpu_percent': round(cpu_percent, 2),
                   'memory_mb': round(memory_mb, 2)
               }
           })
       except psutil.NoSuchProcess:
           del running_processes[script_name]
           return jsonify({
               'script': script_name,
               'is_running': False
           })
       
   except Exception as e:
       logger.error(f"Error getting script status: {e}")
       return jsonify({'error': str(e)}), 500

@app.route('/api/logs/<script_name>', methods=['GET'])
@require_api_key
def get_logs(script_name):
   """Get recent logs for a script"""
   try:
       log_file = Path(Config.LOGS_DIR) / f'{script_name}.log'
       
       if not log_file.exists():
           return jsonify({'error': f'Log file for {script_name} not found'}), 404
       
       # Get number of lines to return (default 100)
       lines = request.args.get('lines', 100, type=int)
       
       # Read last N lines
       with open(log_file, 'r') as f:
           all_lines = f.readlines()
           recent_lines = all_lines[-lines:] if len(all_lines) > lines else all_lines
       
       return jsonify({
           'script': script_name,
           'lines': recent_lines,
           'total_lines': len(all_lines)
       })
       
   except Exception as e:
       logger.error(f"Error reading logs: {e}")
       return jsonify({'error': str(e)}), 500

@app.route('/api/metrics', methods=['GET'])
@require_api_key
def get_metrics():
   """Get detailed system and RPA metrics"""
   try:
       # System metrics
       cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
       memory = psutil.virtual_memory()
       disk = psutil.disk_usage('C:\\')
       
       # Network I/O
       net_io = psutil.net_io_counters()
       
       # Running processes details
       scripts_detail = []
       for script_name, info in running_processes.items():
           try:
               proc = psutil.Process(info['pid'])
               scripts_detail.append({
                   'name': script_name,
                   'pid': info['pid'],
                   'cpu_percent': proc.cpu_percent(interval=0.1),
                   'memory_mb': round(proc.memory_info().rss / (1024 * 1024), 2),
                   'started_at': info['started_at']
               })
           except psutil.NoSuchProcess:
               pass
       
       return jsonify({
           'timestamp': datetime.now().isoformat(),
           'system': {
               'cpu_percent_per_core': cpu_percent,
               'cpu_percent_avg': round(sum(cpu_percent) / len(cpu_percent), 2),
               'memory': {
                   'total_gb': round(memory.total / (1024**3), 2),
                   'available_gb': round(memory.available / (1024**3), 2),
                   'used_gb': round(memory.used / (1024**3), 2),
                   'percent': memory.percent
               },
               'disk': {
                   'total_gb': round(disk.total / (1024**3), 2),
                   'used_gb': round(disk.used / (1024**3), 2),
                   'free_gb': round(disk.free / (1024**3), 2),
                   'percent': disk.percent
               },
               'network': {
                   'bytes_sent_mb': round(net_io.bytes_sent / (1024**2), 2),
                   'bytes_recv_mb': round(net_io.bytes_recv / (1024**2), 2)
               }
           },
           'rpa': {
               'scripts_running': len(running_processes),
               'scripts': scripts_detail
           }
       })
       
   except Exception as e:
       logger.error(f"Error getting metrics: {e}")
       return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
   # Ensure log directory exists
   os.makedirs(Config.LOGS_DIR, exist_ok=True)
   
   logger.info("Starting RPA Control API")
   
   # Run Flask app
   # In production, use a proper WSGI server like waitress
   app.run(host='0.0.0.0', port=5000, debug=False)

Create a .env file:

API_KEY=your-super-secret-api-key-change-this
SECRET_KEY=another-secret-key-for-flask
OTEL_ENDPOINT=http://localhost:4318
SERVICE_NAME=rpa-vm-api

Install dependencies:

cd C:\RPA\api
pip install -r requirements.txt

Test the API locally:

python app.py

Try the health check:

curl http://localhost:5000/health

Part 3: Making It Production-Ready with NSSM

NSSM (Non-Sucking Service Manager) turns your Python script into a proper Windows service that:

  • Starts automatically on boot
  • Restarts on failure
  • Runs in the background
  • Logs output properly

Installing and Configuring NSSM

1. Install NSSM:

choco install nssm -y

2. Create the service:

# Navigate to your API directory
cd C:\RPA\api

# Install as Windows service
nssm install RPAControlAPI "C:\Python311\python.exe" "C:\RPA\api\app.py"

# Configure service parameters
nssm set RPAControlAPI AppDirectory "C:\RPA\api"
nssm set RPAControlAPI DisplayName "RPA Control API"
nssm set RPAControlAPI Description "Flask API for controlling RPA scripts"

# Set startup type to automatic
nssm set RPAControlAPI Start SERVICE_AUTO_START

# Configure logging
nssm set RPAControlAPI AppStdout "C:\RPA\logs\api_stdout.log"
nssm set RPAControlAPI AppStderr "C:\RPA\logs\api_stderr.log"

# Rotate logs when they exceed 10MB
nssm set RPAControlAPI AppRotateFiles 1
nssm set RPAControlAPI AppRotateBytes 10485760

# Set environment variables
nssm set RPAControlAPI AppEnvironmentExtra "PYTHONUNBUFFERED=1"

# Configure restart on failure
nssm set RPAControlAPI AppExit Default Restart
nssm set RPAControlAPI AppRestartDelay 5000  # 5 second delay before restart

# Set throttle to prevent restart loops (max 3 restarts in 60 seconds)
nssm set RPAControlAPI AppThrottle 60000

# Start the service
nssm start RPAControlAPI

3. Verify the service is running:

# Check service status
nssm status RPAControlAPI

# Or use Windows services
Get-Service RPAControlAPI

# Check the logs
Get-Content C:\RPA\logs\api_stdout.log -Tail 20

4. Useful NSSM commands for management:

# Stop the service
nssm stop RPAControlAPI

# Restart the service
nssm restart RPAControlAPI

# Edit service configuration
nssm edit RPAControlAPI

# Remove service (if needed)
nssm remove RPAControlAPI confirm

Part 4: Securing HTTPS Access

Running on HTTP is fine for testing, but production needs HTTPS.

Option 1: Use a Reverse Proxy (Recommended)

Install nginx as a reverse proxy with SSL:

choco install nginx -y

Configure nginx (C:\tools\nginx\conf\nginx.conf):

worker_processes 1;

events {
   worker_connections 1024;
}

http {
   # Redirect HTTP to HTTPS
   server {
       listen 80;
       server_name your-vm-domain.com;
       return 301 https://$server_name$request_uri;
   }

   # HTTPS server
   server {
       listen 443 ssl;
       server_name your-vm-domain.com;

       ssl_certificate C:/RPA/ssl/cert.pem;
       ssl_certificate_key C:/RPA/ssl/key.pem;
       
       ssl_protocols TLSv1.2 TLSv1.3;
       ssl_ciphers HIGH:!aNULL:!MD5;

       location / {
           proxy_pass http://127.0.0.1:5000;
           proxy_set_header Host $host;
           proxy_set_header X-Real-IP $remote_addr;
           proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
           proxy_set_header X-Forwarded-Proto $scheme;
       }

       # Health check should be fast
       location /health {
           proxy_pass http://127.0.0.1:5000/health;
           access_log off;
       }
   }
}

Generate self-signed certificate (for testing):

# Create SSL directory
New-Item -ItemType Directory -Path C:\RPA\ssl -Force

# Generate certificate (requires OpenSSL - install via choco install openssl)
cd C:\RPA\ssl
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

For production, use Let's Encrypt (free SSL certificates):

choco install win-acme -y
# Follow the prompts to get a real SSL certificate

Install nginx as a service:

nssm install NginxProxy "C:\tools\nginx\nginx.exe"
nssm set NginxProxy AppDirectory "C:\tools\nginx"
nssm start NginxProxy

Update firewall rules:

# Open HTTPS port
New-NetFirewallRule -DisplayName "HTTPS" -Direction Inbound -Protocol TCP -LocalPort 443 -Action Allow

# Close direct access to Flask (only nginx should access it)
Remove-NetFirewallRule -DisplayName "RPA API"

Option 2: Use Waitress with SSL (Python-only solution)

Modify your Flask app to use Waitress WSGI server with SSL:

# app.py - add at the bottom
if __name__ == '__main__':
   from waitress import serve
   
   logger.info("Starting RPA Control API with Waitress")
   
   # For HTTPS, you need cert and key files
   serve(
       app,
       host='0.0.0.0',
       port=443,
       url_scheme='https',
       # These paths should point to your SSL certificate files
       # cert_file='C:/RPA/ssl/cert.pem',
       # key_file='C:/RPA/ssl/key.pem',
   )

Part 5: Setting Up Monitoring and Logging

Local Logging with OpenTelemetry Collector

Install the OpenTelemetry Collector to aggregate logs and metrics:

1. Download OpenTelemetry Collector:

# Create otel directory
New-Item -ItemType Directory -Path C:\RPA\otel -Force
cd C:\RPA\otel

# Download collector (check for latest version)
Invoke-WebRequest -Uri "https://github.com/open-telemetry/opentelemetry-collector-releases/releases/download/v0.91.0/otelcol_0.91.0_windows_amd64.tar.gz" -OutFile "otelcol.tar.gz"

# Extract (requires 7zip)
choco install 7zip -y
& "C:\Program Files\7-Zip\7z.exe" x otelcol.tar.gz
& "C:\Program Files\7-Zip\7z.exe" x otelcol.tar

2. Create OpenTelemetry Collector config (C:\RPA\otel\config.yaml):

receivers:
 otlp:
   protocols:
     grpc:
       endpoint: 0.0.0.0:4317
     http:
       endpoint: 0.0.0.0:4318

 # Host metrics
 hostmetrics:
   collection_interval: 30s
   scrapers:
     cpu:
     memory:
     disk:
     network:

processors:
 batch:
   timeout: 10s
   send_batch_size: 1024

 # Add resource attributes
 resource:
   attributes:
     - key: service.instance.id
       value: rpa-vm-01
       action: insert

exporters:
 # Log to file
 file:
   path: C:\RPA\logs\otel_traces.json

 # Send to external service (optional - configure your backend)
 # otlp:
 #   endpoint: your-observability-backend:4317

 # Prometheus metrics (optional)
 prometheus:
   endpoint: "0.0.0.0:8889"

 # Debug logging
 logging:
   loglevel: info

service:
 pipelines:
   traces:
     receivers: [otlp]
     processors: [batch, resource]
     exporters: [file, logging]

   metrics:
     receivers: [otlp, hostmetrics]
     processors: [batch, resource]
     exporters: [file, prometheus, logging]

 telemetry:
   logs:
     level: info

3. Install OpenTelemetry Collector as a service:

nssm install OTelCollector "C:\RPA\otel\otelcol.exe" "--config=C:\RPA\otel\config.yaml"
nssm set OTelCollector AppDirectory "C:\RPA\otel"
nssm set OTelCollector DisplayName "OpenTelemetry Collector"
nssm set OTelCollector Description "Collects telemetry data from RPA services"
nssm set OTelCollector Start SERVICE_AUTO_START

# Start the collector
nssm start OTelCollector

4. Verify OpenTelemetry is receiving data:

# Check the collector logs
Get-Content C:\RPA\logs\otel_traces.json -Tail 20

# Check Prometheus metrics endpoint
curl http://localhost:8889/metrics

External Monitoring Setup

For production, send your telemetry to a proper observability platform:

Option A: Self-hosted (Free)

  • Grafana + Prometheus + Loki: Full observability stack
  • Jaeger: Distributed tracing

Option B: Cloud Services

  • Datadog: Comprehensive but expensive
  • New Relic: Good free tier for small deployments
  • Grafana Cloud: Free tier includes metrics, logs, traces
  • Honeycomb: Excellent for debugging, generous free tier

Example: Configure for Grafana Cloud:

Update your config.yaml:

exporters:
 otlp:
   endpoint: otlp-gateway-prod-us-central-0.grafana.net:443
   headers:
     authorization: "Basic your-base64-encoded-credentials"

Setting Up Health Check Monitoring

Create a simple monitoring script that checks your API health:

C:\RPA\monitor\health_monitor.py:

import requests
import time
import logging
from datetime import datetime

logging.basicConfig(
   level=logging.INFO,
   format='%(asctime)s - %(levelname)s - %(message)s',
   handlers=[
       logging.FileHandler('C:/RPA/logs/health_monitor.log'),
       logging.StreamHandler()
   ]
)

API_URL = "https://localhost/health"
CHECK_INTERVAL = 60  # seconds
ALERT_THRESHOLD = 3  # consecutive failures before alert

consecutive_failures = 0

def check_health():
   global consecutive_failures
   
   try:
       response = requests.get(API_URL, verify=False, timeout=10)
       
       if response.status_code == 200:
           data = response.json()
           status = data.get('status')
           
           if status == 'healthy':
               logging.info(f"✓ Health check passed - CPU: {data['system']['cpu_percent']}%, Memory: {data['system']['memory_percent']}%")
               consecutive_failures = 0
           else:
               logging.warning(f"⚠ Health check returned unhealthy status: {data}")
               consecutive_failures += 1
       else:
           logging.error(f"✗ Health check failed with status code: {response.status_code}")
           consecutive_failures += 1
           
   except requests.exceptions.RequestException as e:
       logging.error(f"✗ Health check failed with exception: {e}")
       consecutive_failures += 1
   
   # Alert if threshold exceeded
   if consecutive_failures >= ALERT_THRESHOLD:
       send_alert(f"RPA API health check failed {consecutive_failures} times consecutively")

def send_alert(message):
   """Send alert via email, Slack, SMS, etc."""
   logging.critical(f"ALERT: {message}")
   
   # Example: Send to Slack webhook
   # slack_webhook = "your-slack-webhook-url"
   # requests.post(slack_webhook, json={'text': message})
   
   # Example: Send email
   # send_email(to="admin@company.com", subject="RPA VM Alert", body=message)

if __name__ == '__main__':
   logging.info("Starting health monitor")
   
   while True:
       check_health()
       time.sleep(CHECK_INTERVAL)

Install as a service:

nssm install RPAHealthMonitor "C:\Python311\python.exe" "C:\RPA\monitor\health_monitor.py"
nssm set RPAHealthMonitor AppDirectory "C:\RPA\monitor"
nssm set RPAHealthMonitor Start SERVICE_AUTO_START
nssm start RPAHealthMonitor

Part 6: Client Usage Examples

Now that your VM is set up, here's how to interact with it:

Python client:

import requests

class RPAClient:
   def __init__(self, base_url, api_key):
       self.base_url = base_url
       self.headers = {'X-API-Key': api_key}
   
   def health_check(self):
       response = requests.get(f"{self.base_url}/health")
       return response.json()
   
   def list_scripts(self):
       response = requests.get(
           f"{self.base_url}/api/scripts",
           headers=self.headers
       )
       return response.json()
   
   def start_script(self, script_name, parameters=None):
       response = requests.post(
           f"{self.base_url}/api/scripts/{script_name}/start",
           headers=self.headers,
           json={'parameters': parameters or {}}
       )
       return response.json()
   
   def stop_script(self, script_name):
       response = requests.post(
           f"{self.base_url}/api/scripts/{script_name}/stop",
           headers=self.headers
       )
       return response.json()
   
   def get_script_status(self, script_name):
       response = requests.get(
           f"{self.base_url}/api/scripts/{script_name}/status",
           headers=self.headers
       )
       return response.json()
   
   def get_logs(self, script_name, lines=100):
       response = requests.get(
           f"{self.base_url}/api/logs/{script_name}",
           headers=self.headers,
           params={'lines': lines}
       )
       return response.json()

# Usage
client = RPAClient(
   base_url="https://your-vm.example.com",
   api_key="your-super-secret-api-key"
)

# Check health
health = client.health_check()
print(f"VM Status: {health['status']}")

# Start a script
result = client.start_script('process_invoices', {'batch_size': 50})
print(f"Script started: {result}")

# Check status
status = client.get_script_status('process_invoices')
print(f"Is running: {status['is_running']}")

# Get logs
logs = client.get_logs('process_invoices', lines=50)
print('\n'.join(logs['lines'][-10:]))  # Last 10 lines

cURL examples:

# Health check (no auth required)
curl https://your-vm.example.com/health

# List scripts
curl -H "X-API-Key: your-api-key" https://your-vm.example.com/api/scripts

# Start script
curl -X POST \
 -H "X-API-Key: your-api-key" \
 -H "Content-Type: application/json" \
 -d '{"parameters": {"customer_id": "12345"}}' \
 https://your-vm.example.com/api/scripts/process_invoices/start

# Check status
curl -H "X-API-Key: your-api-key" \
 https://your-vm.example.com/api/scripts/process_invoices/status

# Stop script
curl -X POST \
 -H "X-API-Key: your-api-key" \
 https://your-vm.example.com/api/scripts/process_invoices/stop

Part 7: Maintenance and Best Practices

Daily checks:

  • Monitor health check endpoint
  • Review error logs
  • Check disk space usage
  • Verify scripts completed successfully

Weekly tasks:

  • Review system metrics for trends
  • Update RPA scripts if needed
  • Check for Windows updates (schedule during off-hours)
  • Verify backups are working

Monthly tasks:

  • Rotate API keys
  • Review and archive old logs
  • Update Python packages: pip list --outdated
  • Review and optimize resource usage

Backup strategy:

# Create a backup script
# C:\RPA\backup\backup.ps1

$timestamp = Get-Date -Format "yyyyMMdd_HHmmss"
$backupPath = "C:\RPA\backups\backup_$timestamp"

# Create backup directory
New-Item -ItemType Directory -Path $backupPath -Force

# Backup scripts
Copy-Item -Path "C:\RPA\scripts\*" -Destination "$backupPath\scripts\" -Recurse

# Backup API
Copy-Item -Path "C:\RPA\api\*" -Destination "$backupPath\api\" -Recurse

# Backup configs
Copy-Item -Path "C:\RPA\otel\config.yaml" -Destination "$backupPath\"
Copy-Item -Path "C:\RPA\api\.env" -Destination "$backupPath\"

# Compress backup
Compress-Archive -Path $backupPath -DestinationPath "$backupPath.zip"
Remove-Item -Path $backupPath -Recurse

Write-Host "Backup completed: $backupPath.zip"

# Optional: Upload to S3, Azure Blob, etc.

Schedule the backup:

# Run daily at 2 AM
$action = New-ScheduledTaskAction -Execute "PowerShell.exe" -Argument "-File C:\RPA\backup\backup.ps1"
$trigger = New-ScheduledTaskTrigger -Daily -At 2am
Register-ScheduledTask -Action $action -Trigger $trigger -TaskName "RPA Backup" -Description "Daily backup of RPA configuration"

Troubleshooting Common Issues

Service won't start:

# Check service status
nssm status RPAControlAPI

# View service logs
Get-Content C:\RPA\logs\api_stderr.log

# Test the script manually
cd C:\RPA\api
python app.py

Can't access API from external machine:

# Verify service is listening
netstat -ano | findstr "5000"

# Check firewall rules
Get-NetFirewallRule | Where-Object {$_.DisplayName -like "*RPA*"}

# Test locally first
curl http://localhost:5000/health

# Check cloud provider security groups (AWS, Azure, GCP)

High resource usage:

# Check what's using resources
Get-Process | Sort-Object CPU -Descending | Select-Object -First 10

# Restart services if needed
nssm restart RPAControlAPI

# Check for runaway RPA processes
Get-Process python

Final Thoughts

Setting up a production-ready RPA VM is more than just installing Python and running scripts. You need:

  • Proper networking: Secure HTTPS access with authentication
  • Service management: NSSM to ensure everything runs reliably
  • Health checks: Know when things break before they become critical
  • Monitoring: OpenTelemetry for observability and debugging
  • Maintenance: Regular backups, updates, and log rotation

This setup might seem like overkill when you're just getting started, but it pays dividends when you're running dozens of automations across multiple VMs. The API gives you programmatic control, the monitoring tells you when things break, and the service management keeps everything running even when Windows decides to update at 3 AM.

Build it right once, and you won't be the person RDPing into VMs at midnight to restart stuck scripts.