Blog
Enterprise
Sep 29, 2025
·
10
minutes read

The Real Guide to Building RPA Scripts That Actually Work

Robotic Process Automation (RPA) sounds simple in theory: automate repetitive tasks by mimicking human interactions with software. In practice? It's one of the most frustrating development challenges you'll face.

The hardest part isn't the coding it's figuring out the path. That reliable, repeatable sequence of steps that works every single time, regardless of window positions, system states, or cosmic alignment. I've spent countless hours debugging RPA scripts that worked perfectly yesterday but mysteriously fail today, and I've learned some hard-won lessons along the way.

Here's what actually works when building RPA automation.

1. Always Start and End in the Same State

This is the golden rule of RPA, and it will save you more debugging time than anything else.

The principle: Your script should begin with a clean slate and leave behind a clean slate. Close all windows, log out of applications, reset to a known starting position.

Why it matters: When your script fails at 3 AM (and it will), you need to know exactly where things went wrong. If Window A is still open from the previous run, and Window B opened in a different position, suddenly your entire automation is clicking the wrong buttons and chaos ensues.

In practice:

def run_automation():
   # Start: Close everything
   close_all_target_windows()
   ensure_desktop_is_clear()
   
   try:
       # Your actual automation steps
       open_application()
       navigate_to_form()
       enter_data()
       submit_form()
       export_results()
       
   finally:
       # End: Close everything (even if script failed)
       close_all_target_windows()
       return_to_home_position()

This pattern makes it immediately obvious when something breaks. If you see three windows open when there should be zero, you know which step failed to clean up. It also means you can re-run failed scripts without manual intervention to reset the environment.

2. Keyboard Shortcuts Over Mouse Coordinates (Always)

Here's a secret that will change your RPA game: legacy software has amazing keyboard support.

Developers in the '90s and early 2000s built applications for power users who lived on the keyboard. Every button, menu, and field usually has a keyboard shortcut or tab order. Modern developers forgot this, but your automation scripts shouldn't.

Why keyboard beats mouse:

  • Screen resolution changes? Keyboard still works.
  • Window moved slightly? Keyboard still works.
  • Someone RDPs in with different DPI settings? Keyboard still works.
  • Different monitor setup? You guessed it keyboard still works.

Techniques to discover keyboard shortcuts:

Press Alt in most Windows applications and watch the menu letters underline. Alt + F opens File, Alt + F + O might open a file. Alt + F4 closes windows (you probably knew that one).

Tab and Shift+Tab navigate between fields in a predictable order. Enter submits forms. Escape cancels dialogs.

Check the application's Help menu for keyboard shortcuts (yes, people still document these).

Actually read the menu text: "Save Copy" usually means Ctrl+C or Alt+C works.

Example transformation:

# Fragile (coordinate-based clicking)
pyautogui.click(x=450, y=320)  # Hope the button is still there!
time.sleep(0.5)
pyautogui.click(x=680, y=520)  # And this one too...

# Robust (keyboard-based navigation)
pyautogui.press('alt')
pyautogui.press('f')  # File menu
pyautogui.press('e')  # Export option
time.sleep(0.5)
pyautogui.press('enter')  # Confirm

The second version works regardless of window position, screen size, or DPI scaling. It's also more readable you can actually tell what it's doing.

3. The Clipboard Is Your Best Friend

Extracting data from legacy applications is painful. They don't have APIs, you can't access the database directly, and screen scraping is unreliable. Enter the clipboard.

The technique: Use Ctrl+C to copy data from the application, then read it programmatically from the clipboard.

import pyperclip

# Select the data field (using Tab navigation)
pyautogui.press('tab')
pyautogui.press('tab')
pyautogui.press('tab')  # Now on the invoice number field

# Copy to clipboard
pyautogui.hotkey('ctrl', 'a')  # Select all in field
pyautogui.hotkey('ctrl', 'c')  # Copy
time.sleep(0.1)  # Brief pause for clipboard

# Extract from clipboard
invoice_number = pyperclip.paste()
print(f"Extracted invoice: {invoice_number}")

Why this works:

  • Most applications support standard clipboard operations
  • You're getting the actual text value, not trying to OCR pixels
  • It's fast and reliable
  • You can paste data back in too (Ctrl+V)

Pro tip: Clear the clipboard before copying to ensure you're getting fresh data:

pyperclip.copy('')  # Clear clipboard
pyautogui.hotkey('ctrl', 'c')
time.sleep(0.1)
data = pyperclip.paste()

4. Queue Your Scripts One at a Time Only

Never, ever run multiple RPA scripts simultaneously on the same machine. Just don't.

Why: RPA scripts control the mouse and keyboard. Running two at once means they're fighting for control, clicking each other's windows, typing into the wrong fields, and generally creating an unholy mess.

The solution: Implement a simple queue system.

import queue
import threading
from pathlib import Path

# Simple file-based locking
LOCK_FILE = Path("rpa_running.lock")

def is_script_running():
   return LOCK_FILE.exists()

def acquire_lock():
   if is_script_running():
       raise Exception("Another RPA script is already running")
   LOCK_FILE.touch()

def release_lock():
   LOCK_FILE.unlink(missing_ok=True)

def run_with_lock(script_function):
   try:
       acquire_lock()
       script_function()
   finally:
       release_lock()

For more sophisticated setups, use a proper job queue (Celery, RQ, or even a simple database table with a "processing" flag). The key is ensuring only one automation runs at a time.

5. If You Must Use Coordinates, Maximize the Window First

Sometimes you can't avoid coordinate-based clicking. Maybe the application has no keyboard shortcuts, or the UI is too complex to navigate reliably with Tab.

The trick: Always maximize the window before clicking coordinates.

import pygetwindow as gw

# Find and maximize the target window
windows = gw.getWindowsWithTitle('Legacy ERP System')
if windows:
   window = windows[0]
   window.maximize()  # Now coordinates are consistent
   time.sleep(0.5)  # Wait for maximize animation
   
   # Now your coordinates are reliable
   pyautogui.click(x=450, y=320)

Maximizing ensures the window is always the same size, so your coordinates hit the same buttons. It's not perfect (different screen resolutions still cause issues), but it's far better than hoping the user left the window in the right position.

Better yet: Use relative coordinates from the window's position:

window = gw.getWindowsWithTitle('Legacy ERP System')[0]
window_x, window_y = window.left, window.top

# Click relative to window position
pyautogui.click(
   x=window_x + 200,  # 200 pixels from left edge of window
   y=window_y + 150   # 150 pixels from top edge of window
)

6. Jupyter Notebooks Are Your RPA Development Environment

This one surprised me, but Jupyter notebooks are fantastic for RPA development.

Why Jupyter works for RPA:

Step-by-step execution: Run each part of your automation one cell at a time. Did the login work? Great, move to the next step.

Easy replays: Messed up data entry? Just re-run that cell without starting over from scratch.

Interactive debugging: Check variables, test clipboard contents, inspect window states between steps.

Visual feedback: See screenshots and outputs inline as you develop.

Example workflow:

# Cell 1: Setup
import pyautogui
import pyperclip
import time

# Cell 2: Open application
def open_application():
   pyautogui.press('win')
   time.sleep(0.3)
   pyautogui.write('legacy_app.exe')
   pyautogui.press('enter')
   time.sleep(2)

open_application()

# Cell 3: Login (test this separately)
def login(username, password):
   pyautogui.write(username)
   pyautogui.press('tab')
   pyautogui.write(password)
   pyautogui.press('enter')
   time.sleep(1)

login('testuser', 'testpass')

# Cell 4: Navigate to form
# ... and so on

Run each cell, verify it worked, then move forward. When you're done, combine everything into a proper script with error handling.

Pro tip: Add screenshot cells to capture the state after important steps:

# Take a screenshot to verify we're in the right place
screenshot = pyautogui.screenshot()
display(screenshot)  # Shows inline in Jupyter

7. Break Your Script Into Granular, Verbose Steps

When you're developing RPA, resist the urge to write monolithic functions. Instead, break everything into small, descriptive steps.

Bad approach:

def process_invoice():
   # 50 lines of mixed actions
   # Good luck figuring out where it failed

Good approach:

def open_invoice_screen():
   """Navigate to the invoice entry screen from main menu"""
   logging.info("Opening invoice screen")
   press_alt_key_menu()
   select_accounts_receivable()
   select_invoice_entry()
   wait_for_invoice_screen_to_load()

def enter_customer_information(customer_id, customer_name):
   """Fill in customer details in the invoice form"""
   logging.info(f"Entering customer info for {customer_id}")
   focus_customer_id_field()
   type_customer_id(customer_id)
   tab_to_customer_name_field()
   verify_customer_name_matches(customer_name)

def add_invoice_line_item(item_code, quantity, price):
   """Add a single line item to the invoice"""
   logging.info(f"Adding line item: {item_code}")
   navigate_to_new_line_item_row()
   enter_item_code(item_code)
   enter_quantity(quantity)
   verify_price_populated(price)

Why this matters:

When the script fails, your logs tell you exactly which micro-step broke. "Failed at: verify_price_populated" is infinitely more useful than "Failed at: process_invoice".

You can test and replay individual steps in your Jupyter notebook without running the entire workflow.

Your code becomes self-documenting. Reading open_invoice_screen() followed by enter_customer_information() tells the story of what's happening.

When you need to modify the workflow (and you will), you can swap out individual steps without risking the entire script.

Pro tip: Your function names should read like instructions you'd give to a human. If you can't explain what a function does in its name, it's probably doing too much.

8. Label Everything When Using LLMs

If you're using AI tools to help generate or improve your RPA scripts (and you should be LLMs are surprisingly good at this), the quality of your output depends entirely on the quality of your input labels.

Don't do this:

# Sending raw coordinates to an LLM
events = [
   ("click", 450, 320),
   ("click", 680, 520),
   ("type", "12345"),
]

Do this instead:

# Labeled events that LLMs can understand
events = [
   ("click_button", x=450, y=320, label="Open File Menu"),
   ("click_button", x=680, y=520, label="Select Export Option"),
   ("type_in_field", text="12345", field="Invoice Number Field"),
]

When you feed labeled data to an LLM and ask it to generate functions, you get much better results:

Prompt: "Generate Python functions for these labeled RPA steps"

LLM Output:

def open_file_menu():
   """Opens the File menu in the application"""
   pyautogui.click(x=450, y=320)
   time.sleep(0.3)

def select_export_option():
   """Selects the Export option from the File menu"""
   pyautogui.click(x=680, y=520)
   time.sleep(0.5)

def enter_invoice_number(invoice_id):
   """Types the invoice number into the Invoice Number field"""
   pyautogui.write(str(invoice_id))

The LLM creates meaningful function names, adds docstrings, and structures the code logically all because you labeled your inputs clearly.

This also works in reverse: Record your manual actions with labels, then ask an LLM to generate the automation script. The labels provide the semantic meaning that transforms raw coordinates into understandable code.

9. OCR Is Unreliable Use Print-to-PDF Instead

OCR (Optical Character Recognition) seems like the obvious solution for extracting data from applications. Point it at the screen, read the text, done. But OCR in RPA contexts is frustratingly inconsistent.

The OCR problem:

  • Sometimes it's pixel-perfect
  • Sometimes it reads "Invoice" as "lnv0ice"
  • Font rendering, anti-aliasing, and DPI settings all affect accuracy
  • Small UI fonts are particularly problematic
  • It's slow compared to other extraction methods

The better solution: Most legacy applications have a "Print" function, even if they don't have proper export capabilities. And here's the key they almost always support printing to PDF.

Why Print-to-PDF works:

def extract_report_data_via_pdf():
   """Extract report by printing to PDF, then parsing the PDF"""
   
   # Navigate to the report screen
   open_monthly_report()
   
   # Trigger print dialog
   pyautogui.hotkey('ctrl', 'p')
   time.sleep(1)
   
   # Select PDF printer (Microsoft Print to PDF is installed by default on Windows)
   pyautogui.write('Microsoft Print to PDF')
   pyautogui.press('enter')
   time.sleep(0.5)
   
   # Set file location
   pdf_path = f"C:/temp/report_{timestamp}.pdf"
   pyautogui.write(pdf_path)
   pyautogui.press('enter')
   time.sleep(2)
   
   # Now parse the PDF with a proper library
   import PyPDF2
   with open(pdf_path, 'rb') as file:
       pdf_reader = PyPDF2.PdfReader(file)
       text = ""
       for page in pdf_reader.pages:
           text += page.extract_text()
   
   # Parse structured text (much more reliable than OCR)
   invoice_number = extract_invoice_from_text(text)
   return invoice_number

Why this is better than OCR:

  • PDF text extraction is deterministic and accurate
  • It's faster than OCR processing
  • The application handles the formatting you just parse structured text
  • Works regardless of screen resolution or font rendering

Bonus tip: Some applications let you "print" to text files or CSV formats. Explore the printer options you might find export capabilities hiding in plain sight.

When you must use OCR: If there's truly no way to print or export, use OCR but validate extensively. Compare OCR output against known values, implement confidence thresholds, and always have a human review workflow for low-confidence reads.

10. Stress Test Everything Weird Dialogs Always Appear

Your RPA script works perfectly 100 times in a row. You deploy it to production. On run 101, a dialog box appears that you've never seen before, and your script clicks right past it, wreaking havoc.

The reality of legacy software: These applications have accumulated decades of edge cases, error messages, confirmation dialogs, and warning popups. They appear based on data conditions, system states, or seemingly random cosmic events.

How to stress test:

Run with bad data: Invalid customer IDs, negative quantities, past dates, future dates, special characters, extremely long text strings.

Run with edge case data: Zero values, maximum values, null values, duplicate entries.

Run repeatedly: Run the same script 50, 100, 500 times consecutively. Those random dialogs will eventually appear.

Run during system load: Other users logged in, other processes running, low memory conditions.

Run across different system states: Right after reboot, after hours of uptime, during backups, during maintenance windows.

Example defensive coding:

def click_save_button_with_dialog_handling():
   """Click save and handle any possible confirmation dialogs"""
   logging.info("Clicking save button")
   pyautogui.hotkey('ctrl', 's')
   time.sleep(0.5)
   
   # Check for common dialogs that might appear
   dialogs_to_check = [
       ("Confirm Save", handle_confirm_save_dialog),
       ("Warning", handle_warning_dialog),
       ("Duplicate Entry", handle_duplicate_dialog),
       ("Connection Lost", handle_connection_lost_dialog),
   ]
   
   for dialog_title, handler in dialogs_to_check:
       if check_if_window_exists(dialog_title):
           logging.warning(f"Unexpected dialog appeared: {dialog_title}")
           handler()
           return
   
   # If we get here, save completed without dialogs
   logging.info("Save completed successfully")

Create a "dialog library": As you discover new popups during testing, add them to a shared handler module. Over time, you'll build a comprehensive collection of edge case handlers.

# dialog_handlers.py
def handle_network_timeout_dialog():
   """Handles 'Network Timeout' dialog by retrying"""
   pyautogui.press('enter')  # Click OK
   time.sleep(2)
   return "retry"

def handle_data_validation_error_dialog():
   """Handles validation errors by logging and skipping"""
   pyautogui.press('escape')  # Close dialog
   return "skip"

# Add more as you discover them...

Screenshot everything during stress testing: When weird dialogs appear, you need to see them. Automatically capture screenshots during test runs:

def stress_test_with_monitoring():
   for i in range(500):
       try:
           run_automation()
       except Exception as e:
           screenshot_path = f"stress_test_error_{i}_{timestamp}.png"
           pyautogui.screenshot(screenshot_path)
           logging.error(f"Run {i} failed: {e}")
           logging.error(f"Screenshot saved to {screenshot_path}")

Review these screenshots after your stress test. You'll discover dialogs you never knew existed.

Bonus Tips from the Trenches

Add generous sleep delays: That time.sleep(0.5) after every action? It's not optional. Applications need time to respond, especially legacy ones. Better to have a slow, reliable script than a fast one that randomly fails.

Log everything: Wrap every action in logging. When it breaks at 3 AM, you'll want to know exactly which step failed.

import logging

logging.info("Opening application...")
open_application()
logging.info("Application opened successfully")

logging.info("Logging in...")
login(user, pass)
logging.info("Login successful")

Take screenshots on failure: When something goes wrong, automatically capture what's on screen.

try:
   run_automation()
except Exception as e:
   timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
   pyautogui.screenshot(f'error_{timestamp}.png')
   logging.error(f"Automation failed: {e}")
   raise

Test on a fresh VM regularly: Your development machine has the application pre-configured, windows in perfect positions, and muscle memory for what "should" happen. Test on a clean machine to catch assumption errors.

Final Thoughts

Building RPA is hard because you're bridging the gap between deterministic code and the chaotic world of GUI applications. But by following these guidelines especially starting with a clean state, breaking scripts into verbose steps, and stress-testing relentlessly you can build automations that actually work reliably in production.

The path will still be hard to figure out. You'll still encounter bizarre dialogs and inexplicable failures. But with these techniques, you'll spend less time debugging and more time actually automating.

Remember: RPA is not elegant. It's duct tape and determination. Embrace the chaos, write defensive code, and keep your Jupyter notebook handy.