Robotic Process Automation (RPA) sounds simple in theory: automate repetitive tasks by mimicking human interactions with software. In practice? It's one of the most frustrating development challenges you'll face.
The hardest part isn't the coding it's figuring out the path. That reliable, repeatable sequence of steps that works every single time, regardless of window positions, system states, or cosmic alignment. I've spent countless hours debugging RPA scripts that worked perfectly yesterday but mysteriously fail today, and I've learned some hard-won lessons along the way.
Here's what actually works when building RPA automation.
This is the golden rule of RPA, and it will save you more debugging time than anything else.
The principle: Your script should begin with a clean slate and leave behind a clean slate. Close all windows, log out of applications, reset to a known starting position.
Why it matters: When your script fails at 3 AM (and it will), you need to know exactly where things went wrong. If Window A is still open from the previous run, and Window B opened in a different position, suddenly your entire automation is clicking the wrong buttons and chaos ensues.
In practice:
def run_automation():
# Start: Close everything
close_all_target_windows()
ensure_desktop_is_clear()
try:
# Your actual automation steps
open_application()
navigate_to_form()
enter_data()
submit_form()
export_results()
finally:
# End: Close everything (even if script failed)
close_all_target_windows()
return_to_home_position()
This pattern makes it immediately obvious when something breaks. If you see three windows open when there should be zero, you know which step failed to clean up. It also means you can re-run failed scripts without manual intervention to reset the environment.
Here's a secret that will change your RPA game: legacy software has amazing keyboard support.
Developers in the '90s and early 2000s built applications for power users who lived on the keyboard. Every button, menu, and field usually has a keyboard shortcut or tab order. Modern developers forgot this, but your automation scripts shouldn't.
Why keyboard beats mouse:
Techniques to discover keyboard shortcuts:
Press Alt
in most Windows applications and watch the menu letters underline. Alt + F
opens File, Alt + F + O
might open a file. Alt + F4
closes windows (you probably knew that one).
Tab
and Shift+Tab
navigate between fields in a predictable order. Enter
submits forms. Escape
cancels dialogs.
Check the application's Help menu for keyboard shortcuts (yes, people still document these).
Actually read the menu text: "Save Copy" usually means Ctrl+C
or Alt+C
works.
Example transformation:
# Fragile (coordinate-based clicking)
pyautogui.click(x=450, y=320) # Hope the button is still there!
time.sleep(0.5)
pyautogui.click(x=680, y=520) # And this one too...
# Robust (keyboard-based navigation)
pyautogui.press('alt')
pyautogui.press('f') # File menu
pyautogui.press('e') # Export option
time.sleep(0.5)
pyautogui.press('enter') # Confirm
The second version works regardless of window position, screen size, or DPI scaling. It's also more readable you can actually tell what it's doing.
Extracting data from legacy applications is painful. They don't have APIs, you can't access the database directly, and screen scraping is unreliable. Enter the clipboard.
The technique: Use Ctrl+C
to copy data from the application, then read it programmatically from the clipboard.
import pyperclip
# Select the data field (using Tab navigation)
pyautogui.press('tab')
pyautogui.press('tab')
pyautogui.press('tab') # Now on the invoice number field
# Copy to clipboard
pyautogui.hotkey('ctrl', 'a') # Select all in field
pyautogui.hotkey('ctrl', 'c') # Copy
time.sleep(0.1) # Brief pause for clipboard
# Extract from clipboard
invoice_number = pyperclip.paste()
print(f"Extracted invoice: {invoice_number}")
Why this works:
Ctrl+V
)Pro tip: Clear the clipboard before copying to ensure you're getting fresh data:
pyperclip.copy('') # Clear clipboard
pyautogui.hotkey('ctrl', 'c')
time.sleep(0.1)
data = pyperclip.paste()
Never, ever run multiple RPA scripts simultaneously on the same machine. Just don't.
Why: RPA scripts control the mouse and keyboard. Running two at once means they're fighting for control, clicking each other's windows, typing into the wrong fields, and generally creating an unholy mess.
The solution: Implement a simple queue system.
import queue
import threading
from pathlib import Path
# Simple file-based locking
LOCK_FILE = Path("rpa_running.lock")
def is_script_running():
return LOCK_FILE.exists()
def acquire_lock():
if is_script_running():
raise Exception("Another RPA script is already running")
LOCK_FILE.touch()
def release_lock():
LOCK_FILE.unlink(missing_ok=True)
def run_with_lock(script_function):
try:
acquire_lock()
script_function()
finally:
release_lock()
For more sophisticated setups, use a proper job queue (Celery, RQ, or even a simple database table with a "processing" flag). The key is ensuring only one automation runs at a time.
Sometimes you can't avoid coordinate-based clicking. Maybe the application has no keyboard shortcuts, or the UI is too complex to navigate reliably with Tab.
The trick: Always maximize the window before clicking coordinates.
import pygetwindow as gw
# Find and maximize the target window
windows = gw.getWindowsWithTitle('Legacy ERP System')
if windows:
window = windows[0]
window.maximize() # Now coordinates are consistent
time.sleep(0.5) # Wait for maximize animation
# Now your coordinates are reliable
pyautogui.click(x=450, y=320)
Maximizing ensures the window is always the same size, so your coordinates hit the same buttons. It's not perfect (different screen resolutions still cause issues), but it's far better than hoping the user left the window in the right position.
Better yet: Use relative coordinates from the window's position:
window = gw.getWindowsWithTitle('Legacy ERP System')[0]
window_x, window_y = window.left, window.top
# Click relative to window position
pyautogui.click(
x=window_x + 200, # 200 pixels from left edge of window
y=window_y + 150 # 150 pixels from top edge of window
)
This one surprised me, but Jupyter notebooks are fantastic for RPA development.
Why Jupyter works for RPA:
Step-by-step execution: Run each part of your automation one cell at a time. Did the login work? Great, move to the next step.
Easy replays: Messed up data entry? Just re-run that cell without starting over from scratch.
Interactive debugging: Check variables, test clipboard contents, inspect window states between steps.
Visual feedback: See screenshots and outputs inline as you develop.
Example workflow:
# Cell 1: Setup
import pyautogui
import pyperclip
import time
# Cell 2: Open application
def open_application():
pyautogui.press('win')
time.sleep(0.3)
pyautogui.write('legacy_app.exe')
pyautogui.press('enter')
time.sleep(2)
open_application()
# Cell 3: Login (test this separately)
def login(username, password):
pyautogui.write(username)
pyautogui.press('tab')
pyautogui.write(password)
pyautogui.press('enter')
time.sleep(1)
login('testuser', 'testpass')
# Cell 4: Navigate to form
# ... and so on
Run each cell, verify it worked, then move forward. When you're done, combine everything into a proper script with error handling.
Pro tip: Add screenshot cells to capture the state after important steps:
# Take a screenshot to verify we're in the right place
screenshot = pyautogui.screenshot()
display(screenshot) # Shows inline in Jupyter
When you're developing RPA, resist the urge to write monolithic functions. Instead, break everything into small, descriptive steps.
Bad approach:
def process_invoice():
# 50 lines of mixed actions
# Good luck figuring out where it failed
Good approach:
def open_invoice_screen():
"""Navigate to the invoice entry screen from main menu"""
logging.info("Opening invoice screen")
press_alt_key_menu()
select_accounts_receivable()
select_invoice_entry()
wait_for_invoice_screen_to_load()
def enter_customer_information(customer_id, customer_name):
"""Fill in customer details in the invoice form"""
logging.info(f"Entering customer info for {customer_id}")
focus_customer_id_field()
type_customer_id(customer_id)
tab_to_customer_name_field()
verify_customer_name_matches(customer_name)
def add_invoice_line_item(item_code, quantity, price):
"""Add a single line item to the invoice"""
logging.info(f"Adding line item: {item_code}")
navigate_to_new_line_item_row()
enter_item_code(item_code)
enter_quantity(quantity)
verify_price_populated(price)
Why this matters:
When the script fails, your logs tell you exactly which micro-step broke. "Failed at: verify_price_populated" is infinitely more useful than "Failed at: process_invoice".
You can test and replay individual steps in your Jupyter notebook without running the entire workflow.
Your code becomes self-documenting. Reading open_invoice_screen()
followed by enter_customer_information()
tells the story of what's happening.
When you need to modify the workflow (and you will), you can swap out individual steps without risking the entire script.
Pro tip: Your function names should read like instructions you'd give to a human. If you can't explain what a function does in its name, it's probably doing too much.
If you're using AI tools to help generate or improve your RPA scripts (and you should be LLMs are surprisingly good at this), the quality of your output depends entirely on the quality of your input labels.
Don't do this:
# Sending raw coordinates to an LLM
events = [
("click", 450, 320),
("click", 680, 520),
("type", "12345"),
]
Do this instead:
# Labeled events that LLMs can understand
events = [
("click_button", x=450, y=320, label="Open File Menu"),
("click_button", x=680, y=520, label="Select Export Option"),
("type_in_field", text="12345", field="Invoice Number Field"),
]
When you feed labeled data to an LLM and ask it to generate functions, you get much better results:
Prompt: "Generate Python functions for these labeled RPA steps"
LLM Output:
def open_file_menu():
"""Opens the File menu in the application"""
pyautogui.click(x=450, y=320)
time.sleep(0.3)
def select_export_option():
"""Selects the Export option from the File menu"""
pyautogui.click(x=680, y=520)
time.sleep(0.5)
def enter_invoice_number(invoice_id):
"""Types the invoice number into the Invoice Number field"""
pyautogui.write(str(invoice_id))
The LLM creates meaningful function names, adds docstrings, and structures the code logically all because you labeled your inputs clearly.
This also works in reverse: Record your manual actions with labels, then ask an LLM to generate the automation script. The labels provide the semantic meaning that transforms raw coordinates into understandable code.
OCR (Optical Character Recognition) seems like the obvious solution for extracting data from applications. Point it at the screen, read the text, done. But OCR in RPA contexts is frustratingly inconsistent.
The OCR problem:
The better solution: Most legacy applications have a "Print" function, even if they don't have proper export capabilities. And here's the key they almost always support printing to PDF.
Why Print-to-PDF works:
def extract_report_data_via_pdf():
"""Extract report by printing to PDF, then parsing the PDF"""
# Navigate to the report screen
open_monthly_report()
# Trigger print dialog
pyautogui.hotkey('ctrl', 'p')
time.sleep(1)
# Select PDF printer (Microsoft Print to PDF is installed by default on Windows)
pyautogui.write('Microsoft Print to PDF')
pyautogui.press('enter')
time.sleep(0.5)
# Set file location
pdf_path = f"C:/temp/report_{timestamp}.pdf"
pyautogui.write(pdf_path)
pyautogui.press('enter')
time.sleep(2)
# Now parse the PDF with a proper library
import PyPDF2
with open(pdf_path, 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
text = ""
for page in pdf_reader.pages:
text += page.extract_text()
# Parse structured text (much more reliable than OCR)
invoice_number = extract_invoice_from_text(text)
return invoice_number
Why this is better than OCR:
Bonus tip: Some applications let you "print" to text files or CSV formats. Explore the printer options you might find export capabilities hiding in plain sight.
When you must use OCR: If there's truly no way to print or export, use OCR but validate extensively. Compare OCR output against known values, implement confidence thresholds, and always have a human review workflow for low-confidence reads.
Your RPA script works perfectly 100 times in a row. You deploy it to production. On run 101, a dialog box appears that you've never seen before, and your script clicks right past it, wreaking havoc.
The reality of legacy software: These applications have accumulated decades of edge cases, error messages, confirmation dialogs, and warning popups. They appear based on data conditions, system states, or seemingly random cosmic events.
How to stress test:
Run with bad data: Invalid customer IDs, negative quantities, past dates, future dates, special characters, extremely long text strings.
Run with edge case data: Zero values, maximum values, null values, duplicate entries.
Run repeatedly: Run the same script 50, 100, 500 times consecutively. Those random dialogs will eventually appear.
Run during system load: Other users logged in, other processes running, low memory conditions.
Run across different system states: Right after reboot, after hours of uptime, during backups, during maintenance windows.
Example defensive coding:
def click_save_button_with_dialog_handling():
"""Click save and handle any possible confirmation dialogs"""
logging.info("Clicking save button")
pyautogui.hotkey('ctrl', 's')
time.sleep(0.5)
# Check for common dialogs that might appear
dialogs_to_check = [
("Confirm Save", handle_confirm_save_dialog),
("Warning", handle_warning_dialog),
("Duplicate Entry", handle_duplicate_dialog),
("Connection Lost", handle_connection_lost_dialog),
]
for dialog_title, handler in dialogs_to_check:
if check_if_window_exists(dialog_title):
logging.warning(f"Unexpected dialog appeared: {dialog_title}")
handler()
return
# If we get here, save completed without dialogs
logging.info("Save completed successfully")
Create a "dialog library": As you discover new popups during testing, add them to a shared handler module. Over time, you'll build a comprehensive collection of edge case handlers.
# dialog_handlers.py
def handle_network_timeout_dialog():
"""Handles 'Network Timeout' dialog by retrying"""
pyautogui.press('enter') # Click OK
time.sleep(2)
return "retry"
def handle_data_validation_error_dialog():
"""Handles validation errors by logging and skipping"""
pyautogui.press('escape') # Close dialog
return "skip"
# Add more as you discover them...
Screenshot everything during stress testing: When weird dialogs appear, you need to see them. Automatically capture screenshots during test runs:
def stress_test_with_monitoring():
for i in range(500):
try:
run_automation()
except Exception as e:
screenshot_path = f"stress_test_error_{i}_{timestamp}.png"
pyautogui.screenshot(screenshot_path)
logging.error(f"Run {i} failed: {e}")
logging.error(f"Screenshot saved to {screenshot_path}")
Review these screenshots after your stress test. You'll discover dialogs you never knew existed.
Add generous sleep delays: That time.sleep(0.5)
after every action? It's not optional. Applications need time to respond, especially legacy ones. Better to have a slow, reliable script than a fast one that randomly fails.
Log everything: Wrap every action in logging. When it breaks at 3 AM, you'll want to know exactly which step failed.
import logging
logging.info("Opening application...")
open_application()
logging.info("Application opened successfully")
logging.info("Logging in...")
login(user, pass)
logging.info("Login successful")
Take screenshots on failure: When something goes wrong, automatically capture what's on screen.
try:
run_automation()
except Exception as e:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
pyautogui.screenshot(f'error_{timestamp}.png')
logging.error(f"Automation failed: {e}")
raise
Test on a fresh VM regularly: Your development machine has the application pre-configured, windows in perfect positions, and muscle memory for what "should" happen. Test on a clean machine to catch assumption errors.
Building RPA is hard because you're bridging the gap between deterministic code and the chaotic world of GUI applications. But by following these guidelines especially starting with a clean state, breaking scripts into verbose steps, and stress-testing relentlessly you can build automations that actually work reliably in production.
The path will still be hard to figure out. You'll still encounter bizarre dialogs and inexplicable failures. But with these techniques, you'll spend less time debugging and more time actually automating.
Remember: RPA is not elegant. It's duct tape and determination. Embrace the chaos, write defensive code, and keep your Jupyter notebook handy.