If you've built RPA scripts for Windows applications, you've encountered this infuriating scenario: you open up your UI automation library (UIAutomation, pywinauto, or similar), inspect the window structure, and... half the elements are missing. Or they're there, but not accessible. Or they're accessible, but not where you think they are.
Welcome to the wonderful world of the Windows "DOM" tree, where not everything that's visible on screen is actually visible to your automation tools.
Modern web developers have it easy. The browser DOM is standardized, inspectable, and predictable. Every element you see on screen exists in the DOM tree, and you can access it programmatically.
Windows applications? Not so much.
The Windows accessibility tree (what UI automation libraries actually see) is a messy approximation of what's rendered on screen. Applications built with different frameworks expose their UI elements differently or sometimes not at all. A button you can clearly see and click might be completely invisible to your automation library because the developer didn't implement proper accessibility interfaces.
Common scenarios where UI elements are invisible:
You'll open UIAutomationSpy or Inspect.exe, hover over a perfectly visible button, and get... nothing. No element. No properties. Just a blank space where your automation dreams go to die.
When you can't see the element in the automation tree, remember that Windows still knows about it. The operating system manages focus and tab order for accessibility, even when automation libraries can't see the controls.
The Tab key is your friend:
Tab and Shift+Tab move focus between interactive elements in a predictable order. Even if you can't programmatically identify or click a button, you can usually Tab to it and press Enter.
import pyautogui
import time
def navigate_to_save_button_via_tabs():
"""Navigate to the Save button by tabbing, even though we can't see it in UI tree"""
# Start from a known position (e.g., first field in the form)
pyautogui.click(100, 100) # Click somewhere safe to reset focus
time.sleep(0.3)
# Tab to the Save button (discovered through manual testing)
# In this application, Save is 7 tabs from the start
for i in range(7):
pyautogui.press('tab')
time.sleep(0.1)
# Now we're focused on Save
pyautogui.press('enter')
time.sleep(0.5)
Discovery process: Manually tab through the application while counting. Document the tab order:
Tab 0: Customer ID field
Tab 1: Customer Name field
Tab 2: Address field
Tab 3: City field
Tab 4: State dropdown
Tab 5: Zip field
Tab 6: Cancel button
Tab 7: Save button ← Our target
Pro tips for tabbing:
Use Shift+Tab
to go backwards if you overshoot your target.
Press Ctrl+Home
or Ctrl+Tab
to reset to the beginning of a form or tab set.
Some applications let you press Alt
to highlight the currently focused element, helping you confirm you're in the right place.
Tab order usually follows visual top-to-bottom, left-to-right layout, but not always. Test it yourself.
Advantages:
Disadvantages:
Legacy Windows applications are keyboard shortcut goldmines. Before graphical UIs dominated, power users lived in the keyboard, and developers built comprehensive shortcut systems to accommodate them.
These shortcuts often work even when UI elements are completely invisible to automation tools, because they're handled at a different layer of the application.
Where to find keyboard shortcuts:
The Help menu: Look for "Keyboard Shortcuts" or "Hotkeys" documentation. Many applications have comprehensive lists.
Menu text: Watch for underlined letters. In "File → Save", the S is underlined, meaning Alt+F
, then S
triggers it.
Tooltips: Hover over buttons and menu items. Tooltips often show keyboard shortcuts like Ctrl+S
or F5
.
User manuals: Old-school applications came with PDF manuals. Download them and search for "keyboard" or "shortcut".
Trial and error: Try common patterns. Ctrl+S
for Save, Ctrl+N
for New, Ctrl+P
for Print, F1
for Help, F5
for Refresh.
Online communities: Search "[Application Name] keyboard shortcuts" or check forums where long-time users share their workflows.
Example shortcut discovery:
# Instead of trying to find and click invisible elements:
def export_report_the_hard_way():
find_file_menu() # Might not be visible to automation
find_export_option() # Definitely not visible
click_export_button() # Good luck with that
# Use keyboard shortcuts instead:
def export_report_via_keyboard():
"""Export report using discovered keyboard shortcuts"""
# Alt+F opens File menu (even if menu is custom-drawn)
pyautogui.hotkey('alt', 'f')
time.sleep(0.2)
# E triggers Export option (discovered from menu letter)
pyautogui.press('e')
time.sleep(0.3)
# Tab to filename field and enter name
pyautogui.press('tab')
pyautogui.write(f'report_{datetime.now().strftime("%Y%m%d")}.csv')
# Enter to confirm
pyautogui.press('enter')
time.sleep(1)
Advanced shortcut techniques:
Function keys: Many applications map F1-F12 to common actions. F2 often means "edit", F5 "refresh", F12 "save as".
Ctrl+Tab: Switches between tabs or panes in multi-panel applications.
Alt+Number: Some applications let you jump to the Nth tab with Alt+1
, Alt+2
, etc.
Ctrl+End / Ctrl+Home: Jump to the end or beginning of lists, documents, or forms.
Spacebar: Toggles checkboxes, activates focused buttons, or opens dropdowns.
Creating your own shortcut reference:
# shortcuts.py - Document everything you discover
LEGACY_ERP_SHORTCUTS = {
'open_customer': ('alt', 'c', 'o'), # Alt+C (Customer menu), O (Open)
'save': ('ctrl', 's'),
'save_and_close': ('ctrl', 'shift', 's'),
'export_to_excel': ('alt', 'f', 'e', 'x'), # File → Export → Excel
'print': ('ctrl', 'p'),
'find': ('ctrl', 'f'),
'next_record': 'f8',
'previous_record': 'f7',
'delete_current': 'delete',
'refresh_list': 'f5',
}
def execute_shortcut(shortcut_name):
"""Execute a documented keyboard shortcut"""
keys = LEGACY_ERP_SHORTCUTS.get(shortcut_name)
if isinstance(keys, tuple):
pyautogui.hotkey(*keys)
else:
pyautogui.press(keys)
time.sleep(0.3) # Wait for action to complete
This creates a maintainable library of shortcuts that work regardless of what your UI automation tools can see.
When UI elements are invisible, you can't inspect their values programmatically. But you can still copy them.
The clipboard technique becomes even more critical when dealing with inaccessible UI elements, because it's one of the few reliable ways to extract data.
Extracting data from invisible lists:
Imagine you need to extract 100 customer records from a list view, but the list control isn't accessible to your automation library. You can see it, but your code can't read it.
The clipboard solution:
import pyperclip
import time
def extract_customer_list_via_clipboard():
"""Extract all customers from an invisible list control"""
customers = []
# Click into the list area (coordinate-based, but just to focus)
pyautogui.click(400, 300)
time.sleep(0.3)
# Go to the top of the list
pyautogui.hotkey('ctrl', 'home')
time.sleep(0.2)
for i in range(100): # Assuming max 100 customers
# Select the current row (many lists support Shift+End to select entire row)
pyautogui.hotkey('shift', 'end')
time.sleep(0.1)
# Copy to clipboard
pyautogui.hotkey('ctrl', 'c')
time.sleep(0.1)
# Extract from clipboard
row_data = pyperclip.paste()
# Check if we've hit the end (empty or duplicate data)
if not row_data or row_data in customers:
break
customers.append(row_data)
print(f"Extracted row {i+1}: {row_data}")
# Move to next row
pyautogui.press('down')
time.sleep(0.1)
return customers
Parsing clipboard data:
Often, copied data comes with tab separators or specific formatting:
def parse_customer_row(clipboard_text):
"""Parse a customer row copied from the invisible list"""
# Example format: "12345\tJohn Smith\tjohn@example.com\t555-1234"
parts = clipboard_text.split('\t')
if len(parts) >= 4:
return {
'customer_id': parts[0],
'name': parts[1],
'email': parts[2],
'phone': parts[3]
}
return None
Clipboard data entry:
The clipboard works both ways. If you can't access input fields directly, you can often paste data into them:
def enter_customer_data_via_clipboard(customer_data):
"""Enter customer information when fields aren't accessible"""
# Navigate to customer name field (via Tab)
for i in range(3): # Name field is 3rd in tab order
pyautogui.press('tab')
time.sleep(0.1)
# Copy data to clipboard and paste
pyperclip.copy(customer_data['name'])
pyautogui.hotkey('ctrl', 'v')
time.sleep(0.2)
# Move to next field
pyautogui.press('tab')
time.sleep(0.1)
# Enter email via clipboard
pyperclip.copy(customer_data['email'])
pyautogui.hotkey('ctrl', 'v')
time.sleep(0.2)
Why clipboard works when automation fails:
The clipboard operates at the OS level, completely independent of the application's UI framework. Even if the application uses custom controls that hide from UI automation, it almost always respects standard clipboard operations like Ctrl+C
and Ctrl+V
.
Clipboard safety tips:
Always clear the clipboard before copying to ensure you get fresh data:
pyperclip.copy('')
pyautogui.hotkey('ctrl', 'c')
time.sleep(0.1)
data = pyperclip.paste()
Wait briefly after copying before reading the clipboard. Some applications take a moment to populate it.
Store the original clipboard contents and restore them after your script:
original_clipboard = pyperclip.paste()
# ... do your clipboard operations ...
pyperclip.copy(original_clipboard) # Restore
When tabbing, keyboard shortcuts, and clipboard tricks all fail, you're left with OCR (Optical Character Recognition). It's slow, it's unreliable, but sometimes it's the only option.
If you must use OCR, at least maximize your chances of success.
Zoom in before OCR:
Text size is the #1 factor in OCR accuracy. Small fonts with anti-aliasing are OCR nightmares. Large, crisp text is much more reliable.
import pytesseract
from PIL import Image
import pyautogui
def extract_invoice_number_via_ocr():
"""Extract invoice number when no other method works"""
# First, zoom in on the application if possible
# Many applications support Ctrl+Plus to zoom
for i in range(3): # Zoom in 3 levels
pyautogui.hotkey('ctrl', 'plus')
time.sleep(0.3)
# Wait for re-render
time.sleep(1)
# Take screenshot of the specific region where invoice number appears
# Smaller region = faster processing and better accuracy
screenshot = pyautogui.screenshot(region=(800, 200, 300, 50))
# Convert to grayscale for better OCR
screenshot = screenshot.convert('L')
# Increase contrast
from PIL import ImageEnhance
enhancer = ImageEnhance.Contrast(screenshot)
screenshot = enhancer.enhance(2.0)
# Run OCR
invoice_number = pytesseract.image_to_string(screenshot, config='--psm 7')
# Clean up the result
invoice_number = invoice_number.strip()
# Zoom back out
for i in range(3):
pyautogui.hotkey('ctrl', 'minus')
time.sleep(0.3)
return invoice_number
OCR optimization techniques:
Isolate the target: Screenshot only the specific area containing the text you need, not the entire screen.
Grayscale conversion: Color adds noise. Convert to grayscale before OCR.
Contrast enhancement: Make dark text darker and light backgrounds lighter.
Page segmentation modes: Tesseract's --psm
parameter controls how it interprets the image:
--psm 6
: Assume a single uniform block of text (default)--psm 7
: Treat the image as a single text line--psm 8
: Treat the image as a single word--psm 10
: Treat the image as a single characterWhitelist characters: If you know the text only contains certain characters, restrict OCR:
# Only digits for invoice numbers
result = pytesseract.image_to_string(img, config='--psm 7 -c tessedit_char_whitelist=0123456789')
Multiple OCR passes: Take several screenshots with slight delays, run OCR on each, and use voting or confidence scores to pick the best result.
Validate results: OCR makes mistakes. Implement validation:
def extract_and_validate_invoice_number():
"""Extract invoice number with validation"""
raw_ocr = extract_invoice_number_via_ocr()
# Validate format (example: invoice numbers are always 6 digits)
import re
match = re.search(r'\b\d{6}\b', raw_ocr)
if match:
return match.group(0)
else:
# OCR failed, try alternative method or flag for human review
logging.warning(f"OCR returned invalid invoice number: {raw_ocr}")
return None
When to use AI-powered OCR:
Modern AI OCR services (Google Cloud Vision, AWS Textract, Azure Computer Vision) are significantly more accurate than Tesseract for complex layouts, but they add latency and cost.
Consider AI OCR when:
from google.cloud import vision
def extract_text_with_google_vision():
"""Use Google Cloud Vision for better OCR accuracy"""
client = vision.ImageAnnotatorClient()
# Take screenshot
screenshot = pyautogui.screenshot(region=(800, 200, 300, 50))
screenshot.save('temp_ocr.png')
# Read image
with open('temp_ocr.png', 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
response = client.text_detection(image=image)
texts = response.text_annotations
if texts:
return texts[0].description
return None
OCR is still a last resort:
Even with all these optimizations, OCR is:
Exhaust all other options (Tab navigation, keyboard shortcuts, clipboard) before resorting to OCR.
Let's see how these strategies work together in practice.
Scenario: Extract customer data from a legacy CRM where the customer list control is completely invisible to UI automation libraries.
def extract_all_customers_from_legacy_crm():
"""
Combined approach to extract customer data when UI automation fails
"""
# Step 1: Navigate to customer list using keyboard shortcuts
logging.info("Opening customer list")
pyautogui.hotkey('alt', 'c') # Customer menu
time.sleep(0.2)
pyautogui.press('l') # List option
time.sleep(1)
# Step 2: Ensure we're at the top of the list
pyautogui.hotkey('ctrl', 'home')
time.sleep(0.3)
customers = []
max_customers = 500
for i in range(max_customers):
# Step 3: Try clipboard extraction first (fastest, most reliable)
try:
# Select current row
pyautogui.hotkey('shift', 'end')
time.sleep(0.1)
# Copy to clipboard
pyperclip.copy('') # Clear first
pyautogui.hotkey('ctrl', 'c')
time.sleep(0.1)
customer_data = pyperclip.paste()
# If clipboard worked, parse and store
if customer_data and customer_data not in [c['raw'] for c in customers]:
parsed = parse_customer_row(customer_data)
if parsed:
customers.append(parsed)
logging.info(f"Extracted customer {i+1} via clipboard: {parsed['name']}")
else:
# Clipboard gave us data but it's unparseable
# Fall back to OCR for this specific row
logging.warning(f"Clipboard data unparseable, trying OCR for row {i+1}")
customer_data = extract_customer_via_ocr()
if customer_data:
customers.append(customer_data)
else:
# Reached end of list or duplicate
break
except Exception as e:
logging.error(f"Error on row {i+1}: {e}")
# If clipboard totally fails, try OCR
try:
customer_data = extract_customer_via_ocr()
if customer_data:
customers.append(customer_data)
except:
logging.error(f"OCR also failed on row {i+1}, skipping")
# Move to next row using keyboard
pyautogui.press('down')
time.sleep(0.1)
logging.info(f"Extracted {len(customers)} customers total")
return customers
def parse_customer_row(clipboard_text):
"""Parse customer data from clipboard (Tab-separated)"""
parts = clipboard_text.split('\t')
if len(parts) >= 3:
return {
'raw': clipboard_text,
'id': parts[0].strip(),
'name': parts[1].strip(),
'email': parts[2].strip() if len(parts) > 2 else '',
}
return None
def extract_customer_via_ocr():
"""Fallback OCR extraction for when clipboard fails"""
# Zoom in for better OCR
pyautogui.hotkey('ctrl', 'plus')
time.sleep(0.3)
# Screenshot the current row area
screenshot = pyautogui.screenshot(region=(100, 300, 800, 30))
# Optimize for OCR
screenshot = screenshot.convert('L')
# Run OCR
text = pytesseract.image_to_string(screenshot, config='--psm 7')
# Zoom back out
pyautogui.hotkey('ctrl', 'minus')
time.sleep(0.3)
# Parse and return
if text:
return {'raw': text.strip(), 'name': text.strip()}
return None
This approach:
The Windows UI automation tree is a mess, and you'll frequently encounter elements that are visible on screen but invisible to your automation tools. When this happens:
The frustrating reality is that Windows RPA requires you to work around limitations constantly. But with these techniques, you can automate even the most stubbornly inaccessible applications.
Document every shortcut you discover. Build a library of clipboard extraction patterns. Create reusable functions for OCR with proper pre-processing. Over time, you'll build a toolkit that handles whatever invisible UI horrors legacy applications throw at you.
And remember: if you can see it and click it manually, there's a way to automate it. It might not be elegant, but it's possible.