Simulation of mouse and keyboard on third-party applications

Simulation of a key press or key combination, or even a mouse click in your own application is relatively easy: using SendKeys method is enough to our application to receive data from the keyboard, and the proper explicit invocation of the mouse associated event to simulate a mouse-click.

However, if we desire to create an application which mission consists to do these operations over a third-party applicatino, we must make use of unmanaged code.

Managed and unmanaged code.

Managed code is the code executed under CLR (Common Language Runtime) control, that is to say, that code written in .NET compiled into MSIL or CIL intermediate code(depending on Framework version) which is transformed into native code on runtime. This is the natural process of the .NET programming, with which I suppose everyone or almost everyone reading this is well-known.

Unmanaged code, in return, is that code not belonging to this life-cycle, like COM/COM+ componentes, ActiveX or the Windows API (which is where we’ll focus this article).

Windows exposes its API through a set of dynamic link libraries (dll), like user32.dll (allows to handle windows and their events), shell32.dll (processes), winspool.drv (printing)… To make use of keyboard and mouse in third-party applications, we’ll focus on the user32.dll library, which, as we can realize, will be unmanaged code.

Using unmanaged code

Using a method belonging to a library which code is unmanaged requires at least two operations:

Declaring the method with the same signature that the original method (that is to say, declaring it as was originally created inside the dll library), adding the extern annotation. This annotation will point out to the compiler that the method is externally implemented outside our code. The next step will inform about where is the implementation located.
Decorate through the DllImport annotation the file name where the method is implemented.

Thereby, if we want to use the FindWindow function, which provides a pointer to a specific window and is exposed in the user32.dll file, we would write the following code:

1 2 [DllImport("user32.dll")]
public static extern IntPtr FindWindow(string lpClassName, string lpWindowName);

user32.dll file contains, as we explained before, the implementation of the Windows API methods and functions bound to the interaction with system windows and their events. You can find a detailed list of the methods exposed by this library in this link.

In this way, if we want to get a pointer to a windows which class or name is known, it should be enough to invoke the method in the following way:


IntPtr handle;
string className =
    System.Configuration.ConfigurationManager.AppSettings["class"];
string windowName =
    System.Configuration.ConfigurationManager.AppSettings["window"];

handle = FindWindow(className, windowName);

if (handle == IntPtr.Zero) {
  MessageBox.Show("Window not found.");
  return;
}

In the code above, we are assuming that the class name and the window name (both strings) are encoded as paarameters inside the app.config file. However, surely we’re asking a small detail: Where can we get these (class and window) names? Even though there are methods in the API to perform this exploration, we’ll make use of one of the Visual Studio tools: Spy++.

Spy++

This tool is usually located inside the Visual Studio Tools folder, and requires administration privileges, since it will access to all the metadata handled by each opened windows in our session, including the content of the messages they send and receive from and to each others.

As soon as we launch the application, we’ll find something as follows:

Let’s imagine that we want to obtain the class and/or name of a particular window. To do so, we just select the Spy > Find Windows option. This will open a windows similar as the next one:

If we click on the spotlight and we hold the mouse left button, this windows will show information about any windows we select from this particular moment. For example:

As we can observe, we have a windows which class is Notepad++ and its name (caption) is “C:\Users\Dani\Desktop\config.ini – Notepad++”.

From the class name or the window name, it will be possible, through the FindWindow method, get a pointer to the window and thereby perform operations on it.

Importing API methods

Our intention is make use of keyboard and mouse over a particular window. We’ve already seen how to get a windows pointer. Now, we must to find out what to do with that pointer. We want to set the selected windows as active, bringing it to the foreground in order to interact with it. We’ll use the SetForegroundWindow method to do so. Let’s reference this method in the following way:

[DllImport("user32.dll", CharSet = CharSet.Unicode)]
public static extern bool SetForegroundWindow(IntPtr hWnd);

Then, we’ll need a set of methods which allows us to interact with the mouse buttons. But first of all, we’ll need to define some constants which will encode certain values which will correspond with mouse events. These values will be the following:

Mouse moving.
Left button press.
Left button release.
Right button press.
Right button release.
Suggest that the indicated coordinates are absolute. Just in case that we don’t include this value, the X and Y values will be added to the present position of the mouse pointer.

These constants are predefined, and their values are as follows:

private const uint MOUSEEVENTF_MOVE = 0x0001;
private const uint MOUSEEVENTF_LEFTDOWN = 0x0002;
private const uint MOUSEEVENTF_LEFTUP = 0x0004;
private const uint MOUSEEVENTF_RIGHTDOWN = 0x0008;
private const uint MOUSEEVENTF_RIGHTUP = 0x0010;
private const uint MOUSEEVENTF_ABSOLUTE = 0x8000;

The methods we’ll use to handle the mouse will be the following: one to set the pointer in a determined position (SetCursorPos) and another one to wire a mouse event (mouse_event).

[DllImport("user32.dll", CharSet = CharSet.Unicode)]
static extern bool SetCursorPos(uint x, uint y);

[DllImport("user32.dll", CharSet = CharSet.Unicode,
           CallingConvention = CallingConvention.StdCall)]
public static extern void mouse_event(uint dwFlags, uint dx, uint dy,
                                      uint cButtons, UIntPtr dwExtraInfo);

Using the mouse

To perform a mouse action over a particular window, the following method should be enough:

private void performClick(uint x, uint y)
{
    SetCursorPos(x, y);
    mouse_event(MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_LEFTDOWN, x, y, 0, UIntPtr.Zero);
    Thread.Sleep(200);
    mouse_event(MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_LEFTUP, x, y, 0, UIntPtr.Zero);
}

The SetCursorPos(x, y) method is similar to the following code:

private void moveToPos(uint x, uint y)
{
    mouse_event(MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_MOVE, x, y, 0, UIntPtr.Zero);
}

Thereby, we would place the pointer in the (x, y) position, then we would send the MOUSEEVENTF_LEFT_DOWN event (left button press), wait for 200 milliseconds and then, we would send the MOUSEEVENTF_LEFTUP event (left button release). This sequence is equivalent to a regular left button click. A double click should be similar, calling the method twice in the following way:

private void performDoubleClick(uint x, uint y)
{
    performClick(x, y);
    Thread.Sleep(400);
    performClick(x, y);
}

Hence, the required code to perform a double click in the point (200, 400) of the windows which class is “Notepad++” should be as follows:

handle = FindWindow("Notepad++", null);
 
if (handle == IntPtr.Zero)
{
    MessageBox.Show("Window not found.");
    return;
}
 
SetForegroundWindow(handle);
performDoubleClick(200, 400);

Typing

Simulating text through the keyboard is much easier. It’s enough with using the SendKeys class to send text to the active window. So, if we want to type “Hello, world”, it will be enough with the following code:

handle = FindWindow("Notepad++", null);

if (handle == IntPtr.Zero) {
  MessageBox.Show("Window not found.");
  return;
}

SetForegroundWindow(handle);
SendKeys.SendWait("Hello, world");

SendKeys also allows to send special characters (tabs, returns, etc) and key combinations (like CTRL+ALT+DEL). You can take a depth look to the possibilities in this link.

Last, we can take a fast view to the common tasks that can be performed through user32.dll methods in this link. They’re supposed to be coded in Visual Basic 6, but they’re easily adaptable to any .NET language.

You can find the Spanish version of this article here.

Print | posted on Monday, January 13, 2014 5:10 PM

This article is part of the GWB Archives. Original Author: Daniel Garcia

Replatforming Guide: Pros, Cons, and Impact

Deciding to replatform is no small feat; it’s like setting sails for new horizons with your digital presence. Weighing the

Cypress vs Selenium: Why Cypress is Better!

Navigating the competitive landscape of web testing tools, Cypress emerges as a noteworthy contender, outshining Selenium with its cutting-edge advantages.