Scripting with Ansible and Python
How to Find Rogue, Unmanaged Devices
Client Issue
A client came to us with a complicated global network environment of engineers, developers, and executives. They were facing the same issues that many larger companies with multiple IT departments and variable infrastructure face – non-uniformity, rogue devices, poor historical reporting, security flaws, etc.
Problem: Shadow IT issues caused by developers creating their own virtual machines without going through the appropriate channels and having the appropriate security tools correctly installed on the hosts.
Our DevOps department has created a library of customized scripts to tackle this device management issue (and all the problems that come with it) both moving forward and retrospectively.
Scripts and Constraints
The networking team was hard at work locking down access lists at the same time we were looking for a way to query the devices that were out on their network. Our solution came down to 3 major steps:
- Create a device in their network
- Locate devices that are currently out there
- Create a set of processes and tools to keep newly created devices managed
Create the device
We decided to go with an Ubuntu host running Ansible and Python via Cron. We will focus on the VMWare portion of this task for brevity’s sake.
The vSphere version we are using is 7.0.U3: https://developer.vmware.com/docs/14558/vsphere-web-services-sdk-programming-guide–7-0-update-3-
Locate the unmanaged devices
The first script that we wrote utilizes python and the vSphere API to query the VMs that are currently powered on. VMWare’s API utilizes a sessions authentication method as seen in the python functions below. After the api_key and session ID is generated by the createAuthSession() function, it can be utilized as the authentication method.
Please note that as vSphere requires a basic authentication method, it is best to store this information securely separately from the raw code. This api_key is then utilized to call the /api/vcenter/vm?power_states=POWERED_ON endpoint to generate a list of powered VMs in a programmatic way. This list was then parsed into a .csv file in which we could start to track down the unmanaged devices and either remove them or to install the tools the client is using to manage devices.
vSphere Authentication Sessions
def createAuthSession():
urllib3.disable_warnings(InsecureRequestWarning)
headers = {
"ContentType": "application/json"
}
url = f"{base_url}/api/session"
print(f" # Attempting to start a rest session to {url}.")
try:
response = requests.post(url=url, headers=headers, auth=(username, password), verify=False)
if response.status_code == 201:
print(f" # Authentication to {url} succeeded! Retrieved session ID.\n\n")
else:
print(f" # createAuthSessions failed - {response.status_code}")
return response.json()
except Exception as EV:
print(f" # createAuthSessions failed - {EV}")
print(f" # Exiting the program. Please assure you are on the VPN and can connect to {url}.")
exit(1)
def deleteAuthSession(api_key):
headers = {
"ContentType": "application/json",
"vmware-api-session-id": api_key
}
url = f"{base_url}/api/session"
print(f" # Attempting to end the rest session to {url} via {api_key}.")
response = requests.delete(url=url, headers=headers, verify=False)
if response.status_code == 204:
print(f" # The rest session to {url} via {api_key} has been deleted!")
else:
print(f" # deleteAuthSession failed - {response.status_code}")
Create a set of processes and tools
After the initial remediation of servers, the next major step was to come up with a method to keep that list up to date. The question became: “How do we keep all the newly created VMs caught and onboarded in a quick and efficient manner?” Ansible Tower (AWX) was the solution to this issue. Utilizing customized Ansible Roles specific to their management and access tools we created a playbook that onboarded any new hosts added to the MANAGED AWX Inventory. This playbook used roles that handled installing DataDog, Crowdstrike, SumoLogic, and Duo as well as pushing the settings for Windows governance, NTP, and IIS.
After creating this playbook to onboard a device, the next step was to attach it to the VM tracking script utilized in the prior section. Ansible requires specific onboarding to Windows hosts to open the winRM port utilized in managing Windows hosts. We used a daily updating .csv of the current vs previous-day’s VMs. This comparison was then emailed to the DevOps support team daily, opening a ticket when a new host has been found. A PowerShell script is then run on the host to open the winRM port on the host and then the host is added to the MANAGED Inventory on AWX. This playbook is run against that inventory every 12 hours to assure all new hosts are onboarded, no programs have been removed or tampered with, and to restart or install anything that may require it.
Ansible Onboarding Playbook/Roles
Daily VM Counter
Conclusion
Problem: Shadow IT issues caused by developers creating their own virtual machines without going through the appropriate channels and having the appropriate security tools correctly installed on the hosts
Solution: Ansible Tower (AWX) deployment with customized python scripts to integrate with existing tools in both the clients and our environments to offer seamless onboarding and management of newly added hosts
This solution allowed for flexibility to communicate with both Azure and VMWare resources by utilizing the Ansible Galaxy Collections to create Ansible playbooks and customized python functions. There are a few more customized integrations to this that are not mentioned in this article such as – opening tickets on new hosts being added, AWX host onboarding, interacting with Azure nodes, and more all communicating in such a way that whatever situation arises, the appropriate engineer is tapped quickly to offer their expertise in finding the right solution.
Interested in more of our DevOps capabilities? Check out our post on Zoom and Cisco Room Kit OBTP.