Using Ansible to Tackle Shadow IT Programmatically

Scripting with Ansible and Python

How to Find Rogue, Unmanaged Devices

Client Issue 

A client came to us with a complicated global network environment of engineers, developers, and executives. They were facing the same issues that many larger companies with multiple IT departments and variable infrastructure face – non-uniformity, rogue devices, poor historical reporting, security flaws, etc. 

Problem: Shadow IT issues caused by developers creating their own virtual machines without going through the appropriate channels and having the appropriate security tools correctly installed on the hosts.

Our DevOps department has created a library of customized scripts to tackle this device management issue (and all the problems that come with it) both moving forward and retrospectively. 

Scripts and Constraints 

The networking team was hard at work locking down access lists at the same time we were looking for a way to query the devices that were out on their network. Our solution came down to 3 major steps: 

  1. Create a device in their network 
  1. Locate devices that are currently out there 
  1. Create a set of processes and tools to keep newly created devices managed 

Create the device 

We decided to go with an Ubuntu host running Ansible and Python via Cron. We will focus on the VMWare portion of this task for brevity’s sake.
The vSphere version we are using is 7.0.U3: https://developer.vmware.com/docs/14558/vsphere-web-services-sdk-programming-guide–7-0-update-3-

Locate the unmanaged devices 

The first script that we wrote utilizes python and the vSphere API to query the VMs that are currently powered on. VMWare’s API utilizes a sessions authentication method as seen in the python functions below. After the api_key and session ID is generated by the createAuthSession() function, it can be utilized as the authentication method.  

Please note that as vSphere requires a basic authentication method, it is best to store this information securely separately from the raw code.  This api_key is then utilized to call the /api/vcenter/vm?power_states=POWERED_ON endpoint to generate a list of powered VMs in a programmatic way. This list was then parsed into a .csv file in which we could start to track down the unmanaged devices and either remove them or to install the tools the client is using to manage devices.  

vSphere Authentication Sessions 

def createAuthSession(): 
    urllib3.disable_warnings(InsecureRequestWarning) 
    headers = { 
        "ContentType": "application/json" 
    } 
    url = f"{base_url}/api/session" 
    print(f" # Attempting to start a rest session to {url}.") 
    try: 
        response = requests.post(url=url, headers=headers, auth=(username, password), verify=False) 
        if response.status_code == 201: 
            print(f" # Authentication to {url} succeeded! Retrieved session ID.\n\n") 
        else: 
            print(f" # createAuthSessions failed - {response.status_code}") 
        return response.json() 
    except Exception as EV: 
        print(f" # createAuthSessions failed - {EV}") 
        print(f" # Exiting the program. Please assure you are on the VPN and can connect to {url}.") 
        exit(1) 
 
 
def deleteAuthSession(api_key): 
    headers = { 
        "ContentType": "application/json", 
        "vmware-api-session-id": api_key 
    } 
    url = f"{base_url}/api/session" 
    print(f" # Attempting to end the rest session to {url} via {api_key}.") 
    response = requests.delete(url=url, headers=headers, verify=False) 
    if response.status_code == 204: 
        print(f" # The rest session to {url} via {api_key} has been deleted!") 
    else: 
        print(f" # deleteAuthSession failed - {response.status_code}") 

Create a set of processes and tools 

After the initial remediation of servers, the next major step was to come up with a method to keep that list up to date. The question became: “How do we keep all the newly created VMs caught and onboarded in a quick and efficient manner?” Ansible Tower (AWX) was the solution to this issue.  Utilizing customized Ansible Roles specific to their management and access tools we created a playbook that onboarded any new hosts added to the MANAGED AWX Inventory. This playbook used roles that handled installing DataDog, Crowdstrike, SumoLogic, and Duo as well as pushing the settings for Windows governance, NTP, and IIS.  

After creating this playbook to onboard a device, the next step was to attach it to the VM tracking script utilized in the prior section. Ansible requires specific onboarding to Windows hosts to open the winRM port utilized in managing Windows hosts. We used a daily updating .csv of the current vs previous-day’s VMs. This comparison was then emailed to the DevOps support team daily, opening a ticket when a new host has been found. A PowerShell script is then run on the host to open the winRM port on the host and then the host is added to the MANAGED Inventory on AWX. This playbook is run against that inventory every 12 hours to assure all new hosts are onboarded, no programs have been removed or tampered with, and to restart or install anything that may require it.  

Ansible Onboarding Playbook/Roles

-hosts: all
  gather_facts: yes
  roles:
    - datadog-agent-install
    - win_governance
    - NTP
    - crowdstrike
    - sumo
    - duo_install
 vars:   datadog_agent_version: "7.32.4"
datadog_api_key:  "{{  hostvars['localhost']['datadogapikey']['stdout'] }}"
anisble_connection: winrm
anisble_winrm_server_vert_validation: ignore
A screenshot of the playbook and roles. Alt text is available.

Daily VM Counter

Image of email from daily VM counter report

There are two attached spreadsheets. One is the off list, the other is the powered on.

vCenter VMs:
 >Powered on: 543
 >Powered off: 52
 >Total: 595

"The total powered on hosts has changed: Yesterday=544 --> Today=543.

The total powered off hosts has changed: Yesterday=51 --> Today=52

-Host: HOSTA Change: removed from powered on

 -Host: HOSTA Change: added to powered off"
An image of the daily VM counter email.

Conclusion

Problem: Shadow IT issues caused by developers creating their own virtual machines without going through the appropriate channels and having the appropriate security tools correctly installed on the hosts 

Solution: Ansible Tower (AWX) deployment with customized python scripts to integrate with existing tools in both the clients and our environments to offer seamless onboarding and management of newly added hosts

This solution allowed for flexibility to communicate with both Azure and VMWare resources by utilizing the Ansible Galaxy Collections to create Ansible playbooks and customized python functions. There are a few more customized integrations to this that are not mentioned in this article such as – opening tickets on new hosts being added, AWX host onboarding, interacting with Azure nodes, and more all communicating in such a way that whatever situation arises, the appropriate engineer is tapped quickly to offer their expertise in finding the right solution.  

Interested in more of our DevOps capabilities? Check out our post on Zoom and Cisco Room Kit OBTP.