Using Ansible to Tackle Shadow IT Programmatically

Scripting with Ansible and Python

How to Find Rogue, Unmanaged Devices

Client Issue 

A client came to us with a complicated global network environment of engineers, developers, and executives. They were facing the same issues that many larger companies with multiple IT departments and variable infrastructure face – non-uniformity, rogue devices, poor historical reporting, security flaws, etc. 

Problem: Shadow IT issues caused by developers creating their own virtual machines without going through the appropriate channels and having the appropriate security tools correctly installed on the hosts.

Our DevOps department has created a library of customized scripts to tackle this device management issue (and all the problems that come with it) both moving forward and retrospectively. 

Scripts and Constraints 

The networking team was hard at work locking down access lists at the same time we were looking for a way to query the devices that were out on their network. Our solution came down to 3 major steps: 

  1. Create a device in their network 
  1. Locate devices that are currently out there 
  1. Create a set of processes and tools to keep newly created devices managed 

Create the device 

We decided to go with an Ubuntu host running Ansible and Python via Cron. We will focus on the VMWare portion of this task for brevity’s sake.
The vSphere version we are using is 7.0.U3: https://developer.vmware.com/docs/14558/vsphere-web-services-sdk-programming-guide–7-0-update-3-

Locate the unmanaged devices 

The first script that we wrote utilizes python and the vSphere API to query the VMs that are currently powered on. VMWare’s API utilizes a sessions authentication method as seen in the python functions below. After the api_key and session ID is generated by the createAuthSession() function, it can be utilized as the authentication method.  

Please note that as vSphere requires a basic authentication method, it is best to store this information securely separately from the raw code.  This api_key is then utilized to call the /api/vcenter/vm?power_states=POWERED_ON endpoint to generate a list of powered VMs in a programmatic way. This list was then parsed into a .csv file in which we could start to track down the unmanaged devices and either remove them or to install the tools the client is using to manage devices.  

vSphere Authentication Sessions 

def createAuthSession(): 
    urllib3.disable_warnings(InsecureRequestWarning) 
    headers = { 
        "ContentType": "application/json" 
    } 
    url = f"{base_url}/api/session" 
    print(f" # Attempting to start a rest session to {url}.") 
    try: 
        response = requests.post(url=url, headers=headers, auth=(username, password), verify=False) 
        if response.status_code == 201: 
            print(f" # Authentication to {url} succeeded! Retrieved session ID.\n\n") 
        else: 
            print(f" # createAuthSessions failed - {response.status_code}") 
        return response.json() 
    except Exception as EV: 
        print(f" # createAuthSessions failed - {EV}") 
        print(f" # Exiting the program. Please assure you are on the VPN and can connect to {url}.") 
        exit(1) 
 
 
def deleteAuthSession(api_key): 
    headers = { 
        "ContentType": "application/json", 
        "vmware-api-session-id": api_key 
    } 
    url = f"{base_url}/api/session" 
    print(f" # Attempting to end the rest session to {url} via {api_key}.") 
    response = requests.delete(url=url, headers=headers, verify=False) 
    if response.status_code == 204: 
        print(f" # The rest session to {url} via {api_key} has been deleted!") 
    else: 
        print(f" # deleteAuthSession failed - {response.status_code}") 

Create a set of processes and tools 

After the initial remediation of servers, the next major step was to come up with a method to keep that list up to date. The question became: “How do we keep all the newly created VMs caught and onboarded in a quick and efficient manner?” Ansible Tower (AWX) was the solution to this issue.  Utilizing customized Ansible Roles specific to their management and access tools we created a playbook that onboarded any new hosts added to the MANAGED AWX Inventory. This playbook used roles that handled installing DataDog, Crowdstrike, SumoLogic, and Duo as well as pushing the settings for Windows governance, NTP, and IIS.  

After creating this playbook to onboard a device, the next step was to attach it to the VM tracking script utilized in the prior section. Ansible requires specific onboarding to Windows hosts to open the winRM port utilized in managing Windows hosts. We used a daily updating .csv of the current vs previous-day’s VMs. This comparison was then emailed to the DevOps support team daily, opening a ticket when a new host has been found. A PowerShell script is then run on the host to open the winRM port on the host and then the host is added to the MANAGED Inventory on AWX. This playbook is run against that inventory every 12 hours to assure all new hosts are onboarded, no programs have been removed or tampered with, and to restart or install anything that may require it.  

Ansible Onboarding Playbook/Roles

-hosts: all
  gather_facts: yes
  roles:
    - datadog-agent-install
    - win_governance
    - NTP
    - crowdstrike
    - sumo
    - duo_install
 vars:   datadog_agent_version: "7.32.4"
datadog_api_key:  "{{  hostvars['localhost']['datadogapikey']['stdout'] }}"
anisble_connection: winrm
anisble_winrm_server_vert_validation: ignore
A screenshot of the playbook and roles. Alt text is available.

Daily VM Counter

Image of email from daily VM counter report

There are two attached spreadsheets. One is the off list, the other is the powered on.

vCenter VMs:
 >Powered on: 543
 >Powered off: 52
 >Total: 595

"The total powered on hosts has changed: Yesterday=544 --> Today=543.

The total powered off hosts has changed: Yesterday=51 --> Today=52

-Host: HOSTA Change: removed from powered on

 -Host: HOSTA Change: added to powered off"
An image of the daily VM counter email.

Conclusion

Problem: Shadow IT issues caused by developers creating their own virtual machines without going through the appropriate channels and having the appropriate security tools correctly installed on the hosts 

Solution: Ansible Tower (AWX) deployment with customized python scripts to integrate with existing tools in both the clients and our environments to offer seamless onboarding and management of newly added hosts

This solution allowed for flexibility to communicate with both Azure and VMWare resources by utilizing the Ansible Galaxy Collections to create Ansible playbooks and customized python functions. There are a few more customized integrations to this that are not mentioned in this article such as – opening tickets on new hosts being added, AWX host onboarding, interacting with Azure nodes, and more all communicating in such a way that whatever situation arises, the appropriate engineer is tapped quickly to offer their expertise in finding the right solution.  

Interested in more of our DevOps capabilities? Check out our post on Zoom and Cisco Room Kit OBTP.

Zoom and Cisco Room Kit OBTP

Solving a persistent problem between two meeting clients

The Problem


A client recently came to us with an issue they were having with their suite of various models of Cisco Cloud registered Room Kit systems. Zoom and WebEx are both utilized for their meeting clients.  They were able to use the one button to join feature to join WebEx meetings, but the Zoom meetings were prompting for a meeting ID and a passcode. The issue was relayed to our team who began to investigate

What is One Button to Push? (OBTP)

The client has been utilizing a feature in the Room Kit software called One Button to Push (OBTP). It allows users to integrate their calendars with the Room systems and join SIP enabled meetings with a single button. This feature parses the calendar entries looking for meeting information and populates the system with as much of the provided information as possible, limited by the data format and the codec’s interpretation. The full features of OBTP come out when the data contains a properly formatted SIP address. The system will do a few things behind the scenes, then it will send a reply stating that it accepted or declined the invite. If the invite was accepted, then the Touch10 will display the OBTP Join Meeting overlay 5 minutes prior to the start of the meeting. This can also be set to automatically join the meeting 1 minute prior to the start of the meeting, which can be very useful for unmanaged meetings or for those users that need to have a meeting started while they are away from their desk.

Note: There is a lot of backend set up to get this working and as this post is not directly related to how to set up OBTP, I will keep the explanation short and related specifically to the moving parts related to the issue.

OBTP requires a new mail and service account to be generated (O365, exchange, and google). This is the email address that is “owned” by the Room Kit and the calendar of this account is the way in which the system handles reservation management and responding to invites. The Zoom integration requires a secondary and tertiary step of creating a Cisco Room in the Zoom admin portal and installing an .msi on a server  in the local network respectively. This allows the zoom invite to be sent directly to that email address to be interpreted by the system.

Resources: The full guide to installation can be found here: https://support.zoom.us/hc/en-us/articles/115003126346-Using-the-Legacy-Zoom-Connector-for-Cisco


Researching the issue between Zoom and Room Kit OBTP

The first thing our DevOps engineers did was set up a lab in our environment to mimic the issue that our client was having, which meant setting up a Zoom account and modeling our settings from those that our client is using. After being able to replicate the issue and scraping the logs we noticed an SSL error that seemed to be the root cause.

Call disconnected, error shown to be SSL rejection

After chasing that false lead for a while, we determined that there were no issues with the SSL CA that was being used for zoomcrc.com. During our testing for that we found that the voice join function works for Zoom meetings, only the button is broken. When the button is pressed, you get this failed call error in the logs. This provided for a successful joining of the meeting and some obvious differences. Most noticeable is the protocol being used is not SIP, but Spark instead.

Yellow Highlight: Where we identified the Spark Protocol over SIP
Red Arrow: Call 23 showing connecting


Resources: More information on this protocol can be found on page 56 of this PDF of protocols.

This showed that there are different modules being used between the two methods of call and one of those is unable to format that SIP call correctly. This also allowed us to write a macro to successfully dial this number. The next thing we tried was to modify the OBTP overlay so that it was able to format the call the correct way. Unfortunately, the overlay is immutable on the user end, so that idea was scrapped. We briefly toyed with the idea of removing the overlay completely for Zoom meetings and having a custom overlay appear in its stead, but the margin for error was relatively high, the issue could compound during the next software update, and it was a clunky solution. It felt like using a shotgun for a bothersome fly.

Getting in the fix for Zoom and Cisco Room Kit OBTP

We ended up writing a simple listener as the solution to this problem which elegantly stops being called when either Zoom or Cisco fix the issue. This listener looks specifically for calls that are failing for an SSL error to the zoomcrc.com URI, reconstructs the call data correctly, then sends the call. There is a brief ‘Call Failed’ pop-up, then the call connects and does not require a meeting ID or a passcode. While this solution has the small aesthetic negative of the pop-up, it was determined to be the least invasive and most effective mitigation of the issue until these two tech giants decided to play together a bit more nicely.

Have you run into any similar issues? We’d love to hear about it, so feel free to comment below and check our our Zero Trust Philosophy by our own Chris Crotteau.

By: Ben Barnard