Addressing AWS Managed Policy Rot in CI/CD

Posted by Michael Bailey on April 28, 2021 · 5 mins read

AWS likes to rename, replace, etc their managed policies a fair bit. Generally, before (for instance) outright deleting one, they'll send a deprecation notice. But even with all of that, it's complicated as some people don't read their AWS emails frequently. Sometimes I and a few others have to send tailored temporary CloudFormation stacks to clients, so if it's referencing non-existent policies that's a bit embarrassing on our part. Manual testing is obviously an answer, but in 2021 what we do manually we should try to automate at this point.
Before we get into this "meh practice" policy linting, let's talk about best practices:

  • Reference policies in the context of version IDs. AWS sets the default policy version of managed policies at their discretion. Ideally they aren't making any breaking changes between their versions without notice, but it's more predictable to know a policy you rely on hasn't changed.
  • If you're not going to version policies, keep an eye on the latest versions, even if it's a Twitter account.

This may not work 100%. It may require modification. It just works for our niche project in question. It's a bit grimey, but Reddit asked and I'll be damned if I'm not an OP who delivers.

Like many others, I don't typically develop my code, policies, etc in AWS. Sorry, CodeCommit. What we do have is a CI/CD pipeline that runs against our CodeFormation. AWS is nice enough to have a cfn-lint tool to check for common mistakes, and other third party tools exist for other things, but what about referncing policies that exist? The Cfn may be well formed that references something that doesn't exist at all.

In my case, to avoid rotting managed policies, in our CI (without disclosing individual internal stacks) I'll regex for arn:aws:iam::aws:policy\/([A-Za-z0-9_\-]{1,150}) in any yaml and pass the captured policy (not the ARN stub) to the below script.

import requests
import sys
import os
# Pull down the API gateway, explained later
response = requests.get("https://apigatewaystubhere.execute-api.us-east-1.amazonaws.com/v1").json()
# Get the policy as an argument 
for policy in sys.argv[1:]:
    print("Checking {0}".format(policy))
    if policy in response:
        print("Passed")
    else:
        print("Non-existent policy: {0}".format(policy))
        sys.exit(1)
sys.exit(0)

As for the API Gateway, it passes to a lambda function that runs this:

import json
import boto3 

def lambda_handler(event, context):
    client = boto3.client('iam')
    policy_names = []
    isTruncated = True
    marker = False
    # There may be a ton of policies so we need to paginate
    while isTruncated:
        if marker:
            response = client.list_policies(
                Scope='AWS',
                Marker=marker
            )
        else:
            response = client.list_policies(
                Scope='AWS'
            )
        for policy in response['Policies']:
            policy_names += [policy['PolicyName']]
        isTruncated = response['IsTruncated']
        if 'Marker' in response:
            marker = response['Marker']
    return {
        'statusCode': 200,
        'body': json.dumps(policy_names)
    }

All this will do is return a list of policy names latest to AWS when calling the API Gateway. Feel free to put the API Gateway behind any necessary security, WAF, etc.

So what we have is a script pulling each policy name, comparing them against AWS, and throwing an error if one doesn't exist in AWS. Running this on push works, like most CI, but running it on a schedule is probably better.