Ansible is a very rich framework with many flavors and usage patterns. While it can be applied in different ways, the technology is built on several pillars that should guide its use. Among these, one is particularly important for complex organizations: the Ansible Collection.

Collections provide a foundational element for Ansible code management, enabling separation of concerns, controlled distribution, and versioning. This makes them essential for scaling practices, maintaining consistency, and ensuring traceability across teams and environments.

Ansible Collections are a distribution format for Ansible content, including roles, modules, plugins, test code, and documentation. They make it easy to package, share, and reuse automation across projects and teams.

This introduction shows how to create collections, maintain them in local directories or private GitHub repositories, and use them in playbooks.

As most major Ansible Collections are public and distributed via Ansible Galaxy servers—which is often a problem in enterprise environments—this guide demonstrates how to install them from local sources e.g. local directories or repositories. This approach avoids reliance on public sources and ensures full control over external dependencies.

The material takes you from a classic playbook, through an Ansible Role, and finally to an Ansible Collection, concluding with the interface specification for Ansible Roles. The next step in the learning path will cover testing and Ansible fact persistence.

This document is supported by code available at the https://github.com/rstyczynski/ansible-collection-howto repository. The apache* playbooks take you through the steps towards building an Ansible Collection and using Molecule for testing. The duck* set is a recap that includes argument specification, and the state* set of plays introduces fact persistence.

Contents

1. The Classic Playbook

The following HTTP server examples illustrate how an Ansible playbook evolves as infrastructure requirements grow. We begin with a playbook designed specifically for Red Hat systems, then extend it to support Debian. This progression demonstrates the Ansible logic lifecycle and highlights the maintenance consequences in enterprise environments.

1.1 RedHat-only Playbook

Initially, the administrator needs to configure Apache on Linux systems running RedHat. The playbook uses two basic Ansible modules and runs in root mode.

---
- name: Install and configure Apache on RedHat systems
  hosts: webservers
  become: true
  tasks:
    - name: Install Apache
      ansible.builtin.yum:
        name: httpd
        state: present

The playbook looks straightforward. It’s easy to read and use. Full control over configuration steps is exposed to the user, who can define target servers, root operations, and other low-level technical details.

1.2 Debian-only Playbook

Another system required handling Debian packages, so this new playbook was created.

---
- name: Install and configure Apache on Debian systems
  hosts: webservers
  become: true
  tasks:
    - name: Install Apache
      ansible.builtin.apt:
        name: apache2
        state: present

Again, it’s a trivial playbook with just a change to the package manager and the Apache package name. Let’s combine both to maintain only one playbook with universal Debian/RedHat logic.

---
- name: Install Apache on RedHat and Debian systems
  hosts: webservers
  become: true
  tasks:
    - name: Install Apache on RedHat
      ansible.builtin.yum:
        name: httpd
        state: present
      when: ansible_os_family == "RedHat"

    - name: Install Apache on Debian
      ansible.builtin.apt:
        name: apache2
        state: present
      when: ansible_os_family == "Debian"

The combined version is a little more complex, harder to read, and more fragile. However, it is still a very simple example. In real life, the code will be much more complex, and exposing such code to others becomes risky.

In the next step, I’ll present how to encapsulate the code in a fundamental Ansible element — the Role. For the price of a few additional development steps, an Ansible Role provides more benefits in enterprise environments.

2. Ansible Role

2.1 Ansible Role vs. Module

The initial code directly used Ansible Modules - atomic Ansible steps, used to build playbooks. Now the code will be refactored to utilize Ansible Roles.

Ansible Modules and Roles serve different purposes in the automation ecosystem. A module is a single, self-contained unit of work that performs a specific task, such as installing a package (yum, apt), or copying files (copy). Modules are the fundamental building blocks in Ansible; each task in a playbook typically invokes a module to carry out a particular action on the managed hosts.

In contrast, a role is a higher-level organizational structure that groups together multiple related tasks, along with their defaults, handlers, templates, files, and variables. Roles provide a standardized way to package and reuse automation logic, making it easy to share and apply complex configurations across different projects or environments. By organizing content into roles, you can separate concerns, promote consistency, and reduce duplication in your automation code.

2.2 Scaffold Ansible Role directory structure

As Ansible role requires specific directory structure it’s handy use ansible tool to initialize the directory.

ansible-galaxy role init apache

This creates a full role skeleton in roles/apache/ with the standard Ansible structure for the role:

roles/
  apache/
    defaults/
      main.yml
    files/
    handlers/
      main.yml
    meta/
      main.yml
    tasks/
      main.yml
    templates/
    tests/
      inventory
      test.yml
    vars/
      main.yml

It’s important to understand each place in a role hierarchy, however not all of them are critical for regular use. Here is a list of critical directories:

  • tasks: the role’s executable logic. Split into additional task files and import/include as needed.

  • defaults: lowest-precedence vars for the role. Use for safe, overridable settings users might tweak.

  • vars: Higher-precedence vars than defaults (vars/main.yml). Use for internal/platform-specific values rarely overridden.

  • meta: role metadata and dependencies: supported platforms, required roles/collections, Galaxy info. Recent Ansible describes here role’s argument.

2.3 Move existing tasks into the Role

Role delivers multiple features, however on this stage we are interested in roles/apache/tasks/main.yml file to move playbook’s core logic to this place.

---
- name: Install Apache on RedHat
  ansible.builtin.yum:
    name: httpd
    state: present
  when: ansible_os_family == "RedHat"

- name: Install Apache on Debian
  ansible.builtin.apt:
    name: apache2
    state: present
  when: ansible_os_family == "Debian"

Now the complexity is encapsulated in Ansible Role, and the user see only top level technical function - make apache up and running.

2.4 Revert the Playbook to the simplest format

Having above Role ready, the playbook may be super simple. It’s even more simple that the initial one. All the complexity is hidden now in Role and the administrator calls pure business need to activate apache.

- hosts: webservers
  become: yes

  roles:
    - apache

Note that one more element should be simplified - the root control, by moving Ansible’s "become" to the lower level. It will be done later during refactoring supported by Molecule testing.

2.5 Limitations of the module and role based approaches

The examples above show a natural evolution: starting from a simple RedHat-only playbook, extending it to support Debian systems, and finally moving the complexity into an Ansible Role. While using roles helps organize and encapsulate the automation logic, the traditional approach of copying playbooks and roles between multiple projects remains problematic.

Duplicating these playbooks and roles across projects leads to multiple copies that can diverge over time, causing version drift and inconsistencies. This fragmentation makes it difficult to maintain and update automation content effectively, as changes applied in one place are not automatically reflected elsewhere. Let’s imagine that tha playbook was used by multiple users, who copied it to their environments. Does not smell good.

As a result, version control becomes scattered, and managing updates requires significant manual effort and coordination. Although this approach may work for small environments, enterprise-scale automation demands better separation of concerns, strict versioning, and mechanisms that prevent code duplication to ensure maintainability and consistency across teams.

One may argue that a role can be stored in its own Git repository and then included in a project. This approach indeed solves some challenges, such as version control and reuse across multiple playbooks. However, it still leaves other problems unresolved — for example, potential naming clashes, the lack of a consistent packaging format, and difficulties in managing dependencies.

Using just Ansible Roles is a partial solution, as role was introduced for soke purposes, Ansible community noticed a need to make next step. Finally as for today all aspects of domain specific logic are packaged in Ansible Collection.

Collections are top level distribution components used world-wide by all small and corporate size providers. On the other hand collection are super easy to use and maintain, giving enterprise level capabilities to Ansible adopters.

3. Introduction to Ansible Collections

Ansible Collections are a standardized packaging format that bundle together multiple types of Ansible content—such as roles, modules, plugins, and documentation—into a single, organized unit. This approach streamlines the distribution and management of automation resources, allowing you to work with related content as a whole rather than handling individual roles or modules separately.

Collections greatly improve reusability and versioning. By packaging content into collections, you can easily share your work within your team or with the wider Ansible community. Collections also support structured version control, enabling you to track changes, update content safely, and ensure compatibility across projects. This makes maintaining and evolving automation simpler and more reliable.

Collections can be stored locally, published to public repositories like Ansible Galaxy, or hosted in private repositories (e.g., GitHub). This flexibility makes them suitable for both community-driven projects and enterprise environments where control and security are required. Overall, Ansible Collections provide a powerful way to organize, share, and manage automation content efficiently.

A key feature introduced by Ansible Collections is the namespace — the top-level identifier that groups collections, prevents naming conflicts, and indicates ownership. Examples include community.general or myorg.apache. Namespaces are particularly important in large organizations and when sharing collections publicly, as they help maintain clear boundaries and avoid collisions.

3.1 Collections vs Roles vs Modules

As discussed earlier, modules are the smallest building blocks in Ansible, performing atomic actions within tasks. Roles group tasks and related content into reusable units, sitting one level above modules. Collections extend this concept further by packaging roles, modules, plugins, and documentation together into a single, distributable format.

Collections sit at the top of the hierarchy as the primary packaging layer. They address challenges around sharing, versioning, and dependency management across projects—problems that roles alone cannot fully solve. Collections are therefore essential for maintaining consistency and scalability in larger automation environments.

3.2 Collection Structure

Like roles, collections are based on a strict directory structure. Ansible provides tooling to scaffold the initial directory layout.

The ansible-galaxy utility creates the directory structure for a collection. Unlike role creation, you must provide both a namespace and a collection name. For example, using myorg.unix (myorg as the namespace, unix as the collection name):

ansible-galaxy collection init myorg.unix

This command creates a full collection skeleton in the myorg/unix/ directory with the standard structure. Note the roles directory, which will contain all roles belonging to the collection.

myorg/
  unix/
    docs/
    plugins/
    roles/
    galaxy.yml
    README.md

3.3 Move Existing Role into the Collection

The previously created apache role can be moved into the collection under myorg/unix/roles/apache/. The role structure remains the same, and the tasks will continue to function without modification.

myorg/
  unix/
    roles/
      apache/
        tasks/
          main.yml
        defaults/
        handlers/
        meta/
        templates/
        vars/
        files/

3.4 Installing the Collection

In this tutorial, the collection is kept in the Ansible-aware collections/ansible_collections directory to make it directly available for playbooks. This works for special cases but is not suitable for regular enterprise usage. Before use, the collection should be installed to the proper location.

The installation location is configurable, but for now, we will use the default (~/.ansible). Ansible defines a standard way to bring a collection from any location into the local execution environment, supporting sources such as Galaxy, Git, URL, file directory, or subdirectories.

Typically, collection installation is managed via a requirements.yml file that specifies dependencies:

---
collections:
  - name: collections/ansible_collections/myorg/unix/
    type: dir
  - name: collections/ansible_collections/myorg/toolchain/
    type: dir
  - name: collections/ansible_collections/myorg/publicapi/
    type: git
    source: https://github.com/rstyczynski/ansible-collection-howto.git#/collections/ansible_collections/myorg/publicapi
    version: main
Note

Note that Ansible supports wide range of sources for collections, including Git, URL, file directory, or subdirectories. Collection stored at git may be placed in a subdirectory of the repository, what may be beneficial is some cases, however for production like collections always use dedicated repository, what gives full control over the collection to the owner.

With requirements.yml ready, install the dependencies using the ansible-galaxy tool:

ansible-galaxy install -r requirements.yml

You can verify that the collection is available:

ansible-galaxy collection list | grep myorg

3.5 Simplify the Playbook Using the Collection

With the role now inside the collection and the collection installed, you can reference it in your playbook:

- hosts: webservers
  become: yes

  roles:
    - myorg.unix.apache

Note the namespace prefix (myorg.unix). This allows you to use an apache role supplied by different authors, as collections use namespaces to avoid naming conflicts.

3.6 Benefits of Using Collections

Roles already provide organization and reusability, but collections extend these advantages significantly. A collection can bundle roles together with modules, plugins, and documentation in one package. You no longer need to manage these elements separately across projects.

While roles can be versioned (e.g., via Git tags or Galaxy releases), the mechanism is mostly ad hoc and external. Collections, by contrast, make versioning a first-class feature: every collection carries a version in its galaxy.yml, and dependencies on other collections can be declared in a structured way. This makes it easier to control upgrades, avoid incompatibilities, and ensure consistency across environments.

Another key benefit is unified distribution. While roles can be shared via Galaxy, GitHub, or private repositories, collections package multiple content types (roles, modules, plugins, documentation) together. This makes installation, versioning, and sharing more consistent and predictable, especially in larger environments.

In summary, collections are the natural next step after roles: they enhance reusability, standardize version control, and provide the dependency management needed for automation at scale.

3.7 Advanced Topics

3.7.1 Custom Collection Installation

Oracle distributes its OCI Collection through regular Ansible Galaxy servers, but this document focuses on local sources. The following example shows how to install the Oracle OCI Collection from a tar source, downloading and building it first:

curl -L -o /tmp/oci-ansible-collection-5.5.0.tar \
  https://github.com/oracle/oci-ansible-collection/archive/refs/tags/v5.5.0.tar.gz
mkdir -p /tmp/oci-ansible-collection-src
tar -xf /tmp/oci-ansible-collection-5.5.0.tar -C /tmp/oci-ansible-collection-src --strip-components=1
cd /tmp/oci-ansible-collection-src
ansible-galaxy collection build
ansible-galaxy collection install oracle-oci-5.5.0.tar.gz

ansible-galaxy collection list | grep oracle.oci

3.7.2 Blocking Public Galaxy Servers

Blocking public sources may not be straightforward without additional firewall measures. However, a simple technique disables public Galaxy servers at the Ansible level, which can be applied as a first protection layer in pipelines.

export ANSIBLE_GALAXY_SERVER_LIST=blocked
export ANSIBLE_GALAXY_SERVER_BLOCKED_TOKEN='blocked'

Now, if you try to install Oracle OCI:

ansible-galaxy collection install oracle.oci --force

Instead of installation progress, you will see an error:

[ERROR]: Required config 'url' for 'blocked' galaxy_server plugin not provided.

4. Ansible Role argument specification

Ansible provides the capability to specify a Role’s argument definitions. This feature is limited to inputs, but it can serve as a starting point for defining output properties as well. To demonstrate this capability, we use a simple DuckDuckGo API integration that returns a description of a given person’s name. This use case runs on the controller, so it does not require any managed hosts. It also serves as a recap of roles and collections, with the focus on argument definitions.

The playbook is presented in three versions:

  1. duck1.yml - regular playbook interacting with DuckDuckGo API

  2. duck2_with_role.yml - playbook with role hiding DuckDuckGo API complexity

  3. duck3_with_collection.yml - playbook with collection

Argument validation is defined in the argument_specs.yml file stored in the meta/ directory.

argument_specs:
  main:
    short_description: "Query DuckDuckGo"
    options:
      duckduckgo_query:
        type: str
        description: The search query to send to DuckDuckGo Instant Answer API
        required: true

It is verified at runtime by a task at the start of the Role’s logic, using the validate_argument_spec module.

- name: Validate inputs (explicit)
  ansible.builtin.validate_argument_spec:
    argument_spec: "{{ lookup('file', role_path ~ '/meta/argument_specs.yml') | from_yaml }}"

By applying these two simple elements, you ensure that your role receives all required arguments in the expected format.

Appendix. References