What You Need to Know about YAML: A Complete Introduction for Beginners
Introduction:
What is YAML?
Hey there, let's jump into the world of YAML – a superstar among data languages! It's famous for being super easy to understand, just like reading a story. People use it to create special files that control how things work. But guess what? YAML isn't only for files – it's like a superhero that stores info, sends messages online, and shares data between different computer languages!
Imagine this: YAML stands for "YAML Ain't Markup Language." It's designed to be flexible and work with all kinds of programming languages. So, it's like a friendly bridge that helps different computers talk. And cool thing – when you see files ending with .yaml
or .yml
, that's YAML saying, "Hey, it's me!"
In the DevOps world, YAML is a big deal. It helps tools like Kubernetes, Ansible, and Terraform do their jobs smoothly. So, remember, YAML is like a helpful friend for computers, making everything work like magic!
Now, let's dive deeper into YAML. We'll start by learning the basic format of a YAML file and then explore its syntax. YAML files follow certain rules, like having the right spacing, key-value pairs, lists, arrays, strings, and numbers. After that, we'll understand how the appspec.yaml
configuration file works.
So, let's begin with our first topic: understanding the general format of a typical YAML file.
What Does a Regular YAML File Look Like?
The format of a typical YAML file looks similar to the following. I have explained it by taking a very generic example to simply understand how the various parts of a YAML file are structured one by one.
# A sample YAML file: Represents company information.
company: MyCompany
domain:
- technology
- software
tutorial:
- yaml:
name: "YAML Explained"
type: informative
published: 2023-08-21
- json:
name: "JSON Introduction"
type: useful
published: 2023-08-20
- xml:
name: "Understanding XML"
type: foundational
published: 2023-08-19
author: John Doe
published: true
In this sample format, the YAML structure represents company information. The ‘company’ field is set to "MyCompany", and the ‘domain’ field mentions that it has two domains - ‘technology’ and ‘software’. The tutorial section contains a list of tutorials, each tutorial has further information in the ‘key:value’ pair format. The ‘author’ section is set to "John Doe", and the ‘published’ section has a value of ‘true’, which implies that the information is published.
So here, we have tried to put the data in a certain format. Now let’s extend this understanding and learn the actual syntax of the YAML file.
Basic YAML Syntax
A YAML file always organizes data using three primary structures. These primary structures are maps/dictionaries, also called mapping, arrays/lists, also called sequence, and literals, also called scalars.
a. Maps/Dictionaries (Mapping)
Maps are a way to represent structured data using key-value pairs. The keys are unique identifiers, and each key is associated with a corresponding value. In YAML, mappings are represented by lines like key: value
.
For instance, in the below example,
person:
name: John Doe
age: 30
occupation: Software Engineer
In this example, "person" is the key, and its corresponding value is another set of key-value pairs representing details about a person.
‘person’ - This is the key. It's the identifier for the map and is followed by a colon.
‘name: John Doe’ - This is the first key-value pair within the map. ‘name’ is the key, and ‘John Doe’ is the value associated with that key.
‘age: 30’ - This is the second key-value pair. ‘age’ is the key, and ‘30’ is the associated value.
‘occupation: Software Engineer’ - This is the third key-value pair. ‘occupation’ is the key, and ‘Software Engineer’ is the associated value.
Mappings are like dictionaries that connect keys to values. They're not arranged in any particular order. You can put one map inside another by indenting it more, or you can make a new map on the same level as the previous one.
For example:
student:
name: "John Doe"
age: 18
contact:
phone: "123-456-7890"
email: "john@example.com"
In this case, "student" is the main key, and under it are keys like "name," "age," and "contact." The "contact" key itself has sub-keys "phone" and "email," creating a structure of related information.
In a nutshell, I can say that YAML maps allow you to group related data together using keys, making it easy to access specific pieces of information.
Arrays/Lists (Sequence):
In YAML, arrays, also known as lists or sequences, are used to represent collections of items in a specific order. A sequence is an ordered list of items, allowing you to group multiple values together under a single key. In YAML, sequences are denoted by a hyphen followed by a space before each item. Let’s understand this with the following example.
fruits:
- apple
- banana
- orange
In this example, "fruits" is the key, and the associated value is a list of fruit names. Each fruit name is represented by a hyphen (-) followed by a space.
- ‘fruits:’ - This is the key. It identifies the array and is followed by a colon.
- ‘- apple’ - This is the first item in the array. The hyphen indicates the start of an item, and "apple" is the value of the first item.
- ‘- banana’ - This is the second item. The hyphen indicates another item, and "banana" is the value of the second item.
- ‘- orange’ - This is the third item. Similarly, the hyphen indicates an item, and "orange" is the value of the third item.
So in a nutshell, I can say YAML arrays represent a sequence of values under a single key. The order of items in the array is maintained, and each item is preceded by a hyphen. This is a way to group related data together while preserving the order in which they appear.
Literals (Scalars):
In YAML, literals, also known as scalars, refer to individual atomic values that are not structured or grouped with other values. They are simple, indivisible values that can be strings, numbers, booleans, or null.
product:
name: Laptop
price: 1000
in_stock: true
In this example, "product" is the key, and its corresponding values are strings, a number, and a boolean.
The sample YAML file showcases these concepts in action. The structure and indentation help differentiate between mappings, sequences, and scalar values. This hierarchical arrangement ensures data readability and clarity when using YAML.
Indentation:
In YAML files, the way information is arranged visually matters a lot. It's like creating an outline with different levels of indentation, similar to how you might organize things in a list or a document. But remember, when you're indenting, use spaces, not tabs. The number of spaces doesn't have to be exact, just keep it the same for each level.
school_subjects: # This is like the main category
- math: # This is a sub-category, indented by 2 spaces
name: "Mathematics" # This is a specific detail, indented by 4 spaces
level: high school # Another detail
topics: algebra, geometry # More details
- science:
name: "Science"
level: middle school
topics: biology, chemistry
In this example, the indentation levels help us see that "school_subjects" is the main section, "math" and "science" are subsections, and the specific details are indented further. This way, the structure becomes clear and organized, making it easier to grasp.
Literals – Strings:
In YAML, strings are used to represent text. Usually, you don't need to put quotes around strings, unless they contain special characters. For instance, if your string includes the "&" symbol, you should quote it to avoid confusion. For example:
message1: YAML & JSON # Needs quotes because of &
message2: "YAML & JSON" # Works fine with quotes
Folding Strings:
YAML lets you write long strings in a way that ignores line breaks using the ">" symbol. For example:
message: >
This is a long message
that spans multiple lines
but it's still a single line
Block Strings:
You can also use the "|" symbol to write strings in multiple lines, where line breaks are preserved. For example:
message: |
This message
has line breaks
exactly as written
Chomp Characters:
When you have multiline strings, you might want to control the new line characters at the end. You can use "+" to preserve newlines or "-" to strip them. For example:
message: >+
This message has a newline at the end
message: >-
This message doesn't have a newline at the end
So in a nutshell, we can use special characters with strings by quoting them, formatting lengthy sentences using ">", maintaining line breaks via "|", and controlling newline characters using "+"/"-". This section offers valuable tools to enhance the organization and presentation of textual content within YAML documents.
Comments:
YAML supports comments that start with "#". Comments are helpful for adding explanations or notes to your configuration.
# This is a comment explaining the purpose of the following line
name: John Doe # This is a comment about the name field
age: 30 # Another comment, this time about the age field
In this example, lines starting with "#" are comments. They are not treated as part of the configuration but serve as explanations or notes for anyone reading the YAML file.
So I can say that we can provide extra needed information in YAML document itself using “#”. It will provide more information about that document.
Multiple Documents:
In YAML, a file can have many separate documents. Each document can be seen as an individual YAML file within the same file. They're separated by three hyphens "---". Imagine each document as its own story within a book.
---
# Document 1
name: YAML
release: 2001
---
# Document 2
uses:
- configuration language
- data persistence
- internet messaging
- cross-language data sharing
---
# Document 3
company: spacelift
...
So I can say that if I want the number of documents in the same YAML File, we can do that using ‘---’ sign.
Schemas and Tags:
When YAML reads data, it needs to understand what type of data it's dealing with. Schemas help in this understanding. YAML has three default schemas:
- FailSafe Schema: Understands basic data like maps, sequences, and strings. Works for any YAML file.
- JSON Schema: Understands JSON-like data, including boolean, null, int, float, and more.
- Core Schema: Extends JSON schema to make it more human-readable, handling similar types in different forms.
In YAML, the way data is interpreted depends on the schema used. Tags come in here. They're like labels indicating the data type, though often inferred automatically. For example, maps have the tag "tag:yaml.org,2002:map," sequences have "tag:yaml.org,2002:seq," and strings have "tag:yaml.org,2002:str."
Using Tags:
You can explicitly tell YAML how to interpret data using tags. For instance, if you want a value to be read as a string, even if it looks like a boolean, you can tag it:
company: !!str spacelift
This ensures the YAML parser reads "spacelift" as a string. Tags are powerful tools for precise data interpretation.
So I can say, schemas in YAML define rules for validating data structure, ensuring correct formatting. Tags give explicit data type information, enhancing clarity in YAML documents.
Anchors and Aliases:
In complex YAML files, repeating configurations can become tedious. Anchors (&) and aliases (*) help here. You can anchor a chunk of configuration and refer to it using an alias later. This avoids duplication and keeps your code clean.
service1:
config: &service_config
env: prod
service2:
config: *service_config
So I can say, anchors and aliases in YAML let you use the same data in different spots without copying it over and over. They help to make YAML files shorter, tidier, and less repetitive.
Overrides:
When configurations vary slightly, overriding helps. You can use aliases and make specific changes using overrides (<<:).
service1:
config: &service_config
env: prod
service2:
config:
<<: *service_config
version: 5
So I can say, in this way, you reuse configurations with minor adjustment using overrides in YAML.
Escaping Special Characters:
Certain characters like :, &, |, and others have special meanings in YAML. But what if you want these characters as part of your data? You can escape them using various methods.
For instance:
Entity Escapes: , :, &
Unicode Escapes: "\u0020", "\u0027", "\u0022"
Quoted Escapes: 'YAML is the "best" configuration language'
So I can say that these methods let you include special characters in your data without confusing the YAML parser.
By understanding these advanced YAML concepts, you can create more complex and organized configurations. It's like having advanced tools to shape your data the way you want.
Now we have learned the fundamentals of YAML, as well as some advanced parts of YAML. Now using the above concepts we can write an appspec file in YAML.
So let's do it,
Appspec file in YAML
version: 1
resources:
- FrontendService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: !Ref FrontendTaskDefinition
LoadBalancerInfo:
ContainerName: FrontendContainer
ContainerPort: 80
- BackendService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: !Ref BackendTaskDefinition
LoadBalancerInfo:
ContainerName: BackendContainer
ContainerPort: 8080
- BackgroundProcessingFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: BackgroundProcessingFunction
Handler: index.handler
This is the appspec file, we have written it using the above blog. Let's break it down to understand each part of the appspec file
Mapping (Mapping as Key-Value Pair):
version: 1:
Here, "version" is a key that is mapped to the value "1". This is a basic example of a mapping.
Array (Sequence):
resources:
The "resources" key maps to a sequence (array) of resource definitions. Each resource is defined within a hyphen (-) followed by indentation.
Mapping (Mapping for Each Resource):
- FrontendService:
This is a mapping where "FrontendService" is a key mapped to the corresponding value. The value is itself a mapping containing properties for the frontend service resource.
Mapping (Properties Mapping):
Properties:
Inside each resource mapping, the "Properties" key maps to a mapping that defines specific properties for that resource.
Mapping (Key-Value Pairs Within Properties):
TaskDefinition: !Ref FrontendTaskDefinition:
Within the "Properties" mapping, various key-value pairs are used to define properties for the resource. For example, "TaskDefinition" is a key mapped to the value obtained using the YAML reference tag "!Ref FrontendTaskDefinition".
Mapping (Nested Mapping):
LoadBalancerInfo:
Inside the "Properties" mapping, "LoadBalancerInfo" is a key mapped to a nested mapping that defines load balancer configuration for the resource.
Mapping (Properties for BackendService and BackgroundProcessingFunction):
- Similar to the frontend service resource, the "BackendService" and "BackgroundProcessingFunction" resources are also defined with their properties using mappings.
Indentation (Hierarchy):
- The indentation levels indicate the hierarchy and nesting of mappings within mappings. For example, the indentation sequence " - FrontendService:" indicates that "FrontendService" is an element of the "resources" sequence, and the subsequent mappings are indented further to show their properties.
Tags (!Ref Tag for References):
TaskDefinition: !Ref FrontendTaskDefinition:
In YAML, the !Ref tag is used to reference other values or resources. Here, !Ref is a tag indicating that FrontendTaskDefinition should be referenced.
Mappings and Nested Mappings:
LoadBalancerInfo:
In this case, LoadBalancerInfo is a mapping (key-value pair) that contains a nested mapping. No explicit tags are used here, but the structure is inferred by YAML's default mappings.
Aliases (& for Anchors and * for Aliases):
- Anchors (&): In this example, you don't see explicit anchors, but they can be used to define reusable structures. For instance, an anchor could be used for common configurations shared between services.
- Aliases (*): Similarly, you don't see explicit aliases, but they could be used to reference and reuse configurations within the same YAML file.
Overall, the AppSpec file is structured using mappings, sequences (arrays), and indentation sequences, anchors, aliases & tags to define how various resources should be deployed using AWS CodeDeploy.
Conclusion:
Overall, YAML is a superstar among data languages that's easy to understand, used for creating files and controlling how things work. It's like a friendly bridge between different computers and programming languages, helping them talk and share info. In DevOps, YAML is essential for tools like Kubernetes and Ansible, making things work like magic. Now, we've learned the basics and even advanced concepts like schemas, tags, and anchors. With this knowledge, we can create organized configurations and even write deployment files like the appspec.yaml we explored. So, YAML is a powerful friend in the tech world, making things smooth and efficient.
Share this post: