Chapter 8. Storing your data on hard drives: EBS and instance store
This chapter covers
- Attaching network storage to your EC2 instance
- Using the instance store of your EC2 instance
- Backing up your block-level storage
- Testing and tweaking the performance of your block-level storage
- Instance storage versus network-attached storage
Block-level storage with a disk file system (FAT32, NTFS, ext3, ext4, XFS, and so on) can be used to store files as you do on a personal computer. A block is a sequence of bytes and the smallest addressable unit. The OS is the intermediary between the application that wants to access files and the underlying file system and block-level storage. The disk file system manages where (at what block address) your files are persisted on the underlying block-level storage. You can use block-level storage only in combination with an EC2 instance where the OS runs.
The OS provides access to block-level storage via open, write, and read system calls. The simplified flow of a read request goes like this:
1. An application wants to read the file /path/to/file.txt and makes a read system call.
2. The OS forwards the read request to the file system.
3. The file system translates /path/to/file.txt to the block on the disk where the data is stored.
Applications like databases that read or write files by using system calls must have access to block-level storage for persistence. You can’t tell a MySQL database to store its files in an object store because MySQL uses system calls to access files.
Not all examples are covered by the Free Tier
The examples in this chapter are not all covered by the Free Tier. A special warning message appears when an example incurs costs. As long as you don’t run all other examples longer than a few days, you won’t pay anything for them. Keep in mind that this applies only if you created a fresh AWS account for this book and nothing else is going on in your AWS account. Try to complete the examples of the chapter within a few days; you’ll clean up your account at the end of each example.
AWS provides two kinds of block-level storage: network-attached storage (NAS) and instance storage. NAS is (like iSCSI) attached to your EC2 instance via a network connection, whereas instance storage is a normal hard disk that the host system provides to your EC2 instance. NAS is the best choice for most problems because it provides 99.999% availability of your data. Instance storage is interesting if you’re optimizing for performance. The next three sections will introduce and compare the two block-level storage solutions by connecting block-level storage with an EC2 instance, doing performance tests, and exploring how to back up the data. After that, you’ll set up a shared file system using instance storage and NAS.
Elastic Block Store (EBS) provides network-attached, block-level storage with 99.999% availability. Figure 8.1 shows how you can use EBS volumes with EC2 instances.
Figure 8.1. EBS volumes are independent resources but can only be used when attached to an EC2 instance.

EBS volumes
- Aren’t part of your EC2 instances; they’re attached to your EC2 instance via a network connection. If you terminate your EC2 instance, the EBS volumes remain.
- Can be attached to no EC2 instances or one EC2 instance at a time.
- Can be used like normal hard disks.
- Are comparable to RAID1: your data is saved to multiple disks in the background.
Warning
You can’t attach the same EBS volume to multiple servers!
The following example demonstrates how to create an EBS volume and attach it to an EC2 instance with the help of CloudFormation:

An EBS volume is a standalone resource. This means your EBS volume can exist without an EC2 server, but you need an EC2 server to use the EBS volume.
To help you explore EBS, we’ve prepared a CloudFormation template located at https://s3.amazonaws.com/awsinaction/chapter8/ebs.json. Create a stack based on that template, and set the AttachVolume parameter to yes. Then, copy the PublicName output and connect via SSH.
You can see the attached EBS volumes with the help of fdisk. Usually, EBS volumes can be found at /dev/xvdf to /dev/xvdp. The root volume (/dev/xvda) is an exception—it’s based on the AMI you choose when you launch the EC2 instance and contains everything needed to boot the instance (your OS files):

The first time you use a newly created EBS volume, you must create a file system. You could also create partitions, but in this case the volume size is only 5 GB, so you probably don’t want to split it up further. It’s also best practice to not use partitions with EBS volumes. Create volumes with the size you need; if you need two separate “partitions,” create two volumes. In Linux, you can create a file system with the help of mkfs. The following example creates an ext4 file system:
After the file system has been created, you can mount the device:
To see mounted volumes, use df -h:

EBS volumes have one big advantage: they aren’t part of the EC2 instance; they’re independent resources. To see how an EBS volume is independent of the server, you’ll now save a file to the volume and then unmount and detach the volume:

Update the CloudFormation stack, and change the AttachVolume parameter to no. This will detach the EBS volume from the EC2 instance. After the update is completed, only your root device is left:
The testfile in /mnt/volume/ is also gone:
Now you’ll attach the EBS volume again. Update the CloudFormation stack, and change the AttachVolume parameter to yes. After the update is completed, /dev/xvdf is again available:

Voilà: the file testfile that you created in /mnt/volume/ is still there.
Performance testing of hard disks is divided between read and write tests. Many tools are available. One of the simpler tools is dd, which can perform block-level reads and writes between a source if=/path/to/source and a destination of=/path/to/destination:

Keep in mind that depending on your actual workload, performance can vary. The example assumes that the file size is 1 MB. If you’re hosting websites, you’ll most likely deal with lots of small files instead.
But EBS performance is a bit more complicated. Performance depends on the EC2 instance type and the EBS volume type. Table 8.1 gives an overview of EC2 instance types that are EBS-optimized by default or can be optimized for an additional hourly charge. Input/output operations per second (IOPS) are measured using 16 KB I/O size. Performance depends heavily on your workload: read versus write, and the size of your I/O operations. These numbers are illustrations, and your mileage may vary.
Table 8.1. What performance can be expected from EBS optimized instance types?
Use case |
Instance type |
Max bandwidth (MiB/s) |
Max IOPS |
EBS optimized by default? |
---|---|---|---|---|
General purpose | m3.xlarge–c4.large | 60–120 | 4,000–8,000 | No |
Compute optimized | c3.xlarge–3.4xlarge | 60–240 | 4,000–16,000 | No |
Compute optimized | c4.large–c4.8xlarge | 60–480 | 4,000–32,000 | Yes |
Memory optimized | r3.xlarge–r3.4xlarge | 60–240 | 4,000–16,000 | No |
Storage optimized | i2.xlarge–i2.4xlarge | 60–240 | 4,000–16,000 | No |
Storage optimized | d2.xlarge–d2.8xlarge | 90–480 | 6,000–32,000 | Yes |
Depending on your storage workload, you must choose an EC2 instance that can deliver the bandwidth you require. Additionally, your EBS volume must be able to saturate the bandwidth. Table 8.2 shows the different EBS volume types available and how they perform.
Table 8.2. How EBS volume types differ
EBS volume type |
Size |
Maximum throughput MiB/s |
IOPS |
IOPS burst |
Price |
---|---|---|---|---|---|
Magnetic | 1 GiB–1 TiB | 40–90 | 100 | Hundreds | $ |
General purpose (SSD) | 1 GiB–16 TiB | 160 | 3 per GiB (up to 10,000) | 3,000 | $$ |
Provisioned IOPS (SSD) | 4 GiB–16 TiB | 320 | As much as you provision (up to 30 per GiB or 20,000) | - | $$$ |
EBS volumes are charged for based on the size of the volume, no matter how much you use of that size. If you provision a 100 GiB volume, you pay for 100 GiB even if you have no data on the volume. If you use magnetic volumes, you must also pay for every I/O operation you perform. A provisioned IOPS (SSD) volume is additionally charged for based on the provisioned IOPS. Use the AWS Simple Monthly Calculator at http://aws.amazon.com/calculator to determine how much your storage setup will cost.
GiB and TiB
The terms gibibyte (GiB) and tebibyte (TiB) aren’t used often; you’re probably more familiar with gigabyte and terabyte. But AWS uses them in some places. Here’s what they mean:
- 1 GiB = 2^30 bytes = 1,073,741,824 bytes
- 1 GiB is ~ 1.074 GB
- 1 GB = 10^9 bytes = 1,000,000,000 bytes
We advise you to use general-purpose (SSD) volumes as the default. If your workload requires more IOPS, then go with provisioned IOPS (SSD). You can attach multiple EBS volumes to a single instance to increase overall capacity or for additional performance.
You can increase performance by combining two (or more) volumes together in a software RAID0, also called striping. RAID0 means that if you have two disks, your data is distributed over those two disks, but data resides only on one disk. A software RAID can be created with mdadm in Linux.
EBS volumes offer 99.999% availability, but you should still create backups from time to time. Fortunately, EBS offers an optimized, easy-to-use way of backing up EBS volumes with EBS snapshots. A snapshot is a block-level incremental backup that is saved on S3. If your volume is 5 GB in size and you use 1 GB of data, your first snapshot will be around 1 GB in size. After the first snapshot is created, only the changes will be saved to S3 to reduce the size of the backup. EBS snapshots are charged for based on how many gigabytes you use.
You’ll now create a snapshot with the help of the CLI. Before you can do so, you need to know the EBS volume ID. You can find it as the VolumeId output of the CloudFormation stack or by running the following:

With the volume ID, you can go on to create a snapshot:

Creating a snapshot can take some time, depending on how big your volume is and how many blocks have changed since the last backup. You can see the status of the snapshot by running the following:

Creating a snapshot of an attached, mounted volume is possible but can cause problems with writes that aren’t flushed to disk. If you must create a snapshot while the volume is in use, you can do so safely as follows:
1. Freeze all writes by running fsfreeze -f /mnt/volume/ on the server.
2. Create a snapshot.
3. Resume writes by running fsfreeze -u /mnt/volume/ on the server.
4. Wait until the snapshot is completed.
You must only freeze when snapshot creation is requested. You must not freeze until the snapshot is completed.
To restore a snapshot, you must create a new EBS volume based on that snapshot. When you launch an EC2 instance from an AMI, AWS creates a new EBS volume (root volume) based on a snapshot (an AMI is a snapshot).
Cleaning up
Don’t forget to delete the snapshot:
Also delete your stack after you finish this section to clean up all used resources. Otherwise, you’ll likely be charged for the resources you use.
An instance store provides block-level storage like a normal hard disk. Figure 8.2 shows that the instance store is part of an EC2 instance and available only if your instance is running; it won’t persist your data if you stop or terminate the instance. Therefore you don’t pay separately for an instance store; instance store charges are included in the EC2 instance price.
In comparison to an EBS volume, which is attached via network to your virtual server, an instance store is included in the virtual server and can’t exist without the virtual server.
Don’t use an instance store for data that must not be lost; use it for caching, temporary processing, or applications that replicate data to several servers as some databases do. If you want to set up your favorite NoSQL database, chances are high that data replication is handled by the application and you can use an instance store to get the highest available I/O performance.
Warning
If you stop or terminate your EC2 instance, the instance store is lost. Lost means all data is destroyed and can’t be restored!
AWS offers SSD and HDD instance stores from 4 GB up to 48 TB. Table 8.3 shows all EC2 instance families with instance stores.
Table 8.3. Instance families with instance stores
Use case |
Instance type |
Instance store type |
Instance store size in GB |
---|---|---|---|
General purpose | m3.medium–m3.2xlarge | SSD | 1 × 4–2 × 80 |
Compute optimized | c3.large–c3.8xlarge | SSD | 2 × 16–2 × 320 |
Memory optimized | r3.large–r3.8xlarge | SSD | 1 × 32–2 × 320 |
Storage optimized | i2.xlarge–i2.8xlarge | SSD | 1 × 800–8 × 800 |
Storage optimized | d2.xlarge–d2.8xlarge | HDD | 3 × 2,000–24 × 2,000 |
If you want to launch an EC2 instance with an instance store manually, open the Management Console and start the Launch Instance wizard as you did in section 3.1.1:
Warning
Starting a virtual server with instance type m3.medium will incur charges. See http://aws.amazon.com/ec2/pricing if you want to find out the current hourly price.
- Go through steps 1 to 3: choose an AMI, choose the m3.medium instance type, and configure the instance details.
- In step 4, configure an instance store as shown in figure 8.3:
1. Click the Add New Volume button.
2. Select Instance Store 0.
3. Set the device name to /dev/sdb.
- Complete steps 5 to 7: tag the instance, configure a security group, and review the instance launch.
The instance store can now be used by your EC2 instance.
Listing 8.1 demonstrates how to use an instance store with the help of CloudFormation. If you launch an EC2 instance from an EBS-backed root volume (which is the default), you must define a BlockDeviceMappings to map EBS and instance store volumes to device names. Compared to the EBS template snippet, an instance store isn’t a standalone resource like an EBS volume; the instance store is part of your EC2 instance: Depending on the instance type, you’ll have zero, one, or multiple instance store volumes for mapping.
Listing 8.1. Connecting an instance store with an EC2 instance with CloudFormation
12345678910111213141516171819202122
"Server": {
"Type": "AWS::EC2::Instance",
"Properties": {
"InstanceType": "m3.medium", #1
[...]
"BlockDeviceMappings": [{
"DeviceName": "/dev/xvda", #2
"Ebs": {
"VolumeSize": "8",
"VolumeType": "gp2"
}
}, {
"DeviceName": "/dev/xvdb", #3
"VirtualName": "ephemeral0" #4
}]
}
}
#1 - Choose an InstanceType with an instance store.
#2 - EBS root volume (your OS lives here)
#3 - The instance store will appear as /dev/xvdb.
#4 - The instance store has a virtual name like ephemeral0 or ephemeral1.
Windows-based EC2 instances
The same BlockDeviceMappings applies to Windows-based EC2 instances. DeviceName isn’t the same as the drive letter (C:/, D:/, and so on). To go from DeviceName to the drive letter, the volume must be mounted. The instance store volume from listing 8.1 will be mounted to Z:/. Read on to see how mounting works on Linux.
Cleaning up
Delete your manually started EC2 instance after you finish this section to clean up all used resources. Otherwise you’ll likely be charged for the resources you use.
To help you explore instance stores, we created the CloudFormation template located at https://s3.amazonaws.com/awsinaction/chapter8/instance_store.json.
Warning
Starting a virtual server with instance type m3.medium will incur charges. See http://aws.amazon.com/ec2/pricing to find out the current hourly price.
Create a stack based on that template, copy the PublicName output, and connect via SSH. You can see the attached instance store volumes with the help of fdisk. Usually, instance stores are found at /dev/xvdb to /dev/xvde:

To see the mounted volumes, use this command:

Your instance store is mounted automatically to /media/ephemeral0. If your EC2 instance has more than one instance store, ephemeral1, ephemeral2, and so on will be used. Now it’s time to run some performance tests.
Let’s take the same performance measurements to see the difference between the instance store and EBS volumes:

Keep in mind that performance can vary, depending on your actual workload. This example assumes a file size of 1 MB. If you’re hosting websites, you’ll most likely deal with lots of small files instead. But this performance measurement shows that the instance store is a normal hard disk and has performance characteristics like those of a normal hard disk.
Cleaning up
Delete your stack after you finish this section, to clean up all used resources. Otherwise you’ll likely be charged for the resources you use.
There is no built-in backup mechanism for instance store volumes. Based on what you learned in section 7.2, you can use a combination of cron and S3 to back up your data periodically:
But if you need to back up data from an instance store, you should probably use more durable, block-level storage like EBS. An instance store is better used for ephemeral persistence requirements.
Table 8.4 shows how S3, EBS, and instance stores differ. Use this table to decide what option is best for your use case. A rule of thumb: if your application supports S3, use S3; otherwise, choose EBS.
Table 8.4. S3 vs. block-level storage solutions in AWS
Next, you’ll look at a real-world example using instance store and EBS volumes.
There is an important problem that you can’t solve with AWS block-level storage solutions: sharing block-level storage between multiple EC2 instances at the same time. You can solve this problem with the help of the Network File System (NFS) protocol.
Amazon Elastic File System is coming
AWS is working on a service called Amazon Elastic File System (EFS). EFS is a distributed file system service based on the Network File System version 4 (NFSv4) protocol. As soon as EFS is available, you should choose it if you need to share block-level storage between multiple servers. Find out if EFS is available in the meantime by visiting http://aws.amazon.com/efs.
Figure 8.4 shows how one EC2 instance acts as a NFS server and exports a share via NFS. Other EC2 instances (NFS clients) then mount the NFS share from the NFS server via a network connection. To enhance performance in terms of latency, an instance store is used on the NFS server. But you already learned that an instance store isn’t very durable, so you must take care of that. An EBS volume is attached to the NFS server, and data is synchronized at a regular interval. The worst-case scenario would be if all data modified since the last sync was lost. In some scenarios (such as sharing PHP files between web servers), this data loss is acceptable because the files can be uploaded again.
NFS setup is a single point of failure
The NFS setup is most likely not what you want to run in mission-critical production environments. The NFS server is a single point of failure: if the EC2 instance fails, no NFS clients can access the shared files. Think twice about whether you want a shared file system. In most cases, S3 is a good alternative that can be used with a few changes to the application. If you really need a shared file system, consider Amazon EFS (when it’s released) or set up GlusterFS.
You’ll now create a CloudFormation template and Bash scripts to turn this system diagram into reality. Step by step you’ll do the following:
1. Add security groups to create a secure NFS setup.
2. Add the NFS server EC2 instance and the EBS volume.
3. Create the installation and configuration script for the NFS server.
4. Add the NFS client EC2 instances.
Let’s get started.
Who talks to whom? That’s the question determining how security groups must be designed. To make things easier (you won’t use a bastion host here), SSH access should be allowed from the public internet (0.0.0.0/0) on all EC2 instances. The NFS server also must be reachable on the needed ports for NFS (TCP and UDP: 111, 2049), but only clients should have access to the NFS ports.
The interesting part is that SecurityGroupClient contains no rules. It’s only needed to mark traffic from NFS clients. SecurityGroupServer uses SecurityGroupClient as a source to allow traffic only from NFS clients.
The instance type of the NFS server must provide an instance store. You’ll use m3.medium in this example because it’s the cheapest instance store available, but it offers only 4 GB. If you need a larger size, you must choose another instance type. The server has two security groups attached: SecurityGroupCommon to allow SSH and SecurityGroupServer to allow NFS-related ports. The server must also install and configure NFS on startup, so you’ll use a bash script; you’ll create this script in the next section. Using a bash script makes things more readable—the UserData format is a bit annoying over time. To prevent data loss, you’ll create an EBS volume as a backup for the instance store.
Listing 8.3. NFS server and volume
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
"Server": {
"Type": "AWS::EC2::Instance",
"Properties": {
"IamInstanceProfile": {"Ref": "InstanceProfile"},
"ImageId": "ami-1ecae776",
"InstanceType": "m3.medium", #1
"KeyName": {"Ref": "KeyName"},
"SecurityGroupIds": [{"Ref": "SecurityGroupCommon"},
{"Ref": "SecurityGroupServer"}], #2
"SubnetId": {"Ref": "Subnet"},
"BlockDeviceMappings": [{
"Ebs": { #3
"VolumeSize": "8",
"VolumeType": "gp2"
},
"DeviceName": "/dev/xvda"
}, {
"VirtualName": "ephemeral0", #4
"DeviceName": "/dev/xvdb"
}],
"UserData": {"Fn::Base64": {"Fn::Join": ["", [
"#!/bin/bash -ex\n",
"curl -s https://[...]/nfs-server-install.sh | bash -ex\n" #5
]]}}
}
},
"Volume": {
"Type": "AWS::EC2::Volume", #6
"Properties": {
"AvailabilityZone": {"Fn::GetAtt": ["Server", "AvailabilityZone"]},
"Size": "5",
"VolumeType": "gp2"
}
},
"VolumeAttachment": {
"Type": "AWS::EC2::VolumeAttachment", #7
"Properties": {
"Device": "/dev/xvdf",
"InstanceId": {"Ref": "Server"},
"VolumeId": {"Ref": "Volume"}
}
}
#1 - m3.medium provides a 4 GB SSD instance store.
#2 - Uses the server security group to filter traffic
#3 - Maps the root EBS volume to /dev/xvda
#4 - Maps the instance store to /dev/xvdb
#5 - Downloads the install script and executes it (only from trusted sources!)
#6 - Creates the 5 GB backup volume (enough space to back up the 4 GB instance store)
#7 - Attaches the volume to the server (to /dev/xvdf)
To get NFS running, you need to install the relevant software packages with yum and configure and start them. To back up the instance store volumes at a regular interval, you also need to mount the EBS volume and run a cron job from time to time to copy the data to the EBS volume. Finally, you’ll create an EBS snapshot from the EBS volume, as shown in the next listing.
Because the script makes calls to the AWS API via the CLI, the EC2 instance needs permission to make those calls. This can be done with an IAM role, as shown in the next listing.
rsync with lots of small files
If your use case requires many small files (more than 1 million), rsync will take a long time and consume many CPU cycles. You may want to consider DRBD to asynchronously sync the instance store to the EBS volume. The setup is slightly more complicated (at least, if you use Amazon Linux), but you get much better performance.
Only one thing is missing: clients. You’ll add them next.
An NFS share can be mounted by multiple clients. For demonstration purposes, two clients will be enough: Client1 and Client2. Client2 is a copy of Client1.
It’s time to try sharing files via NFS.
To help you explore NFS, we’ve prepared a CloudFormation template located at https://s3.amazonaws.com/awsinaction/chapter8/nfs.json.
Warning
Starting a virtual server with instance type m3.medium will incur charges. See http://aws.amazon.com/ec2/pricing/ if you want to find out the current hourly price.
Create a stack based on that template, copy the Client1PublicName output, and connect via SSH.
Place a file in /mnt/nfs/:
Now, connect to the second client via SSH by copying the Client2PublicName output from the stack. List all files in /mnt/nfs/:
Voilà! You can share files between multiple EC2 instances.
Cleaning up
Delete your stack after you finish this section to clean up all used resources. Otherwise you’ll be charged for the resources you use.
- Block-level storage can only be used in combination with an EC2 instance because the OS is needed to provide access to the block-level storage (including partitions, file systems, and read/write system calls).
- EBS volumes are connected to your EC2 instance via network. Depending on your instance type, this network connection can use more or less bandwidth.
- EBS snapshots are a powerful way to back up your EBS volumes to S3 because they use a block-level, incremental approach.
- An instance store is part of an EC2 instance, and it’s fast and cheap. But all your data will be lost if the EC2 instance is stopped or terminated.
- You can use NFS to share files between EC2 instances.