Persistent storage is critical when running applications across containers on AWS. In this article, we cover how to build persistent storage for Docker containers on AWS. Learn best practices to spin-up, spin-down and move containerized applications across AWS environments, whether running Docker or Amazon EC2 Container Services (ECS).
You can jump to different sections of the article by clicking the hyperlinks below:
- Resources (Recording video, slides, whitepapers)
- What is Docker?
- Virtual Machines vs. Containers
- Why Does Docker Persistent Storage Matter?
- Application Delivery with Persistent Storage
- Amazon EC2 Container Service
- Docker and SoftNAS Cloud NAS
- SoftNAS Cloud NAS Overview
- Docker Persistent Storage Q&A
SlideShare: How to Build Docker Persistent Storage on AWS
SoftNAS Cloud NAS on the AWS Marketplace: Visit SoftNAS on the AWS Marketplace
What is Docker?
What is Docker and what are containers?
Containers running on a single machine share the same operating system kernel. They start instantly and they make more efficient use of RAM. Images are constructed from layered file systems so they share common files. This makes disk usage and image downloads much more efficient. Docker containers are based on open standards. Which allows containers to run on major Linux distributions and Microsoft operating systems. Containers isolate applications from each other and the underlying infrastructure. This provides an added layer of protection for the application.
Virtual Machines vs. Containers
People often ask, “How are virtual machines and containers different?” Containers have similar resource isolation and allocation benefits as virtual machines. But have a different architectural approach that allows them to be much more portable and efficient. Virtual machines may include the application, necessary binaries and libraries. But also have the overhead of an entire guest operating system, and that can take tens of gigabytes. It’s a challenge that virtual desktop people have taken on.
Containers have a very different approach where there’s more containers running into a single instance or virtual machine. They’re better for isolating processes and user space not tied into any specific infrastructure. They’re much more portable and can run virtually everywhere, but it has a Docker infrastructure. The benefits of Docker containers over VMs is less overhead, faster instantiation, better isolation, and easier scalability. Another benefit of containers over virtualization is containers are a great fit for automation. Containers are the best for automation.
So why does DevOps care? Again, it’s all about automation, setup, launch and run. Don’t worry about what hardware you’re on. Don’t worry about finding the drivers for your servers. Now you can focus on your life cycle repeatability and not worry about keeping your infrastructure going.
Why Does Docker Persistent Storage Matter?
Why does persistent storage matter for Docker? We need to think about what our storage options are. The Docker containers themselves have checked storage. You can use the storage that’s in that container if you want. The huge problem with that is simple: it goes away when the container is gone. The container is useful as a scratch pad, but not great if you have data you want to keep. So that’s the storage problem.
Docker containers can mount directories from the host instance on AWS. It would be in this instance that storage can be shared by all containers that run within that host. So what are the issues? You can deploy a cluster of instances to house containers. Your containers move around those different instances, hosts, and your storage is persistent. Also, there’s no guarantee of how you can share that storage.
Network storage is a much better option because now you can share storage like you used to. You can access it from anywhere, and then there’s cloud storage such as EBS and S3. If you wanted the block, block doesn’t share very well. If you want S3, you have to code your containers to work directly with object. SoftNAS Cloud NAS does give you that middle ground option with network storage and native cloud storage. Let’s put CIFS shares, NFS shares, etc., onto your cloud storage and have that complete solution.
Application Delivery with Persistent Storage
Let’s talk about application delivery with persistent storage. If you look at container services, there’s really three components in a container service. There’s your front-end service, and think of that as what you see. It’s the stuff that provides information often like on a webpage. Your back-end service provides APIs in front of the service and the execution part of the application within Docker. Then there’s data storage services. If you use SoftNAS Cloud NAS as your data storage service, you now get high availability and persistence.
So to really use EC2 and SoftNAS together, what does that mean? You’re going to use Amazon’s clustering to kick off a cluster of containers and instances, and auto-scaling.
By doing this, we can have a SoftNAS Cloud NAS instance in one availability zone using our virtual IP address and mount that storage a couple of ways. You can mount those directly to the containers so that they’re using NFS directly, or you could even mount SoftNAS Cloud NAS into the container instance. This lets each of the containers use this as local storage and that lessens the amount of capacity you need on your container instances and also still provides that level of sharability that you model on.
The other thing we stress with ECS and Docker containers is you really want to stretch those across a couple of availability zones. That way if an AZ completely goes out, your auto-scaling can help by bringing up new container instances. This distributes the load on new containers, and by continuing access to SoftNAS virtual IP, you’ll then be able to keep up with your storage and stay online.
Amazon EC2 Container Service
Now let’s go into Amazon EC2 Container Services. It’s a highly scalable, fast, container management service that makes it easy to run, stop, and manage Docker containers on a cluster of Amazon EC2 instances. Amazon EC2 lets you launch and stop container-enabled applications with simple API calls and allows you to get the state of your cluster from a centralized service, gives you access to many familiar Amazon features.
When you use Amazon ECS’ schedule, placing the containers across your cluster based on resource needs, isolation policies and availability requirements, having the ability to schedule, that’s important. ECS also eliminates the need for you to operate your own cluster management and configuration management system and not have to worry about scaling your management infrastructure.
One of the benefits of ECS is being able to easily to manage clusters for any scale. There is flexible container placement. You want them to flow across availability zones for most of that time.
Docker and SoftNAS Cloud NAS
Let’s talk about why SoftNAS matters to Docker. Again, the storage for Docker and container services, they’re in multiple places. There’s temporary storage in the containers. There’s global storage on the server, the instance that the containers are running on, but if you really want to have your storage around portable, accessible and highly available, then you need the capabilities that SoftNAS Cloud NAS provides. We try to make it simple.
Part of that’s making sure we have APIs that plug into your automation system that goes along with ECS. The setup we showed you had high availability setup as part of that CFT deployment. So to have that kind of ability within quick deployment is amazing. Now try to be agile, and really when you think about DevOps, they want to be fast and they want to be quick.
We have a feature called SnapClone that lets you take snapshots or remember ahead of schedule for hourly snapshots during business hours turned on as a result of that container cluster deployment. Each of those snapshots or a snapshot on-demand could be mounted as what we call a SnapClone so it’s a space efficient writeable snapshot and a 99% rate, and then use that for like a DevOps test case where you want to test against real production data without, heaven forbid, damaging your real production data. The SnapClones are very useful that way. Continuous deployment, and that’s again making it easy, using the SnapClones.
What are the read/write latency, max throughputs, IO costs for SoftNAS? We don’t use the ephemeral storage except for read cache as part of an EC2 instance, so the storage is the EBS general purpose, EBS provision IOPS. What percentage of that is due to SoftNAS overhead? It’s very light as overall IO.
The reason you might want to do that is because of our fault system. We have background scanning, so if you want an additional layer of protection, allow us to recognize that some bit-rot occurred or something went wrong in the EBS and then with our fault system underlying we can fix that. The reason I bring that into what’s our overhead, so, you know, in this case I configured it for marriage so that means every write is two writes, every read still maps to one read and we go to, you know, the most available storage. If we’re doing like a RAID 5 of course, then it’s a, we do a parity writing but there’s no read-back.
One additional thought on that is in some scenarios, SoftNAS will actually improve the IO profile simply because of the way that we use the FS in the backend with read caching. In the event that you’re re-reading something that’s been pulled into cache, you’ll typically see an even better IO profile than what the storage typically provides. There’s definitely some considerations associated with performance and the way that we’re structured and the way the product is designed.
SoftNAS Cloud NAS Overview
Let’s talk about SoftNAS Cloud NAS and what it is. SoftNAS is a software-based Cloud NAS filer. We deliver on Amazon through the AWS Marketplace as an EC2 instance. One of the huge benefits you get from SoftNAS is being able to use cloud native storage, to deliver files such as NFS for Linux and CIFS for Microsoft, and blocks through an iSCSI interface.
Being able to layer that on the different types of EBS volumes, whether it’s provisioned IOPS or general purpose or any of the other flavors or on object storage such as Amazon S3, SoftNAS can take S3 and EBS, and we look at those as this device is aggregated into storage pools and then carve those storage pools into volumes and export those interfaces that I mentioned as on with AFP. So being able to take that cloud made of storage and provide it to software that used to work indirectly with shared files is a huge benefit.
Another huge benefit is our ability to replicate data, and through data replication also provide high availability where we won’t have parity instances. and where with the data replicated mutually more often in separate availability zones, and sort of the secondary monitor, the primary’s health through network and storage heartbeats and then do a takeover and continue to provide uninterrupted service to the servers, the users and the files. That’s just as important in Docker you’re going to spend that two, four, five hundred of containers that all want to have shared storage, you want to make sure that your infrastructure stays up during that entire time so that you don’t have any outages. Since we go across availability zones, it usually leaves a whole Amazon availability zone within a region and then through auto provisioning with your containers and SoftNAS turnover have completely uninterrupted service.
Docker Persistent Storage Q&A
- SnapReplicate and public elastic IPs. If it’s private, how do you have the service storage using private IPs which is specific to a subnet AZ or availability zone for AWS?
- We have two modes of HA and with a virtual IP. For the longest time we’ve been supporting the HA through Amazon’s elastic IP and that in fact does use the public IP, of course. The other mode is our private virtual IP and in that case everything would be completely private. We’re managing the route tables between availability zones and moving that virtual IP from primary and secondary instance that way so that’s how we deliver that.
- Can EBS volumes be encrypted with AWS key management service?
- We have encryption built into our product for data storage and we’re using a common third-party encryption software called LUKS so we can encrypt the data on the disk and we also have a pretty nice application guide on how to do data flight encryption for both NFS and CIFS.
- Is there a way to backup the SoftNAS managed storage? What types of recovery can we leverage?
- So we’ve built into our product the ability to back up our storage tools through EBS snapshots. If you’re familiar with EBS snapshots, it’ll take a full copy of your volume and copy that into EBS and manage the changes through there, so we had that built into our storage pool panel in the UI so you can take full backups that way and then be able to restore it to the full storage tool as well. But, you know what, that’s just one avenue for backup. Obviously what we’re doing with storage in the public cloud and Amazon in this case is mirror any other enterprise-class storage product, enterprise-class NAS, and we highly recommend that you have a complete backup and recovery strategy. There’s a lot of really good products on the market today, you know, that we’ve integrated and tested within our lab and a lot of others that I’m sure work just fine because we’re completely about open standards. It’s very important that with our storage or anybody’s storage that, you know, want to have a very comprehensive backup plan and use of those third party products.
- What RAID types are being used under SoftNAS?
- So in the safety built for containers, that’s RAID 1, but that was just a choice. Often when you look at Amazon and the short answer is we support RAID 0, RAID 1, RAID 5, and RAID 6. I probably want to use RAID 1 at really durable storage, and then if you were to deploy us in a data center where we’re on raw drives, that’s where you want to really look hard at the RAID 5, RAID 6 and use them there because as the drives got bigger and bigger the rebuilds take a while and you kind of want to get to the point where if I have a failed drive and replace it, well, we’ll say, “Oh, it’s in another one,” we’ll develop some type of bit error, we’ll undo it, you know, recovering from the other, so those kind of factors go in. We support the whole gambit of RAID levels.
- What version of NFS do we support?
- We support version four of NFS.
- What is the underlying file system used by SoftNAS?
- At SoftNAS we are very much an open standards, open source company. We’ve built the Z file system commonly referred to as ZFS into our product.
- What is the maximum storage capacity of the SoftNAS instance?
- We don’t really have a limit that we enforce ourselves. Amazon has certain rules for the amount of drive mounts that they provide. But if you’re using S3, our capacity range is virtually limitless. We’ll quote up to 6PB. On the AWS marketplace, we do have editions based on the capacity that they’ll manage. Our express edition manages up to 1TB, our standard up to 20, and then we have a BYOL edition.
- Is it possible to get a trial version of SoftNAS?
- Yes it is. Through the AWS marketplace, we have a 30 day trial as long as you’ve never tried our product before. It just works out of the box there, just off the console. If you would like to try it through BYOL, then contact our sales team at sales@softnas.com.
- Is it a good idea to use SoftNAS as a backup target?
- Yes. It’s a common use case for us since we enable native cloud storage. Even on premise storage you could have a good backup plan of maybe keeping your nightly backups on the very fast storage such as EBS or, you know, SSD or spending the rest of this in your data center, but then also have a storage pool made out of object storage and S3 up in the cloud and use that for your weekly archival. Very common for people that are using product like VMware to use SoftNAS for backup, as a backup target.
- Is it a good idea to replace a physical NAS device with SoftNAS?
- The considerations would be with replacing that type of a solution, just adding where you want to store your backups. If you want to leverage the cloud or if you want to retain those locally. If you want to retain those locally, we have an offering that allows you to connect to local disks and you have a lot of flexibility on the types of disks that you can attach to, local attach disks, iSCSI targets, as well as tying into S3. You can have a local instance that’s tied into S3 from all object storage.
- Additionally a secondary option would be to have a SoftNAS node that’s deployed within the cloud and use that as a backup target, and essentially you’re getting a 2-for-1 with that type of a strategy. By backing up to the target, you get a backup storage resource that you don’t have to store on premise but secondarily it offers a disaster recovery strategy by being able to take your backups and store those offsite. So those are two approaches that might make sense for that particular scenario.
- Is it advisable to utilize an AWS-based SoftNAS instance for on premise apps?
- I advise not to deploy SoftNAS into an Amazon VPC and access it remotely through NFS or CIFS. And those protocols are very tatty and degrade over long distances. What is common is to deploy SoftNAS into your VMware cluster and your virtual loader data center. Then mount Amazon S3 into a storage pool. Backup your applications and storage pools use them in your data center. It’s great for backups.
- You’ll have to be somewhat sensitive to latency. There’s applications that would not be great for, because there’s IO latency. It’s going to happen from your data center to the Amazon region where the S3 is. For example, it wouldn’t be a good idea to use like a database with transactional IO. Using a backup with S3, you put SoftNAS onto your data center and back up to the storage pool.
- A typical customer use case is when they segment out their hot data which is highly active. It’s typically a smaller subset of their overall data set. One use case is to tie into object storage that you host in the cloud for cool data. Use on premise storage that is not affected by latency to service hot data requirements. Then leverage S3 object storage as the backend larger repository for your cool data.
- If we use S3 as a storage pool for on premise, does it provide write back caching?
- With on premises, we’re able to leverage high performance local disk as a block cache file to front-end S3 storage. It functions like a page file for read and write operations. It also allows higher performance, essentially caching for S3 access, and enhances overall performance.
- Using the log cache for both reads and writes will do read-aheads and make it easier to handle read/writes.
We hope that you found the content useful and that you gained something out of it. Hopefully, you don’t feel we marketed SoftNAS Cloud NAS too much. Our goal here was just to pass on some information about how to build docker persistent storage on AWS. As you’re making the journey to the cloud, hopefully this saved you time from tripping over some common issues.
We’d like to invite you to try SoftNAS Cloud NAS on AWS. We do have a SofNAS 30 day trial.