Scaling to New Heights: Enterprise Ignition with Ease

42 min video  /  38 minute read
 

Speakers

James Burnand

CEO

4IR Solutions

Randy Rausch

VP Engineering

4IR Solutions

In this session, 4IR Solutions will showcase best practices and technologies to rapidly deploy and remotely manage large-scale Ignition systems in the cloud and on-prem across hundreds of sites. We'll demonstrate zero-touch provisioning and real-time updates to a fleet of Ignition installations.

Transcript:

00:00
Kyle Van Eenennaam: He also has been a user developer and integrator of advanced manufacturing software. Randy brings with him a deep understanding of intelligent edge solutions and a proven track record of global on-premise and hybrid deployments. Over on my left, I'm fortunate to introduce James Burnand, the CEO of 4IR solutions, who's a seasoned veteran of the industrial automation industry, boasting more than two decades of experience as well, leveraging his experience now to empower manufacturers by providing the infrastructure for their plant for applications to thrive in the cloud is a keen understanding of the intersection of cybersecurity, operational requirements and management, which we might boil down to Industry 4.0 and James Craft's tailored solutions for companies embarking on their cloud enabled and highly automated OT journeys. So without further ado, please welcome both of our presenters to the stage.


00:52
James Burnand: I think I'm mic'd up, right? So thank you for that introduction. I'm not sure whoever wrote those profiles, but we need to send them a nice gift basket. So the agenda for today is we're gonna talk a little bit about enterprise architecture. So we've got some pretty exciting demonstrations near to the end of the presentation. But in the beginning, we wanna talk and describe a little bit about what we mean by enterprise and what distinguishes that from a non-enterprise or more of a standard sort of a solution. We're gonna go through a bit of the design principles and the considerations that you need to make when deploying in an enterprise level environment and really talk through how that can stretch and be extended down to the edge. So have you ever needed to deliver across multiple sites or started with a small application that all of a sudden became a big application, and once people saw what it could do, there was this huge demand to deliver it consistently across a bunch of different locations or a bunch of different users, or been needed to take something from ignition and tie it in with your enterprise systems.

01:56
James Burnand: So most commonly it's ERP or other enterprise level systems, and all of a sudden it becomes more complicated because now we have to meet not just the regulatory and operational requirements of operating inside of a facility, but also of in operating inside of the IT environment and potentially within the cloud, depending on what the solution is. So if, so, we think you might need an enterprise architecture, and yes, I did that animation myself. So what is an enterprise architecture? So really it has a handful of characteristics, but it's not something you can really just define as specifically this, or specifically that. You really have to look at it as a set of characteristics that make something enterprise. So usually it's got a greater need for uptime and security, and that it's a more complicated, sort of a deployment that often will span a wider range of people, locations, or application integrations.

02:51
James Burnand: Usually these systems are involving a variety of different departments and a variety of different individuals and auditable requirements to be able to be deployed. So they tend to be a lot longer to identify and a lot longer to deploy, but they tend to have a useful life that can deliver tremendous value for an end organization. So what do we see as some pretty common enterprise applications today? So, unified namespace has become a huge endeavor for a lot of organizations really around digital transformation and being able to apply these new digital technologies to the way that they operate inside of their facilities. And oftentimes that will represent itself as big data systems, AI systems or systems that are deployed to be consistent across a variety of different geographies, locations, facilities, lines, whatever, whatever the characteristic of lots of you're looking at, the enterprise applications are often delivered in that way.

03:51
James Burnand: So it's important when we talk about enterprise systems to distinguish what we think are some of the best practices for deploying enterprise systems. So we've categorized those into to five different groups around security, reliability, scalability, operational excellence, and cost optimization. And I'm gonna describe really from a cloud centric point of view, what does that mean? And then Randy, when he takes over in a couple of minutes here, is gonna talk about how that extends down towards the edge. So for security, some of the key design principles in security are things like identity providing, so authentication authorization, so the use of identity providers and individual logins for your users using things like multi-factor authentication, role-based access control for easier management of lots and lots of different users that need to have specific capabilities. So different than setting up the admin user for all the different gateways is setting up admin groups and then being able to grant and revoke the access as is necessary for those users to perform the tasks that they need to on those different deployed assets.

04:57
James Burnand: Certificates. So specifically when you're talking about the cloud, certificates become a must. In the way we do our deployments, we actually use public certificates, which implies that those certificates are known and trusted by all computers. That requires a little bit of work and a little bit of automation to make it work well. But that also means that we have the ability to rotate those certificates on a regular basis. So what we see in a lot of end users is they'll set a certificate and however long they're allowed to set it for is how long it gets set for. What we do is we actually rotate them every 30 days, and that's a security best practice. From a security principle perspective, that's important when you're exposing something to the public internet, which oftentimes cloud applications are. It's important to encrypt it in transit and at rest, which obviously SSL is a big piece of that.

05:45
James Burnand: But also when you're doing things like setting up your MQTT network or if you're setting up your communications with other systems everywhere you're able to encrypt, you need to encrypt as part of these applications to minimize the surface area for potential attack. And then finally, audits. Everyone loves that word, but it becomes absolutely critical in these enterprise systems, especially when they span across a wide group of people in a wide group of users as well as administration folks, that there's some level of auditing going on on those systems to ensure that the security posture that you have created when you first deploy it is maintained and improved over time and that, "Hey, I just went in and added this admin user to get me access for this piece." Those are the kinds of things that will get caught in a periodic audit and make sure that those things are removed. Should they have, should they somehow escape the task list of the person that added them.

06:45
James Burnand: So next we wanna talk a little bit about of reliability. So reliability and availability are somewhat similar. Availability really comes down to the choice that you make for the application and the user experience that you're trying to create. So for reliability and availability, you really need to think about what is this application and how much of this application, how much downtime can this application tolerate? The nice part about the cloud is that you can make it extremely available. You can also empty your pocketbooks by making things extremely available. So you have to figure out what the right balance and tolerate, and ability to tolerate is for that specific application, and then ensure that the way you've configured the assets and the way that you've configured the services that you use is appropriate for that use. So for everyone's benefit, one of those pieces is if you're deploying in a data center with multiple availability zones, what an availability zone is, is it's a building, it's got its own power supply, its own internet connection.

07:46
James Burnand: And its own set of services is a part of it. So it's essentially as good as a single data center, in most of the regions or most of the bigger regions for the main cloud providers, their regions actually have three availability zones in them. So when you deploy certain applications at certain availability, they're not just mirrored on multiple hard drive arrays and multiple hosts like you would get in a VMware system, they're actually mirrored across multiple buildings. You would literally need to take an entire building out to see a minute of downtime. So understanding what that means and how that affects the pricing, but also what that availability target is, is an incredibly important exercise as a part of architecting and building out an enterprise architecture. And then finally is network. Network is a little more complex than just your internet connection, but understanding what level of reliability you have in communication from a site into a cloud-based environment so that you can make intelligent choices about what it is you need from an availability perspective.

08:48
James Burnand: So one of the great technologies that Inductive has as a part of their platform is Store-and-Forward. So Store-and-Forward is a key component to handling poor internet connectivity. Some applications can't tolerate that. Some applications need more more persistent connectivity, and that's where you get into more complicated SD-WAN and redundant connections to the cloud. And certain organizations at an enterprise level have already taken that journey and others haven't. And really understanding what that is as a function of what the needs of the end users are gonna be is an incredibly important part of the architecting exercise for an enterprise application. And then the final thing is, is actually something we have learned, I would say over the last couple of years is, test environments for cloud-based resources.

09:41
James Burnand: So different than DevOps around Ignition, where you have a development environment and a test environment and a production environment. One of the things, one of the characteristics of a lot of enterprises is they have a lot of security policies and changes that they make at a tenant or a subscription level inside of their Azure and AWS environments. Those changes can have very major impacts on the workloads that are running inside of those spaces. So it's a really smart idea when you're building out an enterprise application to also consider building a duplicate of it so that those policy changes and those upgrades to the services and things that are happening at a cloud provider level can be managed and tested before they're applied to a live working environment.

10:27
James Burnand: All right, scalability, this is my favorite one because scalability is a little less, oh, sorry, I didn't actually advance the slide. Lemme turn that thing back on. So scalability is really more about making or avoiding decisions that are gonna limit you. So what we find in scalability is, is it actually starts with the application. So oftentimes when you start building an application and Ignition, you get a gateway and you start connecting tags and you start building perspective displays and you start connecting to databases, then all of a sudden you've built something and it's great, and then all of a sudden you need to add 100 more users and five more sites and all of these pieces to it. And all of a sudden it's like, "Well, that's not gonna fit on a single gateway. I really need to think about now how am I gonna do this? Or do I need to have dedicated gateways for each part of this application and how should I do this?"

11:18
James Burnand: So we actually encourage people when they're starting off, especially building small applications, to think about things like the scale out architecture as a potential eventuality depending on what the use case for this application could be. So that's things like separating your front end and backend into separate projects in the same gateway so that if you ever do decide to pull them apart, it becomes a much easier exercise in the future. We also recommend the use of EAM and inheritable objects and components. So being able to take things like your navigation bar and certain project components and tags and be able to have those as common pieces that you can then pull down into your environments allows for you to manage those pieces centrally, but deploy them wide across all the different use points for them.

12:05
James Burnand: And finally, around scalability, we want to talk about the cloud. So there are services in the cloud that make scalability very easily. So things like databases, load balancers, storage, compute and orchestration. These things allow for you to be able to start small and add as you need the capacity. Now, being able to do that also requires that obviously your application architecture makes sense, but that you make intelligent choices about what your potential future state could be. One of the key technologies that we at 4IR have adopted and we think is crucial to enterprise architectures in general is containerization. So for those of you that haven't done Docker containers and played with the stuff that Kevin Collins has published, please do. It's, makes your life so much easier when you're testing and building if you have, this conversation will hopefully make a lot of sense to you.

13:00
James Burnand: But being able to create a very easy unitized way of deploying Ignition and the different components that are required and configuring it is a really scalable way of doing the operations. And what you'll see on the screen there is I have a cow and a cat, and this is intended to represent what they call cattle versus pets. And so if you think about a virtual machine that's running Windows as an example, we consider that to be a pet. So that means that you have to manage it with a level of attention and care because it requires it because you have to perform individual updates and you have to manage it more directly. Whereas containers are really more like cattle where we treat them like cattle, and if something goes wrong with one, we're not taking it to the vet and paying a huge bill to get it's hip replaced. We're getting another cow.

13:56
James Burnand: So there's much better analogies online that you can read about that. But that is, in all intents and purposes, how we treat the containers that run ignition is we treat them as cattle, as a herd of capabilities. And I think when you see the demo that Randy's gonna be running today, hopefully you'll see what that looks like in action. Alright, Operational excellence. So really this comes down to that you can deploy all this stuff, but if no one's watching it and updating it and looking at it from a compliance perspective, something will go wrong. So as part of your planning for when you're going to rule this out is you can't forget that the continuous improvement and updates and patching and monitoring and alerting and having the right support mechanisms in place are absolutely crucial to the experience of the end users. And finally, everyone's favorite cost. So cost optimization is something, obviously it's related to things like availability, but using technologies that make sense for you to be able to manage your cost in the best possible way. So as an example, when we do backups, we do backups in the cloud because it's really cheap to store stuff for a long time and cheap storage in the cloud.

15:09
James Burnand: But we also do things like we consolidate services. So I don't need an active directory system in every location if I can have a central system. And then basically the ability to operate and use that from a bunch of locations. Same thing with databases, same thing with a bunch of different technology pieces that allow for you to make it easier to provide these services that you need as a part of the application deployment at the best possible cost ratio. And finally discounts. So understanding how discounts work in the cloud providers is tremendously important. So it's a lot cheaper to buy a virtual machine or a cluster if you say, "I'm gonna use this for the next year." You get a pretty big discount for doing that. There's also negotiation, there's spot instances and there's pay as you go for certain other services that allows for you to really optimize how much your cost basis is for the different services that you need.

16:07
James Burnand: So to talk about edges and clouds, our friend William Shakespeare, it's a cloud or not to cloud that is the question. It may seem like we would say the cloud is the answer for everything 'cause it's like literally in our logo and our website name. But we recognize and we don't encourage every single application to be deployed in the cloud. There are certain considerations that you need to have around things like the user experience tolerance to internet connectivity, latency, and whether or not the staff skillset is able to handle some of the different pieces and parts. So we recognize that there's a very strong use case for being able to and to have things that operate on-prem. We also happen to have an idea and a concept that makes that a little easier to do at scale.

16:56
James Burnand: It's called hybrid cloud. So hybrid cloud is like taking a bit of cloud, paving it off and putting it on a server in your building. There are a variety of different providers out there that do this. I would say this is, from my perspective, I think this is gonna be a huge growth area inside of the manufacturing space in the next several years. Not a whole lot of people are doing it yet. But what it gives you the benefit of is that you can now operate and manage your on-prem resources using the cloud portal and the automation and the tools that are available to you as a part of your deployments in Azure or AWS or Google Cloud. So it kind of gives you the best of both worlds. But like I said, it's still a relatively new technology, but allows for you to have that capability of central management and local operation. All right, with that, I'm gonna hand it over to my partner Randy.

17:54
Randy Rausch: Thank you. So James just took us through the on-prem, or the design principles for in the cloud. And we're gonna talk about, they are for on-prem. Surprisingly, they're the same thing. We're gonna do it again because they're a little different when you get to the edge, what do I mean by edge? If you're a person from the cloud, the edge is where the data originates. It's your facility, it's your site. So there are a few things different when you start working there. In particular security. So what's different, the physical security and your network posture is different. There are some data centers that will have armed guards standing outside, making sure physical security is pretty good. That's not always the case for maybe a server or a IPC that you have on site. So you need to deal with things like that. What can you do?

18:36
Randy Rausch: If you are remotely managing something and you don't have physical control over it and other people do well, you wanna make sure someone can't come in and plug a USB stick or start logging into your terminal. So go ahead and disable interactive login because you don't need it and it's just yet another threat vector. Another thing you can do is utilize the security of the built-in hardware security modules or your TPM trusted platform module chips. On top of that, you can do secure boot, you can use that to encrypt your hard drives and make sure things are encrypted at rest and in transit of course.

19:10
Randy Rausch: And you wanna minimize your attack surface. So in this case, for your operating system, if you wanna rip out everything you don't need, because that's an attack factor that, again, you don't need. In our case, we use network, 'cause it has to talk on the network, and we run containers. So we have networking containers in the operating system, and that's it. Significantly reduces the attack surface. From a network perspective, you've got a different security posture in that the worst thing you can do is be a vector for some bad actor to come into your customer's network. So some things you wanna do from a best practice is you should never have any open firewall ports. All communication should be outbound from your on-premise device, typically over port 443 outbound only. You wanna make sure you're using your chain of trust. So again, your TPM chip on the hardware, all the way up through your certificate authority in the cloud. You wanna make sure that your whitelisting or allowlisting the destinations that your computer is allowed to go to.

20:12
Randy Rausch: From a scalability and reliability perspective on-prem, there's not as many computers on-premises as there are in the cloud. And so from a scalability perspective, that can limit you. But on the plus side, workloads in your factory are pretty predictable. It's not as if you're gonna have 100 new manufacturing lines on Black Friday just for that day. So it's fairly critical. So you don't need that elasticity, but you still wanna architect for scalability as you will grow over time or at least give yourself the ability to. The fun part is when you start talking about reliability. So reliability and cost have an interesting relationship. Things fail, right? It happens. The question is how long can you tolerate a failure before you have to restore service? The answer to that is usually between 10 milliseconds and 10 weeks, right? So if you're using a server industrial PC that has a 10 week lead time and you don't have a spare on-site, you are gonna wait 10 weeks if something goes wrong. If you do have a spare on-site and you cannot tolerate how long it takes to swap over from the broken machine to the spare, then you wanna start to look at a high availability architecture.

21:30
Randy Rausch: In this case, it may be a hot spare running live. Maybe you need even more. So maybe you have a multi-node Kubernetes cluster to give you some of that availability. But it's important, a thing to consider. And it's not just the compute. You wanna look at it holistically. If you only have one WAN connection, that's a single point of failure. If you only have one power source, that's a single point of failure. You do wanna take advantage of or consider that entire design, right? So when you're designing, the primary input is your availability target, right? And that's, again, how much downtime you can tolerate. A 99% availability translates to seven hours a month of downtime. 99.9, it's about 43 minutes a month of downtime. Four nines, 99.99% availability, it's about four minutes of downtime. And if your customer says, "I need 100% availability." If you just follow the math on that, you can deliver that for infinite cost, so. All right, in terms of operational excellence, some things you wanna make sure you consider, updating regularly is so critical.

22:39
Randy Rausch: Easy to do in the cloud. You also have to do it on-prem or on the edge. This is for your security updates, but also you should be having new functionality updates and making sure you're regularly keeping things patched, up-to-date, adding new features, and you have that muscle built to regularly update. It's important to have full bare metal update capability. If you're managing this device remotely, the operating system needs to be fully under your control. It should be a single artifact operating system without package managing of that, that can cause drift. And you should have A-B updates, so you can entirely update the operating system, fail over to the other partition. If something doesn't work, you can go back to your first partition and restore service. You don't wanna brick your device, especially if you're not there to go fix it. Certainly you wanna have backups.

23:27
Randy Rausch: When you're on-prem, you wanna back that up to the cloud or some other offsite location. And you wanna have 24/7 monitoring, as well as the ability to do self-healing and spares management and other types of things. So, enough talking. Let's take a look at some of this working. So we're gonna do a demonstration. So here's what we're gonna see. We are going to forcibly kill an ignition gateway and watch the self-healing capability, bring it back to life and restore service. We are going to take and deploy an additional 23 gateways, commission them real live. So if that takes, if it takes you half a day to do that, this will be about 10 days until this talk is over. We are going to upgrade a set of our ignition gateways, and we are going to then downgrade to simulate as if we needed to roll back.

24:19
James Burnand: And just for everyone's clarity, what we're doing here is this is actually our platform that we operate called FactoryStack. It's using a technology called Kubernetes. So Kubernetes is an orchestration engine that allows for us to operate and manage at scale a variety of different containers. And then what Randy will describe is a little more detail behind some of the tools we use to automate those deployments and management.

24:44
Randy Rausch: Okay, a little bit about the architecture. In this case, we have an enterprise architecture deployed. We have several front-end nodes. We're starting with three, that'll scale to five. We're starting with several back-end nodes, I think two, and we're gonna scale that to three. And we're starting with several on-site, in-factory nodes, starting with 30, and we're gonna scale that to 50. All of these will be communicating through an enterprise broker, in this case, HiveMQ. And yeah, so it's a live demo, what could go wrong, right? All right, let's see. A little bit of orientation here. This is a live perspective application. Each one of these boxes represents a gateway. So if there is a gateway, it is publishing to a broker. If this application sees that topic in the broker, it will make an image for that gateway. If that gateway goes offline, it will turn red, right? Or if it stops communicating with the brokers. We're looking at the broker and what's happening, it'll turn red. So if new things come in line, they'll pop up. If they go offline, they'll turn red. Live application out in the real world with certificates and things like that. So yeah, let's do this live. Okay, first things first, let's demonstrate self-healing. So audience participation.

25:58
James Burnand: Which city do you hate the most?

26:00
Randy Rausch: Which city would you like to take offline? We can do more, we can do more. I heard of Paris, is that? All right, all right. Houston. All right, where's Paris? Okay, so if I come here to Paris, this is our gateway. You can see, right here we've got our live certificates. This is available, a public-facing endpoint. It's live. We're up here. What I have on the right-hand side, this is a system inspiration tool. It's not meant to be pretty. It's called K9s. It allows us to administer Kubernetes clusters. In this case, we're looking at the pods in our cluster. What I'm gonna use is this tool to forcibly kill the ignition that is running here called Paris. So here we go. Delete, are we ready? All right, so I have just completely removed the resources and the ignition gateway. If I come back here and I try to reload, gateway's just gone, right?

27:02
Randy Rausch: All I've done is kill it. The infrastructure that we have in place with its self-healing capability, it has recognized that something that is supposed to be running is not running, and it's going and auto-restarting and restoring the backups in the background without anything on our behalf. So if we refresh this again, we should see the gateway is starting up. And if we come back here, oh, I was too fast to see it red, but you see Paris is now back online. So a little bit of self-healing goes a long way. So if something does go wrong, you wanna recover quickly, and that's a pretty quick way to do it. All right, let's scale out. I am gonna start this, 'cause it takes a little bit of time. All right, and then I'm gonna explain what's going on. So in this case, we're using a tool called Pulumi. This is like Ansible or Terraform. It's an Infrastructure as code automation tool. We have given it a input of what we want the state of our infrastructure to be.

28:04
Randy Rausch: It then goes out and looks at what is actually happening in reality. What is the real state in the real world of what's there and compares the difference in what you want and what it actually is. Based on that difference, it will execute the code to make the real world look like what you want it to look like. So, command line tools may not be the most interesting things to look at, but let's see. You can, in this case, we are creating new things. On the right-hand side, again, this is the set of pods that were there. As the infrastructure is determined what needs to be there and we're refreshing what's out there, you're gonna start to see new pods come online. So.

28:45
James Burnand: So this isn't just like turn on ignition, right? This is going out and enabling and deploying the cloud services and setting up the drive volumes. It's running the pre-configuration and restoring an ignition gateway backup into each one of these systems. So as Randy said, 10 more days, this presentation will be done, but this is all happening live and you guys are actually seeing all of the different commands that are a part of the automation running in real time in the cloud.

29:17
Randy Rausch: Sure. So yeah, creating the secrets for the gateway network, going out and requesting certificates from a certificate authority, creating the cloud, file, cloud file shares, setting up automated backups, building.

29:34
James Burnand: Load balancer.

29:35
Randy Rausch: What's that?

29:37
James Burnand: Load balancer, the firewall rules.

29:37
Randy Rausch: Yeah.

29:37
James Burnand: There's a lot involved with having this be set up in a complete way. And that's really, the ability to do that without clicking more than one command is pretty neat. And I think what you can see on the right there is they're starting to show up now in the cluster.

29:52
Randy Rausch: Yep. So this is infrastructure live review. The resources are being spun up and created. When they turn blue, it's going through their net. When they turn blue, that pod is now running, in which case Ignition can start to boot. So it takes a little bit of time for Ignition to boot. Once it boots again, we've already configured the databases, we've configured the certificates, we've configured our MQTT broker. So when it comes online and fully boots up, it should start broadcasting to our broker. Once the broker sees it, our application should, we should see a couple of more gateways start to come online. So, what am I forgetting in stuff that we've automated?

30:30
James Burnand: Well, while we're waiting, does anyone have any questions about what they're seeing so far?

30:37
Randy Rausch: Yeah, the gist of it. And depending on how you want to architect it, if you want to have one main enterprise broker for all of them, or if you wanna have tiered brokers or staging, depending on what you want to deliver, depending on where they are, you could have load balancers in front of sets. In this case, we're having a load balancer in front of all of them, but depending on the design application that you need, you would spec it out that way.

31:00
Audience Member 1: So are, oh, so are all these pods being deployed in different availability, AZs right now, or?

31:07
Randy Rausch: Yeah, so as part of the automation, you can specify where you want them to be. For simplicity of demo, we've deployed, in this case, they happen to be all in the same availability zone, but for many of our enterprise customers, not only different availability zones, but we'll deploy in different regions to do the same sort of thing.

31:23
James Burnand: Yeah, and so these applications, the compute is in a single availability zone, but we use zone-redundant storage. So should there be an AZ failure, it'll actually, the healing process you saw would actually move to the next availability zone. So you would see, for an entire building being gone, you would see less time than what it would take to reboot a Windows machine.

31:46
Audience Member 2: Quick question about using HiveMQ here, or using MQTT broker. Is that because, I mean, would the Ignition gateway network be something feasible in architecture like this, or is that kind of not possible to shuttle tags?

32:03
Randy Rausch: Yeah, we're actually deploying the gateway network and creating the gateway network certificates as part of speeding up every one of these.

32:11
James Burnand: Okay. Yeah, we used MQTT for this, primarily because of its common use inside of an enterprise application, but also it was a nice way for us to be able to capture a handful of statistics about each one of these gateways to be able to display it dynamically, and also when it goes offline. So that was.

32:28
Randy Rausch: I'm gonna jump in right. I ran in the background an update script, and what we said is for the first 10 gateways, I wanna upgrade them from version 42 to version 43 of ignition. So we kicked that off in the background. It realized, "Hey, these are different. It has to spin down those particular containers." It then spins up a new container with version 8.43. It restores the backups and configuration and connections. So they should start to come online, and you'll see if we did it right, the version numbers should change on that top row representing upgrading our fleet of ignition gateways. So.

33:01
James Burnand: Yeah, you can see a couple of them have already started to show up, and as they're booting up and reconnecting to MQTT, they will show up as well. So from an upgrade perspective, this would be, if you're thinking about it from an application perspective, you can upgrade the version of Ignition, but that doesn't necessarily mean that your application is perfect. So what you wanna do is you wanna perform that upgrade and then do your unit testing on the application function. And should something go wrong, what would we do next, Randy?

33:28
Randy Rausch: Well, if something were to go wrong, we would use our rollback and downgrade those particular ignition gateways.

33:36
James Burnand: So we're gonna wait until they're all up before we go ahead and downgrade them, but that conceptually allows for you to have that ability to perform those tests in a way that you're reducing the risk of what would happen if there was an application problem.

33:53
Audience Member 1: So some clients that we may deal with, they're gonna have limited IT staff for ability to understand things, but they may have an application space where they actually want it managed as infrastructure. So when we're talking about getting to the edge layer into the cloud layer, does that mean opening a VPN tunnel if we're not trying to open the ports? And if we are doing something like that, do we need to worry about attack vectors like DDoS, how to respond to that?

34:15
Randy Rausch: Yeah, really good question. I understood it right. If you're dealing with something on-site as well on the cloud, do you need a VPN between them? Is that what you're asking?

34:24
Audience Member 1: Yeah, essentially for clients that have data that's existing at the edge layer, right? They wanna use a cloud-based service. So they need some way to get it up into a cloud endpoint that's publicly accessible. How do we handle that from a security perspective, from the architecture?

34:40
Randy Rausch: Great question. I'm assuming you are not coming from a place that has express routes and private links between your factories and your cloud. If you don't have that, a couple of ways to do it. VPN is one way. You can have your edge gateways VPN into your cloud and keep that private. In some cases though, like if you're using encrypted MQTT and that's what you have to send up to an enterprise broker that's secured well on the cloud, that may be sufficient for securely getting your information, depending on the...

35:10
James Burnand: We actually have had some clients, enterprise clients, that prefer public SSL encrypted data versus a VPN tunnel 'cause a VPN tunnel allows for potential traffic to come back in. So there's obviously a variety of different ways to handle that. A lot of the smaller companies will end up taking on SD-WAN projects and then terminating at a VPN endpoint in the cloud as well. So there's many different ways to solve for that. But to what Randy said, being able to have kind of a public facing MQTT SSL encrypted and then secured authenticated broker, we found to be a very well accepted posture by a lot of companies.

35:50
Randy Rausch: Took us a little over two minutes, but we did get to version 43 and we discovered, oops, we forgot to test. Let's roll back to where we were. So in this case, we're now rolling back to the previous version, which with ignition actually requires deleting the stateful set and bringing that back up to downgrade. Again, it upgrades some of your project files. But you'll see that happen in the background. Is there another one?

36:16
Audience Member 3: I know we've talked about this a lot, but what kind of feedback are you getting from customers that let's say are fairly progressive in their security posture and might have full blown Paramodel in terms of DMZs and proxies all over the place and whatnot in terms of being able to sit there and eliminate that and ultimately eliminate those operating costs from operations?

36:46
James Burnand: I would say, so you're talking about like the, I'll say the like Rockwell Cisco complex. Yeah. I forget what the name of it is, but so there's two schools of thought here. There's the, we'll call them like network layer fundamentalists who never want to kind of flatten any of that out, but to your point, there's a lot of costs that's involved in management and effort and complexity in those networks. And specifically, if you look at like how complex firewall rules get when you have three or four layers of network deep between one of your control assets and your demilitarized zone or your transfer network.

37:26
James Burnand: So we are seeing that there is a, especially as PLCs get more secure and able to exist on more IT centric and friendly networks where you're seeing OPC UA servers on board on the controllers. We are seeing a flattening of those networks. I wouldn't say it's moving terribly fast though. So it does take a little while. As Randy mentioned, one of the things that's happening here is you can't actually downgrade from 8.1.43 to 42 without deleting the hard drive in this case. So it is taking a couple of extra steps as a part of the downgrading process, which again is fully automated, but it does take a little bit of time for those resources to reprovision.

38:08
Audience Member 3: For the deployments, is there a way to, is the way you control or how something deploys, can that be changed at an individual level? Say a plant wants to change how their deployment is, or does that have to go through the centralized, like what managed the clusters deployment?

38:27
Randy Rausch: Yeah, good question. So if you are trying to centrally manage a lot of things at scale, central management is good, but you can coordinate with each individual site on when their update happens, but you would trigger that centrally. You don't have to do everything all at once, but you can.

38:44
Audience Member 3: Is there a way to have it so a local administrator can manage a deployment without having to contact the main controller that manages everyone else's deployments?

38:57
Randy Rausch: There's always a way. Is that how you really wanna do it? If someone's managing it, why do you need another manager locally? But certainly you can set it up to do on a pull basis if you prefer.

39:12
James Burnand: Yeah, typically like adding or changing services is a very fast process. So really what you saw executed here was we made changes to what Pulumi was expecting to see, and Pulumi took care of all the mechanics of actually doing that. So you can imagine if I needed an additional gateway or if I needed to have, I needed another broker or a different database or whatever the case may be, you simply need to add that into the deployment pipeline into the Pulumi configuration, and it takes care of doing that for you. So that's typically handled by whoever kind of owns that infrastructure, but the process is not, I know the process for some companies to deploy like a new virtual machine is like months of time. This is not that.

40:02
Audience Member 4: Great presentation. So you mentioned sort of at the beginning that there were potential applications that you wouldn't recommend be hosted in the cloud. I guess could you maybe elaborate what those might be at this point, and would they be something that you could instead use hybrid cloud for?

40:16
James Burnand: Yeah, so anything you would want to do on-prem hybrid cloud is a viable option because it literally is running on-prem the same as if you were running it on a bare metal server. So that is kind of interchangeable in terms of those applications. Really, it comes back to latency and user experience. So if you have a ton of transactional data going back and forth, or if this is the only thing you can do to open and close a valve and there's no other backup way to do that, you probably don't want that in the cloud. You don't wanna tolerate that up there. That's just a bad idea. So what I do think, and this is me and what I think I see happening in the future, and what I see on the enterprise application side of things is that the reliability of the connectivity for enterprises between their factories and their public cloud instances has improved drastically over the last several years. So you are finding things like warehouse management systems, ERP systems. There's a lot of time-critical things that are not living in the four walls of a building anymore. I think there is gonna be some applications that today we're not comfortable with that will maybe in 10 years from now, hopefully in my career, will make sense to be deployed in that way. And I think hybrid cloud is maybe a step in the middle.

41:32
Randy Rausch: Yeah, and you wanna take that holistic view, right? What is it you are trying to accomplish? What are the risks? Where are your points of failure? What does your overall design consider? WANs are getting very reliable and you can have redundant, triple, quadruple, whatever you need to do if that's an issue, if that's important to what you're doing, if you can tolerate the latency, you can design for that. Lots of different ways to do it, but take that holistic approach.

41:57
Kyle Van Eenennaam: Thanks so much, guys. That is gonna wrap up this presentation. Everyone, we are about out of time. Let's get a big round of applause.

Posted on November 27, 2024