DevOps in a Cloud Native & Serverless World – The Goat Farm – S2E1

The Goat Farm is back with what we’re dubbing “Season 2”. In Season 2 we are going to focus on the challenges of inertia in organizations, how organizations are adopting the practices of Cloud Native and Serverless, and the intersection of DevOps and Cloud Native.

We start this episode off with a conversation with Joe Beda of Heptio. This conversation was originally recorded in December of 2017 at The Lodge Sports Grille in downtown Seattle.  Joe talks about the concept of organizational fit, and how the patterns and practices of DevOps in the past had a hard time mapping to an organization. He also talks about the concept of API first infrastructure, and how that impacts how we think about operations.

We also catch up with Rob Cummings of Slalom and Tom McLaughlin of ServerlessOps at DevOpsDays Seattle 2018. Both guests tell us about the impact of Serverless on the transformation of the organization, and the current trends their seeing with adoption of Cloud Native and Serverless.

Download MP3 – iTunesStitcherRSS

Music: C’est sur toi que ça va le mieux by Monplaisir

Show Notes:


joebedaJoe Beda – TwitterLinkedIn

Joe Beda is CTO of Heptio, a startup focused on unleashing the technology driven enterprise. We aim to realize the full potential of Kubernetes and transform IT into a business accelerator. Prior to Heptio, Joe was at Google for over 10 years. While there, Joe started Google Compute Engine and co-founded the Kubernetes project. In a previous life, Joe started his career at Microsoft working on Internet Explorer. Joe is slowly becoming a Seattle native with his wife, a physician, and two kids.

opsrob

Rob Cummings – TwitterLinkedIn

Rob has been involved in IT operations for the past 20 years. This includes systems engineering work at Bose, EMC Corporation, and Nordstrom. Today, Rob is a Solution Principal at Slalom Consulting where he helps companies solve problems and build for the future. He focuses his passion on pushing the boundaries of both how we implement technology and set up organizational structures for success. Rob is also a co-organizer for the DevOpsDays Seattle conference.

tom-mclaughlinTom McLaughlin – TwitterLinkedIn

Tom is the founder of ServerlessOps and an experienced operations engineer. He is actively engaged in promoting serverless infrastructure and engaging with the community to make operations people successful through what he sees as a disruption to his profession. When not at work he is a proud cat dad to two calicoes and enjoys spending his time drag racing and sailing. He is also an amateur thinkfluencer on Twitter.

Measuring Success at Capital One – The Goat Farm – S1E13

We all think DevOps is a better way to work, but how can you begin to measure aspects of your DevOps transformation. In this episode we talk to Adam Auerbach and Topo Pal of Capital One, and learn more about the work they are doing. We discuss how their DevOps journey started, how it’s now a CIO mandated journey, and how they build some open source tooling to help them measure the speed at which they are moving.

Download MP3 – iTunesStitcherRSS

Show Notes:

Scaled Agile Framework (SAFe)

Capital One DevOps Dashboard – Hygiea

15 principles of CD (mostly binary, used to create heatmap of maturity for Capital One platforms)

  • Github (or similar) with branching strategy
  • Code Coverage (90% preferred, at business discretion)
  • Static Analysis (e.g. PMD, CPD, FindBugs)
  • Static Security
  • Open Source/third party vulnerability scan/support
  • Automated instance provisioning in each region
  • Immutable servers
  • Artifact management
  • Automated build, deploy, and testing on commit (can be by feature)
  • Automated integration testing on successful test
  • Automated performance testing
  • Automated/repeatable rollbacks (including data migrations)
  • Push button/automated deployments to production
  • Automated generation of COs
  • Blue/green (zero downtime/canary) releases
  • Feature activation (wire on/wire off)

 


adam_auerbachAdam Auerbach – TwitterLinkedIn

Adam Auerbach is the Sr Technology Director for Advanced Testing and Release services for Capital One Financial Corporation.  Adam is responsible for Capital One’s enterprise performance and automated testing departments as well as enterprise release management. Since joining Capital One, he has provided leadership for the agile transformation of their quality assurance group and led the enterprise adoption of DevOps and ATDD. Before joining Capital One, Adam was with Chase and other financial and insurance companies, in various leadership positions focusing on quality and agile practices.

 

topo_palTapabrata Pal (Topo) – TwitterLinkedIn

Tapabrata Pal has 20 years of IT experience in various technology roles (developer, operations engineer, and architect) in the retail, healthcare, and finance industries. Over the last five years, Tapabrata has served as director of Capital One’s Enterprise Architecture group, and led the company’s DevOpsSec initiatives. He is currently director and individual contributor focusing on next-generation infrastructure. Tapabrata is also the community manager and a core committer of an Open Source project “Hygieia” that won “Open Source Rookie of the Year” for 2015.
Previously, Tapabrata spent some time in academics doing doctoral and post-doctoral research in the field of solid state physics.

DevOps When Startups Become Enterprises – The Goat Farm – S1E12

In this episode we talk to Andy Domeier of SPS Commerce. As startups grow into larger companies, they face the same scaling challenges that larger enterprises tend to encounter. Andy gives us his 11 years of experience of watching SPS Commerce grow from a startup to an enterprise, and how they’ve handled these challenges. We also focus on some of the technology SPS using to help scale the people, and scale their technology capabilities.

Download MP3 – iTunesStitcherRSS

Show notes:


Andy Domeier – LinkedInTwitter
andydomeier

Andy has been in Technology Operations leadership with SPS Commerce for the past 11 years.  SPS grows very aggressively creating an environment of persistent growth challenges.  Andy’s focuses within the organization include:  monitoring and operating complex changing systems, priority organization and alignment, and the organization of Knowledge.

DevOps at IBM – The Goat Farm – S1E9

How does IBM manage to run web sites for some the World’s largest sporting and television events? With the practices of DevOps of course! In this episode, Ross and Michael talk to Brian O’Connell of IBM.

Brian tells us of his journey to DevOps practices through stumbling onto the ideas of Chef and Infrastructure as Code. We talk about the cultural shift required when it comes to who owns delivery of changes and ownership of those changes. Brian also tells us how they leverage the “build, measure, learn” product development loop.

The sites Brian and team help run are some of the more high profile, and highly visited sites in the world. Brian talks about the challenges when trying to introduce DevOps to such high profile sites, and mistakes that were made along the way. We also talk about some of the tooling Brian and team use, and how they effectively deploy enterprise software packages.

Download MP3 – iTunesStitcherRSS

Show notes:


Brian O’Connell – TwitterLinkedInBrian O'Connell

Brian O’Connell is a Senior Technical Staff Member at IBM that leads a team focused on DevOps, predictive analytics, big data, and cloud technologies.

Brian joined IBM in 2001, starting as a software engineer. He built many software systems to support the continuous availability and events infrastructure.  His expertise includes architecting and developing scalable server applications, concurrency, advanced visualizations, and big data.

From 2007 until 2011 Brian was the lead infrastructure technology advocate and designer for the World Wide Sponsorship Marketing (WWSM) client. His role included strategic technical direction, evaluating technology pilots and the end to end delivery of highly visible web events. In that role, he successfully delivered all IBM sponsorship web sites including The Masters, Wimbledon, Roland Garros (French Open), US Open Tennis, US Open Golf, Australian Open, and The Tony Awards. Brian designed systems to manage the infrastructure and applications used by the client including a focus on defining plans, strategies and architectures for the installation, operation, migration and management of complex information systems.
Brian has had more than 250 patents issued, is an IBM designated Master Inventor and a Franz Edelman laureate.

Adrian Cockcroft of Battery Ventures – The Goat Farm – S1E8

In this episode we talk to the famous (or infamous) Adrian Cockcroft of Battery Ventures. Adrian is known for his work at Netflix and his work to migrate them to a Cloud first strategy, then before that for his book on Sun performance tuning.

Adrian has been doing a lot of work talking to CIOs of large enterprises and helping them understand where ideas such as DevOps, microservices, Cloud are taking the industry. He allows tells us how he is helping CIOs realize how their IT organizations must transform to adopt these new ideas. This episode is all about how the horses are growing horns to become the unicorns.

(Editor’s note: We are really sorry about the audio on this episode. Adrian was in Portland, Michael was in Amsterdam, and Ross was in Minneapolis. While we could have cut a bunch of the bad audio, the content was so good we didn’t want to drop anything. Apologies.)

Download MP3 – iTunesStitcherRSS


Show Notes


Adrian Cockcroft – LinkedInTwitter

Adrian Cockcroft has had a long career working at the leading edge of technology. He’s always been fascinated by what comes next, and he writes and speaks extensively on a range of subjects. At Battery, he advises the firm and its portfolio companies about technology issues and also assists with deal sourcing and due diligence.

Before joining Battery, Adrian helped lead Netflix’s migration to a large scale, highly available public-cloud architecture and the open sourcing of the cloud-native NetflixOSS platform. Prior to that at Netflix he managed a team working on personalization algorithms and service-oriented refactoring.

Adrian was a founding member of eBay Research Labs, developing advanced mobile applications and even building his own homebrew phone, years before iPhone and Android launched. As a distinguished engineer at Sun Microsystems he wrote the best-selling “Sun Performance and Tuning” book and was chief architect for High Performance Technical Computing.

Jonny Wooldridge on Enterprises vs Startups – The Goat Farm – S1E3

In this episode Ross and I talk to Jonny Wooldridge, formerly of Marks & Spencer and currently at The Cambridge Satchel Company. We ask Jonny his thoughts on what DevOps is like in an Enterprise vs a Startup, how to jumpstart adoption, how to handle “legacy systems”, and get his thoughts on concepts such as “Pace Layering” and “Bimodal IT”.

Ross and I also talk about why the language we use is important when talking about DevOps and DevOps concepts.

Download MP3 – iTunesStitcherRSS

Guest Info:

Jonny Wooldridge – LinkedInTwitter

Jonny Wooldridge is CTO of The Cambridge Satchel Company and has a history of leading agile cross-functional teams in dynamic and fast paced start-ups in London including lastminute.com, Opodo.com and Photobox.com. Prior to joining The Cambridge Satchel Company he was Head of Web Engineering at the British multinational retailer M&S. He was instrumental in introducing DevOps to the enterprise whilst working on a 3 year / £150 Million project to re-platform the website, order management systems & customer service tools.

He is passionate about Lean and DevOps topics, particularly in challenging environments (like the average enterprise!) and earlier this year started a blog at enterprisedevops.com.

Veteran of the Process Wars

Tokyo. I’m still in Tokyo. I wake up, rub the sleep from my eyes, and roll out of bed. As I rise I take a quick look at my watch for any new emails. Nothing. There’s not much email anymore. Not since the process wars of 2016.

Many people won’t talk about the process wars. They changed the way most of the InfoTech industry works and how we are allowed to think of our jobs. In the past, I might have been responsible for running hundreds of servers. Now, the machines are in charge. I’m only allowed to feed the machines code.

We are grouped together in small teams that write code as a unit. We are “kept in line” by dogs that are trained to attack us. They say it’s for our own good. That it’s for the betterment of industry. If we try to touch the servers, if we try to do anything other than write code and feed it to the machines, the dogs will bite us.

We are kept in line by process and controls, but not like we used to be kept in check. Before, we used to have weekly meetings reviewing the work we wanted to perform. The meetings were always a joke amongst my peers. Typically they were run by a VP that had no clue what we wanted to change on the machines. If we wanted something bad enough, we could social engineer our way to what we wanted.

No more today. When we write the code to feed the machines, we have to write tests. They tell us these tests are for our own good. The tests confirm that the machines are set up exactly how they want them to be. We actually write the tests first, and they always fail the first time through. Then we write code to bring the tests into compliance.

If the tests weren’t bad enough, we also have automated tests to make sure our code conforms to “Style Guidelines”. They tell us these guidelines are to ensure conformity and consistency. I say they are there to hold us down and control us. They also require us to “lint” our code before we feed it to the machines. This once again is to ensure “conformity”.

Even with all of this verifying for conformity, we still aren’t allowed to directly feed the code directly to the machines. We must check the code into a repository where more machines take over to verify that we have successful tests and we meet the conformity guidelines. The machines also verify that our code works with code written by others, in my group and other groups.

Then, a Conformity Checker reviews my changes to make sure they are compliant with the policy. We are no longer independent; we are no longer allowed to game the review board. We feed the machines compliant code or else we end up on the streets. Three strikes and we’re out.

Which is why I’m in Tokyo. I’ve had two strikes at my company in the United States. They shipped me over to the Tokyo division for reeducation. Japan culture is heavily based on order and process. During the process wars they helped lead the revolution. Many of the compliance tools were written by or enabled by Japanese technology leaders. Yukihiro Matsumoto (codenamed Matz) was responsible for designing the programming language I now feed to the machines. Gosuke Miyashita wrote the tools that I am required to use to test my code before feeding it to the machines. And then there was Kohsuke Kawaguchi, the creator of the master machine that ensures all the code is compliant, automatically with no humans to game the process.

It’s all very neat and orderly now. I take requests for code from the owners of the machines, I write compliant code, and the machines automatically verify my work. Eventually the machines apply my code, and the owners get exactly what they want. I’ll wake up the next however many hundreds of mornings and do just this. No more, no less. It’s all very neat and orderly now.

Get Your Head Out of Your aaS

3815168722_faee10cf62_bI’ve been floating between the worlds of Cloud and DevOps for a while now and it is interesting to see the Cloud world finally start to realize the real value is in DevOps. It’s great that more people are starting to pay attention to this cultural and professional movement. What is not great is how the Cloud experts tend to get wrapped up in some debates that are trivial and meaningless, in the larger scheme of things. Take for instance two persistent debates I am seeing over IaaS vs. PaaS, and then which PaaS is better. I hate to be the one to break it to these camps, but it doesn’t matter; at the end of the day you are selling plumbing fixtures that crap flows through.

To understand what I mean, lets take a step back. In 2008, I started pursuing my MBA at The Ohio State University. One of the core requirements of the degree was Operations Management. In Operations Management, you learn manufacturing optimization through ideas such as Lean and Six Sigma. The book “Learning to See” was part of the course material and it focused on optimization of manufacturing processes through visualization, also known as Value Stream Mapping. As the course progressed, I had a personal epiphany. As we kept walking through manufacturing processes, and Value Streams, what I quickly realized was that the work we did in IT was all about manufacturing a good or service someone would be consuming. Automation in the IT world is about (or should be about) optimizing these Value Streams and (hopefully) eliminating waste from the system. My Operations Management course really taught me to see (pun intended) and to think differently about how we worked in IT.

I took this new found knowledge back to my work where it was summarily ignored by my boss and coworkers, and lacking support I shelved my ideas. Little did I know many of the Lean principals I had learned would be at the forefront of how IT is changing today, and was already being changed at that time in 2008, I just didn’t know it.

When somebody asks me what DevOps is, I often respond with the simple idea that “DevOps is about increasing the flow of work through IT.” I borrow this idea heavily from “The Phoenix Project“, but I find it is the most simplest way to capture the essence of this cultural and Imageprofessional movement. And that is where Value Stream Mapping and the ideas of Lean come into the conversation. Books like the “The Phoenix Project“, and notable DevOps contributors such as John Willis expound the values of these techniques to optimize the IT Manufacturing chain, be it Development work or Operations work.

Value Stream Maps are relatively simple. They identify the flow of a raw material through Screen Shot 2014-04-03 at 11.07.33 PMvarious processes that add value. They also identify areas of waste in the system, and they help in building the Future State Map, or the Value Stream that you want to achieve in the future after optimizing the system. The most basic and valuable thing about Value Stream Maps is how they allow you to easily visualize your work, and once it is visualized it is easy to understand and optimize.

If you look at the first current state map, you can easily see how relabeling the boxes to reflect common IT tasks, say in a server build process, makes this a powerful tool for IT. Replace the box names with another process – maybe code build, testing, and release – and you see once again how Value Stream Mapping is a key tool in fixing our broken IT.

Now that we’ve established a method for the optimization of our IT processes, let’s go back to thinking about Cloud and the debates around Iaas, PaaS, and the PaaS vendors. Take the second Value Stream Map. Say this diagram more accurately reflected server builds and the time it took to install an OS was one hour. We optimize this process through our IaaS based Cloud, public or private, and get the time down to 5 minutes. That is awesome, we’ve saved 55 minutes and really optimized that process. Go team!

If “premature optimization is the root of all evil”, then local optimization is the Devil’s half brother. In the above example we saved 55 minutes, but the total time of work flowing through the system is still 67 days, 23 hours. And that is where we come back to Cloud. IaaS is a local optimization. It is great, it is awesome, but it is a very small piece of the puzzle. PaaS is another local optimization, but instead of optimizing one process it optimizes three or four. Which is great, but many IT organizations are going to “adopt Cloud for business agility and speed, then be sadly surprised when their local optimization does little to fix their screwed up system. Cloud can be a great enabler, but it is only a small piece of the larger system. It is high time more of us focus on the larger system.

What if Everything We’ve Been Doing is Wrong?

60-wrong-way

After I wrote my last post, I was talking with Donnie Berkholz as we traveled to FOSDEM. Donnie commented on how powerful of a post it was, yet it left the reader hanging. He, and other readers, wanted more. So I’ve taken the liberty of breaking down more of the reasons Enterprise IT needs a “special kind of DevOps” as posted by Andi Mann. I don’t want anyone to think I am picking on Andi personally. Rather, his post reminds me of all the excuses Enterprises give as to why “We can’t change”. As Mick told Rocky, “There ain’t no can’ts!”

  • They cannot achieve the same levels of agility and personal responsibility as a smaller or less complex organization.

Why Not? Principles that teach agility and speed have long been used at large companies such as Microsoft. (Yes, feel free to say Microsoft is a bad example, they are still one of the world’s largest software companies.) Additionally, if one doesn’t want to take personal responsibility for what they produce for a company, maybe they are in the wrong job for the wrong company?

  • They cannot stream new code into production and just shut down for a couple of hours to fallback if it fails.

This is fool-hardy to begin with. The goal of methods such as Continuous Integration is to be constantly building releases and testing them to catch problems before they are released to production. Also, the idea is to test small changes, so you know exactly what breaks, rather than large chunks of code. Large enterprises “cannot stream new code” because they haven’t built the necessary flows in front of production releases to effectively and efficiently test and verify code changes. This requires IT organizations to fully automate their processes all the way down to server builds, a process they often are incapable of doing because of an attachment to the “old way of doing things”.

  • They rarely ever have ‘two pizza teams’ for development or operations (indeed, they are lucky if they have ‘two Pizza Hut teams’).

The size of the team is nearly always irrelevant. Within each Pizza Hut there are tables, and each table consumes the pizza buffet. The goal of DevOps is to increase the flow of the work through those tables so the teams can eat their pizza and leave quicker. As I’ve said before, focusing on the Silos is the wrong way to solve the problem. Rather focus on the grain elevators that move the grain to produce something meaningful.

  • They cannot sign up for cloud services with a credit card without exceeding their monthly limit and/or being fired.

Get an MSA/PO with the cloud vendor or build a Private Cloud. Cloud or no cloud, building strong automation on top of existing VM or server infrastructure can help alleviate many problems in service delivery.

  • They cannot allow developers to access raw production data, let alone copy it to their laptop for development or testing.

Scrub the data. DevOps or not, this is a problem that we’ve solved years ago. When I worked at a major e-commerce site, real data was often required for testing, but that data was always cleaned of any sensitive PII. This is not an issue that is unique to DevOps.

  • They cannot choose to stream new code into production in violation of a change freeze, or even without the prior approval of a CAB.

Once again, one assumes that DevOps is all about willy nilly pushing of code to production. One aspect of DevOps is about increasing the flow of the work through the system by optimizing the centers where value is added. As I’ve discussed before, principles and practices of DevOps actually help things like Change Control.

  • They cannot just tell developers to carry pagers ‘until their software is bedded in’ (not least because their developers have always carried pagers, and on a full-time basis).

If Devs already carry pagers, then they’ve already been told to carry pagers, hence, “they” can indeed tell their Devs to carry pagers. Additionally, bedding in of the software should happen in the lower environments as discussed previously. If you’ve done things right before production, pagers become a tool that is used when things go really badly. It’s a form of monitoring and incident response that becomes meaningful again because you aren’t being paged for endless break fix work.

  • They cannot put developers and operators together because one team works 24×7 shifts in 7data centers while the other works 16-hour days in 12 different locations.

Well, good, they at least have 16 hours a day together. Highly distributed remote teams are becoming more and more common. Technology is evolving to help bring this concept of remote work and people are finding creative ways to work around it. I’m also against the idea that DevOps is all about merging dev and ops onto one team, because that is not the point. The idea, as already stated, is to increase the flow of work between Dev and Ops and build a culture of continuous improvement between the two groups (three groups if you include the business). Dev, Ops, Business, who gives a shit. The point is working towards a common goal, no matter where you sit.

What large IT shops cannot do is be satisfied anymore with the status-quo. They cannot accept the ways of the past any longer, and they have to start thinking about blowing up their way of doing things. They cannot let the castles and fiefdoms of the past get in the way any longer.

I think the single most powerful question any IT shop can ask themselves is, “What if everything we’ve been doing over the last X years is completely wrong?” Start there, and reevaluate everything you’ve been doing to achieve (or not achieve) the results your customers require.