David Bowen BSc Hons

Experienced multi-cloud DevSecOps Lead and Mentor

Contact

david@myforest.com

+44 77 34 90 16 61

myforest.com, github, LinkedIn

Key Skills

Able to learn quickly by thinking laterally and reading extensively

Looks to the future, builds on the past

Expertise and energy that enables creative solutions to complex problems

Able to communicate effectively at all levels of an organisation

Capable of experimentation but retains sight of objective

Experience

Lead Service Reliability Engineer (SRE): Inrupt (2021 to present)

January 2021 - present

I lead the implementation of Inrupt's customer, SaaS and development offerings, all delivered as fully cloud-native services. I ensure the systems are available, robust, secure, ISO 27001-compliant and performant.

Of course Kubernetes does the heavy-lifting but creating useful things like effective healthchecks takes a combination of development and operational experience that is hard-won from hands-on experience.

I drive and deliver automation initiatives using Python, Terraform, and GitHub Actions to streamline infrastructure deployment and management on tier 1 and tier 2 clouds including arms-length operations.

Releasing product into production environments needs to be safe. I've built and extended testing along a number of dimensions but my proudest achievement is having the development team see a system integration test failure as signal rather than noise.

We're using a mix of technologies and I work with the teams to select, validate and deploy them so they operate well at scale. I shape our technical direction and platform strategy, ensuring our tools and processes enhance developer experience and support rapid, reliable delivery. We have global operations and I work across time zones and data centres.

A critical part of my leadership continues to be authenticity. I develop our backend applications so have my hands on Linux, Kubernetes, Docker, Java, Python, Terraform and GitHub all the time.

I am called to tackle issues with Inrupt and customer systems, from adapting the use of a service which got a new cost model to fixing packet dropping issues affecting a specific kernel on worker nodes of a specific vintage. The buck stops with me.

At Inrupt we have a close-knit remote team and it has been a joy to help the team grow as they learn about operational concerns either by pitching in alongside me on a customer issue, taking time to work through a design together or walking through some observability and seeing ways the system could run better.

Lead Service Reliability Engineer (SRE), Cloud and Cognitive Solutions: IBM (2019 to 2020)

January 2019 - December 2020

I drove technical solutions in a diverse and disparate team of 20 people.

We ran ten internal and external business applications for IBM and so had demanding requirements for availability, accuracy, functionality and integrity.

I helped the team grow from a patchwork of platforms to a robust multi-zone Kubernetes cluster. I encouraged diversity in our solutions, but where we had a strong platform we came together to use it and that allowed us to focus on things that made a real difference.

I instigated and enhanced our logging and monitoring to give us a comprehensive view of our systems behaviour and to give us sensible alerting to things that mattered. We were very conscious of alert fatigue and so we had a number of robustness solutions in place both within the design of our systems and the handling of interruptions. As Lead SRE it was very rewarding to see the team take these on-board and extend them in new ways to help us do even better.

A critical part of my leadership was authenticity. I developed three of the applications so had my hands on Linux, Kubernetes, Docker, TypeScript, Python and GitHub all the time. This gave me concrete experience to determine if SonarQube, Travis, LogDNA, Sysdig, OpenAPI, npm, VS Code and a raft of other tools were actually useful. I was not an arm-waving architect basing ideas on a conference I just attended. I did incorporate new ideas and encouraged the team to experiment. We normally did this in production which showed the new shiny thing was robust and could be observed sensibly. It also demonstrated that our production systems could handle trouble on a regular basis which helped us succeed when unusual trouble arrived.

Mentoring the team was very rewarding and it was great to see their increased courage as the problems they tackled got harder. The ethos in the team allowed us to fail and we supported this with things like "Learning reviews" (a.k.a. blameless RCAs).

Technical Lead, Customer Health Insights: IBM (2019 to 2020)

January 2019 - December 2020

Our work resulted in the retention of millions of dollars of customer contracts. We also won the "Innovation in XaaS Product Management" TSIA award and I won an "Outstanding Technical Achievement Award".

We created machine learning to generate actionable insights into IBM's customers. We carefully gathered data from a number of sources and brought it together to create models specific to different areas of our business. We then delivered these as calls-to-action for our customer success team or as email nurture campaigns.

Since founding this project in early 2019, I guided team members as they contributed according to their strengths. I helped them learn new skills to join their work up with others for relevant results.

I built a Kubernetes cluster, monitoring and logging as well as extra orchestration in Airflow, notably to give us SLAs for jobs (which increased signal to noise ratio). These approaches gave us considerable robustness and allowed us to deliver on our SLAs despite a swathe of exciting issues. This tooling proved to be so effective that the other part of the business I worked with adopted these tools for ten other applications.

To increase our productivity I also developed an API where internal applications could access our business data in a secure and resilient way without needing to hook into the complexities of our underlying data store. This API had 99.95% uptime by using a multi-zone cluster, rolling deploys and robust handling of pod and cluster interruptions. As this was a best-effort API I couldn't justify spending more to get better which is something you have to accept as an SRE.

I enjoyed helping people come together and being able to guide them both from a technical and business perspective. My comfort with the technical landscape allowed me to embolden our team to try experiments and approaches that were unusual or unexpected. They didn't always work out, but when they did it helped us all provide better insights into our customers.

Technical Lead, Third Party Marketplace Syndication: IBM (2017 - 2018)

Summer 2017 - December 2018

Handled orders from third-party marketplaces for IBM products, ensuring the product was billed and deployed.

We ran our system in Dallas, London and Sydney for robustness and created a global rolling deploy. All systems were as-a-service so we had no servers to care and feed for.

Developed fast, effective solutions using TypeScript and Mongo.

Infrastructure and Deployment Lead: IBM (2015 - 2017)

Autumn 2015 - Spring 2017

Took a team on their Docker-in-production journey.

We implemented infrastructure-as-code and gitops to provide a hands-free rolling deploy with gating tests at a time when this wasn't done in IBM.

Business Analytics Cloud Architect: IBM (2014 - 2015)

Autumn 2014 - Autumn 2015

Helped other teams bring their cloud offerings to market using my experience with TM1 on Cloud.

This included moving to automated deployments as well as helping with audits and compliance.

TM1 on Cloud Team Lead: IBM (2013-2014)

Autumn 2013 - Autumn 2014

Operated the TM1 on Cloud offering which of course included monitoring, troubleshooting and adapting to customer requirements.

Interviewed and on-boarded new devops.

Directed product development for future TM1 releases to improve the cloud offering's maintainability and margin.

Enhanced the offering, for example investigating more sophisticated monitoring tooling.

Summer 2013

Implemented TM1 on Cloud in two months.

Used my experience with TM1 and system administration to architect the solution and adapt it as we rapidly prototyped and learned about the behaviours of the system.

Released weekly product updates to demonstrate the changing offering. Worked with the six product development teams to describe our offering and garner feedback on improvements which I then went on to implement. Carried out similar work with experts who had implemented TM1 on customer site.

Rapidly got up to speed on the SoftLayer API and PowerShell to automate the creation of the cluster and its customer-specific configuration. At times administered over a hundred servers of various vintages.

Learned how to negotiate the technical and legal clearance processes in IBM to release software to the novel cloud environment.

Worked extensively with our QC department to verify the offering worked. I was on the hook to determine the cause of problems and determine if they were cloud-related (so I could fix them) or if they were product-related.

Senior Software Engineer: IBM (2008 - 2013)

2010 - 2013

Used deep TM1 knowledge to understand how to extract content from one running database and move it to another running database.

Thrived with the tools we all loved such as Jenkins, JUnit, Java, Eclipse and of course, agile teams.

Delivered world-class, enterprise-level solutions with automation, accessibility, globalisation, time zones, Unicode, multi-threading and graceful error handling.

Worked with the TM1 team to develop a REST API to describe TM1 objects (and extended the JSON locally where the API fell short). Needed a wide-ranging knowledge of TM1 to include the range of objects it works with to ensure we moved as much of the content as we could.

The technical difficulties of carefully integrating changes into the target system meant that my approach of using test-driven development really paid off. Developed considerable skill in JUnit, for example using parameterised tests generated from permutations, combinations, test files or test case lists.

2008 - 2010

Optimisation project for IBM Cognos Planning 10.1.0. Implemented complex database and runtime changes to massively reduce the resources required to apply access control to cell data (patent). Used my expertise in understanding clusters to improve the robustness and efficiency of work item processing.

These tasks required me to blend my SQL, Visual Basic, system administration and business knowledge skillsets to understand what would be worth changing and then to dive into a ten year old code base and make significant changes. This effort was backed up by extensive testing and data gathering by automating the processing of a vast array of test models to understand how much improvement we were making.

Senior Software Engineer: Cognos (2003 - 2007)

2005 - 2007

Worked on Eclipse-based UI with advanced features such as intellisense and quick fixes.

Added the ability to transfer systems between environments. Exported data from DB2, Oracle or SQL Server to disk. This could then be re-imported into any of those platforms. Developed faster by making extensive re-use of existing COM components from Java via Jawin.

2003 - 2004

Co-developed an engine to transfer data between systems and optimized this to be 60 times quicker than before. (patent)

Created a system to export consistent sets of data from Contributor to multiple external systems in parallel using copy-on-write. (patent)

Senior Software Engineer: Adaytum (2000 - 2003)

Worked in a small team defining a new enterprise-friendly architecture for an n-tier web-based system. Major enhancements included:

  • Discrete change management - system stayed online
  • Platform neutral data specification and data storage structure - flattened learning curve
  • Component re-use to improve consistency of system functions and exception handling - users and developers became more efficient

Enhanced data store architecture to allow cross-instance storage. Migrated to a data store platform-agnostic approach that massively sped up the ports to new data stores. Defined, coded and tested a scalable engine for managing large data volumes in big enterprises. Took responsibility for key stages in the data transformation and maintenance in critical sections of the system. (patent)

Trained new developers to create higher-quality, more reliable code. Provided direction and components that harmonised exception handling, resource management and debug information.

Developed a system to allow system activities to be scaled-out (and scaled-up as a side-effect). The execution environment picked up atoms of work and executed them whilst providing debug and administration information about the tasks processed. Worked in a database-independent way using a central list of jobs and handling the multi-machine access to this list using a platform neutral locking architecture. (patent)

Liaised with remote development teams to port data storage to Oracle 8/9i and IBM DB2. Provided a system architecture, initial examples, documentation and practical advice to assist in the port. Worked directly with client (a global bank) to create and demonstrate the enterprise features such as failover.

Software Engineer: Raft International (1997 - 2000)

Component Based Rapid Application Development and Training: I was selected to provide training and technical expertise to our Danish office and so moved to Denmark for nine months.

Developed Company Intranet 1.0 as well as Sales Support including Microsoft TechEd as a vendor on two stands simultaneously.

Worked on a live Client-Server Foreign Exchange system in a major bank in the City of London

Temp Jobs & Early Career Roles (1994 - 1997)

During this period I undertook a variety of temporary jobs and short-term roles while exploring career options and building practical experience. These included:

  • Office administration and data entry for local businesses
  • IT support and troubleshooting for small companies
  • Worked in the insurance industry including programming and system setup
  • Administrative role at a housing charity

This period helped me develop adaptability, communication skills, and a broad understanding of workplace environments, laying the foundation for my later technical career.

Portfolio

These open projects are on-hold as I've been dealing with a personal issue for the last year, although that is practically concluded now.

SolidFS

SolidFS is a FUSE driver for Solid.

I developed this as I wanted to be able to use regular Linux tools, such as rsync, to do things like backing up a Solid Pod.

I used Python which I know well and integrated it into FUSE on one side and Solid on the other. Python is ideal for glue code like this.

Having built this I was able to test it and found that I could work with multiple Solid servers which helps prove the vision of Solid as a platform with multiple implementations. Of course I needed to make adaptions to the different server flavours and that was the eye-opening I was looking for when developing this tool.

Issue Access Grant

My tool to issue an access grant allows a user to simply issue an access grant without needing to respond to a request. Notably this means you can pre-emptively give access.

As is clear when you look at the source, I was using this as an opportunity to experiment with building a web application without using a JavaScript framework.

MMSP Heat Pump Experiment Review

An application for Emoncms that helps review experiments with a heat pump.

It was built to allow me and others in the Open Energy Monitor community to review alternate ways to run heat pumps. It lead to a number of interesting and technically detailed conversations.

I have also contributed to EmonCMS which is the platform used by the members of the community to gather and review data on a number of sustainability initiatives.

Technology Skills

Cloud

Managing the company's estate on AWS using IAM, EKS, S3, RDS, MSK and so on.

Experienced in managing the limitations of tier 1 and tier 2 clouds to still provide the required business outcomes.

Comfortable working multi-cloud, for example by setting up observability tooling so we have view across all the clouds. Have run workloads on AWS, GCP, Azure, IBM and Orange (Huawei).

Managing budgets and adapting workloads to changing business requirements as well as the fun stuff like Keda to have the systems do the work for us.

Software

I've developed enterpise scale systems that have been deployed on thousands of customer sites.

I run disparate workloads from machine-learning to business apps to experimental new APIs using containers on Kubernetes. I'm experienced enough to understand the critical nature of image hashes and Kubernetes probes on providing a reliable service.

Built and run a multi-dimensional testing rig spinning up clusters to check different software configurations and even version upgrades to make software releases safer.

Very comfortable in Python and familiar with Java. Happy working with TypeScript, Javascript, HTML and CSS.

Comfortable on Linux, happiest on the web.

Experienced in database design and implementation having used SQL Server, Oracle, DB2, MySQL, MongoDB and PostgreSQL. I understand how to tweak the operational systems, but also how to alter the applications to improve interactions.

Good understanding of Kafka having built systems for audit and for managing marketplace transactions handling tens of millions of messages a day.

Hardware

All the normal stuff: building machines from scratch, setting up networks and interfacing with devices such as CurrentCost.

Qualifications

Over a dozen training badges earned at IBM.

Mathematics BSc Hons - York University - 1991 to 1994: 2:2

5 A-Levels - Kidderminster College - 1989 to 1991: Pure Maths - A, Applied Maths - A, Physics - A, Chemistry - B, General Studies - C

9 GCSEs - Lacon Childe School - 1984 to 1989: Including Maths and English.

Patents

US20120226656: Scalable mechanism for resolving cell-level access from sets of dimensional access rules

PCT/US2006/011993: Automatically moving multidimensional data between live datacubes of enterprise software systems

PCT/IB2005/004113: Export queue for an enterprise software system

PCT/US2003/030983: Node-level modification during execution of an enterprise planning model

PCT/US2003/029024: Deploying multiple enterprise planning models across clusters of applications servers

Personal Details

Date of Birth: 1973-02-01

Driving Licence: Clean & Full

Nationality: British

Current Location: York, England