I am an experienced SRE with 20 years experience designing architecture for, building and running cloud-based global scale infrastructures. I am a polygot developer of infrastructure-centric tooling. I am happy and able to lead technical teams or work as an individual contributor. I believe technology should be a consequence of the application of good fundamental principals. I have been working with AWS since 2011 (pre-ELB) and have worked on traffic scales where turning off whole countries is the appropriate response to operational duress.
Full-time employee at Apple as an SRE.
Contractor at Apple as an SRE.
Centrica is a multi-national energy supplier.
I am technical lead of an SRE function charged with the implementation of a large (tens of AWS payers, hundreds of AWS accounts, tens of Azure tenants) multi-cloud management platform.
We practice test-driven infrastructure development with Go and Terraform to continuously deploy Guard Rails and tooling across the cloud estate. A guiding principal is the emission and collection of telemetry to enable alerting, escalation, cost and performance improvements and post-mortem analysis.
I am a hands-on developer who also sets technical direction along with planning and running sprints for a team of 11. I also engage in internal consulting on SRE principals with internal teams.
Hive is a domestic IoT platform prominent in the UK.
As Technical lead for internal tooling I had two primary responsibilities: the creation and maintenance of an internal infrastructure management suite and the building and ongoing operation of an infrastructure-wide log aggregation facility. I championed the adoption of telemetry for alerting, escalation, cost and performance improvements and post-mortem analysis.
The primary technologies involved were AWS, Puppet, ELK and a great deal of bespoke Ruby code.
I spoke publicly on these topics in 2017.
I also designed a novel EC2 to container migration strategy that enabled a staggered migration of workloads towards orchestrated containers without requiring an all-at-once rebuilding of the infrastructure.
Space Ape is a mobile game development company recent acquired by Supercell.
I was brought into Space Ape shortly after the launch of their first game to replace the single infrastructure engineer who had helped them launch. I was charged with two tasks: operate and improve all aspects of the infrastructure underneath a very rapidly growing hit game with a global player base and expand the infrastructure team as the company grew.
The primary technologies involved were AWS, Chef and Ruby.
Upon my departure I was leading a team of 5 running and operating the game platform I had designed and we had built for two live titles played by around 0.5 million users daily, with one title in development. This platform included an API against which game teams could integrate to build and manage the infrastructure serving their game globally.
I spoke publicly about this in 2014.
Playfish was a large Facebook game developer with 26 titles, of which the best known was The Sims Social. The latter alone had over 1.5 million daily active users.
Playfish was a very early adopter of the then-new AWS. I was brought in to help deal with the massive scale they were experiencing as they grew to tens of millions of players. Since AWS consisted of only S3, EC2 and a few other peripheral services (no ELB!) a great deal of this work was low-level Linux kernel and TCP performance tuning. I also had significant dealings with the large and heavily sharded MySQL infrastructure.
During my tenure I was also one of two primary architects and developers of a Chef-based infrastructure management platform. This was described by Chef as the most sophisticated in Europe and second only to that of Facebook in terms of size. In 2012 we managed an estate of approximately 3,000 EC2 instances.
I spoke publicly on this topic in 2012.
Kurtosys is a SaaS service that provides services to the hedge fund industry.
This role was a UNIX administration role typical of the era that encompassed Linux and Solaris administration, colocation activities, developer support and application support within the Java Application Server ecosystem.
HPD was a software development house and very early SaaS provider to the Factoring industry.
This role was incredibly varied and included support for UNIX systems (AIX, FreeBSD, HP-UX, Linux, Solaris) and IBM AS/400 (iSeries) systems. In modern terms the work be described as "full stack" in the sense that along with administering modem farms, X.25 ISDN infrastructure, backups, developer workstation, in-house mail and web servers I was also heavily involved in building out the hosting facilities for the SaaS business from router to partition layouts. It was old-school sysadmin stuff where one was as likely to be updating PCMCIA drivers as authoring shell scripts. Towards the latter end of my tenure the role moved more towards application and customer support including on-site installations at various multi-national customers.
Natwest is a high-street bank.
My first job. I wrote Assembler and PL/1 for IBM S/390 mainframes that constituted the core accounting systems of a national bank. Much of this software was written in support of what was then the largest data migration undertaken in the UK.
- My side project runson.cloud was featured on the front page of the lobste.rs technology news aggregator in January of 2021
- My side project digaws was featured in the Last Week In AWS newsletter
- I developed the first public SDK for the wavefront.com telemetry platform during their pre-launch beta.
- My consultancy was nominated as their preferred EMEA partner by Chef
- github
- [email protected]
- Telephone upon request