Ryan Ruby — Production Services Engineering

I’m an IT Professional with 25 years experience in Production Operations, Systems & Service Engineering disciplines

My experiences include:

Architecture and design of Active/Active CDN solution for Critical Path content delivery ensuring 100% user availability and zero-touch Business Continuity/Disaster Recovery
Design and implementation of Org wide standard for the secure use of Hybrid Cloud systems.
Exceeding frugality in driving deprecation of orphaned and/or unused cloud resources.
Primary responsibility of availability, administration, and zero downtime deployment of proprietary orchestration platform enabling Live Site Operations for multi-million global customer base
Team and Engineering Lead for Continuous Integration / Continuous Delivery pipeline, automated build, and administration of development / testing infrastructure
Ownership of infrastructure and delivery of code / content for multi-million dollar consumer research and editorial product team
Enterprise Data Center Site Services building and maintaining high business impact systems
Integration Engineering, Consulting and Training of customer IT Staff to fulfill the promises of Sales and Marketing and ensure client satisfaction
Directing build-out of technical training classrooms fulfilling the delivery of millions of dollars in sales annually

Areas of Expertise

Production Service Engineering / Operations

Alerting / Monitoring / Telemetry

Incident Response / Resolution / Post-Mortem Reviews

Enterprise Cloud Services Architecture / Design / Deployment

Microsoft Azure Platform Offerings

Specialization in

Azure Logic Apps

Azure Data Explorer (Kusto)

Hybrid Cloud

Subscription / Capacity Management

Role Based Access Control

High Availability / Disaster Recovery

Business Continuity Planning / Execution

Content Delivery Systems

Continuous Integration / Continuous Delivery Principles

Service Engineering Philosophy

I have a passion for the deep involvement of Live Site Operations professionals throughout the software development lifecycle.

In my experience, the best way to ensure highly available and fault tolerant services are shipped to Production is via involvement of experienced Production Operations professionals in all architecture and design of services.

Maintaining this symbiotic relationship creates the environment where High Availability, Disaster Recovery, Monitoring / Analytics, and Business Continuity Planning are not bolted on after the code is shipped. Additionally it nurtures the simplification of administration and a fuller understanding of the interconnectedness of any (even minorly) complex system.

Forcibly separating the responsibilities of live site operations and product development will inevitably grow ambiguity and misunderstanding between disciplines. This can only further resentment between teams and builds a culture of shame & blame for emergent issues.

Fix the problem, not the blame.

Likewise, merging discrete specialties into a single shared function creates an unproductive mélange of engineers working outside their personal passions.

Developers create code because they have passion for that expression of creation and are good at it.

Live Site Engineers follow their passion for solving problems impacting real users, finding root causes, and driving solutions to the next iteration.

The best product is delivered by people who are passionate about what they do.

Authentic goals of great user experiences based in customer obsession cannot be birthed from malignantly parasitic origins.

To truly come to a place where config and infrastructure are treated the same as code, a virtuous cycle must be maintained. Live Site Operations concerns must be included in new iterations of code shipped to production. Additionally, Development teams must have visibility into the Production system operations to ensure the functionality of new features.

Don’t fear transparency.

When incidents or outages occur, the onus is on the entire team to work to a solution, understand the underlying cause, modify process / telemetry / documentation / code to ensure the advancement of Production reliability. Accountability for consistent improvement of availability and user experience must be shared across all disciplines.

The result can be a culture with relentless drive towards consistent improvement in customer experience, simplified service architecture, relevant alerting, and a fuller appreciation for the perspectives and talents of the entire team.