Velocity 2011 - Part 3: Wednesday (2nd day)
My notes on the second conference day at the Velocity Conference.
The keynotes where again a highlight, to be topped only by the talk about Automating for Success: Production Begins in Development which happened to confirm all my theories about web operations and package-based deployment :-)
Videos are available on the Velocity 2011 Videos page, slides can be found on the Velocity 2011 Speakers Slides and Video page.
Read also about the Workshops and the first day.
Damon Edwards, Co-founter DTO Solutions, DevOps Days organizer, DevOps Cafe
The keynotes where again a highlight, to be topped only by the talk about Automating for Success: Production Begins in Development which happened to confirm all my theories about web operations and package-based deployment :-)
Videos are available on the Velocity 2011 Videos page, slides can be found on the Velocity 2011 Speakers Slides and Video page.
Read also about the Workshops and the first day.
Keynotes Thursday
World IPv6 Day: Lessons Learned
Ian Flint, yahoo- http://www.youtube.com/watch?v=T04o6bQN8Ls
- Last /8 net assigned in 2011
- NAT is bad for geolocating clients
- bad for business
- bad for targeting
- bad for business
- What is the catch of using IPv6?
- 0.2% of users have IPv6 so far
- dual stack setups oftenly have broken IPv6 setups, browsers prefer IPv6
- OS timeout for switching from IPv6 to IPv4 is long (Linux/Windows 21sec, OS X 75sec, phones no fallback)
- 0.2% of users have IPv6 so far
- Checken/egg problem: Which website will go first dual stack?
- All of them: 434 participants signed up for World IPv6 Day
- June 8, 2011
- All of them: 434 participants signed up for World IPv6 Day
- Yahoo implementation details for yahoo.com
- 37 markets
- served from 10 datacenters
- setup IPv6 proxy server in 7 locations, reduce risk of turning on IPv6
- Install 6to4 Relays in all peering points
- Certify all network gear at scale
- Retrofit custom global DNS
- Retrofit DOS protection layer
- Retrofit Audience Data Pipeline
- 37 markets
- IPv6 Test in 38 languages, user help pages
- 2 15-minute test before the IPv6 day
- first test showed that problematic health checks in the DNS infrastructure routed all India traffic to Santa Clara
- first test showed that problematic health checks in the DNS infrastructure routed all India traffic to Santa Clara
- Panning for Decision Points
- When would be things bad enough to force Yahoo to roll back
- Never do big changes at times of traffic changes
- When would be things bad enough to force Yahoo to roll back
- Always make sure you can look at things from more than one point of view
- Practice makes perfect. For a major change always run some tests before
Facebook Open Compute & Other Infrastructure
Jonathan Heiliger, facebook- http://www.youtube.com/watch?v=urG0dQ7kc3w
- Very good !!!
- Growth of users was also matched by growth of innovation and speed of change
- This is very unusal, usually innovation speed becomes less as companies grow
- This is very unusal, usually innovation speed becomes less as companies grow
- Run down of facebook of the growth story
- HPHP brought a great improvement for site performance
- power consumption became a big issue, decision to look at all parts involved
- HPHP brought a great improvement for site performance
- facebook started building their own datacenter and servers
- Conclusions:
- Make audacious bets and iterate quickly
- Smart and hungry beats large and capable every time
- Make it work
- Manage risk with hedges
- Make audacious bets and iterate quickly
Velocity Culture
Jon Jenkins, Amazon- http://www.youtube.com/watch?v=dxk8b9rSKOo
- http://assets.en.oreilly.com/1/event/60/Velocity%20Culture%20Presentation.pdf
- Web performance drives real value for the business
- Case studies from bing, google, shopzilla, msn show this
- Steve Sauders did a lot for that
- Case studies from bing, google, shopzilla, msn show this
- What about operations? How does ops provide value for business
- What if the size of your server fleet could be totally flexible?
- Case study 1: Downscaling
- weekly traffic patterns high and low
- at amazon up to 39% server capacity goes to waste
- for high traffic months this can be even up to 75%
- Since November 10, 2010 all amazon.com traffic is served by EC2
- Reduced spending on server capacity
- Fleet scales dynamically in increments as small as a single host
- Reduced spending on server capacity
- weekly traffic patterns high and low
- Case study 2: Continuous Deployment
- Mean time between deployments: 11.2sec
- 1079 deployments per hour maximum rate in May 2011
- Deployments roll through server groups
- Problems: Complex workflow, slow, error scenarios very complex to handle
- Problems: Complex workflow, slow, error scenarios very complex to handle
- Solution: If capacity is unlimited then one could simply spawn a new set of server groups
- More and more deployments use this method
- 75% reduction in outages triggered by software deployments since 2006
- 90% reductiion in outage minutes triggered by software deployments
- instantaneous automated rollback (switch LB back to old server group)
- Reduction in complexity, no upgrades on server, just make new servers
- 75% reduction in outages triggered by software deployments since 2006
- Mean time between deployments: 11.2sec
- The Challange for Velocity 2012
- save millions $ by optimizing server utilizations
- became faster and more available by using flexible server capacity
- Please come back in 2012 and tell your story how ops managed to contribute business value
- save millions $ by optimizing server utilizations
Artur on SSD
Artur Bergman, fastly.com- http://www.youtube.com/watch?v=H7PJ1oeEyGg
- Mac Laptop boot time: 13 seconds
- If you don’t use SSDs, you waste your life
- fastly uses only (or mostly) SSDs in their data center
- Show this to the boss to get an SSD :-)
Cisco and OpenStack
Lew Tucker, Cisco- http://www.youtube.com/watch?v=kWP9VE4K8cU
- http://assets.en.oreilly.com/1/event/60/Cisco%20and%20Open%20Stack%20Presentation.ppt
- Modern web pioneers are DIY "builders", you need to build your own because you can’t buy it
- Enterprise is scale up architecture, HA failover model. Commercial
- Web is scale out architecture, designed for failure. Open Source
- OpenStack
State of the Infrastructure
Rachel Chalmers, The 451 Group- http://www.youtube.com/watch?v=EbmuxSeVnpY
- http://assets.en.oreilly.com/1/event/60/State%20of%20the%20Infrastructure%20%20Presentation.zip
- Tells many anecdotes and details about IT innovators and their learning
- VMware is probably the last company to go from 0 to 1 bil $ through a purely proprietary licensing model
- Modern infrastructure is open source
- We are already at the brink of a post-Windows world
- !!! nice
Holistic Performance
John Resig- http://www.youtube.com/watch?v=WuMEQN7aph0
- About jQuery
- Client-side JavaScript performance issues
- Analyzing performance not trivial, e.g. is wall-clock time relevant? Or CPU consumption?
- Memory consumption, what about memory leaks?
- Parse time, the more you download the more to parse
- Battery consumption (Mobile!)
- Analyzing performance not trivial, e.g. is wall-clock time relevant? Or CPU consumption?
- Example: Dictionary Lookups in JavaScript
- Most solutions optimize for file (download) size
- Bad parse time
- Succinct Trie is the best both by file size, memory consumption and lookup performance
- Most solutions optimize for file (download) size
- dynaTrace - useful tool to dig into the details
- jQuery project
- Bug reports need a reproducible test case
- Performance enhancements need to be proven through http://jsperf.com
- Bug reports need a reproducible test case
Lightning Demos
Page Speed
Michael Schneider, Google- New work on page speed
- page speed firefox addon
- Now also for chrome
- page speed is a tab in the web inspector
- page speed is a tool to analyze page load times and suggest optimizations
- http://pagespeed.googlelabs.com online version of page speed
- get mobile report to analyze page load timings for mobile devices
- get mobile report to analyze page load timings for mobile devices
- Experimental hints about avoiding unneccessary reflows
dynaTrace
Andreas Grabner, Dynatrace- http://ajax.dynatrace.com
- What is new in dynaTrace AJAX Edition 3
- Compare IE and FF performance side-by-side
- speed of the Web: new service, compare your own website against Alexa 1000
- slides and other useful info at http://blog.dynatrace.com
Chrome Developer Tools
Paul Irish, Google Chrome relations team- New things
- Task manager: Right click on a task gives many internals and details, e.g. Number of Goats Teleported
- JavaScript Performance APIs:
- performance.timing
- performance.memory (need --enable-memory-info command line option)
- window.onerror
- console.profile() and console.profiles[] - CPU profiling also as an JS object. Can be send back to the webserver for analysis
- console.markTimeline() - set markers that show up in the Timeline to help group JS actions
- performance.timing
- Heap Profiler
- dig into memory consumptions
- snapshot diffs between different states
- find memory leaks
- dig into memory consumptions
- Remote Debugging
- --remote-debugging-port # command line option
- Developer Tools run a little web server
- allows remote analysis
- This is part of WebKit and should be soon available for all webkit browsers
- --remote-debugging-port # command line option
showslow.com
Sergey Chernyshev, showslow- collects performance data from various services and show it
- dashboard-like overview and drill down into detail
- help create a business case for performance optimizations
Cast - The Open Deployment Plattform
Paul Querna, Rackspace- Deployment as a RESTful API
- Service Management
- Start, Stop, Restart
- Start, Stop, Restart
- Version Management
- Distribution of release
- Upgrade
- Rollback
- Distribution of release
- Service Monitoring
- Logfiles
- Network Ports
- Processes
- Logfiles
- Service Coordination
- ?
- ?
- Open Source
- http://cast-project.org
Making the Web instant
Arvind Jain & Sreeram Ramachandran, Google- Still, most pages take 5 seconds to load
- How to make it instant?
- We humans are not as fast as computers
- It takes about 300ms between onMouseOver and onClick
- This time can be used to optimize loading by prefetching the content
- We humans are not as fast as computers
- Google search with Google Instant Pages
- Predict & preload
- Guess what the user will click and load the while the user still thinks about what to click next
- Works only on Chrome so far
- Chrome loads target in hidden frame and replaces frame
- Predict & preload
- Instant everywhere
- Chrome supports preloading pages when typing into the address bar
- Chrome supports preloading pages when typing into the address bar
- Everybody can use it, web page authors usually know more about the next likely page
- Instruct the browser that this is the likely next page
- Instruct the browser that this is the likely next page
- Beware:
- This creates more load on the client and on the server!
- Accounting (ads, analytics) gets more difficult
- don’t want to count hidden pages that the user never saw
- google submitted an RFC to the W3C to support an API for page visibility API to determine if a page is actually visible to the user or still in
- don’t want to count hidden pages that the user never saw
- This creates more load on the client and on the server!
- Benefit: Better and faster internet browsing experience
Wikia: Going Active/Active
Jason Cook, Wikia- Active/Active means rear everywhere, write in master data center
- Wikia built on top of MediaWiki
- Story of Wikia with typical startup problems
- What about earthquakes? Time To Recover?
- FULL DR Site
- In a nuclear bunker
- In the middle of nowhere in Iowa
- In a nuclear bunker
Automation for Success: Production begins in Development
Lee Thompson, CTO Travel/Transportation, HPDamon Edwards, Co-founter DTO Solutions, DevOps Days organizer, DevOps Cafe
- http://www.slideshare.net/dev2ops/velocity-2011-production-begins-in-development
- Very good, especially if you believe that Chef and Puppet are not the end of innovation !!!
- Webtone
- Clouds
- DevOps
- Continuous Deployment/Delivery
- Lean Startup
- Clouds
- How to measure DevOps success
- Alignment - how well do different parts of the organization work together
- Quality - of processes and deliveries
- Cycle Time
- Alignment - how well do different parts of the organization work together
- Risk tolerance
- How much change do you want, how much risk can you tolerate?
- "Move fast and break thungs. Unless you are breaking stuff, you are not moving fast enough." - Mark Zuckerberg
- How much change do you want, how much risk can you tolerate?
- Webtone utilities
- Reliable
- Repeatable
- Scalable
- Reliable
- It all starts in Development!
- But what do we tell them to do?
- and how to we get them to do it?
- But what do we tell them to do?
- Share ownership of availability
- Developers must wear pagers (on-call)
- Incident command trainig so everyone knows their roles
- Notification mechanism?
- Access provisioning (emergency access for people who usually don’t have it)?
- Developers must wear pagers (on-call)
- Non-functional requirements are first class citizens
- Strive for parity between dev & prod
- should be really the same
- test data fictures for all environment
- implement mock services for major infrastructure pieces for Developer users (usually Ops needs to help with this), typically authentication systems.
- Continuous integration means integrate early
- Use all the deployment, config and packaging tools in dev
- should be really the same
- Push config management discipline back to Dev
- Dev is about creating variation, Ops is about eliminating variation
- Augment deployment toolchain to support the variation
- Do developers use the tools?
- Accept config contributions and patches from dev
- Dev is about creating variation, Ops is about eliminating variation
- Packaging … it’s not just for the OS
- high performing web operations organizations needs to take change management serious
- Strict versioning
- It’s about beeing idempotent
- Transfer packaging responsibility to dev
- Define the packaging constructs you will support
- high performing web operations organizations needs to take change management serious
- Config is code
- if it’s code it needs to be managed like code
- Should be transparent and identical SDLC in both dev and ops
- Avoid or eliminate asymetric release processes (config = software)
- if it’s code it needs to be managed like code
- Tailor release artifacts to roles
- "Small teams make better software"
- One team stuck should not prevent other teams from releasing (org coupling)
- Large codebases suffer software entropy effects
- Build an infrastructure that can reliably manage lots of smaller artifacts
- Org conflict is a good time to suggest breaking up a codebase into separate concerns
- "Small teams make better software"
- Standard management vocabulary
- Consistent and expected management behaviour
- Accross components and releases
- "start, stop, status, update, install …"
- Consistent and expected management behaviour
- Rollback
- Rollback that works
- Tested and proven
- Test rollback for each release
- Rollback that works
- Standard metrics abstractions
- Dev surface metrics to Ops
- Use a standard framework
- https://github.com/codehale/metrics
- Use standard types (gauge, counter, timer …)
- Ops knows what to expect and how to visualize
- Dev surface metrics to Ops
- Push test ownership to the edges
- QA = Quality Assurance
- QA writing tests = bottleneck and avoiding responsability
- Test Driven Development
- Test Driven Operations (yes, you too!)
- Bottom line: Everyone owns quality
- QA = Quality Assurance
- Test outside of the box
- Crowd test, A/B test
- Simulation
- Crowd test, A/B test
- Continuos Delivery
- Delivery Pipelines
- Continous Deployment
- Don’t be too dogmatic, a hybrid model is also good
- Delivery Pipelines
DevOps Metrics: Measuring the devops gap
Patrick Debois Andrew Shafer- http://assets.en.oreilly.com/1/event/60/Measuring%20the%20devops%20gap%20Presentation.pdf
- Very good presentation on how to get going with DevOps
Comments
Post a Comment