Update 6: Some interesting changes from Twitter’s Evan Weaver: everything in RAM now, database is a backup; peaks at 300 tweets/second; every tweet followed by average 126 people; vector cache of tweet IDs; row cache; fragment cache; page cache; keep separate caches; GC makes Ruby optimization resistant so went with Scala; Thrift and HTTP are used internally; 100s internal requests for every external request; rewrote MQ but kept interface the same; 3 queues are used to load balance requests; extensive A/B testing for backwards capability; switched to C memcached client for speed; optimize critical path; faster to get the cached results from the network memory than recompute them locally.
更新6：一些感兴趣的变化，来自twitter的 Evan Weaver：所有的东西都在内存，数据是备份，每秒峰值达到300条推特，每个推评价被126个follow，tweet ids放到向量缓存；列缓存，片段缓存，页面缓存，保持缓存独立。ruby GC导致在消息上面迁移到scala；本质上使用Thrift 和 HTTP ;对于每个外部请求有100内部请求；重写MQ但是接口保持一致；使用3个队列来负载均衡；切换 c memcached 客户端提速；优化重要路径，cache提速
Update 5: Twitter on Scala. A Conversation with Steve Jenson, Alex Payne, and Robey Pointer by Bill Venners. A fascinating discussion of why Twitter moved to the Java JVM for their server infrastructure (long lived processes) and why they moved to Scala to program against it (high level language, static typing, functional). Ruby is used on the front-end but wasn’t performant or reliable enough for the back-end.
更新5：Twitter on Scala 这个topic讨论为什么twitter移到java虚拟机平台，ruby在前端比较好用，但是性能和可用性在后端不够。
Update 4: Improving Running Components at Twitter by Evan Weaver. Tells how Twitter changed their infrastructure to go from handling 3 requests to 139 requests a second. They moved to a messaging model, asynchronous process, 3 levels of cache, and moved their middleware to a mixture C and Scala/JVM.
Improving Running Components at Twitter 告诉了我们twitter如何改善基础设施，把每秒的处理能力从3个提高到139个。他们移植了消息模式，异步过程，三级缓存，中间件是c和Scala/JVM平台混合。
Update 3: Upgrading Twitter without service disruptions by Gojko Adzic. Lots of good updates on the new Twitter architecture.
Update 2: a commenter in Twitter Fails Macworld Keynote Test said this entry needs to be updated. LOL. My uneducated guess is it’s not a language or architecture problem, but more a problem of not being able to add hardware fast enough into their data center. The predictability of this problem is debatable, but once you have it, it’s hard to fix.
Update: Twitter releases Starling – light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter’s backend, and is in production across Twitter’s cluster.
Twitter started as a side project and blew up fast, going from 0 to millions of page views within a few terrifying months. Early design decisions that worked well in the small melted under the crush of new users chirping tweets to all their friends. Web darling Ruby on Rails was fingered early for the scaling problems, but Blaine Cook, Twitter’s lead architect, held Ruby blameless:
For us, it’s really about scaling horizontally – to that end, Rails and Ruby haven’t been stumbling blocks, compared to any other language or framework. The performance boosts associated with a “faster” language would give us a 10-20% improvement, but thanks to architectural changes that Ruby and Rails happily accommodated, Twitter is 10000% faster than it was in January.
If Ruby on Rails wasn’t to blame, how did Twitter learn to scale ever higher and higher?
Update: added slides Small Talk on Getting Big. Scaling a Rails App & all that Jazz
- Scaling Twitter Video by Blaine Cook.
- Scaling Twitter Slides
- Good News blog post by Rick Denatale
- Scaling Twitter blog post Patrick Joyce.
- Twitter API Traffic is 10x Twitter’s Site.
- A Small Talk on Getting Big. Scaling a Rails App & all that Jazz – really cute dog picks
- Ruby on Rails
- Mongrel – hybrid Ruby/C HTTP server designed to be small, fast, and secure
- Google Analytics
- AWStats – real-time logfile analyzer to get advanced statistics
- Over 350,000 users. The actual numbers are as always, very super super top secret.
- 600 requests per second.
- Average 200-300 connections per second. Spiking to 800 connections per second.
- MySQL handled 2,400 requests per second.
- 180 Rails instances. Uses Mongrel as the “web” server.
- 1 MySQL Server (one big 8 core box) and 1 slave. Slave is read only for statistics and reporting.
- 30+ processes for handling odd jobs.
- 8 Sun X4100s.
- Process a request in 200 milliseconds in Rails.
- Average time spent in the database is 50-100 milliseconds.
- Over 16 GB of memcached.
- Ran into very public scaling problems. The little bird of failure popped up a lot for a while.
- Originally they had no monitoring, no graphs, no statistics, which makes it hard to pinpoint and solve problems. Added Munin and Nagios. There were difficulties using tools on Solaris. Had Google analytics but the pages weren’t loading so it wasn’t that helpful
- Use caching with memcached a lot.
- For example, if getting a count is slow, you can memoize the count into memcache in a millisecond.
- Getting your friends status is complicated. There are security and other issues. So rather than doing a query, a friend’s status is updated in cache instead. It never touches the database. This gives a predictable response time frame (upper bound 20 msecs).
- ActiveRecord objects are huge so that’s why they aren’t cached. So they want to store critical attributes in a hash and lazy load the other attributes on access.
- 90% of requests are API requests. So don’t do any page/fragment caching on the front-end. The pages are so time sensitive it doesn’t do any good. But they cache API requests.
- Use message a lot. Producers produce messages, which are queued, and then are distributed to consumers. Twitter’s main functionality is to act as a messaging bridge between different formats (SMS, web, IM, etc).
- Send message to invalidate friend’s cache in the background instead of doing all individually, synchronously.
- Started with DRb, which stands for distributed Ruby. A library that allows you to send and receive messages from remote Ruby objects via TCP/IP. But it was a little flaky and single point of failure.
- Moved to Rinda, which a shared queue that uses a tuplespace model, along the lines of Linda. But the queues are persistent and the messages are lost on failure.
- Tried Erlang. Problem: How do you get a broken server running at Sunday Monday with 20,000 users waiting? The developer didn’t know. Not a lot of documentation. So it violates the use what you know rule.
- Moved to Starling, a distributed queue written in Ruby.
- Distributed queues were made to survive system crashes by writing them to disk. Other big websites take this simple approach as well.
- SMS is handled using an API supplied by third party gateway’s. It’s very expensive.
- They do a review and push out new mongrel servers. No graceful way yet.
- An internal server error is given to the user if their mongrel server is replaced.
- All servers are killed at once. A rolling blackout isn’t used because the message queue state is in the mongrels and a rolling approach would cause all the queues in the remaining mongrels to fill up.
- A lot of down time because people crawl the site and add everyone as friends. 9000 friends in 24 hours. It would take down the site.
- Build tools to detect these problems so you can pinpoint when and where they are happening.
- Be ruthless. Delete them as users.
- Plan to partition in the future. Currently they don’t. These changes have been enough so far.
- The partition scheme will be based on time, not users, because most requests are very temporally local.
- Partitioning will be difficult because of automatic memoization. They can’t guarantee read-only operations will really be read-only. May write to a read-only slave, which is really bad.
- Twitter’s API Traffic is 10x Twitter’s Site
- Their API is the most important thing Twitter has done.
- Keeping the service simple allowed developers to build on top of their infrastructure and come up with ideas that are way better than Twitter could come up with. For example, Twitterrific, which is a beautiful way to use Twitter that a small team with different priorities could create.
- Monit is used to kill process if they get too big.
- Talk to the community. Don’t hide and try to solve all problems yourself. Many brilliant people are willing to help if you ask.
- Treat your scaling plan like a business plan. Assemble a board of advisers to help you.
- Build it yourself. Twitter spent a lot of time trying other people’s solutions that just almost seemed to work, but not quite. It’s better to build some things yourself so you at least have some control and you can build in the features you need.
- Build in user limits. People will try to bust your system. Put in reasonable limits and detection mechanisms to protect your system from being killed.
- Don’t make the database the central bottleneck of doom. Not everything needs to require a gigantic join. Cache data. Think of other creative ways to get the same result. A good example is talked about in Twitter, Rails, Hammers, and 11,000 Nails per Second.
- Make your application easily partitionable from the start. Then you always have a way to scale your system.
- Realize your site is slow. Immediately add reporting to track problems.
- Optimize the database.
- Index everything. Rails won’t do this for you.
- Use explain to how your queries are running. Indexes may not be being as you expect.
- Denormalize a lot. Single handedly saved them. For example, they store all a user IDs friend IDs together, which prevented a lot of costly joins.
- Avoid complex joins.
- Avoid scanning large sets of data.
- Cache the hell out of everything. Individual active records are not cached, yet. The queries are fast enough for now.
- Test everything.
- You want to know when you deploy an application that it will render correctly.
- They have a full test suite now. So when the caching broke they were able to find the problem before going live.
- Long running processes should be abstracted to daemons.
- Use exception notifier and exception logger to get immediate notification of problems so you can address the right away.
- Don’t do stupid things.
- Scale changes what can be stupid.
- Trying to load 3000 friends at once into memory can bring a server down, but when there were only 4 friends it works great.
- Most performance comes not from the language, but from application design.
- Turn your website into an open service by creating an API. Their API is a huge reason for Twitter’s success. It allows user’s to create an ever expanding and ecosystem around Twitter that is difficult to compete with. You can never do all the work your user’s can do and you probably won’t be as creative. So open you application up and make it easy for others to integrate your application with theirs.
Posted by Abel Avram on Jun 26, 2009
- Architecture & Design
- Architecture ,
- Performance & Scalability
- C ,
- JVM ,
- Ruby on Rails ,
- Scala ,
Evan Weaver, Lead Engineer in the Services Team at Twitter, who’s primarily job is optimization and scalability, talked about Twitter’s architecture and especially the optimizations performed over the last year to improve the web site during QCon London 2009.
Most of the tools used by Twitter are open source. The stack is made up of Rails for the front side, C, Scala and Java for the middle business layer, and MySQL for storing data. Everything is kept in RAM and the database is just a backup. The Rails front end handles rendering, cache composition, DB querying and synchronous inserts. This front end mostly glues together several client services, many written in C: MySQL client, Memcached client, a JSON one, and others.
The middleware uses Memcached, Varnish for page caching, Kestrel, a MQ written in Scala, and a Comet server is in the works, also written in Scala and used for clients that want to track a large number of tweets.
Twitter started as a “content management platform not a messaging platform” so many optimizations were needed to change the initial model based on aggregated reads to the current messaging model where all users need to be updated with the latest tweets. The changes were done in three areas: cache, MQ and Memcached client.
Each tweet is tracked in average by 126 users, so there is clearly a need for caching. In the original configuration, only the API had a page cache that was invalidated each time a tweet was coming from an user, the rest of the application being cacheless:
The first architectural change was to create a write-through Vector Cache containing an array of tweet IDs which are serialized 64 bit integers. This cache has a 99% hit rate.
The second change was adding another write-through Row Cache containing database records: users and tweets. This one has a 95% hit rate and it is using Nick Kallen’s Rails plug-in called Cache Money. Nick is a Systems Architect at Twitter.
The third change was introducing a read-through Fragment Cache containing serialized versions of the tweets accessed through API clients which could be packaged in JSON, XML or Atom, with the same 95% hit rate. The fragment cache “consumes the vectors directly, and if a serialized fragment is currently cached it doesn’t load the actual row for the tweet you are trying to see so it short-circuits the database the vast majority of times”, said Evan.
Yet another change was creating a separate cache pool for the page cache. According to Evan, the page cache pool uses a generational key scheme rather than direct invalidation because clients can
send HTTPs if-modified-since and put any time stamp they want in the request path and we need to slice the array and present them with only the tweets they want to see but we don’t want to track all the possible keys that the clients have used. There was a big problem with this generational scheme because it didn’t delete all the invalid keys. Each page that was added which corresponding to the number of tweets people were receiving would push out valid data in the cache and it turned out that our cache only had a 5 hour effective life time because of all these page caches flowing through.
When the page cache was moved into its own pool, the cache misses dropped about 50%.
This is the current cache scheme employed by Twitter:
Since 80% of the Twitter traffic comes through the API, there are 2 additional levels of cache, each servicing up to 95% of the requests coming from the preceding layer. The overall cache changes, in total between 20 and 30 optimizations, brought a
10x capacity improvement, and it would have been more but we hit another bottleneck at that point … Our strategy was to add the read-through cache first, make sure it invalidates OK, and then move to a write-through cache and repair it online rather than destroying it every time a new tweet ID comes in.
Since, on average, each user has 126 followers, it means there are 126 messages placed in the queue for each tweet. Beside that, there are times when the traffic peaks, as it was during Obama’s inauguration when it reached several hundreds of tweets/second or tens of thousands messages into the queue, 3 times the normal traffic at that time. The MQ is meant to take the peak and disperse it over time so they would not have to add lots of extra hardware. Twitter’s MQ is simple: based on Memcached protocol, no ordering of jobs, no shared state between servers, all is kept in RAM and it is transactional.
The first implementation of the MQ was using Starling, written in Ruby, and did not scale well especially because Ruby’s GC which is not generational. That lead to MQ crashes because at some point the entire queue processing stopped for the GC to finish its job. A decision was made to port the MQ to Scala which is using the more mature JVM GC. The current MQ is only 1,200 lines and it runs on 3 servers.
The Memcached client optimization was intended to optimize cluster load. The current client used is libmemcached, Twitter being its most important user and contributor to the code base. Based on it, the Fragment Cache optimization over one year led to a 50x increase in page requests served per second.
Because of poor request locality, the fastest way to deal with requests is to precompute data and store it on network RAM, rather than recompute it on each server when necessary. This approach is used by the majority of Web 2.0 sites running almost completely directly from memory. The next step is “scaling writes, after scaling reads for one year. Then comes the multi co-location issue” according to Evan.