ShareCompany Internship

From Master Projects
Jump to: navigation, search


Within the next two years we will have to process and store indefinitely one million messages per second from more than 100 stock exchanges. We are currently implementing a distributed, linear scalable, fault tolerant architecture for processing these messages. How to store and query the information remains an open question. Processing 1 million messages per second 10 years ago we had to deal with 8 stock trades per second, conducted on the Amsterdam Stock Exchange. Today we have to process 200 thousand messages per second. We expect that within the next two years we have to cope with over a million messages per second. The reasons for this are:

1. Our customers are not anymore trading exclusively on the Amsterdam Stock Exchange. They are trading worldwide. Currently we are processing data from about 100 exchanges.

2. On each exchange the volume in transactions increases dramatically.

3. Our customers want more detail information which is nowadays available from the exchanges. i.e. not only do we get the last price information but also the best 10 bids and offers.

Distributed linear scalable fault tolerant architecture

The new distributed architecture we are deploying combines a publish/subscribe system with a key value store. Publishers push key/value pairs on instruments (topics) into a cluster of brokers, these brokers maintain the latest values in memory. Clients can request the latest values on an instrument or subscribe to changes on instruments. It has near linear scalability and allows for an arbitrary redundancy.

Storing the history of 1 million messages per second indefinitely

Following the challenge of processing the 1 million messages per second, arises the question of storing the stock price information. The stock price history is used in charting and data analyzing applications. Up until today we only had to store daily summaries. Now customers expect to retrieve the original historical detailed data (that is all the original messages). For efficient storing 1 million messages per second the traditional relational database seem to have its limits. Currently several NoSQL databases are gaining popularity, although promising it is unclear how to apply these techniques for querying time series data. We are searching for an affordable, linear scalable fault tolerant storage solution.


Project Proposal; research storage solutions for storing 1 million messages per second indefinitely; Compare possible existing solutions/frameworks/programming languages, Build a prototype. We actually would encourage the use of different solutions, frameworks and programming languages. One of the goals for Sharecompany is get the knowledge about the possibilities from different solutions, frameworks and programming languages. .