35 Restful & Socket for Market Data Interface and Capture

35 RESTful & Socket for Market Data Interface and Capture #

Hello, I’m Jingxiao.

In the previous lesson, we introduced the trading mode of exchanges, common concepts of RESTful interfaces for cryptocurrency exchanges, and how to call RESTful interfaces for order operations. As we all know, the premise of buying and selling operations is that you need to know the latest market conditions. In this lesson, I will introduce another important part of the underlying trading system, the integration and fetching of market data.

The most important aspects of market data are real-time and validity. The market conditions change in an instant, and the suitable time window for buying and selling may only last for a few seconds. In high-frequency trading, suitable buying and selling opportunities may even be in milliseconds. It is worth noting that even with the speed of light, a network request from Beijing to the United States takes several hundred milliseconds to propagate. Not to mention the time consumed by establishing an HTTP connection using an interpreted language like Python.

After learning in the previous lesson, you should have a basic understanding of trading, which is also the foundation of what we will learn today. Next, we will start with the matching mode of exchanges, and then introduce the types of market data. After that, I will guide you through the module for fetching market data based on Websockets.

Market Data #

In the previous section, we mentioned that an exchange is a public matching platform between buyers and sellers. Buyers and sellers submit the quantity of goods they need/can provide and the price they are willing to offer/accept to the exchange, which then matches the trades based on fairness principles.

So how does the matching process work? Let’s imagine you are a human Bitcoin exchange. A large number of trade orders are coming to you, and you need to decide how to make the trades fair.

The most intuitive approach is to separate the buy and sell orders into two tables and arrange them in descending order by price. The following graphs are the order books for the buy and sell orders.

Buy Orders

Sell Orders

If the highest buy price is lower than the lowest sell price, then no trade will occur. This is usually the normal state of the order book that you see.

If the highest buy price is the same as the lowest sell price, then a match will be attempted. For example, if BTC is trading at 9002.01, a match will occur, and 0.0330 BTC will be traded at the price of 9002.01. After the trade is completed, the remaining portion of the order from Xiaolin (0.1126 - 0.0330 = 0.0796 BTC) will still be listed in the order book.

But you may wonder, what happens if the buy and sell prices intersect? In fact, this situation does not occur. Let’s consider the following scenario:

If you try to add a new buy order to the order book with a price higher than the highest existing buy price and the highest sell price, it will match directly with the lowest sell price. Once the lowest sell order is fulfilled, it will continue to fill the second lowest sell order until the buy order is completely fulfilled. The same applies the other way around. Therefore, prices in the order book do not intersect.

Of course, please note that I am only discussing the trading mechanism for limit orders here. There are slight differences in the trading rules for market orders, which I won’t go into detail about for now, but it’s important to have a concept of it.

In fact, at this point, the concept of “market data” for exchanges becomes apparent. There are mainly two types of market data: the order book and tick data.

By hiding the specific users in the order book and merging orders with the same price, we can obtain a consolidated order book like the one shown below. We mainly focus on the numeric part on the right, where:

  • In the upper part, the red numbers in the first column represent the sell prices for BTC, the middle column shows the total quantity of BTC in each price range, and the rightmost column shows the accumulated order quantity from the lowest sell price up to the current price range.
  • In the large font section in the middle, 9994.10 USD is the current market price, which is the price of the last trade.
  • The meaning of the green section at the bottom is similar to the upper section, but it represents the buy orders and their corresponding quantities.

Gemini Order Book

Order book from Gemini, sourced from https://cryptowat.ch

In this graph, the lowest sell price is $6.51 higher than the highest buy price, which is commonly referred to as the “spread”. This confirms what we mentioned earlier, that the prices in the order book never intersect. Additionally, a small spread indicates a highly active exchange.

Every matching process results in a trade. Both the sellers and buyers are happy, and the exchange happily notifies the subscribers of market data: a trade just occurred, the price of the trade, and the quantity exchanged. This data is called tick data.

With this data, we have a grasp of the current state of the exchange and can start making moves.

Introduction to Websocket #

As mentioned at the beginning of this article: market data values timeliness. Therefore, the delay between the generation of market data by the exchange and its propagation to our program should be as low as possible. Generally, exchanges also provide REST interfaces for fetching market data. For example, the following code:

import requests
import timeit

def get_orderbook():
  orderbook = requests.get("https://api.gemini.com/v1/book/btcusd").json()

n = 10
latency = timeit.timeit('get_orderbook()', setup='from __main__ import get_orderbook', number=n) * 1.0 / n
print('Latency is {} ms'.format(latency * 1000))

I tested this code on a server in a city near New York in the United States. As you can see, the average latency for accessing the orderbook is about 0.25 seconds. Obviously, this latency would be even greater if the test was done in China. The two cities in the United States are relatively close, so why is the latency so large?

This is because the REST interface is essentially an HTTP interface, with a TCP/TLS socket connection underneath. Each REST request typically requires a new TCP/TLS handshake. After the request is completed, the connection is disconnected. This process is much slower than we imagined.

To illustrate this point, let’s conduct an experiment in the same city. I connected to Gemini’s server in New York from a server near New York using TCP/SSL handshake. How long did it take?

curl -w "TCP handshake: %{time_connect}s, SSL handshake: %{time_appconnect}s\n" -so /dev/null https://www.gemini.com

TCP handshake: 0.072758s, SSL handshake: 0.119409s

The result shows that the process of establishing an HTTP connection takes up more than half of the time! In other words, every time we use a REST request, we waste more than half of the time on establishing a connection with the server. This is obviously inefficient. Naturally, you may wonder if we can establish one connection for multiple communications?

In fact, some Python HTTP request libraries also support reusing underlying TCP/SSL connections. However, that method is more complex and requires server support as well. What should we do then? In the presence of WebSocket, we don’t have to go the extra mile.

Let me introduce WebSocket. WebSocket is a protocol for full-duplex, two-way communication over a single TCP/TLS connection. WebSocket allows for easier and more efficient data exchange between the client and the server, and the server can also actively push data to the client. In the WebSocket API, the browser and server only need to complete a handshake once, and they can directly establish a persistent connection and engage in bi-directional data transmission.

The concept may sound delightful, but it’s still somewhat abstract. To help you quickly understand the previous statement, let’s take a look at two simple examples. Without further ado, let’s start with some code:

import websocket
import thread

# Called when a message is received from the server
def on_message(ws, message):
    print('Received: ' + message)

# Called when the connection with the server is established
def on_open(ws):
    # Thread running function
    def gao():
        # Send numbers 0-4 to the server, with a 0.01 second delay between each send
        for i in range(5):
            time.sleep(0.01)
            msg="{0}".format(i)
            ws.send(msg)
            print('Sent: ' + msg)
        # Wait for 1 second to receive a reply from the server
        time.sleep(1)
        
        # Close the WebSocket connection
        ws.close()
        print("Websocket closed")
    
    # Run the gao() function in another thread
    thread.start_new_thread(gao, ())
if __name__ == "__main__":
    ws = websocket.WebSocketApp("ws://echo.websocket.org/",
                              on_message = on_message,
                              on_open = on_open)
    
    ws.run_forever()

This code tries to establish a connection with wss://echo.websocket.org. When the connection is established, it starts a thread that continuously sends 5 messages to the server.

From the output, we can see that while we are sending messages continuously, we are also constantly receiving messages. Unlike with REST, where we have to wait for the server to complete the request and fully respond before making the next request, we are both requesting and receiving messages at the same time, which is what was mentioned earlier as “full duplex”.

REST (HTTP) unidirectional request-response diagram

REST (HTTP) unidirectional request-response diagram

Websocket bidirectional request-response diagram

Websocket bidirectional request-response diagram

Now let’s look at the second piece of code. To explain “bidirectional”, we’ll look at an example of getting Gemini’s order book.

import ssl
import websocket
import json

# Global counter
count = 5

def on_message(ws, message):
    global count
    print(message)
    count -= 1
    # Close websocket connection after receiving 5 messages
    if count == 0:
        ws.close()

if __name__ == "__main__":
    ws = websocket.WebSocketApp(
        "wss://api.gemini.com/v1/marketdata/btcusd?top_of_book=true&offers=true",
        on_message=on_message)
    ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

We can see that after establishing a connection with Gemini, we do not send any messages to the server, there are no requests, yet the server continuously pushes data to us. This is much more efficient compared to the communication style of REST APIs where we get a reply for each request we send!

Therefore, compared to REST, Websocket is a more real-time and efficient way of exchanging data. However, the downside is obvious: because the requests and responses are asynchronous, it makes the logic for controlling the program’s state more complex. We will have a deeper understanding of this in the following content.

Price Capture Module #

With the basic concept of Websockets, we have mastered the second way to connect to the exchange.

In fact, Gemini provides two types of Websocket interfaces, one is the Public interface, and the other is the Private interface.

The Public interface provides orderbook service, which means everyone can see the current bid and depth, which is what we just discussed in detail in the previous lesson.

The Private interface is related to order placement operations that we discussed in the previous lesson. You will receive notifications for orders that have been fully or partially executed, and other changes.

Let’s take an orderbook web scraper as an example to see how to capture orderbook information. The code below provides a detailed implementation of a typical web scraper, which utilizes class encapsulation. I hope you don’t forget the purpose of this course, which is to understand how Python is applied in practical engineering:

import copy
import json
import ssl
import time
import websocket


class OrderBook(object):

    BIDS = 'bid'
    ASKS = 'ask'

    def __init__(self, limit=20):

        self.limit = limit

        # (price, amount)
        self.bids = {}
        self.asks = {}

        self.bids_sorted = []
        self.asks_sorted = []

    def insert(self, price, amount, direction):
        if direction == self.BIDS:
            if amount == 0:
                if price in self.bids:
                    del self.bids[price]
            else:
                self.bids[price] = amount
        elif direction == self.ASKS:
            if amount == 0:
                if price in self.asks:
                    del self.asks[price]
            else:
                self.asks[price] = amount
        else:
            print('WARNING: unknown direction {}'.format(direction))

    def sort_and_truncate(self):
        # sort
        self.bids_sorted = sorted([(price, amount) for price, amount in self.bids.items()], reverse=True)
        self.asks_sorted = sorted([(price, amount) for price, amount in self.asks.items()])

        # truncate
        self.bids_sorted = self.bids_sorted[:self.limit]
        self.asks_sorted = self.asks_sorted[:self.limit]

        # copy back to bids and asks
        self.bids = dict(self.bids_sorted)
        self.asks = dict(self.asks_sorted)

    def get_copy_of_bids_and_asks(self):
return copy.deepcopy(self.bids_sorted), copy.deepcopy(self.asks_sorted)


class Crawler:
    def __init__(self, symbol, output_file):
        self.orderbook = OrderBook(limit=10)
        self.output_file = output_file

        self.ws = websocket.WebSocketApp('wss://api.gemini.com/v1/marketdata/{}'.format(symbol),
                                         on_message = lambda ws, message: self.on_message(message))
        self.ws.run_forever(sslopt={'cert_reqs': ssl.CERT_NONE})

    def on_message(self, message):
        # Process the received message and pass it to the orderbook
        data = json.loads(message)
        for event in data['events']:
            price, amount, direction = float(event['price']), float(event['remaining']), event['side']
            self.orderbook.insert(price, amount, direction)

        # Organize the orderbook, sort, and select only the ones we need
        self.orderbook.sort_and_truncate()

        # Output to file
        with open(self.output_file, 'a+') as f:
            bids, asks = self.orderbook.get_copy_of_bids_and_asks()
            output = {
                'bids': bids,
                'asks': asks,
                'ts': int(time.time() * 1000)
            }
            f.write(json.dumps(output) + '\n')
if __name__ == '__main__':
    # Create a Crawler object with symbol 'BTCUSD' and output file 'BTCUSD.txt'
    crawler = Crawler(symbol='BTCUSD', output_file='BTCUSD.txt')

###### Output #######

{"bids": [[11398.73, 0.96304843], [11398.72, 0.98914437], [11397.32, 1.0], [11396.13, 2.0], [11395.95, 2.0], [11395.87, 1.0], [11394.09, 0.11803397], [11394.08, 1.0], [11393.59, 0.1612581], [11392.96, 1.0]], "asks": [[11407.42, 1.30814001], [11407.92, 1.0], [11409.48, 2.0], [11409.66, 2.0], [11412.15, 0.525], [11412.42, 1.0], [11413.77, 0.11803397], [11413.99, 0.5], [11414.28, 1.0], [11414.72, 1.0]], "ts": 1562558996535}
{"bids": [[11398.73, 0.96304843], [11398.72, 0.98914437], [11397.32, 1.0], [11396.13, 2.0], [11395.95, 2.0], [11395.87, 1.0], [11394.09, 0.11803397], [11394.08, 1.0], [11393.59, 0.1612581], [11392.96, 1.0]], "asks": [[11407.42, 1.30814001], [11407.92, 1.0], [11409.48, 2.0], [11409.66, 2.0], [11412.15, 0.525], [11412.42, 1.0], [11413.77, 0.11803397], [11413.99, 0.5], [11414.28, 1.0], [11414.72, 1.0]], "ts": 1562558997377}
{"bids": [[11398.73, 0.96304843], [11398.72, 0.98914437], [11397.32, 1.0], [11396.13, 2.0], [11395.95, 2.0], [11395.87, 1.0], [11394.09, 0.11803397], [11394.08, 1.0], [11393.59, 0.1612581], [11392.96, 1.0]], "asks": [[11407.42, 1.30814001], [11409.48, 2.0], [11409.66, 2.0], [11412.15, 0.525], [11412.42, 1.0], [11413.77, 0.11803397], [11413.99, 0.5], [11414.28, 1.0], [11414.72, 1.0]], "ts": 1562558997765}
{"bids": [[11398.73, 0.96304843], [11398.72, 0.98914437], [11397.32, 1.0], [11396.13, 2.0], [11395.95, 2.0], [11395.87, 1.0], [11394.09, 0.11803397], [11394.08, 1.0], [11393.59, 0.1612581], [11392.96, 1.0]], "asks": [[11407.42, 1.30814001], [11409.48, 2.0], [11409.66, 2.0], [11412.15, 0.525], [11413.77, 0.11803397], [11413.99, 0.5], [11414.28, 1.0], [11414.72, 1.0]], "ts": 1562558998638}
{"bids": [[11398.73, 0.97131753], [11398.72, 0.98914437], [11397.32, 1.0], [11396.13, 2.0], [11395.95, 2.0], [11395.87, 1.0], [11394.09, 0.11803397], [11394.08, 1.0], [11393.59, 0.1612581], [11392.96, 1.0]], "asks": [[11407.42, 1.30814001], [11409.48, 2.0], [11409.66, 2.0], [11412.15, 0.525], [11413.77, 0.11803397], [11413.99, 0.5], [11414.28, 1.0], [11414.72, 1.0]], "ts": 1562558998645}
{"bids": [[11398.73, 0.97131753], [11398.72, 0.98914437], [11397.32, 1.0], [11396.13, 2.0], [11395.87, 1.0], [11394.09, 0.11803397], [11394.08, 1.0], [11393.59, 0.1612581], [11392.96, 1.0]], "asks": [[11407.42, 1.30814001], [11409.48, 2.0], [11409.66, 2.0], [11412.15, 0.525], [11413.77, 0.11803397], [11413.99, 0.5], [11414.28, 1.0], [11414.72, 1.0]], "ts": 1562558998748}

The code is quite long, let’s explain it in detail.

At the beginning of this code, there is a class called Crawler which is used to store related data structures. The bids and asks dictionaries are used to store the buy and sell orders at the current time. In addition, we also maintain two sorted arrays: bids_sorted and asks_sorted. The constructor takes a parameter limit that specifies how many data points to keep for the bids and asks in the order book. For many strategies, the top 5 data points are often sufficient, but here we choose the top 10.

Moving on, the insert() function is used to insert a data point into the order book. It’s important to note that if the amount corresponding to a certain price is 0, it means that the data point no longer exists and can be deleted. The data points inserted may be in random order, so when needed, we sort the bids and asks arrays and select the specified number of data points. This is exactly what the sort_and_truncate() function does. We call it to sort and truncate the bids and asks arrays, and then save the results back to bids and asks.

Next is the get_copy_of_bids_and_asks() function, which returns the sorted bids and asks arrays. A deep copy is used here because if we return the arrays directly, we would be returning pointers to bids_sorted and asks_sorted. Consequently, when the sort_and_truncate() function is called again, the content of both arrays would be altered, potentially causing a bug.

Finally, let’s take a look at the Crawler class. The constructor declares an orderbook and defines a WebSocket to receive data from the exchange. One thing to note here is that the callback function on_message() is a class member function. Therefore, as you may have noticed, its first parameter is self. If we wrote it as on_message = self.on_message, it would result in an error.

To avoid this issue, we need to wrap the function again. Here, I use an anonymous function we learned earlier to pass the intermediate state. Note that we only need the message, so we pass that in.

The remaining part should be clear. The on_message callback function, upon receiving a new tick, decodes the information, enumerates all the changes received, inserts them into the order book, sorts the order book, and finally outputs it together with the timestamp.

Although this piece of code may seem long, after breaking it down, you may notice that it covers topics we have already learned. This is why I repeatedly emphasize the importance of building a solid foundation. If any part of the content becomes unfamiliar to you (such as object-oriented programming concepts), make sure to review it promptly. That way, learning more complex concepts will become much easier.

So, going back to the main topic. The previous code is mainly for capturing information about the order book. In fact, when establishing the WebSocket data stream for the Gemini exchange, the initial message is often very large because it contains all the order book information at that moment. This is called the initial data. Subsequent messages are modifications based on the initial data and can be handled directly.

Summary #

In this lesson, we built upon the previous lesson and started by discussing delegate accounts. Then, we explained the definition, working mechanism, and usage of WebSocket. Lastly, we provided an example to teach you how to retrieve information from the Orderbook. While studying the content of this lesson, I hope you can relate it to the previous lesson, carefully think about the differences between WebSocket and RESTful, and try to summarize the applicable scope of different models in network programming.

Thought Question #

Finally, I will leave you with a thought question. Do WebSockets drop packets? If they do, what will happen to the Orderbook crawler? How can we avoid this situation? Feel free to leave a comment and discuss it with me. You are also welcome to share this article.