Leveraging Cryptocurrency Market Data to Backtest Trading Strategies

A case study with Shrimpy

01/14/2021: Kaiko's webinar with Shrimpy covers the content in this case study.

Download a PDF version of this case study here.

(5 minute read) This case study will explore how Shrimpy, a cryptocurrency trading platform, leverages Kaiko's historical trade and order book data to provide a suite of backtesting tools that can be used by traders to develop strategies for top cryptocurrency exchanges.


Accessing clean, granular, and high-quality data is the most important first step when beginning to develop a trading strategy. If you can’t trust the data you test your strategy on, you risk losing in the markets.

Shrimpy is one of the premier trading and backtesting platforms in the cryptocurrency industry, allowing traders of all types to train and test their strategies and execute them on 15+ exchanges. When developing their backtesting feature, they eventually had to decide whether to continue building the data management component, or whether to outsource it to a data provider. 

This question is what traders and trading platforms alike grapple with: build or outsource? This case study shows how Shrimpy's decision to outsource data management to Kaiko saved them valuable developer resources while guaranteeing high quality data for their clients' backtests.

The Problem: Build or Outsource Data Management?

Your job as a trader is to focus on developing a profitable strategy, not spend time sourcing, cleaning, and normalizing data for backtesting. 

Shrimpy CEO Michael McCarty said, “Building a quality trading platform already required our full attention, so introducing extra work for data management was impractical. Having a fail-proof system is nearly impossible for a small team, especially since it’s not our core business.”

Data collection is a complex undertaking, starting with the difficulties in connecting to dozens of exchanges. No two cryptocurrency exchanges are alike, which means every exchange requires a custom integration. The work doesn’t stop once an integration is complete--developers must maintain the connection 24/7 and set up sophisticated monitoring infrastructure to ensure downtime is avoided.

Even once a connection is stable and the data is collected, there is still the problem of organizing and storing the data. It requires huge amounts of time to clean data, identify gaps, resolve inaccuracies, and much much more.

Ultimately, Shrimpy decided to outsource data management to Kaiko, and build their developer API’s on top of Kaiko’s database. “Leveraging Kaiko’s data management meant we could spend more time on our core business while still being sure that our historical data was always meeting the highest standards possible.”

The Product: Highly Granular Market Data

Backtesting almost always requires tick-level data sourced directly from exchanges. You never want to test a strategy on aggregated data, because this does not reflect real market conditions. Kaiko specializes in tick-level exchange-pair data which is exactly what Shrimpy needed to develop their backtesting feature. 

An order book contains a list of all bids and asks for an asset, organized by price level.

Shrimpy's platform leverages the two most powerful types of data for backtesting:   

Trade Data: Tick-by-tick trade data includes every executed transaction on an exchange, including the price, volume, timestamp and trade direction. Trade data is useful for backtesting because it provides the most granular look at real orders that were filled on an exchange.

Order Book Snapshots: An order book is a list of all buy (bid) and sell (ask) orders for an asset, organized by price level. Order books show orders before they are executed as trades. An order book snapshot includes all bids and asks live on an order book at a given moment in time.

For backtesting, order books are the most powerful type of data because they enable traders to simulate how an order would be filled. Order book data provides insights into price slippage, the spread, and market depth for a trading pair. 

Other Data Types: Other types of data that are frequently used by traders, but not recommended, include OHLCV data and VWAP data. OHLCV, otherwise known as candlestick data, comprises the open, high, low, close, and volume over an interval of time ranging from 1 second to 1 day. VWAP, or volume weighted average price, is also taken over an interval of time. Both OHLCV and VWAP data can be useful in some strategies, but ultimately lack the granularity that trades and order books provide when simulating real market conditions. 

The Results: High Precision Backtesting Engine  

Shrimpy operates their backtesting engine on top of Kaiko’s historical database. They receive daily data dumps through AWS, process the data so that it fits their existing structure, and then securely store it for future use. Kaiko’s data is then offered through permissioned APIs. Through the APIs, Shrimpy customers can access order book snapshots to calculate exact trades when simulating backtests. When a customer requests a specific time period, exchange, or trading pair, Shrimpy sources the data from the database and serves it up so customers can easily run a backtest. 

One feature of Shrimpy’s backtesting engine leverages bid and ask orders available on each individual crypto exchange at the time of a simulated trade. Using these order book snapshots, Shrimpy's clients are able to calculate the exact trades that could have been placed when a portfolio rebalance occurs. High precision is necessary when simulating complex rebalancing strategies, which is why access to the most granular data available is so important.

Shrimpy's user interface for backtesting trading strategies. 

Impact: Identifying Profitable Strategies

So how would a backtest work using order book data on Shrimpy's platform? Most backtests require the following precise data:

- The trading fee on the exchange
- The bid-ask spread for the currency pair you are trading
- The price slippage for market orders on the order book
- The timing for each trade

When simulating a buy market order, you would run the order on the ask side of the order book (buy market orders are matched with ask limit orders), and then factor in the trading fee and price slippage. Price slippage occurs if the best ask does not have a large enough quantity to absorb your market order. The exchange’s matching engine would then fill your order with the next best ask, at a slightly worse price than expected. Slippage would thus be the difference between the expected price of a trade and the actual price.

Order book data enables you to easily simulate price slippage so that you can factor this into your model. By relying only on OHLCV, VWAP, or even trade data, you would not be able to simulate price slippage which could result in a loss. 

The importance of slippage is best depicted using a real example. Let’s say we want to buy 1,500 USDT worth of ENJ on Binance, which has a base trading fee of 0.1%.

We could simulate the buying of 1,500 USDT worth of ENJ by incrementally increasing our order price over the order book until we have purchased our desired amount of 1,500 USDT worth of ENJ. The consecutive trades we would execute include the following:

1. Buy 1151.74904126 ENJ at 0.20559424 USDT each = 236.97296881 USDT + 0.2369729 USDT in fees ( 1262.79005829 USDT left)
2.Buy 2559.954 ENJ at 0.20640294 USDT each = 528.38203186 USDT + 0.52838203 USDT in fees ( 733.8796444 USDT left)
3.Buy 1992.51418976 ENJ at 0.20659518 USDT each = 411.64382769 USDT + 0.41164382 USDT in fees (321.82417288 USDT left)
4.Buy 1555.85587451 ENJ at 0.20663894 USDT each = 321.50267164 USDT + 0.32150267 USDT in fees (0 USDT left)

An order book for the ENJ-USDT pair.

In total, we bought exactly 7260.08410553 ENJ after all trades were completed. If we had only used OHLCV candlestick data, our estimate would likely have been as far off as 7319.76112984. This is a difference of almost 60 ENJ or nearly 1%. It might not seem like a lot, but this small percentage compounds incredibly quickly if we are simulating hundreds or thousands of trades. 

Without access to order book data for backtesting, this strategy could have resulted in a loss. 

Kaiko for Backtesting: How We Can Help 

Kaiko's granular cryptocurrency data is optimized for strategy backtesting. We have the most extensive order book data in the industry which is required for accurately simulating a trade. Our data can be easily ingested into any backtesting system because our data is normalized and asset taxonomies are standardized across exchanges. 

In addition, we provide unrivaled support with on-call slack channels giving instant access to developers, support, and the business team. We create custom data plans for all of our enterprise clients and are willing to work closely with you to determine your exact data needs. 

Conclusion: Why Data is the Core of Your Backtest

High-quality data for backtesting is essential for managing risk in financial markets. A trader’s success is highly dependent on the quality of data they use in their models. Many traders will settle for low-quality data that is freely available from the scores of data aggregation websites available. Yet, these traders will likely face problems in the long run without access to granular trade and order books.

As cryptocurrency markets mature, more and more traders have begun realizing the importance of data, but lack the resources to access and work with it. By outsourcing data management to Kaiko, Shrimpy has more time to build the tools that any trader needs to backtest a strategy with minimal effort.

“Day traders and high-frequency traders are increasingly becoming interested in historical backtests. Having quality historical data will allow us to build any number of data products that can help traders navigate the crypto market, from analytical tools, to charting and market research tools”, said McCarty.

Learn more about Kaiko's historical data services here and live data services here

About Kaiko
Founded in 2014, Kaiko is a market data provider in the blockchain-based digital assets space, providing institutional investors and market participants with enterprise-grade data infrastructure. We collect, normalize, store, and distribute digital assets market data via a livestream WebSocket, REST API, and cloud-based flat file (.csv) Data Feed, to which clients connect to build data-driven applications.

About Shrimpy
Shrimpy is a social trading platform for cryptocurrency. It is designed for both professional and novice traders to come and learn about the growing crypto industry. On Shrimpy, users can copy the portfolios and trading strategies of other traders.