Best Practices for Pulling Stock Data Efficiently with APIs


 

Hi everyone,

I’m working on a project that requires pulling historical data on hundreds of stocks using the Interactive Brokers TWS API. The data includes daily OHLC (Open, High, Low, Close) values and trading volume. My current method involves sequential API calls in a loop, which is slow and not efficient for a large dataset.

Here’s my current setup:

  1. Sending requests for individual stocks sequentially using the TWS API.
  2. Saving the returned data locally after each request for further analysis.

I’d like to optimize this process to handle large batches of stocks more efficiently.I’m interested in advice on:

  • Batch Requests: Does the TWS API support batch requests, and if so, how can I implement them?
  • API Limitations: Are there known rate limits or best practices with the TWS API to avoid throttling or data loss?
  • Caching Strategies: Are there effective ways to cache or save data to avoid redundant requests during retries or updates?

If anyone has experience optimizing the Interactive Brokers API or similar APIs for high-volume data retrieval, I’d appreciate your insights. Also, if this topic has been covered before, I’d be grateful if you could point me to the relevant discussions in the archives.

Thanks in advance!


 

Batch request
Never heard of it on IBKR,
- Look to me against the design pattern used by the API. Data callbacks do refer to a reqid, not to a symbol, so I don't see a way to resolve OHLC Bar for multiple symbol
- as you can asynchronously launch multiple requests, batch mode is in batching requests.
- IMHO as it's a also a Trading API, provision for batch would also induce some dramatic coding errors.
 
API limitation
yes there are, read:
optionally study what IBKR call "quote boost" (you shouldn't need it if I understood what you look for)
 
Reality is that it depends also on the popularity of the instrument, like if IBKR does not cache every data and you may need to wake-up old storage
Try to stay under 10 request/second if not intra-day, (you say "includes" so I don't know what else than Daily you may be looking for.)
 
Caching Strategies
None I heard of, and again I doubt anything exist you can manage yourself, this would be against API pattern (how to manage the cache size ? Purge ? Cache synchronization is a huge burden on IBKR shoulders, etc ...)
Tradition is that you handle the local storage is a DB of any form, including CSV file
 
Advices:
1.0- Unless you are dealing with crypto (where precision is paramount), consider another specialized data feed for your data, one that support batch request
like Polygone or FinancialModelingPrep or ActiveTicks,
IBKR primary purpose is to be a broker, not a data feed.
 
1.1- Other feed benefit of alternate feed is that they report more popularly used data.
IBKR seems to tradeoff with some data they fudge themselves, which reflect another kind of reality, somewhere closer to the efficient reality needed when you go manage orders.
I won't discuss who is right or wrong, what could be important if doing decision is frequently to use same data used by other algo
If you just look for OHLC, discard this advice IBKR one's are same, IBKR VWAP is different, probably more accurate, as other feed seems to base their VWAP on 1min hlc3 as AP.
 
2- Loop with 10 request in parallel, don't limit yourself to 1.
You will need a "rendez vous" method to synchronize all historicalDataEnd.  More advice involve knowing that language you expect to use.
 
 


 

Let's start with setting expectations. IBKR is a brokerage that handles your trades and not a specialized historical market data provider. While there is a lot of good historical data you can download, there will be limits, and their real-time data offerings are much richer and more reliable.

There are at least three recent posts that you should look at, but you should also spend a little effort and search the archive yourselves:
The trick will be to create a "pipeline" so that you have a certain number of outstanding requests all the time:
  • You'd start a certain number of requests up front without delays or pacing (say 25 requests).
  • Each time an outstanding request finishes and before you process the provided data, you submit a request for the next symbol.
  • In my case (server processing speed, latency, network bandwidth, distance from IBKR ....) that pipeline of 25 outstanding requests "self adjusted" the request rate to about 20 requests per second over the 26 seconds it took to download the data for 500 members of the S&P 500. Lager window sizes resulted in similar results and you can experiment with the exact number of outstanding requests to find the optimum for your setup.

Hope this helps,

Jürgen


 

I experimented some time ago with downloading daily data for 200 US stocks to see what worked best for me. I tried sequential versus parallel.
With sequential I mean I submit a reqHistoricalData() for one ticker and wait until I have received the data and stored it in a csv file. Then I do the same for the next ticker on the list.
I repeated the experiment with parallel downloads. With parallel I mean that I assign a request ID to each ticker. Then I send all reqHistoricalData() requests for all of these, taking the pacing limit of 50 messages per second into account. All received data was stored with the particular request ID and then matched with the ticker symbol. Again, all data was stored in a csv file, one file per ticker.
This parallel approach was a lot lot faster. Since then I don't do sequential data requests any more.
Be aware though that if you request too much data you might encounter a "soft limit" imposed by IBKR. As it is soft, there are no real criteria provided by them. When exceeding these soft limits you will encounter connectivity issues (IBKR will disconnect your app). So what I do is, after I once downloaded the initial historical data, to download daily only the recent historical data and merge the new data with the existing data. Getting one month of historical data for around 200 tickers takes around 15 seconds with this parallel approach. I don't know whether this is the most efficient solution, but it is good enough for me.


 

"I submit 50 every 1015ms.  A list of messages is managed by a thread that also maintains a list of timestamps for the last 50 submitted.  I haven't had a pacing problem in years."
Further down:
"My code is MVSC C++ and my api code uses the winsock2 send function, so I know exactly when the message is sent to TWS.  It's worked with several different TWS versions.  The 1015 ms value came through trial & error, but has been rock soild"
 
I am coming from Tradestation and thinking about converting my market scanner over to TWS. I had quite a number of work-arounds I was using to get TS even half working. Does anyone have implementation detail (read: code) for managing the data requests via TWS api? 
Right now, my suspicion is I require a subscription to a higher pace data feed like activeticks, though support from TWS has told me they are planning on changing request rates next year.


 

There is currently no subscription option that I am aware of that permits more than 50 API requests per second (cumulative for all clients that are connected to the same IBGW/TWS). There is no limit for the number of responses TWS/IBGW send back to the clients. On busy days, my mix of L2 market book, TickByTick, and other market data subscriptions easily results in 1,200 messages per second as the average for 24hr with peak periods of 10,000 per second and much higher.

And there is nothing you need to do for pacing, In the past, the connection option PACEAPI instructed TWS/IBGW to perform pacing itself (instead of rejecting requests with an error message), while today's TWS/IBGW versions do this by default (unless you you change an option in the global API settings page).

Keep in mind that IBKR is a Brokerage and not a market data provider. And there is a market scanner TWS API call that, if the rules fit your needs, offloads the processing to IBKR so that you do not need to manage large number of data requests.

Just a thought

Jürgen

 

 

 
On Mon, Dec 16, 2024 at 04:54 PM, <jy.yngbld@...> wrote:

"I submit 50 every 1015ms.  A list of messages is managed by a thread that also maintains a list of timestamps for the last 50 submitted.  I haven't had a pacing problem in years."
Further down:
"My code is MVSC C++ and my api code uses the winsock2 send function, so I know exactly when the message is sent to TWS.  It's worked with several different TWS versions.  The 1015 ms value came through trial & error, but has been rock soild"
 
I am coming from Tradestation and thinking about converting my market scanner over to TWS. I had quite a number of work-arounds I was using to get TS even half working. Does anyone have implementation detail (read: code) for managing the data requests via TWS api? 
Right now, my suspicion is I require a subscription to a higher pace data feed like activeticks, though support from TWS has told me they are planning on changing request rates next year.