Skip to main content

How to Optimize Concurrent getBlock Requests

Updated on
Oct 22, 2025

Efficiently handling concurrent getBlock requests on Solana is essential for maintaining high performance and avoiding rate limiting, especially when processing large volumes of historical data or running indexing services. This guide outlines key strategies to improve throughput, reduce latency, and minimize HTTP 429 TOO_MANY_REQUESTS errors.

There are several ways to optimize your getBlock calls for stability and scalability. These include:


Educational Purpose Only

The code examples in this guide are for learning and demonstration purposes only. The code examples shown are specifically designed to efficiently handle getBlock queries and do not demonstrate how to process or handle the underlying block data returned by these requests.

Understanding QuickNode's Infrastructure

When you make a request to your QuickNode endpoint, it goes through several steps:


  1. Your request is received by the closest instance in QuickNode's global load balancer network
  2. Your request is automatically routed to the nearest regional datacenter
  3. An RPC node in that datacenter processes your request
  4. The response travels back to your application

Depending on your RPC plan, the underlying nodes may be shared across multiple customers, where request patterns from one customer can affect performance for others. Understanding this shared infrastructure model helps explain the common performance issues developers encounter when fetching blocks at scale. Optimizing your requests improves your application's performance and maintains infrastructure stability.

Optimizing getBlock Requests

The following optimizations address common performance issues when fetching blocks at scale. Start with encoding and compression to reduce payload size and processing load, then implement concurrency control to manage request throughput.

Choosing the Right Encoding

The Problem

The encoding you choose affects both performance and infrastructure load. The jsonParsed encoding requires server-side parsing of all transaction instructions before the response is sent, which is CPU-intensive and results in larger payloads.

The Solution

Use base64 encoding to shift parsing responsibility to your client infrastructure, reducing both payload size and server load.

Implementation:

# jsonParsed example
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getBlock",
"params": [
slot,
{
"encoding": "jsonParsed",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": False
}
]
}

# base64 example
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getBlock",
"params": [
slot,
{
"encoding": "base64",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": False
}
]
}

Trade-off: jsonParsed returns human-readable data that's immediately usable but slower and larger. base64 returns compact binary data that's faster and smaller but requires client-side parsing.

Optimizing Compression with zstd

The Problem

Each block response can be several megabytes. When running many concurrent requests, the total bandwidth requirements multiply quickly. While Python's aiohttp and Go's net/http automatically enable gzip compression by default, this creates a decompression bottleneck with many concurrent requests.

The Solution

For high-throughput applications, use zstd compression which provides faster decompression speed, reducing CPU usage and improving response times.


note

In Go, manually setting the Accept-Encoding header disables automatic decompression, so you'll need to handle decompression yourself.

Implementation:

headers = {
"Content-Type": "application/json",
"Accept-Encoding": "zstd"
}

Concurrency Control with Retry Logic

The Problem

Launching too many simultaneous requests can overwhelm the endpoint, triggering rate limits and causing requests to fail. When you send hundreds of concurrent requests, the RPC node's rate limiter activates to protect infrastructure stability, resulting in 429 TOO_MANY_REQUESTS errors.

The Solution

With encoding and compression optimized to reduce payload sizes, the final step is managing request concurrency and handling failures. This requires two components: concurrency control to prevent overwhelming the endpoint and retry logic to handle transient errors like rate limits (429 TOO_MANY_REQUESTS error) and timeouts.

Concurrency & Rate Limiting (Dual Gating):

Two control mechanisms work together to manage requests:

  • Semaphore (Concurrency Control): Limits how many requests are in-flight simultaneously. For example, MAX_CONCURRENT = 50 ensures no more than 50 requests run at the same time.

  • Token Bucket (Rate Limiting): Limits how many requests can start per second. For example, REQUESTS_PER_SECOND = 50 ensures no more than 50 requests begin each second.

Together, the semaphore prevents too many concurrent connections while the rate limiter prevents exceeding requests-per-second limits.

Implementation:

Here's a complete example incorporating all optimizations (base64 encoding, zstd compression, dual gating, retry logic):


import asyncio
import aiohttp
from typing import Optional
import time
import os

URL = os.getenv("YOUR_QUICKNODE_ENDPOINT") or None

# Configuration
MAX_CONCURRENT_REQUESTS = 50
REQUESTS_PER_SECOND = 50 # Adjust based on your QuickNode plan
MAX_RETRIES = 3
INITIAL_BACKOFF_SECONDS = 1.0
REQUEST_TIMEOUT_SECONDS = 60
TOTAL_BLOCKS_TO_FETCH = 300

HEADERS = {
"Content-Type": "application/json"
}

class RateLimiter:
"""
Token bucket rate limiter implementation

Maintains a fixed-capacity bucket of tokens that refills at a constant rate.
Each request consumes one token. Requests block when no tokens are available.
"""
def __init__(self, rate: int):
self.rate = rate # maximum requests per second
self.tokens = rate # initialize with full capacity
self.last_update = time.monotonic() # timestamp of last token refill
self.lock = asyncio.Lock() # ensure thread-safe token updates

async def acquire(self):
"""Acquire a token, blocking if none are available"""
async with self.lock: # serialize access to shared state
now = time.monotonic()

# Refill tokens based on elapsed time: tokens_to_add = elapsed_time * rate
time_passed = now - self.last_update
self.tokens = min(self.rate, self.tokens + time_passed * self.rate)
self.last_update = now

if self.tokens < 1:
# No tokens available - calculate wait time and sleep
sleep_time = (1 - self.tokens) / self.rate
await asyncio.sleep(sleep_time)
self.tokens = 0
else:
# Consume one token and proceed
self.tokens -= 1

async def get_block_with_retry(
session: aiohttp.ClientSession,
slot: int,
semaphore: asyncio.Semaphore,
rate_limiter: RateLimiter,
request_id: int
) -> Optional[dict]:

# Gate 1: Semaphore limits concurrent connections
# If MAX_CONCURRENT_REQUESTS are already running, wait here
async with semaphore:
# Retry loop
for attempt in range(MAX_RETRIES):
# Gate 2: Rate limiter
await rate_limiter.acquire()

payload = {
"jsonrpc": "2.0",
"id": request_id,
"method": "getBlock",
"params": [
slot,
{
"encoding": "base64",
"maxSupportedTransactionVersion": 0,
"transactionDetails": "full",
"rewards": False
}
]
}

try:
async with session.post(URL, json=payload, timeout=REQUEST_TIMEOUT_SECONDS) as response:
if response.status == 200:
return await response.json()

elif response.status == 429:
# Rate limited - exponential backoff
backoff = INITIAL_BACKOFF_SECONDS * (2 ** attempt)
print(f"⏳ Request {request_id}: Rate limited, backing off {backoff:.1f}s")
await asyncio.sleep(backoff)
continue

else:
# Other HTTP error
print(f"❌ Request {request_id}: HTTP {response.status}")
return None

except asyncio.TimeoutError:
# Timeout
if attempt < MAX_RETRIES - 1:
backoff = INITIAL_BACKOFF_SECONDS * (2 ** attempt)
print(f"⏳ Request {request_id}: Timeout, retrying in {backoff:.1f}s")
await asyncio.sleep(backoff)
else:
print(f"❌ Request {request_id}: Failed after {MAX_RETRIES} attempts")
return None

except Exception as e:
# Unexpected error
print(f"❌ Request {request_id}: Error - {e}")
return None

# All retries exhausted
return None

async def fetch_blocks_with_rate_limiting(slot: int, num_requests: int = 200):
"""
Fetch blocks with rate limiting and retry logic
"""
print(f"🚀 Fetching {num_requests} blocks with rate limiting")
print(f"⚙️ Max concurrent: {MAX_CONCURRENT_REQUESTS}")
print(f"⚙️ Rate limit: {REQUESTS_PER_SECOND} req/s")
print("="*60)

# Create concurrency control (semaphore) and rate limiter
semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
rate_limiter = RateLimiter(REQUESTS_PER_SECOND)

# Track start time for performance measurement
start_time = time.perf_counter()

# Create HTTP session for all requests
async with aiohttp.ClientSession(headers=HEADERS) as session:
# Create all tasks (they'll be gated by semaphore + rate limiter)
tasks = [
get_block_with_retry(session, slot, semaphore, rate_limiter, i)
for i in range(num_requests)
]
# Execute all tasks concurrently with automatic retry
results = await asyncio.gather(*tasks)

# Calculate performance metrics
elapsed = time.perf_counter() - start_time
successful = len([r for r in results if r is not None])

# Display results
print("="*60)
print(f"✅ Completed: {successful}/{num_requests} blocks")
print(f"⏱️ Time: {elapsed:.1f}s")
print(f"📊 Actual rate: {successful/elapsed:.1f} req/s")

if __name__ == "__main__":

async def main():
# Step 1: Get the current slot number from Solana
async with aiohttp.ClientSession(headers=HEADERS) as session:
slot_response = await session.post(URL, json={
"jsonrpc": "2.0",
"id": 1,
"method": "getSlot"
})
slot = (await slot_response.json())['result']
print(f"📍 Current slot: {slot}\n")

# Step 2: Fetch blocks with rate limiting + retry logic
await fetch_blocks_with_rate_limiting(slot, num_requests=TOTAL_BLOCKS_TO_FETCH)

# Start the async event loop
asyncio.run(main())

How this works:

The code combines multiple optimization layers:

  1. Semaphore limits concurrent connections (50 requests in-flight simultaneously)
  2. Rate limiter controls request start rate (50 requests can start per second)
  3. Retry logic handles 429 TOO_MANY_REQUESTS errors and timeouts with exponential backoff (1s → 2s → 4s)
  4. Base64 encoding reduces server load and payload size
  5. zstd compression provides faster decompression than gzip

Together, these optimizations maximize throughput while staying within rate limits.

Advanced Patterns for High Throughput

For applications with extremely high throughput requirements or specialized access patterns, consider these architectural patterns beyond request-level optimizations:

Client-Side Caching: If your application frequently accesses the same blocks, cache them in memory or Redis to avoid redundant requests. Historical blocks are immutable once finalized, making them ideal caching candidates. Implement cache invalidation strategies to maintain data freshness.

Block Ingestion Service: For applications that need to query across many blocks or perform complex data aggregation, build a dedicated ingestion service. This service pulls blocks from QuickNode and stores parsed data in a database (PostgreSQL, TimescaleDB) optimized for your query patterns. Your applications then query the database instead of making RPC calls for each request, enabling efficient filtering and analysis across multiple blocks.

Dedicated Clusters: For mission-critical applications with very high volumes, consider upgrading to a dedicated cluster. Dedicated clusters eliminate resource competition from shared infrastructure, providing more consistent performance during high traffic periods. While more expensive than shared plans, they offer predictable performance, exclusive resources, higher rate limits, and performance SLAs.

Further Resources


Feedback

If you have feedback or questions about this documentation, let us know.

Share this doc