Efficiently handling concurrent getBlock
requests on Solana is essential for maintaining high performance and avoiding rate limiting, especially when processing large volumes of historical data or running indexing services. This guide outlines key strategies to improve throughput, reduce latency, and minimize HTTP 429 TOO_MANY_REQUESTS
errors.
There are several ways to optimize your getBlock
calls for stability and scalability. These include:
- Preferring base64 encoding over jsonParsed for production systems
- Enabling zstd compression to reduce response size and improve performance
- Setting appropriate concurrency limits and implementing retry logic with exponential backoff
The code examples in this guide are for learning and demonstration purposes only. The code examples shown are specifically designed to efficiently handle getBlock
queries and do not demonstrate how to process or handle the underlying block data returned by these requests.
Understanding QuickNode's Infrastructure
When you make a request to your QuickNode endpoint, it goes through several steps:
- Your request is received by the closest instance in QuickNode's global load balancer network
- Your request is automatically routed to the nearest regional datacenter
- An RPC node in that datacenter processes your request
- The response travels back to your application
Depending on your RPC plan, the underlying nodes may be shared across multiple customers, where request patterns from one customer can affect performance for others. Understanding this shared infrastructure model helps explain the common performance issues developers encounter when fetching blocks at scale. Optimizing your requests improves your application's performance and maintains infrastructure stability.
Optimizing getBlock Requests
The following optimizations address common performance issues when fetching blocks at scale. Start with encoding and compression to reduce payload size and processing load, then implement concurrency control to manage request throughput.
Choosing the Right Encoding
The Problem
The encoding you choose affects both performance and infrastructure load. The jsonParsed
encoding requires server-side parsing of all transaction instructions before the response is sent, which is CPU-intensive and results in larger payloads.
The Solution
Use base64
encoding to shift parsing responsibility to your client infrastructure, reducing both payload size and server load.
Implementation:
- Python
- Go
# jsonParsed example
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getBlock",
"params": [
slot,
{
"encoding": "jsonParsed",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": False
}
]
}
# base64 example
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getBlock",
"params": [
slot,
{
"encoding": "base64",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": False
}
]
}
// jsonParsed example
params := map[string]interface{}{
"encoding": "jsonParsed",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": false,
}
// base64 example
params := map[string]interface{}{
"encoding": "base64",
"transactionDetails": "full",
"maxSupportedTransactionVersion": 0,
"rewards": false,
}
Trade-off: jsonParsed
returns human-readable data that's immediately usable but slower and larger. base64
returns compact binary data that's faster and smaller but requires client-side parsing.
Optimizing Compression with zstd
The Problem
Each block response can be several megabytes. When running many concurrent requests, the total bandwidth requirements multiply quickly. While Python's aiohttp
and Go's net/http
automatically enable gzip compression by default, this creates a decompression bottleneck with many concurrent requests.
The Solution
For high-throughput applications, use zstd
compression which provides faster decompression speed, reducing CPU usage and improving response times.
In Go, manually setting the Accept-Encoding
header disables automatic decompression, so you'll need to handle decompression yourself.
Implementation:
- Python
- Go
headers = {
"Content-Type": "application/json",
"Accept-Encoding": "zstd"
}
import "github.com/klauspost/compress/zstd"
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Accept-Encoding", "zstd")
Concurrency Control with Retry Logic
The Problem
Launching too many simultaneous requests can overwhelm the endpoint, triggering rate limits and causing requests to fail. When you send hundreds of concurrent requests, the RPC node's rate limiter activates to protect infrastructure stability, resulting in 429 TOO_MANY_REQUESTS
errors.
The Solution
With encoding and compression optimized to reduce payload sizes, the final step is managing request concurrency and handling failures. This requires two components: concurrency control to prevent overwhelming the endpoint and retry logic to handle transient errors like rate limits (429 TOO_MANY_REQUESTS
error) and timeouts.
Concurrency & Rate Limiting (Dual Gating):
Two control mechanisms work together to manage requests:
-
Semaphore (Concurrency Control): Limits how many requests are in-flight simultaneously. For example,
MAX_CONCURRENT = 50
ensures no more than 50 requests run at the same time. -
Token Bucket (Rate Limiting): Limits how many requests can start per second. For example,
REQUESTS_PER_SECOND = 50
ensures no more than 50 requests begin each second.
Together, the semaphore prevents too many concurrent connections while the rate limiter prevents exceeding requests-per-second limits.
Implementation:
Here's a complete example incorporating all optimizations (base64 encoding, zstd compression, dual gating, retry logic):
- Python
- Go
import asyncio
import aiohttp
from typing import Optional
import time
import os
URL = os.getenv("YOUR_QUICKNODE_ENDPOINT") or None
# Configuration
MAX_CONCURRENT_REQUESTS = 50
REQUESTS_PER_SECOND = 50 # Adjust based on your QuickNode plan
MAX_RETRIES = 3
INITIAL_BACKOFF_SECONDS = 1.0
REQUEST_TIMEOUT_SECONDS = 60
TOTAL_BLOCKS_TO_FETCH = 300
HEADERS = {
"Content-Type": "application/json"
}
class RateLimiter:
"""
Token bucket rate limiter implementation
Maintains a fixed-capacity bucket of tokens that refills at a constant rate.
Each request consumes one token. Requests block when no tokens are available.
"""
def __init__(self, rate: int):
self.rate = rate # maximum requests per second
self.tokens = rate # initialize with full capacity
self.last_update = time.monotonic() # timestamp of last token refill
self.lock = asyncio.Lock() # ensure thread-safe token updates
async def acquire(self):
"""Acquire a token, blocking if none are available"""
async with self.lock: # serialize access to shared state
now = time.monotonic()
# Refill tokens based on elapsed time: tokens_to_add = elapsed_time * rate
time_passed = now - self.last_update
self.tokens = min(self.rate, self.tokens + time_passed * self.rate)
self.last_update = now
if self.tokens < 1:
# No tokens available - calculate wait time and sleep
sleep_time = (1 - self.tokens) / self.rate
await asyncio.sleep(sleep_time)
self.tokens = 0
else:
# Consume one token and proceed
self.tokens -= 1
async def get_block_with_retry(
session: aiohttp.ClientSession,
slot: int,
semaphore: asyncio.Semaphore,
rate_limiter: RateLimiter,
request_id: int
) -> Optional[dict]:
# Gate 1: Semaphore limits concurrent connections
# If MAX_CONCURRENT_REQUESTS are already running, wait here
async with semaphore:
# Retry loop
for attempt in range(MAX_RETRIES):
# Gate 2: Rate limiter
await rate_limiter.acquire()
payload = {
"jsonrpc": "2.0",
"id": request_id,
"method": "getBlock",
"params": [
slot,
{
"encoding": "base64",
"maxSupportedTransactionVersion": 0,
"transactionDetails": "full",
"rewards": False
}
]
}
try:
async with session.post(URL, json=payload, timeout=REQUEST_TIMEOUT_SECONDS) as response:
if response.status == 200:
return await response.json()
elif response.status == 429:
# Rate limited - exponential backoff
backoff = INITIAL_BACKOFF_SECONDS * (2 ** attempt)
print(f"⏳ Request {request_id}: Rate limited, backing off {backoff:.1f}s")
await asyncio.sleep(backoff)
continue
else:
# Other HTTP error
print(f"❌ Request {request_id}: HTTP {response.status}")
return None
except asyncio.TimeoutError:
# Timeout
if attempt < MAX_RETRIES - 1:
backoff = INITIAL_BACKOFF_SECONDS * (2 ** attempt)
print(f"⏳ Request {request_id}: Timeout, retrying in {backoff:.1f}s")
await asyncio.sleep(backoff)
else:
print(f"❌ Request {request_id}: Failed after {MAX_RETRIES} attempts")
return None
except Exception as e:
# Unexpected error
print(f"❌ Request {request_id}: Error - {e}")
return None
# All retries exhausted
return None
async def fetch_blocks_with_rate_limiting(slot: int, num_requests: int = 200):
"""
Fetch blocks with rate limiting and retry logic
"""
print(f"🚀 Fetching {num_requests} blocks with rate limiting")
print(f"⚙️ Max concurrent: {MAX_CONCURRENT_REQUESTS}")
print(f"⚙️ Rate limit: {REQUESTS_PER_SECOND} req/s")
print("="*60)
# Create concurrency control (semaphore) and rate limiter
semaphore = asyncio.Semaphore(MAX_CONCURRENT_REQUESTS)
rate_limiter = RateLimiter(REQUESTS_PER_SECOND)
# Track start time for performance measurement
start_time = time.perf_counter()
# Create HTTP session for all requests
async with aiohttp.ClientSession(headers=HEADERS) as session:
# Create all tasks (they'll be gated by semaphore + rate limiter)
tasks = [
get_block_with_retry(session, slot, semaphore, rate_limiter, i)
for i in range(num_requests)
]
# Execute all tasks concurrently with automatic retry
results = await asyncio.gather(*tasks)
# Calculate performance metrics
elapsed = time.perf_counter() - start_time
successful = len([r for r in results if r is not None])
# Display results
print("="*60)
print(f"✅ Completed: {successful}/{num_requests} blocks")
print(f"⏱️ Time: {elapsed:.1f}s")
print(f"📊 Actual rate: {successful/elapsed:.1f} req/s")
if __name__ == "__main__":
async def main():
# Step 1: Get the current slot number from Solana
async with aiohttp.ClientSession(headers=HEADERS) as session:
slot_response = await session.post(URL, json={
"jsonrpc": "2.0",
"id": 1,
"method": "getSlot"
})
slot = (await slot_response.json())['result']
print(f"📍 Current slot: {slot}\n")
# Step 2: Fetch blocks with rate limiting + retry logic
await fetch_blocks_with_rate_limiting(slot, num_requests=TOTAL_BLOCKS_TO_FETCH)
# Start the async event loop
asyncio.run(main())
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"math"
"net/http"
"strings"
"sync"
"time"
"os"
)
var URL, exists = os.LookupEnv("YOUR_QUICKNODE_ENDPOINT")
// Configuration
const (
MAX_CONCURRENT_REQUESTS = 50
REQUESTS_PER_SECOND = 50 // Adjust based on your QuickNode plan
MAX_RETRIES = 3
INITIAL_BACKOFF_SECONDS = 1.0
REQUEST_TIMEOUT_SECONDS = 60
TOTAL_BLOCKS_TO_FETCH = 300
)
// RPCRequest represents a JSON-RPC request
type RPCRequest struct {
JSONRPC string `json:"jsonrpc"`
ID int `json:"id"`
Method string `json:"method"`
Params []interface{} `json:"params"`
}
// RPCResponse represents a JSON-RPC response
type RPCResponse struct {
JSONRPC string `json:"jsonrpc"`
Result json.RawMessage `json:"result"`
Error *RPCError `json:"error,omitempty"`
ID int `json:"id"`
}
// RPCError represents a JSON-RPC error
type RPCError struct {
Code int `json:"code"`
Message string `json:"message"`
}
// BlockParams represents getBlock parameters
type BlockParams struct {
Encoding string `json:"encoding"`
MaxSupportedTransactionVersion int `json:"maxSupportedTransactionVersion"`
TransactionDetails string `json:"transactionDetails"`
Rewards bool `json:"rewards"`
}
// RateLimiter implements a token bucket rate limiter
// Maintains a fixed-capacity bucket of tokens that refills at a constant rate.
// Each request consumes one token. Requests block when no tokens are available.
type RateLimiter struct {
rate float64 // maximum requests per second
tokens float64 // current number of available tokens
lastUpdate time.Time // timestamp of last token refill
mu sync.Mutex // ensure thread-safe token updates
}
// NewRateLimiter creates a new rate limiter
func NewRateLimiter(rate int) *RateLimiter {
return &RateLimiter{
rate: float64(rate),
tokens: float64(rate), // initialize with full capacity
lastUpdate: time.Now(),
}
}
// Acquire waits until a token is available, blocking if none are available
func (rl *RateLimiter) Acquire() {
rl.mu.Lock() // serialize access to shared state
defer rl.mu.Unlock() // release lock when function returns
now := time.Now()
// Refill tokens based on elapsed time: tokens_to_add = elapsed_time * rate
timePassed := now.Sub(rl.lastUpdate).Seconds()
rl.tokens = math.Min(rl.rate, rl.tokens+timePassed*rl.rate)
rl.lastUpdate = now
if rl.tokens < 1 {
// No tokens available - calculate wait time and sleep
sleepTime := (1 - rl.tokens) / rl.rate
time.Sleep(time.Duration(sleepTime * float64(time.Second)))
rl.tokens = 0
} else {
// Consume one token and proceed
rl.tokens -= 1
}
}
func getBlockWithRetry(client *http.Client, slot int, semaphore chan struct{}, rateLimiter *RateLimiter, requestID int) (map[string]interface{}, error) {
// Gate 1: Semaphore limits concurrent connections
// If MAX_CONCURRENT_REQUESTS are already running, wait here
semaphore <- struct{}{}
defer func() { <-semaphore }()
// Debug: Show when request acquires semaphore
if requestID%10 == 0 { // Only print every 10th to avoid spam
fmt.Printf("🔵 Request %d: Acquired semaphore slot\n", requestID)
}
// Retry loop: Try up to MAX_RETRIES times
for attempt := 0; attempt < MAX_RETRIES; attempt++ {
// Gate 2: Rate limiter ensures we don't exceed REQUESTS_PER_SECOND
// This distributes requests smoothly over time
rateLimiter.Acquire()
// Debug: Show when request gets rate limit token
if requestID%10 == 0 {
fmt.Printf("🟢 Request %d: Got rate limit token (attempt %d)\n", requestID, attempt+1)
}
// Build the JSON-RPC request payload
params := BlockParams{
Encoding: "base64",
MaxSupportedTransactionVersion: 0,
TransactionDetails: "full",
Rewards: false,
}
rpcReq := RPCRequest{
JSONRPC: "2.0",
ID: requestID,
Method: "getBlock",
Params: []interface{}{slot, params},
}
payloadBytes, err := json.Marshal(rpcReq)
if err != nil {
return nil, err
}
// Create request with timeout context
// Context will automatically cancel the request after REQUEST_TIMEOUT_SECONDS
ctx, cancel := context.WithTimeout(context.Background(), REQUEST_TIMEOUT_SECONDS*time.Second)
defer cancel() // ✅ Cancel when function returns (cleanup)
// Create HTTP request with the timeout context attached
req, err := http.NewRequestWithContext(ctx, "POST", URL, bytes.NewBuffer(payloadBytes))
if err != nil {
return nil, err
}
req.Header.Set("Content-Type", "application/json")
// Make the HTTP POST request
resp, err := client.Do(req)
if err != nil {
// ⏰ Timeout or network error
if attempt < MAX_RETRIES-1 {
// Not the last attempt - retry with exponential backoff
// Attempt 0: wait 1s, Attempt 1: wait 2s, Attempt 2: wait 4s
backoff := INITIAL_BACKOFF_SECONDS * math.Pow(2, float64(attempt))
fmt.Printf("⏳ Request %d: Error '%v', retrying in %.1fs (attempt %d/%d)\n",
requestID, err, backoff, attempt+1, MAX_RETRIES)
time.Sleep(time.Duration(backoff * float64(time.Second)))
continue // Try again
} else {
// Last attempt failed - give up
fmt.Printf("❌ Request %d: Failed after %d attempts - Error: %v\n",
requestID, MAX_RETRIES, err)
return nil, err
}
}
defer resp.Body.Close() // Always close response body
if resp.StatusCode == http.StatusOK {
// ✅ Success - parse and return the block data
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
var rpcResp RPCResponse
if err := json.Unmarshal(body, &rpcResp); err != nil {
return nil, err
}
var result map[string]interface{}
if err := json.Unmarshal(rpcResp.Result, &result); err != nil {
return nil, err
}
// Debug: Show successful completion
if requestID%10 == 0 {
fmt.Printf("✅ Request %d: Completed successfully\n", requestID)
}
return result, nil
} else if resp.StatusCode == http.StatusTooManyRequests {
// ⚠️ Rate limited - back off exponentially
// Even with rate limiting, we can still get 429 `TOO_MANY_REQUESTS` error due to network delays
backoff := INITIAL_BACKOFF_SECONDS * math.Pow(2, float64(attempt))
fmt.Printf("⏳ Request %d: Rate limited, backing off %.1fs\n", requestID, backoff)
time.Sleep(time.Duration(backoff * float64(time.Second)))
continue // Try again
} else {
// ❌ Other HTTP error - don't retry
fmt.Printf("❌ Request %d: HTTP %d\n", requestID, resp.StatusCode)
return nil, fmt.Errorf("HTTP error: %d", resp.StatusCode)
}
}
// All retry attempts exhausted
return nil, fmt.Errorf("max retries exceeded")
}
// fetchBlocksWithRateLimiting fetches blocks with advanced rate limiting and retry logic
func fetchBlocksWithRateLimiting(slot int, numRequests int) {
fmt.Println(strings.Repeat("=", 60))
fmt.Printf("🚀 Fetching %d blocks with rate limiting\n", numRequests)
fmt.Printf("⚙️ Max concurrent: %d\n", MAX_CONCURRENT_REQUESTS)
fmt.Printf("⚙️ Rate limit: %d req/s\n", REQUESTS_PER_SECOND)
fmt.Printf("⚙️ Request timeout: %d seconds\n", REQUEST_TIMEOUT_SECONDS)
fmt.Printf("⚙️ Max retries: %d\n", MAX_RETRIES)
fmt.Println(strings.Repeat("=", 60))
// Create concurrency control (semaphore) and rate limiter
semaphore := make(chan struct{}, MAX_CONCURRENT_REQUESTS)
rateLimiter := NewRateLimiter(REQUESTS_PER_SECOND)
// Create HTTP client with connection pooling
client := &http.Client{
Timeout: REQUEST_TIMEOUT_SECONDS * time.Second,
Transport: &http.Transport{
MaxIdleConns: MAX_CONCURRENT_REQUESTS,
MaxIdleConnsPerHost: MAX_CONCURRENT_REQUESTS,
},
}
// Track start time for performance measurement
startTime := time.Now()
// WaitGroup to wait for all goroutines
var wg sync.WaitGroup
// Slice to store results
results := make([]map[string]interface{}, numRequests)
var mu sync.Mutex // Protect results slice
// Create all goroutines (they'll be gated by semaphore + rate limiter)
for i := 0; i < numRequests; i++ {
wg.Add(1)
go func(requestID int) {
defer wg.Done()
// Fetch block with automatic retry on failure
result, err := getBlockWithRetry(client, slot, semaphore, rateLimiter, requestID)
if err == nil {
mu.Lock()
results[requestID] = result
mu.Unlock()
}
}(i)
}
// Wait for all goroutines to complete
wg.Wait()
elapsed := time.Since(startTime).Seconds()
successful := 0
for _, r := range results {
if r != nil {
successful++
}
}
fmt.Println(strings.Repeat("=", 60))
fmt.Printf("✅ Completed: %d/%d blocks\n", successful, numRequests)
fmt.Printf("⏱️ Time: %.1fs\n", elapsed)
fmt.Printf("📊 Actual rate: %.1f req/s\n", float64(successful)/elapsed)
fmt.Printf("📊 Success rate: %.1f%%\n", float64(successful)/float64(numRequests)*100)
if successful == numRequests {
fmt.Println("\n💡 All requests succeeded!")
fmt.Println(" You can now increase TOTAL_BLOCKS_TO_FETCH in the configuration")
} else if successful < numRequests {
failed := numRequests - successful
fmt.Printf("\n⚠️ %d requests failed\n", failed)
fmt.Println(" Consider:")
fmt.Println(" - Increasing REQUEST_TIMEOUT_SECONDS")
fmt.Println(" - Decreasing MAX_CONCURRENT_REQUESTS")
fmt.Println(" - Checking your network connection")
}
}
// getCurrentSlot fetches the current slot number
func getCurrentSlot(client *http.Client) (int, error) {
rpcReq := RPCRequest{
JSONRPC: "2.0",
ID: 1,
Method: "getSlot",
Params: []interface{}{},
}
payloadBytes, err := json.Marshal(rpcReq)
if err != nil {
return 0, err
}
req, err := http.NewRequest("POST", URL, bytes.NewBuffer(payloadBytes))
if err != nil {
return 0, err
}
req.Header.Set("Content-Type", "application/json")
resp, err := client.Do(req)
if err != nil {
return 0, err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return 0, err
}
var rpcResp RPCResponse
if err := json.Unmarshal(body, &rpcResp); err != nil {
return 0, err
}
var slot int
if err := json.Unmarshal(rpcResp.Result, &slot); err != nil {
return 0, err
}
return slot, nil
}
func main() {
// Create HTTP client for getting the current slot
client := &http.Client{
Timeout: REQUEST_TIMEOUT_SECONDS * time.Second,
}
// Step 1: Get the current slot number from Solana
slot, err := getCurrentSlot(client)
if err != nil {
fmt.Printf("❌ Error getting slot: %v\n", err)
return
}
fmt.Printf("📍 Current slot: %d\n\n", slot)
// Step 2: Fetch blocks with rate limiting + retry logic
// This demonstrates dual gating (semaphore + rate limiter) with exponential backoff
fetchBlocksWithRateLimiting(slot, TOTAL_BLOCKS_TO_FETCH)
}
How this works:
The code combines multiple optimization layers:
- Semaphore limits concurrent connections (50 requests in-flight simultaneously)
- Rate limiter controls request start rate (50 requests can start per second)
- Retry logic handles 429
TOO_MANY_REQUESTS
errors and timeouts with exponential backoff (1s → 2s → 4s) - Base64 encoding reduces server load and payload size
- zstd compression provides faster decompression than gzip
Together, these optimizations maximize throughput while staying within rate limits.
Advanced Patterns for High Throughput
For applications with extremely high throughput requirements or specialized access patterns, consider these architectural patterns beyond request-level optimizations:
Client-Side Caching: If your application frequently accesses the same blocks, cache them in memory or Redis to avoid redundant requests. Historical blocks are immutable once finalized, making them ideal caching candidates. Implement cache invalidation strategies to maintain data freshness.
Block Ingestion Service: For applications that need to query across many blocks or perform complex data aggregation, build a dedicated ingestion service. This service pulls blocks from QuickNode and stores parsed data in a database (PostgreSQL, TimescaleDB) optimized for your query patterns. Your applications then query the database instead of making RPC calls for each request, enabling efficient filtering and analysis across multiple blocks.
Dedicated Clusters: For mission-critical applications with very high volumes, consider upgrading to a dedicated cluster. Dedicated clusters eliminate resource competition from shared infrastructure, providing more consistent performance during high traffic periods. While more expensive than shared plans, they offer predictable performance, exclusive resources, higher rate limits, and performance SLAs.
Further Resources
Feedback
If you have feedback or questions about this documentation, let us know.