Question 1

What is blockchain indexing?

Accepted Answer

TL;DR: Blockchain indexing is the process of extracting raw data from a blockchain, transforming it into a structured format, and storing it in a database optimized for fast queries. Blockchains are designed for security and immutability, not for searching. Without indexing, answering even simple questions like "show me all transfers from this wallet" requires scanning every block from genesis, which is impractical at scale. Indexers solve this by creating queryable databases that make onchain data accessible to applications in real time. The Simple Explanation Blockchains are append-only ledgers. They are excellent at recording transactions in a tamper-proof sequence, but they are terrible at answering questions about those transactions. There is no built-in "search" function. There are no SQL queries. There is no way to say "find all ERC-20 transfers involving this address in the last 30 days" using standard RPC methods alone. The raw tools available through RPC endpoints are primitive by design. You can fetch a specific block by number. You can fetch a specific transaction by hash. You can get an account balance at a specific block. But you cannot search across blocks, filter by criteria, aggregate data, or join information from different parts of the chain. To find all transactions from a specific wallet, you would need to fetch every block ever produced, extract every transaction from each block, check whether the sender or recipient matches your target address, and collect the matches. On Ethereum, that means scanning over 20 million blocks. On Solana, it means scanning billions of slots. Without indexing, this is not slow. It is impossible in any reasonable timeframe. Blockchain indexing solves this by running a process that continuously reads new blocks from a node, decodes the raw data (transactions, event logs, state changes, traces), transforms it into structured records, and writes those records to a database with proper indexes. Once the data is in a database, your application can query it using familiar tools like SQL or GraphQL with sub-second response times. How Indexing Works The indexing pipeline follows an Extract, Transform, Load (ETL) pattern. In the extract phase, the indexer connects to a blockchain node and reads raw block data. This includes block headers, transactions, transaction receipts (which contain event logs), and optionally trace data (which captures internal contract-to-contract calls). The indexer processes blocks sequentially, starting from a configured starting block and advancing through the chain's history. In the transform phase, the indexer decodes the raw data into meaningful structures. Raw event logs, for example, are encoded as hex strings with topic hashes and data fields. The indexer uses the smart contract's ABI (Application Binary Interface) to decode these into human-readable events with named parameters and proper data types. A raw Transfer event log becomes a structured record with a "from" address, a "to" address, a token amount, and a contract address. The transform phase can also compute derived data, like calculating USD values at the time of transfer, aggregating volume by token, or tracking running balances. In the load phase, the structured records are written to a database, whether PostgreSQL, MongoDB, Snowflake, Elasticsearch, or another storage system. The database creates indexes on the fields that applications will query (like wallet addresses, token contracts, timestamps, and block numbers), enabling fast lookups and complex queries across the entire dataset. Once the indexer has caught up with the chain's tip, it switches to real-time mode, processing each new block as it is produced and inserting the resulting records immediately. This keeps the database current with the live state of the chain. The Reorg Problem One of the most challenging aspects of blockchain indexing is handling chain reorganizations. A reorg occurs when the blockchain's canonical chain changes, usually because competing blocks were produced at the same height and the network eventually converges on a different fork than the one your indexer initially processed. When this happens, the blocks your indexer already processed are no longer canonical, and any records derived from those blocks are incorrect. A robust indexer must detect reorgs, roll back the affected records, and reprocess the correct blocks. Failing to handle reorgs leads to phantom transactions appearing in your database (transactions that were in the old fork but not the new one), missing transactions (transactions in the new fork that the old fork did not include), and incorrect state. For financial applications, this kind of data corruption is unacceptable. Reorg handling adds significant complexity to the indexing pipeline. The indexer needs to track which blocks have been processed, compare the chain it sees against what it has already stored, detect divergences, and execute rollbacks cleanly. This is one of the primary reasons teams choose managed indexing solutions over building custom indexers from scratch. Indexing Approaches The Graph is the most widely adopted decentralized indexing protocol. Developers create "subgraphs," which are configuration files that define which smart contracts to monitor, which events to listen for, and how to map event data into a queryable schema. Independent node operators called Indexers run the processing infrastructure and serve queries via GraphQL endpoints. The Graph works well for standard use cases like querying DeFi protocol data, NFT ownership history, and DAO governance activity. Its limitations include latency (subgraph updates are not always real-time), schema inflexibility (changing your schema often requires redeploying and resyncing), and dependency on the decentralized network's availability. Custom indexers give teams full control over their data pipeline but require significant engineering investment. Building a production-grade custom indexer means writing block ingestion logic, event decoding, database schema design, reorg handling, error recovery, monitoring, and scaling infrastructure. For large-scale applications with unique data requirements, this investment can be worthwhile, but for most teams it represents months of engineering time that could be spent on their core product. Push-based streaming represents a newer approach that simplifies indexing by delivering filtered blockchain data directly to your storage system, eliminating the need to build and maintain the extraction layer yourself. Instead of your indexer pulling data from a node, a streaming service pushes exactly the data you need to your database, webhook, or data warehouse. How Quicknode Powers Blockchain Indexing Quicknode Streams is purpose-built for blockchain data indexing. Streams provides a push-based data pipeline that delivers raw or filtered blockchain data directly to your preferred destination, including PostgreSQL, Snowflake, Amazon S3, Azure Storage, and webhooks. Instead of building and maintaining RPC polling infrastructure to extract data from the chain, you configure a Stream with your desired network, dataset (blocks, transactions, receipts, traces), and optional JavaScript filters, and Quicknode handles the rest. Streams delivers data in finality order with exactly-once delivery guarantees, which means your database stays consistent with the canonical chain without your code needing to manage block ordering or deduplication. Built-in reorg handling automatically detects chain reorganizations and sends correction payloads, so your indexed data always reflects the true state of the chain. For historical data, Streams' backfill feature lets you populate your database with any range of past blocks, syncing up to seven times faster than traditional RPC-based indexing pipelines, with the same filtering and delivery guarantees as real-time streaming. Quicknode also publishes a step-by-step guide to building a blockchain indexer with Streams, demonstrating how to create a complete ERC-20 transfer indexer backed by PostgreSQL with a REST API, from configuration to querying. For teams that need even more sophisticated data processing, Streams integrates with Quicknode Functions to enable serverless transformations, enrichment, and automation on top of the streaming data pipeline. What is the difference between indexing and raw RPC? Raw RPC and indexing answer different kinds of questions. RPC is built for point lookups and writes: fetch this block, read this balance, send this transaction. Indexing is built for search and analytics: every transfer for a wallet, total volume by token, ownership history for a collection. The table below contrasts the two so you can pick the right tool for each job. DimensionRaw RPCIndexed databaseBest atPoint reads and transaction submissionSearch, filtering, and aggregationQuery languageJSON-RPC methodsSQL or GraphQLHistorical scansSlow, block-by-blockFast, pre-processed and indexedSetup effortConnect to an endpointBuild or configure a pipelineTypical latencyPer-call round tripSub-second across large datasets For a focused comparison of the two, see RPC vs indexing, and to understand the underlying request model, see how RPC requests work. What kinds of queries does indexing make possible? Once data is indexed into a structured database, questions that are impossible over raw RPC become trivial. You can list every transaction for an address, trace an NFT's full ownership history, aggregate trading volume per token per day, rank wallets by holdings, or join events across multiple contracts. These are exactly the queries that power dashboards, explorers, and analytics products. For why these queries are so hard without indexing, see querying blockchain data. How do indexing and streaming work together? Streaming and indexing are complementary layers. Streaming is the delivery mechanism that pushes new blocks and events to your infrastructure as they happen, while indexing is what organizes that incoming data into a queryable store. A common modern architecture uses a streaming pipeline to feed records into an indexed database, removing the brittle polling layer entirely. See what is blockchain data streaming and polling vs streaming for how data gets to the indexer in the first place. How do indexers backfill historical data? Backfilling is the process of populating your database with past blocks before switching to live data. A good pipeline starts at a chosen historical block, processes forward through the chain, and then transitions seamlessly to real-time once it reaches the tip, so historical and live records share one schema. Done well, backfilling is far faster than re-scanning the chain over raw RPC. See how to access historical blockchain data and real-time vs historical blockchain data for the trade-offs involved. Frequently Asked Questions Why can't I just use RPC instead of indexing? RPC is excellent for fetching a specific block, transaction, or balance, but it offers no way to search, filter, or aggregate across the chain. Answering a question like "all transfers from this wallet" over RPC means scanning millions of blocks one by one, which is impractical. Indexing pre-processes that data into a database so the same question returns in a single query. What is a subgraph? A subgraph is a configuration used by The Graph that defines which contracts and events to index and how to map them into a queryable GraphQL schema. It is one popular way to build an index, well suited to standard DeFi, NFT, and governance data, though it can trade off real-time latency and schema flexibility compared to custom or streaming-based pipelines. How do indexers handle chain reorganizations? A robust indexer detects when the canonical chain changes, rolls back records derived from the orphaned blocks, and reprocesses the correct ones. Without this, your database accumulates phantom or missing transactions. Managed pipelines handle reorgs automatically by sending correction payloads. For background, see what is a blockchain reorg. Should I build my own indexer or use a managed service? Building a custom indexer gives full control but requires writing ingestion, decoding, schema design, reorg handling, monitoring, and scaling, which is often months of work. A managed pipeline handles extraction, ordering, reorgs, and delivery for you, so most teams reach production faster by configuring a service and focusing engineering effort on their core product. What databases can indexed blockchain data be stored in? Indexed data commonly lands in PostgreSQL, MongoDB, Snowflake, Elasticsearch, or object storage such as Amazon S3, depending on whether you need transactional queries, full-text search, or analytics at scale. A streaming pipeline can deliver records directly into these destinations. For end-to-end tutorials, the Builders Guide walks through common indexing patterns. Further Reading How to Build a Blockchain Indexer with Streams - Quicknode Guide Getting Started with Streams - Quicknode Docs Blockchain Data Backfilling - Quicknode Docs Mastering Web3 Data with Blockchain ETL - Quicknode Blog

Question 2

How to access historical blockchain data

Accepted Answer

Accessing historical blockchain data requires either an archive node (which stores every state snapshot from genesis) or a data backfilling tool that retrieves and stores past blocks in a queryable database. Standard full nodes only retain the most recent 128 blocks of state data and prune everything older, so querying a wallet's balance from six months ago on a full node will fail. The two main approaches are archive node RPC access for point-in-time state queries and streaming backfills for building comprehensive historical databases.

Question 3

RPC vs indexing: What’s the difference?

Accepted Answer

RPC and indexing are two fundamentally different approaches to accessing blockchain data. RPC endpoints let you query the current state of the blockchain in real time, one request at a time. Indexing extracts, transforms, and stores blockchain data in a searchable database so your application can run complex queries across historical records. RPC is like asking a librarian to find one specific book. Indexing is like building a card catalog for the entire library. Most production applications need both.

Question 4

Why querying blockchain data is hard

Accepted Answer

Blockchains are optimized for security and immutability, not for data retrieval. There are no built-in search functions, no SQL queries, no indexes, and no way to filter or aggregate data across blocks using standard RPC methods. Querying even simple questions like "show all transfers from this wallet" requires scanning millions of blocks sequentially. Add chain reorganizations, varying finality models, encoded data formats, and massive data volumes, and it becomes clear why blockchain data access is a significant engineering challenge.

Want to stay updated?

Developer Tools

Docs & Guides

Want to stay updated?

Developer Tools

Docs & Guides

What is blockchain indexing?

The Simple Explanation

The Reorg Problem

Indexing Approaches

How Quicknode Powers Blockchain Indexing

What is the difference between indexing and raw RPC?

What kinds of queries does indexing make possible?

How do indexing and streaming work together?

How do indexers backfill historical data?

Frequently Asked Questions

Why can't I just use RPC instead of indexing?

What is a subgraph?

How do indexers handle chain reorganizations?

Should I build my own indexer or use a managed service?

What databases can indexed blockchain data be stored in?

Further Reading

Start Building Now

Dimension	Raw RPC	Indexed database
Best at	Point reads and transaction submission	Search, filtering, and aggregation
Query language	JSON-RPC methods	SQL or GraphQL
Historical scans	Slow, block-by-block	Fast, pre-processed and indexed
Setup effort	Connect to an endpoint	Build or configure a pipeline
Typical latency	Per-call round trip	Sub-second across large datasets