What Is Blockchain Indexing & How Does It Work?

6 min read

Understand how blockchain indexing works and how it eases retrieving data from blockchains

One of the most interesting properties of blockchain is its immutability. This means it is exceptionally difficult to  modify data in a blockchain because each block depends on the block that came before it. This makes blockchains ideal for tracking transactions.

However, blockchains do far more than just keep track of "which wallet owns what number of digital assets". Unlike centralized databases, as the amount of data stored in  blockchains grows, it becomes increasingly difficult to search the chain for transaction details in an efficient way.

In this post, we’ll talk about blockchain indexing, a technology that simplifies searching for blockchain data.

Blockchain Indexing - Overview#

Blockchain indexing is the technology that eases the process of finding information stored in the blockchain. Instead of stepping through data block-by-block, blockchain indexing allows information to be parsed and stored in a centralized database with key-value pairs. The information can then be indexed and queried just like a normal database.

Blockchain indexing makes it easier for end users and developers to get the information they need about historic events on the blockchain. With blockchain indexing, you get the security of onchain data combined with the convenience and efficiency of searching an offchain centralized database of information.

The Need for Blockchain Indexing#

As of May 16th, 2023, a full sync of the Ethereum blockchain would require around 972 GB of storage. That figure is steadily growing, making it harder and more time-consuming to search for and find information about a specific transaction. It’s even more complicated if you’re not sure when a transaction took place.

Indexing organizes data in a structured manner, allowing for quick and efficient retrieval based on specific parameters, such as transaction ID, address, or content type.

Without indexing, users are limited to searching for just transaction hashes. However, with indexing, searching accounts, blocks, and transactions is possible. Users could also add annotations to transactions, create relationships between different elements, and use SQL-like syntax to perform more complex searches.

Challenges in Indexing Blockchain Data#

Here are some issues that blockchain technology has with indexing data:

1. Concerns with Query Languages#

One of the primary challenges in indexing blockchain data is the absence of a standardized query language, such as SQL, in traditional databases.

Unlike conventional systems, where data retrieval is straightforward, blockchain's immutable nature makes it challenging to directly read and access information. Without a well-defined query language, extracting specific data from the blockchain becomes a time-consuming and intricate process, often requiring scanning each block individually.

2. Data Entanglement#

Blockchain's decentralized architecture and node structures have introduced complexity in organizing and retrieving data. Historical records in blockchain networks are distributed across events and stored separately within nodes. This intermingling of data makes it difficult to pinpoint and extract the required information. Furthermore, some public nodes impose restrictions on accessing certain events, resulting in delays and poor query performance.

3. Limited APIs#

The APIs available for interacting with blockchains often offer limited querying capabilities. While they may support simple queries like range-based searches for transactions within a specific time period, more complex queries involving advanced filtering or sorting options are often not readily available. This restricts the flexibility of retrieving precise and tailored information from the blockchain, posing challenges for applications requiring sophisticated data retrieval.

4. Scalability Concerns#

As blockchain networks expand and record increasing volumes of data, maintaining efficient indexing mechanisms becomes crucial. The scalability of indexing processes ensures optimal performance and avoids performance bottlenecks. So, developing scalable indexing solutions capable of keeping pace with the network's growth is a requirement.

The Graph for Efficient Blockchain Querying and Indexing#

There are several companies that now offer SQL-like ways of querying Ethereum-based blockchains to counter the issues with blockchain indexing. One of the most well-known is The Graph, an open-source project that uses a language called GraphQL to allow people to pull information from Ethereum-based blockchains.

With The Graph, developers no longer have to step through each block whenever they want to find the information they need. Rather, they can just use The Graph's APIs to pull information from databases. In fact, they don't even have to run their own servers to get access to that information.

Just as Ethereum is decentralized and there are many nodes validating transactions, The Graph has three different types of participants in its network:

  • Indexers: Who stakes GRT (The Graph’s Tokens) and offer indexing services, answering queries for users and earning GRT tokens in return
  • Delegators: Who delegate GRT tokens to Indexers, helping to secure the network even if they aren't running a node themselves
  • Curators: Who uses signaling to indicate Indexers which information is worth indexing

This system encourages indexers to invest the time, network, and computing resources required to run indexing services and ensures the Indexers supply high-quality information. DeFi project developers can then subscribe to these API services and use them to feed information into their projects.

The Graph supports any EVM-compatible blockchain, and there are several companies offering services based on their APIs. Since 2022, they've served more than 483 billion queries, across tens of thousands of projects and hundreds of nodes. As interest in DeFi continues to grow, those figures will only get bigger.

Unfortunately, these blockchain-focused solutions currently track only what's happening on the base layer. Tracking Layer 2 activity is more difficult since transactions are "rolled up" and settled in batches on Layer 1 at a later date. This is not only a boon for privacy and scaling but also makes analytics a more complex task.

Who Can Build Indexes on the Graph?#

One of the challenges of indexing the blockchain is the large requirement for storage and bandwidth. Indexers are incentivized to create indexes that track data that people want to query. Given the sheer number of tokens and projects operating on EVM-compatible blockchains, it's not guaranteed that one of the existing indexing nodes will have the information a given developer wants.

Fortunately, The Graph is open source, and the network is decentralized. So, anyone with the resources to contribute to the network can set up a new indexing node, although they're required to stake tokens to join the network.

Even if you don't have the computing resources but do have ideas, you can build a subgraph and pay an indexing node to track that information. The node will then scan the Ethereum blockchain and index if information relevant to your queries is found.

Several indexes are already covering many DeFi projects, gaming, NFTs, and general blockchain analytics. In addition, new use cases are being found all the time. If people are interested in the information you're tracking, curators will signal that, and your indexing node will be rewarded.

Integrating subgraphs into an application is a relatively simple task, thanks to the GraphQL language. And since there's a large community of nodes, accessing data quickly and reliably should not be an issue.

Final Words#

Blockchain indexing allows developers to create visualizations, track activity, and perform more detailed analytics than they were previously able to.

At Neptune Mutual, we understand how important it is to know what you're trading and investing in. As a DeFi insurance protocol, understanding risk is something that is close to our hearts. We have a cover marketplace for protecting users’ assets in some of the top DeFi and CeFi protocols.

If you run a DeFi project on EthereumArbitrum, or BNB Smart Chain, we're here to help you safeguard your tokens and your community from potential threats.

If you're interested in integrating our parametric insurance solutions into your DeFi project, contact our team today to discuss your needs.