Giter Club home page Giter Club logo

onboarding-to-bitcoin-core's Introduction

Onboarding to Bitcoin Core

Table of Contents

Getting started with Bitcoin Core

This document was created from Bitcoin Core at commit 4b5659c6b115315c9fd2902b4edd4b960a5e066e

Building Bitcoin Core from source

When building Bitcoin Core from source, there are some platform-dependant instructions to follow.

To learn how to build for your platform, visit the Bitcoin Core bitcoin/doc directory, and read the file named "build-*.md", where "*" is the name of your platform. For windows this is "build-windows.md", for macOS this is "build-osx.md" and for most linux distributions this is "build-unix.md".

There is also a guide by Jon Atack about how to compile and test Bitcoin Core.

Bitcoin developer journeys

It can be interesting to hear stories of how current contributors entered the space to hear about the approach they took and things they found useful, but also about any pitfalls they identified along their way.

amitiuttarwar, jonatack and jimmysong have kindly documented their experiences for others to read about and learn from.

Doxygen documentation

Bitcoin Core uses Doxygen to generate developer documentation automatically from its annotated C++ codebase. Developers can access documentation of the current release of Bitcoin Core online at doxygen.bitcoincore.org, or alternatively can generate documentation for their current head using make docs (see Generating Documentation for more info).

Bitcoin Core and GitHub

Bitcoin Core uses a GitHub-based workflow for development. The primary function of GitHub in the workflow is to discuss patches and connect them with review comments.

Whilst some other prominent projects, e.g. the Linux kernel use email for soliciting feedback and review, Bitcoin Core has used GitHub for many years. Initially Satoshi distributed the code through private emails and hosting source archives at bitcoin.org, and later by hosting on SourceForge (which used SVN but did not at that time have a pull request system like GitHub). The earliest reviewers submitted changes using patches either through email exchange with Satoshi, or by posting them on the bitcoin forum.

In August 2009, the source code was moved to GitHub by Sirius and development has remained there and used the GitHub workflows ever since.

Organisation & roles

Anyone who contributes code to the codebase is labelled a "contributor" by GitHub (and the community). As of Version 22.0 of Bitcoin Core, there are ~820 individual contributors credited with changes.

Some contributors are also labelled as "members" of the Bitcoin Core organisation. There are currently ~30 members of the organisation. These members are usually frequent contributors and have good technical knowledge of the codebase. Members also have some additional permissions over contributors, such as adding/removing tags on issues and pull requests, however being a member does not permit you to merge pull requests into the project.

Some members are also project "maintainers". There are currently 7 maintainers on the Bitcoin Core project, with that number generally slowly increasing. Pull requests (PRs) can only be merged into the main project by "maintainers". Whilst this gives the illusion that maintainers are in "control" of the project, the maintainers' role dictates that they should not be unilaterally deciding what PRs are merged and which aren’t. Instead they should be determining mergability of changes primarily based on the reviews and discussions of other contributors on the PR itself, on GitHub (or less commonly the #bitcoin-core-dev mailing list).

Working on that basis, the maintainers' role becomes largely "janitorial" in that they are simply executing the desires of the community review process; a community which is made up of a decentralised and diverse group of contributors.

In addition to maintainers, there are certain contributors (usually members) who are listed as "suggested reviewers" for certain areas of the codebase. This is because they are deemed to have a deep technical and/or philosophical understanding of this area of the project.

Note
In a normal workflow it is not necessary (or desirable) to request reviews from suggested reviewers, and in fact doing so without a "good reason" might be interpreted as being too pushy, having the opposite result than intended.

A list of maintainers and suggested reviewers can be found in the REVIEWERS document. As the document states, these are NOT the only people who should be reviewing pull requests. The project needs as many reviews on each PR as possible, ideally from a diverse range of reviewers.

The objective of the Bitcoin Core Organisation is therefore to represent an entity that is decentralised as much as practically possible, on a centralised platform. One where no single contributor, member or maintainer has unilateral control over what is/isn’t merged into the project. Having multiple maintainers, members, contributors and reviewers gives this objective the best chance of being realised.

Organisation fail-safes

"Rogue" PRs are occasionally submitted by contributors, however they are almost certain to be detected as part of the community review process. There has recently been discussion on the mailing list about purposefully testing malicious pull requests to test this property of the review process even further.

In the event that a maintainer goes rogue and starts merging controversial code, or conversely not merging changes desired by the community at large, then there are two possible avenues of recourse for users:

  1. Have the "lead maintainer" remove the malicious maintainer

  2. In the case that the lead maintainer themselves is the "rogue" agent: fork the project to a new GitHub repository and continue development there without them.

In the case that GitHub itself becomes the rogue entity, there have been numerous discussions about how to move away from GitHub, should the need ever arise.

Github workflow

The GitHub side of the Bitcoin Core workflow for contributors consists primarily of:

  • Issues

  • Pull Requests (PRs)

  • Reviews

  • Comments

Generally, issues are used for two purposes:

  1. Posting known issues with software, e.g. bug reports, crash logs

  2. Soliciting feedback on potential changes without providing associated code, as would be required in a Pull Request.

GitHub provides their own guide on mastering issues which is worth reading to understand the feature-set available when working with an issue.

Pull requests are where contributors can submit their code against the main codebase and solicit feedback on both the concept and the code implementation. Pull requests and issues are often linked to/from one another:

One common workflow is when an issue is opened to report a bug. After replicating the issue, a contributor creates a patch and then opens a pull request with their proposed changes.

In this case the contributor should, in addition to comments about the patch, reference that the patch fixes the issue. For a patch which fixes issue 22889 this would be done by writing "fixes #22889" in the PR description or in a commit message. In this case the syntax "fixes #issue-number" is caught by GitHub’s pull request linker.

Another good use of issues is for getting feedback on ideas which might require significant changes. This helps free the project from having too many PRs open which aren’t ready for review, and might waste reviewers' time. In addition this workflow can also save contributors their own valuable time, as a idea might be identified as unlikely to be accepted before the contributor spends their time writing the code for it.

Most code changes to bitcoin are proposed directly as pull requests — there’s no need to open an issue for every idea before implementing it, unless it will require significant changes. Additionally, other contributors (and would-be reviewers) will often agree with the approach of a change, but want to "see the implementation" before they can really pass judgement on it.

Reviews help to store and track reviews to PRs in a public way.

Comments (inside issues, PRs, discussions etc.) are where users can discuss relevant aspects of the item and have history of those discussions preserved for future reference. Often contributors having "informal" discussions about changes on e.g. IRC will be advised that they should echo the gist of their conversation as a comment so that the rationale behind changes can be determined in the future.

Research topics/questions

  • What stops a hacker hijacking the Bitcoin Core website and hosting malicious binaries?

    • How about malicious binaries hosted by linux package managers?

  • Where can you go for help if Bitcoin Core doesn’t build on your machine?

  • Before you create a pull request to the main bitcoin core repo, what checks should you do locally?

    • Are there any additional checks you can think of which are only run in the bitcoin core repo (and not your fork)?

Solo work

Git exercises

GitHub workflow basics

  • Fork the bitcoin core repository

  • Download a clone of your fork of the bitcoin project to your local machine

  • Checkout a tag, branch or pull request

Building bitcoin from source

Review a PR

  • Find a PR (which can be open or closed) on GitHub which looks interesting and/or accessible

  • Checkout the PR locally

  • Review the changes

    • Record any questions that arise during code review

  • Build the PR

  • Test the PR

  • Break a test / add a new test

  • Leave review feedback on GitHub, possibly including:

    • ACK/NACK

    • Approach

    • How you reviewed it

    • Your system specifications if relevant

    • Suggesting nits

Create a test using test framework

Group work

  • Each submit a PR on a team member’s fork of Bitcoin Core (not the main repo)

  • Review a different team member’s PR

  • Submit your review of the PR as a GitHub comment on the PR

Removed text

Goals

  • Learn how the Bitcoin Core project uses GitHub

  • Learn how to compile the code from source

  • Learn how to run the test suite

  • Learn about other developers journeys into bitcoin dev

  • PR review process

Concepts

  • GitHub usage

  • Git usage

  • Building bitcoin from source code

  • Running the test suite

Overview & architecture of Bitcoin Core

This document was created from Bitcoin Core at commit 4b5659c6b115315c9fd2902b4edd4b960a5e066e

Decentralised development

Olivia Lovenmark and Amiti Uttarwar describe in their blog post Developing Bitcoin how changes to bitcoin follow the pathway from proposal to being merged into the software, and finally adopted by users.

Bitcoin Core development process and documentation

The Bitcoin Core project itself contains two documents of particular interest to contributors:

  1. CONTRIBUTING.md — How to get started contributing to the project.

  2. developer-notes.md — Development guidelines, coding style etc.

Reviews

Jon Atack’s article How To Contribute Pull Requests To Bitcoin Core describes some less-obvious requirements that any pull request you make might be subjected to during peer review, for example that it needs an accompanying test, or that an intermediate commit on the branch doesn’t compile. It also describes the uncodified expectation that contributors should not only be writing code, but more importantly be reviewing others' pull requests. Most developers enjoy writing their own code more than reviewing code from others, but the decentralised review process is arguably the most critical defence Bitcoin development has against malicious actors and therefore important to try and uphold.

Note
Jon’s estimates of "5-15 PR reviews|issues solved" per PR submitted is not a hard requirement, just what Jon personally feels would be best for the project. Don’t be put off submitting a potentially valuable pull request just because "you have not completed enough reviews"!

Gloria Zhao’s review checklist details what a 'good' review might look like along with some examples what she considers 'good' reviews. In addition to this, it details how potential reviewers can approach a new PR they have chosen to review, along with the sorts of questions they should be asking (and answering) in order to provide a meaningful review themselves.

Some examples of the subject areas Gloria covers include the PR’s subject area, motivation, downsides, approach, security and privacy risks, implementation of the idea, performance impact, concurrency footguns, tests and needed documentation.

Commit messages

When writing commit messages be sure to have read Chris Beams' How to Write a Git Commit Message blog post. As described in CONTRIBUTING.md pull requests should be prefixed with the component or area the PR affects. Common areas are listed in CONTRIBUTING.md Creating the pull request. In addition to this, individual commit messages are also often given similar prefixes in the commit title depending on which area of the codebase the changes primarily affect.

Build issues

Some compile-time issues can be caused by an unclean build directory. The comments in issue 19330 provide some clarifications and tips on how other contributors clean their directories, as well as some ideas for shell aliases to boost productivity.

Debugging Bitcoin Core

Fabian Jahr has created a guide, Debugging Bitcoin Core, aimed at detailing the ways in which various Bitcoin Core components can be debugged, including the Bitcoin Core binary itself, unit tests, functional tests along with an introduction to core dumps and the Valgrind memory leak detection suite.

Of particular note are the configure flags used to build Bitcoin Core without optimisations to permit effective debugging of the Bitcoin Core binary.

Fabian has also presented on this topic a number of times: Firstly as part of his ChainCode labs' residency. And later as part of Scaling Bitcoin 2019.

Bitcoin Cores architecture

lsilva01 has written a deep technical dive into the architecture of Bitcoin Core as part of the bitcoin core onboarding documentation in Bitcoin Architecture.

Once you’ve gained some insight into the architecture of the program itself you can learn further details about which code files implement which functionality using the document Bitcoin Core regions.

James O’Beirne has recorded 3 videos which go into detail on how the codebase is laid out, how the build system works, what devtools there are, as well as what the primary function of many of the files are:

Signet test network

Signet is both a tool that allows developers to create their own networks for testing interactions between different Bitcoin software and the name of the most popular of these testing networks. Signet was codified in BIP325.

To connect to the "main" Signet network, simply start bitcoind with the signet flag, e.g. bitcoind -signet. Don’t forget to also pass the signet flag to bitcoin-cli if using it to control bitcoind, e.g. bitcoin-cli -signet. Instructions on how to setup your own Signet network can be found in the Bitcoin Core Signet README.md. The Bitcoin wiki Signet page provides additional background on Signet.

BIPs

Bitcoin uses Bitcoin Improvement Proposals, or BIPs, as a design document for introducing new features or behaviour into bitcoin. Bitcoin magazine describes what a BIP is in their article What Is A Bitcoin Improvement Proposal (BIP), specifically highlighting how BIPs are not necessarily binding documents required to achieve consensus.

The BIPs are hosted on GitHub and include BIP2 which self-describes the BIP process in more detail. Of particular interest might be the sections BIP Types and BIP Workflow.

The BIP process

Bitcoin Core issue #22665 described how BIP125 was not being strictly adhered to by Bitcoin Core. This raised questions amongst developers about whether the code or the BIP should act as the specification, with most developers expressing that they felt that the code was the spec, and any BIP generated was merely a design document to aid with re-implementation by others. Note that this view was not completely unanimous in the community.

For consensus-critical code most Bitcoin Core developers consider "the code is the spec" to be the ultimate source of truth. A knock-on effect of this was that there were calls for review on BIP2 itself, with newly-appointed BIP maintainer Karl-Johan Alm (a.k.a. kallewoof) posting his thoughts to the bitcoin-dev mailing list.

Getting started with development

What are the best ways to get started with Bitcoin Core development? As mentioned earlier, one of the roles most in demand from the project is that of code review, and in fact this is also one of the best ways of getting familiarised with the codebase too! Reviewing a few PRs, and importantly submitting your review to GitHub on the PR can be really valuable. This Google Code Health blog post gives some good advice on how to go about code review and getting past "feeling that you’re not as smart as the programmer who wrote the change". If you’re going to ask some questions as part of review, try and keep questions respectful.

Aside from review, there are 3 main avenues which might lead you to submitting your own pull request to the repository:

  1. Finding a good first issue, as tagged in the issue tracker

  2. Fixing a bug (you’ve found yourself?)

  3. Adding a new feature (that you want for yourself?)

Of these three, I’d highly recommend choosing a good first issue from an area of the codebase that seems interesting to you. The reason is that these have been somewhat implicitly "concept ACKed" by other contributors as "something that is likely worth working on".

Hopefully now you have an idea of roughly what your PR is going to do; often this is the hardest part to getting started! If you don’t have a bugfix or new feature in mind, and you’re struggling to find a good first issue which looks suitable for you, don’t panic. Instead keep reviewing other developers' PRs to continue improving your understanding of the process (and the codebase), while you watch the issue tracker for something which you like the look of.

Now that you’ve decided what to work on it’s time to take a look at the current behaviour of that part of the code and perhaps more importantly, try to understand why this was originally implemented in this way. This process of code "archaeology" will prove invaluable in the future when you are trying to learn about other parts of the codebase on your own.

Codebase archaeology

When considering changing code it can be helpful to try and first understand the rationale behind why it was implemented that way originally, if possible. One of the best ways to do this is by using a combination of git tools — git blame, git log -S, and less commonly git log -G — and the discussions on GitHub.

git blame

The git blame command will show you when and by who a particular line of code was last changed by.

For example, if we checkout Bitcoin Core at v22.0 and we are planning to make a change related to the m_addr_send_times_mutex found in src/net_processing.cpp, we might want to find out more about its history before touching it.

With git `blame we can find out the last person who touched this code:

# Find the line number for blame
$ grep -n m_addr_send_times_mutex src/net_processing.cpp
233:    mutable Mutex m_addr_send_times_mutex;
235:    std::chrono::microseconds m_next_addr_send GUARDED_BY(m_addr_send_times_mutex){0};
237:    std::chrono::microseconds m_next_local_addr_send GUARDED_BY(m_addr_send_times_mutex){0};
4304:    LOCK(peer.m_addr_send_times_mutex);
$ git blame -L233,233 src/net_processing.cpp

76568a3351 (John Newbery 2020-07-10 16:29:57 +0100 233)     mutable Mutex m_addr_send_times_mutex;

With this information we can easily look up that commit to gain some additional context:

$ git show 76568a3351

───────────────────────────────────────
commit 76568a3351418c878d30ba0373cf76988f93f90e
Author: John Newbery <[email protected]>
Date:   Fri Jul 10 16:29:57 2020 +0100

    [net processing] Move addr relay data and logic into net processing

So we’ve learned now that this mutex was moved here by John from net.{cpp|h} in it’s most recent touch. Let’s see what else we can find out about it.

git log -S

git log -S allows us to search for commits where this line was modified (not where it was only moved, for that use git log -G). A 'modification' (vs. a 'move') in git terms implies that there are uneven instances of the search term in the commit diffs add/remove sections.

$ git log -S m_addr_send_times_mutex
───────────────────────────────────────
commit 76568a3351418c878d30ba0373cf76988f93f90e
Author: John Newbery <[email protected]>
Date:   Fri Jul 10 16:29:57 2020 +0100

    [net processing] Move addr relay data and logic into net processing

───────────────────────────────────────
commit ad719297f2ecdd2394eff668b3be7070bc9cb3e2
Author: John Newbery <[email protected]>
Date:   Thu Jul 9 10:51:20 2020 +0100

    [net processing] Extract `addr` send functionality into MaybeSendAddr()

    Reviewer hint: review with

     `git diff --color-moved=dimmed-zebra --ignore-all-space`

───────────────────────────────────────
commit 4ad4abcf07efefafd439b28679dff8d6bbf62943
Author: John Newbery <[email protected]>
Date:   Mon Mar 29 11:36:19 2021 +0100

    [net] Change addr send times fields to be guarded by new mutex

We can see that John also originally added this to net.{cpp|h}, before later moving it into net_processing.{cpp|h} as part of a push to separate out addr relay data and logic from net.cpp.

PR discussion

To get even more context we can take a look at the comments on the PR where this mutex was introduced (or at any subsequent commit where it was modified). To find the PR you can either paste the commit hash (4ad4abcf07efefafd439b28679dff8d6bbf62943) into GitHub, or list merge commits in reverse order, showing oldest merge with the commit at the top, e.g.:

$ git log --merges --reverse --oneline --ancestry-path 4ad4abcf07efefafd439b28679dff8d6bbf62943..upstream | head -n 1

d3fa42c79 Merge bitcoin/bitcoin#21186: net/net processing: Move addr data into net_processing

Reading up on PR 21186 will hopefully provide us with even more context. For example we can see from the linked issue 19398 what the motivation for this move was.

Solo work

TODO: Add questions on current architecture of Core

Group work

Signet

Either:

  • One member of the group create a private signet as documented on the Bitcoin Wiki Custom Signet page.

  • Distribute the signetchallenge value

  • One or all group members can act as Signet miners

  • Have all group members connect in to the custom signet

OR:

  • Group members request some signet coins from the signet faucet or using the getcoins.py script.

    Note
    The Signet getcoins.py script may not work if a captcha has been added to the site.

THEN:

  • Send coins around the group

Removed Text

Goals

  • How are changes made to Bitcoin Core?

  • Development environment optimisations

  • How is Bitcoin Core source code organised

  • What’s the BIP process?

    • What type of changes require a BIP?

  • Learn how to test changes on a live distributed test network

Concepts

  • Decentralised Development

  • BIPs

  • Bitcoin Core development

  • Bitcoin Core architecture

  • Signet

Consensus

This document was created from Bitcoin Core at commit 4b5659c6b115315c9fd2902b4edd4b960a5e066e

One of the most fundamental concepts behind the bitcoin network is that nodes are able to maintain decentralised consensus with each other. The primary mechanism behind this relies on all nodes validating each transaction and block they learn about against their own copy of the (consensus) rules. The secondary mechanism is that all nodes should follow the chain with the most cumulative proof-of-work. The product of following these two mechanisms is that all nodes in the network will eventually converge onto a single canonical chain. For more information on how the bitcoin networks' decentralised consensus mechanism works see the Mastering Bitcoin section on decentralized consensus.

Consensus in Bitcoin Core

Review of the design of Bitcoin Core from Overview and Architecture will naturally lead to a region of the project titled "consensus/" which one might conclude contains all the logic for maintaining consensus. However this is not entirely the case…​

Aspects of consensus-enforcement code can be found across the Bitcoin Core codebase in a number of regions and files, including notably:

📂 bitcoin
  📂 src
    📂 consensus
    📂 policy
    📄 validation.h
    📄 validation.cpp

Why is such a critical function split up between many files, and how do they all interact? Part of the answer can be learned from sdaftuar’s Stack Exchange answer to the question "What is the difference between policy and consensus when it comes to a Bitcoin Core node validating scripts?"

The answer teaches us that policy checks are a superset of validation checks, that is to say that a transaction that passes policy checks has implicitly passed consensus checks too. Nodes perform policy-level checks on all transactions they learn about before adding them to their local mempool. Many of the policy checks contained in policy are called from inside validation, in the context of adding a new transaction to the mempool.

Consensus bugs

Pieter Wuille disclosed the possibility of a consensus failure related to signature verification when using OpenSSL. The issue was that OpenSSL was accepting multiple signature serialization formats (for the same transaction) as valid. This meant that a transaction’s ID (txid) could be changed, because the signature contributes to the txid hash.

There were a few main cases to consider:

  1. first party malleation: signature length descriptor is extended to 5 bytes

  2. third party malleation: signatures are "slightly" tweaked (or padded)

  3. third party malleation: negating the S value of the ECDSA signature

In the length descriptor case there is a higher risk of causing a consensus-related chainsplit. The first party (the sender) can create a valid (normal length) signature, but which uses a 5 byte length descriptor meaning that it might not be accepted by OpenSSL on all platforms.

In the second case, of signature tweaking or padding, there is a lesser risk of causing a consensus-related chainsplit. However the ability of third parties to tamper with valid transactions may open up off-chain attacks related to Bitcoin services or layers (e.g. Lightning) in the event that they are relying on txids to track transactions.

It is interesting to consider the order of the steps taken to fix this potential vulnerability:

  1. First the default policy in Bitcoin Core was altered (via isStandard()) to prevent the software from relaying or accepting into the mempool transactions with non-DER signature encodings.
    This was carried out in PR #2520.

  2. Following the policy change, the strict encoding rules were later enforced by consensus in PR #5713.

Do you think this approach — first altering policy, followed later by consensus — made sense for implementing the changes needed to fix this consensus vulnerability? In what circumstances might it not make sense? Having OpenSSL as a consensus-critical dependency to the project was ultimately fixed in PR #6954 which switched to using libsecp256k1 for signature verification.

Database consensus

Historically Bitcoin Core used Berkeley DB (BDB) for transaction and block indices. In 2013 a migration to LevelDB for these indices was included with Bitcoin Core v0.8. What developers at the time could not foresee is that nodes that were still using BDB for these indices (all pre 0.8 nodes), were silently consensus-bound by a relatively obscure BDB-specific database lock counter…​

BDB required a configuration setting for the total number of locks available to your database. Bitcoin Core was also interpreting failure to grab the required number of locks as the block being invalid — a consensus failure. This combination caused some BDB-using nodes to mark blocks created by LevelDB-using nodes as invalid and caused a consensus split. BIP 50 provides further explanation on this incident.

Note that that database code is not found in, or even in close proximity to, the /src/consensus region of the codebase.

Validation

This document was created from Bitcoin Core at commit 4b5659c6b115315c9fd2902b4edd4b960a5e066e

Transaction validation

We can follow most of the journey of a transaction through Bitcoin Core by following glozow’s notes on transaction Validation and submission to the mempool. glozow details what different types of checks are run on a new transaction before it’s accepted into the nodes local mempool — consensus vs policy, script vs non-script, contextual vs context-free.

glozow continues with sections on P2P transaction relay, orphans and mining, but more relevant to consensus is the following section, Block Validation, which describes the consensus checks performed on newly-learned blocks, specifically:

Since v0.8, Bitcoin Core nodes have used a UTXO set rather than blockchain lookups to represent state and validate transactions. To fully validate new blocks nodes only need to consult their UTXO set and knowledge of the current consensus rules. Since consensus rules depend on block height and time (both of which can decrease during a reorg), they are recalculated for each block prior to validation.

Regardless of whether or not transactions have already been previously validated and accepted to the mempool, nodes check block-wide consensus rules (e.g. total sigop cost, duplicate transactions, timestamps, witness commitments block subsidy amount) and transaction-wide consensus rules (e.g. availability of inputs, locktimes, and input scripts) for each block.

Script checking is parallelized in block validation. Block transactions are checked in order (and coins set updated which allows for dependencies within the block), but input script checks are parallelizable. They are added to a work queue delegated to a set of threads while the main validation thread is working on other things. While failures should be rare - creating a valid proof of work for an invalid block is quite expensive - any consensus failure on a transaction invalidates the entire block, so no state changes are saved until these threads successfully complete.

If the node already validated a transaction before it was included in a block, no consensus rules have changed, and the script cache has not evicted this transaction’s entry, it doesn’t need to run script checks again - it just uses the script cache!

— glozow

The section on script verification also highlights how the script interpreter is called from at least 3 distinct sites within the codebase:

Having considered both transactions that were already known about (in the mempool), and any new transactions that were first learned about in the block itself (as part of block validation), we now understand both ways a transaction can be deemed consensus-valid.

Multiple chains

TODO: Reorgs, undo data, DisconnectBlock

Bitcoin nodes should ultimately converge in consensus on the most-work chain. Being able to track and monitor multiple chain (tips) concurrently is a key requirement for this to take place. There are a number of different states which the client must be able to handle:

  1. A single, most-work chain being followed

  2. Stale blocks learned about but not used

  3. Full reorganisation from one chain tip to another

BlockManager is tasked with maintaining a tree of all blocks learned about, along with their total work so that the most-work chain can be quickly determined.

CChainState is responsible for updating our local view of the best tip, including reading and writing blocks to disk, and updating the UTXO set. A single BlockManager is shared between all instances of CChainState.

ChainstateManager is tasked with managing multiple CChainStates. Currently just a "regular" IBD chainstate and an optional snapshot chainstate, which might in the future be used as part of the assumeUTXO project.

When a new block is learned about (from src/net_processing.cpp) it will call into ChainstateManagers ProcessNewBlockHeaders method to validate it.

Exercises

  1. What is the difference between contextual and context-free validation checks?

    Contextual checks require some knowledge of the current "state", e.g. ChainState, chain tip or UTXO set.

    Context-free checks only require the information required in the transaction itself.

    See {glozow-tx-mempool-validation}[glozow-tx-mempool-validation] for more info.

  2. What are some examples of each?

    context-free:

  3. In which function(s) do UTXO-related validity checks happen?

    ConnectBlock()

  4. What type of validation checks are CheckBlockHeader() and CheckBlock() performing?

    context-free

  5. Which class is in charge of managing the current blockchain?

    ChainstateManager()

  6. Which class is in charge of managing the UTXO set?

    CCoinsViews()

  7. Which functions are called when a longer chain is found that we need to re-org onto?

    TODO

  8. Are there any areas of the codebase where the same consensus or validation checks are performed twice?

    Again see glozows notes for examples

  9. Why does CheckInputsFromMempoolAndCache exist?

    To prevent us from re-checking the scripts of transactions already in our mempool during consensus validation on learning about a new block

  10. Which function(s) are in charge of validating the merkle root of a block?

    BlockMerkleRoot() and BlockWitnessMerkleRoot() construct a vector of merkle leaves, which is then passed to ComputeMerkleRoot() for calculation.

  11. Can you find any evidence (e.g. PRs) which have been made in an effort to modularize consensus code?

    A few examples: #10279, #20158

  12. What is the function of BlockManager()?

    It manages the current most-work chaintip and pruning of unneeded blocks (*.blk) and associated undo (*.rev) files

  13. What stops a malicious node from sending multiple invalid headers to try and use up a nodes' disk space? (hint: these might be stored in BlockManager.m_failed_blocks)

    Even invalid headers would need a valid proof of work which would be too costly to construct for a spammer

  14. Which functions are responsible for writing consensus-valid blocks to disk?

    TODO: answer

  15. Are there any other components to Bitcoin Core which, similarly to the block storage database, are not themselves performing validation but can still be consensus-critical?

    Not sure myself, sounds like an interesting question though!

  16. In which module (and class) is signature verification handled?

    src/script/interpreter.cpp#BaseSignatureChecker

  17. Which function is used to calculate the Merkle root of a block, and from where is it called?

    src/consensus/merkle.cpp#ComputeMerkleRoot is used to compute the merkle root.

    It is called from src/chainparams.cpp#CreateGenesisBlock, src/miner.cpp#IncrementExtraNonce & src/miner.cpp#RegenerateCommitments and from src/validation.cpp#CheckBlock to validate incoming blocks.

  18. Practical question on Merkle root calculation

    TODO, add exercise

Removed text

The outline of the mechanism at work is that a node relaying a transaction can slightly modify the signature in a way which is still acceptable to the underlying OpenSSL module. Once the signature has been changed, the transaction ID (hash) will also change. If the modified transaction is then included in a block, before the original, the effect is that the sender will still see the outgoing transaction as "unconfirmed" in their wallet. The sender wallet should however also see the accepted (modified) outgoing transaction, so their balance will be calculated correctly, only a "stuck doublespend" will pollute their wallet. The receiver will not perceive anything unordinary, unless they were tracking the incoming payment using the txid as given to them by the sender.

Wallet

This document was created from Bitcoin Core at commit 4b5659c6b115315c9fd2902b4edd4b960a5e066e

Overview

  1. Wallets are stored on disk as databases, either using Berkeley Database (BDB) or sqlite format.

  2. These wallets can be one of two types, "legacy" or "descriptor".

  3. Wallets do not have to store the private keys associated with the addresses and public keys they are monitoring.

Wallet architecture

Separation of wallet and node functionality

Both the bitcoind and bitcoin-qt programs use the same source code for wallet, networking, consensus etc. bitcoin-qt is not simply a wallet/gui "frontend" for bitcoind but a stand-alone binary which happens to share much of the same code. There has been discussion since at least as early as 2014 about splitting wallet code out from the rest of the codebase, however this has not been completed yet.

The Process Separation project is tracking development working towards separating out node, wallet and GUI code even further. In the mean time developers have preferred to focus on improving the organisation of the (wallet) source code within the project and to focus on making wallet code more asynchronous and independent of node code, to avoid locking the node while wallet code-paths are executing.

Wallet interfaces

In order to facilitate code separation, distinct interfaces between the node and the wallet have been created:

  • The node holds a WalletImpl interface to call functions on the wallet.

  • The wallet holds a ChainImpl interface to call functions on the node.

  • The node notifies the wallet about new transactions and blocks through the CValidationInterface.

Wallet component initialisation

The wallet component is initialised via the WalletInitInterface class as specified in src/walletinitinterface.h. The member functions are marked as virtual in the WalletInitInterface definition, indicating that they are going to be overridden later by a derived class.

src/walletinitinterface.h
class WalletInitInterface {
public:
    /** Is the wallet component enabled */
    virtual bool HasWalletSupport() const = 0;
    /** Get wallet help string */
    virtual void AddWalletOptions(ArgsManager& argsman) const = 0;
    /** Check wallet parameter interaction */
    virtual bool ParameterInteraction() const = 0;
    /** Add wallets that should be opened to list of chain clients. */
    virtual void Construct(NodeContext& node) const = 0;

    virtual ~WalletInitInterface() {}
}

Both walletinit.cpp and dummywallet.cpp include derived classes which override the member functions of WalletInitInterface, depending on whether the wallet is being compiled in or not.

The primary src/Makefile.am describes which of these modules is chosen to override: if ./configure has been run with the wallet feature enabled (default), then wallet/init.cpp is added to the sources, otherwise (./configure --disable-wallet) dummywallet.cpp is added.

src/Makefile.am
if ENABLE_WALLET
libbitcoin_server_a_SOURCES += wallet/init.cpp
endif
if !ENABLE_WALLET
libbitcoin_server_a_SOURCES += dummywallet.cpp
endif

src/walletinitinterface.h declares the global g_wallet_init_interface which will handle the configured WalletInitInterface.

The wallet interface is created when the Construct() method is called on the g_wallet_init_interface object by AppInitInterfaces() in init.cpp. Construct takes a reference to a NodeContext as argument, and then checks that the wallet has not been disabled by a runtime argument before calling interfaces::MakeWalletClient() on the node. This initialises a new WalletClientImpl object which is then added to the node object, both to the general list of node.chain_clients (wallet processes or other clients which want chain information from the node) in addition to being assigned as the unique node.wallet_client role, which specifies the particular node.chain_client that should be used to load or create wallets.

src/wallet/init.cpp
void WalletInit::Construct(NodeContext& node) const
{
    ArgsManager& args = *Assert(node.args);
    if (args.GetBoolArg("-disablewallet", DEFAULT_DISABLE_WALLET)) {
        LogPrintf("Wallet disabled!\n");
        return;
    }
    auto wallet_client = interfaces::MakeWalletClient(*node.chain, args);
    node.wallet_client = wallet_client.get();
    node.chain_clients.emplace_back(std::move(wallet_client));

The NodeContext struct is defined as the following:

src/node/context.h

…​contains references to chain state and connection state.

…​used by init, rpc, and test code to pass object references around without needing to declare the same variables and parameters repeatedly, or to use globals…​ The struct isn’t intended to have any member functions. It should just be a collection of references that can be used without pulling in unwanted dependencies or functionality.

Wallets and program initialisation

Wallets can optionally be loaded as part of main program startup (i.e. from src/init.cpp). Any wallets loaded during the life cycle of the main program are also unloaded as part of program shutdown.

Specifying wallets loaded at startup

Wallet(s) to be loaded as part of program startup can be specified by passing -wallet= or -walletdir= arguments to bitcoind/bitcoin-qt. If the wallet has been compiled in but no -wallet*= arguments have been passed, then the default wallet directory ($datadir/wallets) will be checked as per GetWalletDir():

src/wallet/walletutil.cpp#GetWalletDir()
fs::path GetWalletDir()
{
    fs::path path;

    if (gArgs.IsArgSet("-walletdir")) {
        path = gArgs.GetArg("-walletdir", "");
        if (!fs::is_directory(path)) {
            // If the path specified doesn't exist, we return the deliberately
            // invalid empty string.
            path = "";
        }
    } else {
        path = GetDataDir();
        // If a wallets directory exists, use that, otherwise default to GetDataDir
        if (fs::is_directory(path / "wallets")) {
            path /= "wallets";
        }
    }

    return path;
}

Wallets can also be loaded after program startup via the loadwallet RPC.

VerifyWallets

Wallet verification refers to verification of the -wallet arguments as well as the underlying wallet database(s) on disk.

Wallets loaded via program arguments are first verified as part of AppInitMain() which first verifies wallet database integrity by calling VerifyWallets() via the WalletClientImpl override of client→verify().

VerifyWallets() takes an interfaces::Chain object as argument, which is currently used primarily to send init and error messages (about wallet verification) back to the GUI. VerifyWallets() starts by checking that the walletdir supplied by argument, or default of "", is valid. Next it loops through all wallets it finds in the walletdir and adds them to an std::set called wallet_paths, first deduplicating them by tracking their absolute paths, and then checking that the WalletDatabase for each wallet exists (or is otherwise constructed successfully) and can be verified.

src/wallet/load.cpp#VerifyWallets()
// ...

for (const auto& wallet_file : gArgs.GetArgs("-wallet")) {
    const fs::path path = fsbridge::AbsPathJoin(GetWalletDir(), wallet_file);

    if (!wallet_paths.insert(path).second) {
        chain.initWarning(strprintf(_("Ignoring duplicate -wallet %s."), wallet_file));
        continue;
    }

    DatabaseOptions options;
    DatabaseStatus status;
    options.require_existing = true;
    options.verify = true;
    bilingual_str error_string;
    if (!MakeWalletDatabase(wallet_file, options, status, error_string)) {
        if (status == DatabaseStatus::FAILED_NOT_FOUND) {
            chain.initWarning(Untranslated(strprintf("Skipping -wallet path that doesn't exist. %s", error_string.original)));
        } else {
            chain.initError(error_string);
            return false;
        }
    }
}

// ...

If this check passes for all wallets, then VerifyWallets() is complete and will return true to calling function AppInitMain, otherwise false will be returned. If VerifyWallets() fails and returns false (due to a corrupted wallet database, but notably not due to an incorrect wallet path), the main program process AppInit() will be immediately interrupted and shutdown.

LoadWallets

"Startup" wallet(s) are loaded when client→load() is called on each node.chain_client as part of init.cpp.

src/init.cpp#AppInitMain()
for (const auto& client : node.chain_clients) {
    if (!client->load()) {
        return false;
    }
}

The call to load() on the wallet chain_client has again been overridden, this time by WalletClientImpl's LoadWallets() method. This function works similarly to VerifyWallets(), first creating the WalletDatabase (memory) object for each wallet, although this time skipping the verify step, before creating a CWallet object from the database and adding it to the global list of wallets, the vector vpwallets, by calling AddWallet().

src/wallet/load.cpp#LoadWallets()
for (const std::string& name : gArgs.GetArgs("-wallet")) {
    if (!wallet_paths.insert(name).second) {
        continue;
    }
    DatabaseOptions options;
    DatabaseStatus status;
    options.require_existing = true;
    options.verify = false; // No need to verify, assuming verified earlier in VerifyWallets()
    bilingual_str error;
    std::vector<bilingual_str> warnings;
    std::unique_ptr<WalletDatabase> database = MakeWalletDatabase(name, options, status, error);
    if (!database && status == DatabaseStatus::FAILED_NOT_FOUND) {
        continue;
    }
    std::shared_ptr<CWallet> pwallet = database ? CWallet::Create(chain, name, std::move(database), options.create_flags, error, warnings) : nullptr;
    if (!warnings.empty()) chain.initWarning(Join(warnings, Untranslated("\n")));
    if (!pwallet) {
        chain.initError(error);
        return false;
    }
    AddWallet(pwallet);
}
Caution

There are a number of steps in init.cpp that happen before the wallet is loaded, notably the blockchain is synced first. This is a safeguard which means that wallet operations cannot be called on a wallet which has been loaded against stale blockchain data.

Note

init.cpp is run on a single thread. This means that calls to wallet code block further initialisation of the node.

The interfaces::Chain object taken as argument by LoadWallets() is used to pass back any error messages, exactly as it was in VerifyWallets(). AddWallet() is defined in src/wallet.cpp.

StartWallets

The wallet is finally ready when (all) chain_clients have been started in init.cpp which calls the overridden client→start() method from the WalletClientImpl class, resulting in src/wallet/load.cpp#StartWallets() being called.

This calls the GetWallets() function which returns the vector of pointers to the interfaces for loaded CWallet objects, vpwallets. As part of startup PostInitProcess() is called on each wallet which, after grabbing the main wallet lock cs_wallet, synchronises the wallet and mempool by adding wallet transactions not yet in a block to our mempool, and updating the wallet with any relevant transactions from the mempool.

src/wallet/wallet.cpp#CWallet::PostInitProcess()
void CWallet::postInitProcess()
{
    LOCK(cs_wallet);

    // Add wallet transactions that aren't already in a block to mempool
    // Do this here as mempool requires genesis block to be loaded
    ReacceptWalletTransactions();

    // Update wallet transactions with current mempool transactions.
    chain().requestMempoolTransactions(*this);
}

Also, as part of StartWallets, flushwallet might be scheduled (if configured by argument) scheduling wallet transactions to be re-broadcast every second, although this interval is delayed upstream with a random timer.

FlushWallets

All wallets loaded into the program are "flushed" (to disk) before shutdown. As part of init.cpp#Shutdown() the flush() method is called on each member of node.chain_clients in sequence. WalletClientImpl again overrides this method to call wallet/load.cpp#FlushWallets() which makes sure all wallet changes have been successfully flushed to the wallet database.

src/init.cpp#shutdown()
// FlushStateToDisk generates a ChainStateFlushed callback, which we should avoid missing
if (node.chainman) {
    LOCK(cs_main);
    for (CChainState* chainstate : node.chainman->GetAll()) {
        if (chainstate->CanFlushToDisk()) {
            chainstate->ForceFlushStateToDisk();
        }
    }
}

Finally the stop() method is called on each member of node.chain_clients which is overridden by StopWallets(), flushing again and this time calling close() on the database file.

Wallet Locks

Grepping the src/wallet directory for locks, conventionally of the form cs_*, yields 501 matches. For comparison the entire remainder of the codebase excluding src/wallet/* yields 925 matches. Many of these matches are asserts and declarations, however this still illustrates that the wallet code is highly reliant on locks to perform atomic operations.

The cs_wallet lock

In order to not block the rest of the program during wallet operations, each CWallet has its own recursive mutex cs_wallet:

Note
There is currently an issue tracking replacement of RecursiveMutexes with Mutexes, to make locking logic easier to follow in the codebase.
src/wallet/wallet.h
/*
 * Main wallet lock.
 * This lock protects all the fields added by CWallet.
 */
mutable RecursiveMutex cs_wallet;

Most wallet operations whether reading or writing data require the use of the lock so that atomicity can be guaranteed. Some examples of wallet operations requiring the lock include:

  1. Creating transactions

  2. Signing transactions

  3. Broadcasting/committing transactions

  4. Abandoning transactions

  5. Bumping transaction (fees)

  6. Checking IsMine

  7. Creating new addresses

  8. Calculating balances

  9. Creating new wallets

  10. Importing new {priv|pub}keys/addresses

  11. Importing/dumping wallets

In addition to these higher level functions, most of CWallet's private member functions also require a hold on cs_wallet.

Other wallet locks

  1. src/wallet/bdb.cpp, which is responsible for managing BDB wallet databases on disk, has it’s own mutex cs_db.

  2. If external signers have been enabled (via ./configure --enable-external-signer) then they too have their own mutex cs_desc_man which is acquired when descriptors are being setup.

  3. BlockUntilSyncedToCurrentChain() has a unique lock exclude placed on it to prevent the caller from holding cs_main during its execution, and therefore prevent a possible deadlock:

    src/wallet/wallet.h
    /**
     * Blocks until the wallet state is up-to-date to /at least/ the current
     * chain at the time this function is entered
     * Obviously holding cs_main/cs_wallet when going into this call may cause
     * deadlock
     */
    void BlockUntilSyncedToCurrentChain() const LOCKS_EXCLUDED(::cs_main) EXCLUSIVE_LOCKS_REQUIRED(!cs_wallet);

Controlling the wallet

As we can see wallet component startup and shutdown is largely driven from outside the wallet codebase from src/init.cpp.

Once the wallet component is started and any wallets supplied via argument have been verified and loaded, wallet functionality ceases to be called from init.cpp and instead is controlled using external programs in a number of ways. The wallet can be controlled using bitcoin-cli, the bitcoin-qt GUI or the stand-alone bitcoin-wallet tool.

Both bitcoind and bitcoin-qt run a (JSON) RPC server which is ready to service, amongst other things, commands to interact with wallets. The command line tool bitcoin-cli will allow interaction of any RPC server started by either bitcoin or bitcoin-qt.

Tip
If using bitcoin-qt there is also an RPC console built into the GUI.

If using the bitcoin-qt GUI itself then communication with the wallet is done directly via qt’s WalletModel interface.

Commands which can be used to control the wallet via RPC are listed in rpcwallet.cpp.

Wallet via RPC

If we take a look at the loadwallet RPC we can see similarities to WalletClientImpl's LoadWallets() function.

However this time the function will check the WalletContext to check that we have a wallet context (in this case a reference to a chain interface) loaded. Next it will call wallet.cpp#LoadWallet which starts by grabbing g_wallet_loading_mutex and adding the wallet to g_loading_wallet_set, before calling LoadWalletInternal which adds the wallet to vpwallets and sets up various event notifications.

src/wallet/rpcwallet.cpp#loadwallet()
WalletContext& context = EnsureWalletContext(request.context);
const std::string name(request.params[0].get_str());

DatabaseOptions options;
DatabaseStatus status;
options.require_existing = true;
bilingual_str error;
std::vector<bilingual_str> warnings;
std::optional<bool> load_on_start = request.params[1].isNull() ? std::nullopt : std::optional<bool>(request.params[1].get_bool());
std::shared_ptr<CWallet> const wallet = LoadWallet(*context.chain, name, load_on_start, options, status, error, warnings);
if (!wallet) {
    // Map bad format to not found, since bad format is returned when the
    // wallet directory exists, but doesn't contain a data file.
    RPCErrorCode code = RPC_WALLET_ERROR;
    switch (status) {
        case DatabaseStatus::FAILED_NOT_FOUND:
        case DatabaseStatus::FAILED_BAD_FORMAT:
            code = RPC_WALLET_NOT_FOUND;
            break;
        case DatabaseStatus::FAILED_ALREADY_LOADED:
            code = RPC_WALLET_ALREADY_LOADED;
            break;
        default: // RPC_WALLET_ERROR is returned for all other cases.
            break;

Further operation of the wallet RPCs are detailed in their man pages, but one thing to take note of is that whilst loadwallet() (and unloadwallet()) both take a wallet_name argument, the other wallet RPCs do not. Therefore in order to control a specific wallet from an instance of bitcoin{d|-qt} that has multiple wallets loaded, bitcoin-cli must be called with the -rpcwallet argument, to specify the wallet which the action should be performed against, e.g. bitcoin-cli --rpcwallet=your_wallet_name getbalance

CWallet

The CWallet object is the fundamental wallet representation inside Bitcoin Core. CWallet stores transactions and balances and has the ability to create new transactions. CWallet also contains references to the chain interface for the wallet along with storing wallet metadata such as nWalletVersion, wallet flags, wallet name and address book.

CWallet creation

The CWallet constructor takes a pointer to the chain interface for the wallet, a wallet name and a pointer to the underlying WalletDatabase:

src/wallet/wallet.h
/** Construct wallet with specified name and database implementation. */
CWallet(interfaces::Chain* chain, const std::string& name, std::unique_ptr<WalletDatabase> database)
    : m_chain(chain),
      m_name(name),
      m_database(std::move(database))
{
}

The constructor is not called directly, but instead from the public function CWallet::Create(), which is in turn itself called from CreateWallet(), LoadWallets() (or TestLoadWallet()). In addition to the arguments required by the constructor, CWallet::Create() also has a wallet_flags argument. Wallet flags are represented as a single unit64_t bit field which encode certain wallet properties:

src/wallet/walletutil.h
enum WalletFlags : uint64_t {
    WALLET_FLAG_AVOID_REUSE = (1ULL << 0),
    WALLET_FLAG_KEY_ORIGIN_METADATA = (1ULL << 1),
    WALLET_FLAG_DISABLE_PRIVATE_KEYS = (1ULL << 32),
    WALLET_FLAG_BLANK_WALLET = (1ULL << 33),
    WALLET_FLAG_DESCRIPTORS = (1ULL << 34),
    WALLET_FLAG_EXTERNAL_SIGNER = (1ULL << 35),
};

See src/wallet/walletutil.h for additional information on the meanings of the wallet flags.

CWallet::Create() will first attempt to create the CWallet object and load it, returning if any errors are encountered. If CWallet::Create is creating a new wallet — on its 'first run' — the wallet version and wallet flags will be set, before either LegacyScriptPubKeyMan or DescriptorScriptPubKeyMan's are setup, depending on whether the WALLET_FLAG_DESCRIPTORS flag was set on the wallet.

Following successful creation, various bitcoind program arguments are checked and applied to the wallet. These include options such as "-addresstype", "-changetype", "-mintxfee" and "-maxtxfee" amongst others. It is at this stage that warnings for unusual or unsafe values of these arguments are generated to be returned to the user.

After the wallet is fully initialised and setup, its keypool will be topped up before the wallet is locked and registered with the validationinterface, which will handle callback notifications generated during the (optional) upcoming chain rescan. The rescan is smart in detecting the wallet "birthday" using metadata stored in the ScriptPubKeyMan and won’t scan blocks produced before this date:

src/wallet/wallet.cpp#CWallet::Create()
...

chain.initMessage(_("Rescanning...").translated);
walletInstance->WalletLogPrintf("Rescanning last %i blocks (from block %i)...\n", *tip_height - rescan_height, rescan_height);

// No need to read and scan block if block was created before
// our wallet birthday (as adjusted for block time variability)
std::optional<int64_t> time_first_key;
for (auto spk_man : walletInstance->GetAllScriptPubKeyMans()) {
    int64_t time = spk_man->GetTimeFirstKey();
    if (!time_first_key || time < *time_first_key) time_first_key = time;
}
if (time_first_key) {
    chain.findFirstBlockWithTimeAndHeight(*time_first_key - TIMESTAMP_WINDOW, rescan_height, FoundBlock().height(rescan_height));
}

{
    WalletRescanReserver reserver(*walletInstance);
    if (!reserver.reserve() || (ScanResult::SUCCESS != walletInstance->ScanForWalletTransactions(chain.getBlockHash(rescan_height), rescan_height, {} /* max height */, reserver, true /* update */).status)) {
        error = _("Failed to rescan the wallet during initialization");
        return nullptr;
    }
}

...

Finally, the walletinterface is setup for the wallet before the WalletInstance is returned to the caller.

ScriptPubKeyManagers (SPKM)

Each wallet contains one or more ScriptPubKeyManagers, who are in control of storing the scriptPubkeys managed by that wallet.

A CWallet in the general sense therefore becomes "a collection of ScriptPubKeyManagers", which are each managing an address type. In the current implementation, this means that a default (descriptor) wallet consists of 6 ScriptPubKeyManagers, one for each of combination of {legacy | p2sh | bech32} for {receive | change} addresses.

src/wallet/wallet.cpp#SetupLegacyScriptPubKeyMan()
void CWallet::SetupLegacyScriptPubKeyMan()
{
    if (!m_internal_spk_managers.empty() || !m_external_spk_managers.empty() || !m_spk_managers.empty() || IsWalletFlagSet(WALLET_FLAG_DESCRIPTORS)) {
        return;
    }

    auto spk_manager = std::unique_ptr<ScriptPubKeyMan>(new LegacyScriptPubKeyMan(*this));
    for (const auto& type : OUTPUT_TYPES) {
        m_internal_spk_managers[type] = spk_manager.get();
        m_external_spk_managers[type] = spk_manager.get();
    }
    m_spk_managers[spk_manager->GetID()] = std::move(spk_manager);
}
Tip
SetupLegacyScriptPubKeyMan() as shown above really only has a single SPKM which is then aliased and shared between all output types.

Compare this to the equivalent descriptor wallet code fragment which sets up an SPKM for each output type:

src/wallet/wallet.cpp#SetupDescriptorScriptPubKeyMans()
...

for (bool internal : {false, true}) {
    for (OutputType t : OUTPUT_TYPES) {
        auto spk_manager = std::unique_ptr<DescriptorScriptPubKeyMan>(new DescriptorScriptPubKeyMan(*this, internal));
        if (IsCrypted()) {
            if (IsLocked()) {
                throw std::runtime_error(std::string(__func__) + ": Wallet is locked, cannot setup new descriptors");
            }
            if (!spk_manager->CheckDecryptionKey(vMasterKey) && !spk_manager->Encrypt(vMasterKey, nullptr)) {
                throw std::runtime_error(std::string(__func__) + ": Could not encrypt new descriptors");
            }
        }
        spk_manager->SetupDescriptorGeneration(master_key, t);
        uint256 id = spk_manager->GetID();
        m_spk_managers[id] = std::move(spk_manager);
        AddActiveScriptPubKeyMan(id, t, internal);
    }
}

...

Script pubkey managers are stored inside CWallet in a map according to output type:

src/wallet/wallet.h
class CWallet final : public WalletStorage, public interfaces::Chain::Notifications
{
private

// ...

    std::map<OutputType, ScriptPubKeyMan*> m_external_spk_managers;
    std::map<OutputType, ScriptPubKeyMan*> m_internal_spk_managers;

// ...
}
Tip
"external" and "internal" (SPKMs) refer to whether the addresses generated are designated for giving out "externally" and receiving new payments to, or for "internal" change addresses.

Prior to c729afd0 the equivalent SPKM functionality (fetching new addresses and signing transactions) was contained within CWallet itself, now being split out for better maintainability and upgradability brought by modularisation as per the wallet box class structure changes. The ultimate effect of this is that the CWallet object itself no longer handles keys and addresses.

The change to a CWallet made up of (multiple) {Descriptor|Legacy}ScriptPubKeyMan's is also sometimes referred to as the "Wallet Box" model, where each SPKM is thought of as a distinct (black?) "box" within the wallet, which can be called upon to perform new address generation and signing functions.

Keys in the wallet

Legacy wallet keys

Legacy wallets used the "keypool" model which stored a bunch of keys. See src/wallet/scriptbpubkeyman.h#L52-L100 for historical context on the "keypool" model.

The wallet would then simply iterate over each public key and generate a create scriptPubKey (a.k.a. pubkey script) and address for each type of script the wallet supported. However this approach has a number of shortcomings (from least to most important):

  1. One key could have multiple addresses

  2. It was difficult to sign for multisig

  3. Adding new script functionality required adding new hardcoded script types into the wallet code for each new type of script.

Such an approach was not scalable in the long term and so a new format of wallet needed to be introduced.

Descriptor wallet keys

Descriptor wallets instead store output script "descriptors". These descriptors can be of any script type, including arbitrary scripts (which might be "unknown" to the wallet), and mean that wallets can deterministically generate addresses for any type of valid descriptor, as desired by the user.

Descriptors not only contain what is needed to generate an address, they also include all the data needed to "solve" (i.e. spend from) them, i.e. create a valid scriptSig (knowledge about which redeemScripts and witnessScripts needed). The document Support for Output Descriptors in Bitcoin Core provides more details and examples of these output descriptors.

IsMine

The wallet needs a way to determine whether a transaction it learns about belongs to it. When a new transaction is learned about (either entering into the mempool or in a new block) the wallet is notified through the CValidationInterface. This will call the function CWallet:SyncTransaction() which will in turn call CWallet::AddToWalletIfInvolvingMe(). AddToWalletIfInvolvingMe() will then call IsMine() on each output in the transaction, checking the return code to see if a transaction belongs to our wallet.

Note

IsMine historically was located outside of the wallet code, but now takes a more logical position as a member function of CWallet which returns an isminetype value from an enum.

More information on the IsMine semantics can be found in release-notes-0.21.0.md#ismine-semantics.

Constructing transactions

In order to construct a transaction the wallet will validate the outputs, before selecting some coins to use in the transaction. This involves multiple steps and we can follow an outline of the process by walking through the sendtoaddress RPC command, which returns by calling SendMoney(), shown below:

src/wallet/rpcwallet.cpp#SendMoney()
UniValue SendMoney(CWallet& wallet, const CCoinControl &coin_control, std::vector<CRecipient> &recipients, mapValue_t map_value, bool verbose)
{
    EnsureWalletIsUnlocked(wallet);

    // This function is only used by sendtoaddress and sendmany.
    // This should always try to sign, if we don't have private keys, don't try to do anything here.
    if (wallet.IsWalletFlagSet(WALLET_FLAG_DISABLE_PRIVATE_KEYS)) {
        throw JSONRPCError(RPC_WALLET_ERROR, "Error: Private keys are disabled for this wallet");
    }

    // Shuffle recipient list
    std::shuffle(recipients.begin(), recipients.end(), FastRandomContext());

    // Send
    CAmount nFeeRequired = 0;
    int nChangePosRet = -1;
    bilingual_str error;
    CTransactionRef tx;
    FeeCalculation fee_calc_out;
    const bool fCreated = wallet.CreateTransaction(recipients, tx, nFeeRequired, nChangePosRet, error, coin_control, fee_calc_out, true);
    if (!fCreated) {
        throw JSONRPCError(RPC_WALLET_INSUFFICIENT_FUNDS, error.original);
    }
    wallet.CommitTransaction(tx, std::move(map_value), {} /* orderForm */);
    if (verbose) {
        UniValue entry(UniValue::VOBJ);
        entry.pushKV("txid", tx->GetHash().GetHex());
        entry.pushKV("fee_reason", StringForFeeReason(fee_calc_out.reason));
        return entry;
    }
    return tx->GetHash().GetHex();
}

After initialisation SendMoney() will call wallet.CreateTransaction() (CWallet::CreateTransaction()) followed by wallet.CommitTransaction() if successful. If we follow wallet.CreateTransaction() we see that this is a public wrapper function which in its turn calls private member function CWallet::CreateTransactionInternal().

CreateTransactionInternal

It is inside CreateTransactionInternal() that a change address of an "appropriate type" is fetched, where "appropriate" means that it should try to minimise revealing that it is a change address, for example by being a different type to the other outputs. Once a suitable change address is selected A new ReserveDestination object is created which keeps track of reserved addresses to prevent address re-use.

Tip
The address is not "fully" reserved until GetReservedDestination() is called later.

Next some basic checks on the requested transaction parameters are carried out (e.g. sanity checking of amounts and recipients) by looping through each pair of recipient : amount. After initializing a new transaction (txNew), a fee calculation (feeCalc) and variables for the transaction size, we enter into a new code block where the cs_wallet lock is acquired and the nLockTime for the transaction is set:

src/wallet/wallet.cpp#CWallet::CreateTransactionInternal()
...

CMutableTransaction txNew;
FeeCalculation feeCalc;
CAmount nFeeNeeded;
std::pair<int64_t, int64_t> tx_sizes;
int nBytes;
{
    std::set<CInputCoin> setCoins;
    LOCK(cs_wallet);
    txNew.nLockTime = GetLocktimeForNewTransaction(chain(), GetLastBlockHash(), GetLastBlockHeight());
        {
            std::vector<COutput> vAvailableCoins;
            AvailableCoins(vAvailableCoins, true, &coin_control, 1, MAX_MONEY, MAX_MONEY, 0);

    ...

Bitcoin Core chooses to set nLockTime to the current block to discourage fee sniping.

Tip

We must acquire the lock here because we are about to attempt to select coins for spending, and optionally reserve change addresses.

If we did not have the lock it would be possible for the wallet to construct two transactions which attempted to spend the same coins, or which used the same change address.

AvailableCoins

After this, a second new code block is entered where "available coins" are inserted into a vector of COutputs named vAvailableCoins. The concept of an "available coin" is somewhat complex, but roughly it excludes:

  1. "used" coins

  2. coins which do not have enough confirmations (differs for own change)

  3. coins which are part of an immature coinbase (< 100 confirmations)

  4. coins which have not entered into our mempool

  5. coins which are already being used to (attempt) replacement of other coins

This call to AvailableCoins() is our first reference back to the underlying ScriptPubKeyMans controlled by the wallet. The function iterates over all coins belonging to us — found in the CWallet.mapWallet mapping — checking coin availability before querying for a SolvingProvider (ultimately calling GetSigningProvider()): essentially querying whether the active CWallet has a ScriptPubKeyMan which can sign for the given output.

src/wallet/wallet.cpp#CWallet::GetSolvingProvider()
std::unique_ptr<SigningProvider> CWallet::GetSolvingProvider(const CScript& script, SignatureData& sigdata) const
{
    for (const auto& spk_man_pair : m_spk_managers) {
        if (spk_man_pair.second->CanProvide(script, sigdata)) {
            return spk_man_pair.second->GetSolvingProvider(script);
        }
    }
    return nullptr;
}

Below is shown a subsection of the AvailableCoins() function which illustrates available coins being added to the vAvailableCoins vector, with the call to GetSolvingProvider() visible.

Note

Even if a SigningProvider is found, a second check is performed to see if the coin is "spendable" — by calling IsSolvable().

The reason for this is that whilst getSolvingProvider() might return a SigningProvider (read: SPKM), not all SPKMs will be able to provide private keys needed for signing transactions, e.g. in the case of a watch-only wallet.

Finally after we have determined solvablility, "spendability" is calculated for each potential output along with any coin control limitations:

src/wallet/wallet.cpp#AvailableCoins()
    ...

    for (unsigned int i = 0; i < wtx.tx->vout.size(); i++) {

        ...

        std::unique_ptr<SigningProvider> provider = GetSolvingProvider(wtx.tx->vout[i].scriptPubKey);

        bool solvable = provider ? IsSolvable(*provider, wtx.tx->vout[i].scriptPubKey) : false;
        bool spendable = ((mine & ISMINE_SPENDABLE) != ISMINE_NO) || (((mine & ISMINE_WATCH_ONLY) != ISMINE_NO) && (coinControl && coinControl->fAllowWatchOnly && solvable));

        vCoins.push_back(COutput(&wtx, i, nDepth, spendable, solvable, safeTx, (coinControl && coinControl->fAllowWatchOnly)));

        // Checks the sum amount of all UTXO's.
        if (nMinimumSumAmount != MAX_MONEY) {
            nTotal += wtx.tx->vout[i].nValue;

            if (nTotal >= nMinimumSumAmount) {
                return;
            }
        }

        // Checks the maximum number of UTXO's.
        if (nMaximumCount > 0 && vCoins.size() >= nMaximumCount) {
            return;
        }

        ...

See the full CWallet::AvailableCoins() implementation for additional details and caveats.

CreateTransactionInternal continued

After available coins have been determined, we check to see if the user has provided a custom change address "used coin control", or whether the previously not-fully-reserved change address should finally be reserved (and selected) by calling GetReservedDestination(). The change outputs' size, discard_free_rate and effective_fee_rate are then calculated. The discard_fee_rate refers to any change output which would be dust at the discard_rate, and that you would be willing to discard completely and add to fee (as well as continuing to pay the fee that would have been needed for creating the change).

Coin selection

Now that we have a vector of available coins, and our fee rate settings estimated, we are ready to start coin selection itself. This is still an active area of research, with two possible coin selection solving algorithms currently implemented:

  1. Branch and bound ("bnb")

  2. Knapsack

The branch and bound algorithm is well-documented in the codebase itself:

src/wallet/coinselection.cpp
/*
This is the Branch and Bound Coin Selection algorithm designed by Murch. It searches for an input
set that can pay for the spending target and does not exceed the spending target by more than the
cost of creating and spending a change output. The algorithm uses a depth-first search on a binary
tree. In the binary tree, each node corresponds to the inclusion or the omission of a UTXO. UTXOs
are sorted by their effective values and the trees is explored deterministically per the inclusion
branch first. At each node, the algorithm checks whether the selection is within the target range.
While the selection has not reached the target range, more UTXOs are included. When a selection's
value exceeds the target range, the complete subtree deriving from this selection can be omitted.
At that point, the last included UTXO is deselected and the corresponding omission branch explored
instead. The search ends after the complete tree has been searched or after a limited number of tries.

The search continues to search for better solutions after one solution has been found. The best
solution is chosen by minimizing the waste metric. The waste metric is defined as the cost to
spend the current inputs at the given fee rate minus the long term expected cost to spend the
inputs, plus the amount the selection exceeds the spending target:

waste = selectionTotal - target + inputs × (currentFeeRate - longTermFeeRate)

The algorithm uses two additional optimizations. A lookahead keeps track of the total value of
the unexplored UTXOs. A subtree is not explored if the lookahead indicates that the target range
cannot be reached. Further, it is unnecessary to test equivalent combinations. This allows us
to skip testing the inclusion of UTXOs that match the effective value and waste of an omitted
predecessor.

The Branch and Bound algorithm is described in detail in Murch's Master Thesis: https://murch.one/wp-content/uploads/2016/11/erhardt2016coinselection.pdf

@param const std::vector<CInputCoin>& utxo_pool The set of UTXOs that we are choosing from.
       These UTXOs will be sorted in descending order by effective value and the CInputCoins'
       values are their effective values.
@param const CAmount& target_value This is the value that we want to select. It is the lower
       bound of the range.
@param const CAmount& cost_of_change This is the cost of creating and spending a change output.
       This plus target_value is the upper bound of the range.
@param std::set<CInputCoin>& out_set -> This is an output parameter for the set of CInputCoins
       that have been selected.
@param CAmount& value_ret -> This is an output parameter for the total value of the CInputCoins
       that were selected.
@param CAmount not_input_fees -> The fees that need to be paid for the outputs and fixed size
       overhead (version, locktime, marker and flag)
*/

You can read a little more about the differences between these two coin selection algorithms on this StackExchange answer.

Coin selection is performed as a loop, as it may take multiple iterations to select the optimal coins for a given transaction.

Multiwallet

Work on the multiwallet project means that Bitcoin Core can now handle dynamic loading and unloading of multiple wallets while running.

Validation Interface

TODO

COutput

TODO

HWI

Relation to consensus soft forks

Much of the meat of the recently soft-forked changes (e.g. Taproot) reside not inside consensus code, but rather require improvements to the wallet.

Removed text

  • When adding new wallet features which will be included in the GUI, it can be good practice to first implement them as RPC commands because it’s easier to create good test coverage for them.

  • Advanced transaction signature operations (e.g. signature aggregation, sighash flags) happen in the wallet code.

Concepts

  • Wallet architecture

  • key management

    • HD wallets

    • Output script descriptors

  • Separation of wallet and node functionality

  • Key Management

  • Transaction Construction

    • Taproot

    • SegWit

    • Bech32

    • PSBT

    • Coin selection

    • CPFP

    • RBF

    • Transaction batching

    • Adaptor signatures

  • Multiwallet

  • Hardware wallet interface (HWI)

  • QT

GUI

This document was created from Bitcoin Core at commit 4b5659c6b115315c9fd2902b4edd4b960a5e066e

The GUI has its own separate repo at bitcoin-core/gui. Pull requests which primarily target the GUI should be made here, and then they will get merged into the primary repo. Developer Marco Falke created an issue in his fork which detailed some of the rationale for the split, but essentially it came down to:

  1. Separate issue and patch management

  2. More focused review and interests

  3. Maintain high quality assurance

He also stated that:

Splitting up the GUI (and splitting out modules in general) has been brought up often in recent years. Now that the GUI is primarily connected through interfaces with a bitcoin node, it seems an appropriate time to revive this discussion.

— Marco Falke

PR 19071 contained the documentation change now contained in the Bitcoin Core primary repository, along with details of the monotree approach that was ultimately taken. The documentation change provides guidance on what a "GUI change" is:

As a rule of thumb, everything that only modifies src/qt is a GUI-only pull request. However:

  • For global refactoring or other transversal changes the node repository should be used.

  • For GUI-related build system changes, the node repository should be used because the change needs review by the build systems reviewers.

  • Changes in src/interfaces need to go to the node repository because they might affect other components like the wallet.

For large GUI changes that include build system and interface changes, it is recommended to first open a pull request against the GUI repository. When there is agreement to proceed with the changes, a pull request with the build system and interfaces changes can be submitted to the node repository.

— src/CONTRIBUTING.md

On a related note, another issue was recently opened by Marco, to discuss the possibility of instituting the same monotree changes for wallet code.

Building the GUI

bitcoin-qt, which is the GUI version of the node software, is built automatically when the build dependencies are met. Required packages can be found in the build instructions in src/doc/build-*.md as appropriate for your platform. If you have the required packages installed but do not wish to build the bitcoin-qt then you must run ./configure with the option --with-gui=no.

Note

If the build is configured with --enable-multiprocess then additional binaries will be built:

  1. bitcoin-node

  2. bitcoin-wallet

  3. bitcoin-gui

Qt

We can see how the Qt directory is related to the rest of the codebase from its directory dependency graph:

dir f0c29a9f5764d78706f34c972e8114d8 dep

Developers would ideally like to reduce these dependencies even further.

Qt documentation

There is useful documentation for developers looking to contribute to the Qt side of the codebase found at Developer Notes for Qt Code.

Main GUI program

The loading point for the GUI is src/qt/main.cpp. main() calls GuiMain() from src/qt/bitcoin.cpp, passing along any program arguments with it. GuiMain starts by calling SetupEnvironment() which amongst other things, configures the runtime locale and charset.

Next an empty NodeContext is setup, which is then populated into a fully-fledged node interface via being passed to interfaces::MakeNode(), which returns an interfaces::Node. Recall that in Wallet component initialisation we also saw the wallet utilising a NodeContext as part of its WalletInitInterface. In both cases the NodeContext is being used to pass chain and network references around without needing to create globals.

After some QT setup, command-line and application arguments are parsed. What follows can be outlined from the code comments:

  1. Application identification

  2. Initialization of translations, so that intro dialog is in user’s language

  3. Now that settings and translations are available, ask user for data directory

  4. Determine availability of data directory and parse bitcoin.conf

  5. Determine network (and switch to network specific options)

  6. URI IPC sending

  7. Main GUI initialization

GUI initialisation

After configuration the GUI is initialised. Here the Node object created earlier is passed to app.SetNode() before a window is created and the application executed.

The bulk of the Qt GUI classes are defined in src/qt/bitcoingui.{h|cpp}.

QML GUI

Since writing this documentation focus has been directed towards re-writing the Qt code leveraging the Qt QML framework. This will allow developers to create visually-superior, and easier to write and reason-about GUI code, whilst also lowering the barriers to entry for potential new developers who want to be able to focus on GUI code.

The recommendation therefore is to familiarise yourself with Qt QML and review the current codebase for the latest developments. You can follow along with the latest QML work in the specific bitcoin-core/qml-gui repo.

Bitcoin design

The Bitcoin design guide provides some guidance on common pitfalls that Bitcoin GUI designers should look out for when designing apps (like bitcoin-qt).

onboarding-to-bitcoin-core's People

Contributors

willcl-ark avatar adamjonas avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.