How to get started learning web5

July 29, 2022

The internet predictably went berserk on June 10th, when Jack Dorsey announced plans to build “web5”. To be precise, web5 would be the name of a new product launched by TBD, a Bitcoin-focused subsidiary of Block. But precision hardly mattered amid what amounted to a civilization-scale test of humans’ ability to process satire.

Surely, at some level, the name web5 was intended as a joke. And yet, the technology behind what Dorsey dubbed web5 is in fact very real, and may significantly shape the future of the internet.

How do I know? Well, by dumb luck I started exploring web5 late last year. I’ve read the seminal web5 literature, tested out web5 products, hacked on web5 code, and traveled across the country to meet members of the web5 community.

Of course, the subject of my exploration wasn’t called web5 at the time. And by most accounts, it isn’t called web5 today either. But thanks to Jack Dorsey’s publicity stunt, the term has attracted some attention to the topic of “decentralized identity” and "decentralized web nodes", the actual un-memeified technologies powering TBD’s work.

But names aside (and I'll stick with "web5" for sake of this post), what are all of these things, and how can you get started learning about them?

Problems with web2 and web3

It may help to start off by explaining what problem web5 is attempting to solve. Most functionality on today's internet (a.k.a. "web2") is built using simple client-server architecture. Web applications are generally designed as server-based applications, to run by the software provider. And end-users interact with those servers via client-side applications, often run in a web browser.

The client-server architecture is simple, and allows end-users to access software without having to run any of their own infrastructure. But that simplicity comes with downsides for both users and software providers.

The first downside is a lack of data portability. In the client-server architecture, each application is by default an information silo. If any given internet user (let's say "Alice") uses ten different apps, that will result in ten different representations of Alice, even though Alice is only one actual person. This situation is clearly inconvenient to Alice, who has to maintain her account information in ten different places. But it's also costly to the software providers, all ten of whom need to onboard Alice from scratch -- even if her identity and reputation are already well-established by other services Alice uses.

The second downside is lack of privacy. With all data stored server-side, users have effectively no control over how their personal information is used. Although an application's "terms of service" is a binding legal document, typical internet users don't have time to read them -- and moreover, they lack visibility into whether software providers are adhering to them. Although this dynamic is occasionally advantageous to software providers, those advantages are often outweighed by the cost of securing their data and complying with the latest data privacy regulations. In other words, nobody wins.

It's a software engineering truism that all architectures have advantages and disadvantages. The presence of disadvantages doesn't mean that using a particular architecture is a mistake. But as the importance of the internet grows, it's worth asking: do the tradeoffs we've made in the past continue to make sense? And if not, what are our alternatives?

Web3 has already shown us one potential solution to web2's data portability problem. By using the public blockchain as a sort of global storage layer, web3 applications allow users to bring their data with them from one application to the next. No more onboarding forms -- just "Connect with Metamask" and you're done.

But web3 fails quite spectacularly on the privacy front. After all, data on the public blockchain is just that: public. The same mechanism that allows web3 apps to seamlessly share data also exposes the same data to ~8 billion other humans. Granted, that radical transparency makes web3 an exciting social experiment. And lack of privacy is arguably less of a problem for users who remain pseudonymous, identified only by their crypto wallet addresses and monkey GIF's. But it also makes the standard web3 architecture unfit for use cases that require any measure of privacy.

Today's web3 has a number of other shortcomings that make it a poor successor to web2. To name a few: lack of a consent mechanism for transaction recipients, lack of support for key rotation or account recovery, and challenges with verifiability of NFT's. For a more thorough discussion, I'd refer you to this talk at ETH Denver '22 by Disco founder Evin McMullen, and this blog post by Signal founder Moxie Marlinspike, reflecting on his first impressions with web3.

All told, web3 hints at a better internet, but doesn't get us there. It gives its users frictionless data portability and strong guarantees of censorship-resistance. But it does so by sacrificing privacy, limiting its potential for use in mainstream applications.

Enter web5

Luckily, it's possible to keep the good parts of web3 while improving on its privacy properties. And as you probably saw coming, thats what web5 is all about.

Web5 empowers users to retain physical possession of their data by giving them the digital equivalent of a wallet. This approach will sound familiar to web3 users who are accustomed to using a digital wallets to store their private keys. But web5 takes it to the next level, allowing users to store much more than cryptographic keys. Just as in the physical world, the web5 digital wallet is a place to store ID cards, membership cards, payment cards, key cards, professional or educational credentials, and so on -- really anything you might need to show as "proof" as you go about your day-to-day.

Conveniently, this approach solves the data portability problem from the outset. If users maintain the master copy of their own data, there is, logically speaking, no need to keep it updated elsewhere.

But giving users physical possession of their data is the easy part. The hard part is figuring out how to build working software under this paradigm, preserving the sort of functionality we enjoy in web2 and web3. Broadly speaking, two problems need to be solved to make user-held data usable for building real applications:

Communication. In web5, end-users directly mediate access to their data. This is a radical departure from both web2 and web3, both of which allow developers to assume the data they need is just a network call away. Developers of web5 applications will need a way to locate, securely communicate, and request specific information from their users.
Data integrity. If users are in physical possession of their data, they can also easily modify it. Both web2 and web3 come with some data integrity safeguards. Databases can be secured at multiple levels, and blockchains provide their own data integrity guarantees. For use cases where data integrity is a concern, web5 applications will need their own mechanisms to guard against data tampering.

How web5 works

The pioneers of web5 have worked hard over recent years to solve those exact problems. Today, there is a wide array of technical standards, software libraries, and commercial products available for developers who want to get started building.

Although it's too early to say what exact technologies will eventually gain traction, there is general agreement on high-level architecture and components needed to make web5 work. Here are the key concepts:

Agents / DWN's. In web5, a user's "agent" performs two essential functions: it stores their data, and it communicates with other agents to provide or request data. Agents can generally be categorized as either "wallets" or "cloud agents". Wallets are designed to be client-side applications, typically implemented as mobile apps or browser extensions. Cloud agents are web services, designed to be IP addressable and available 24/7 to respond to inbound requests. The TBD team's work is centered around Decentralized Web Nodes (DWN's), which is a new cloud agent specification.
DID's & DID documents. Each entity in web5, human or otherwise, is identified by one or more globally unique "decentralized identifiers" or DID's. Each DID is typically associated with an asymmetric cryptographic key pair. Entities can use their private keys to prove control over the corresponding DID. Additionally, DID's can be used to establish a secure communication channel between two DID holders. In order to communicate, DID holders must first exchange "DID documents" containing each DID's public key and the URL of its cloud agent.
Trust registries. A trust registry is basically a directory of DID holders, along with associated metadata about those entities. It's typically used in the process of "DID resolution", or looking up a DID document for a given DID holder. Not all web5 use cases require a trust registry, since in many situations DID holders can directly exchange DID documents. When trust registries are required, the implementation often uses blockchain, or a "layer 2" protocol on top of it. Blockchain-based technologies are a natural fit for this purpose because their tamper-proof properties help guard against man-in-the-middle style attacks during the DID resolution process.
Verifiable credentials. Verifiable credentials (VC's) are web5's answer to the data integrity risk inherent in having users store their own data. A VC is a piece of data created and signed by an "issuer", attesting to attributes of the recipient, or "holder". The holder can then present the VC to a "verifier", who can use the issuer's signature to confirm that the data has not been tampered with. This three-party arrangement between issuer, holder, and verifier is often described as a "trust triangle" (pictured below).

Put together, these primitives enable a new level of privacy and trust on the web. End-users benefit from fine-grained control over what information they share. Businesses and other organizations benefit by tapping into knowledge of their users's existing trust relationships. We get the portability properties of web3, and better privacy properties than web2 or web3.

But we said earlier that all architectures have advantages and disadvantages. So, what's the catch with web5? Of course, web5's not-so-hidden weakness is its complexity. The diagram below shows the basic architecture of web2, web3, and web5 side by side:

To put it mildy: this complexity won't make it any easier for web5 to gain traction in the real world. Developers will have to learn new architectural concepts and protocols, users will have to adjust to new interaction patterns, and tech companies will have to offer new products using commercial models that don't yet exist. And as Firefox CTO Eric Rescorla points out at the end of his recent blog post exploring web5, we also must face the difficult task of defining domain-specific protocols on top of web5's more general-purpose scaffolding.

For those reasons and more, web5 is not inevitable. But the work of bringing it to life is already underway, and doesn't seem to be slowing down.

Down the rabbit hole

For those interested in exploring web5 further, I've organized some of the references on the topic that I’ve found helpful. I’ve made no attempt to be comprehensive, and instead just did my best to assemble a diverse and high signal-to-noise list. It should be plenty to get started!

Learning resources

Becoming a Hyperledger Aries Developer. The only free online course on the topic of decentralized identity I know of. It’s quite good, despite being text-only. The course teaches fundamental decentralized identity concepts, and illustrates them via hands-on tutorials based on the Hyperledger ecosystem.
Identosphere. The best newsletter out there dedicated exclusively to decentralized identity. It’s written by Infominer and Kaliya Young (a.k.a. “Identity Woman”), an expert, author, and community builder in the space.
Self-Sovereign Identity. A comprehensive overview of “self-sovereign identity”, which is another term you’ll see thrown around interchangeably with “decentralized identity”. The book covers all aspects of the space, including technical architecture and standards, as well as real-world applications and implementation considerations. Chapters are contributed by various experts in the space.
SSI Orbit. Interviews with various builders and other stakeholders in the space. Hosted by Mathieu Glaude, Founder/CEO of Northern Block, a Canada-based self-sovereign identity solutions company.
DID fundamentals and deep dive. An understandable and thorough explanation of decentralized identifiers by Drummond Reed, an early pioneer in the space.

Demos

Animo. This interactive demo by Animo guides visitors through a simple workflow using decentralized identifiers and verifiable credentials. No technical knowledge required.
British Columbia Government. The BC government is at the forefront of adopting decentralized identity technology, and has made several demos available to the public.
Aries Framework JavaScript. This more technical demo walks through running a decentralized identity agent in your terminal, and having them connect and communicate with each other over DIDComm.

Communities and events

Internet Identity Workshop. A twice yearly gathering, with an agenda typically centering around decentralized identity topics. The workshop follows an “unconference” format, meaning attendees determine the agenda somewhat on the fly. I went in April and it was great! The organizers also host occasional online-only events.
Decentralized Identity Foundation. A non-profit geared toward advancing decentralized identity technology. They host regular working groups to discuss and advance standards used by multiple stakeholders in the ecosystem. Meetings are open to the public.
Trust Over IP Foundation. A group within the Linux Foundation, focused on promoting organized collaboration within the space. Their focus tends to be more on implementation of trust ecosystems, and a bit less on low-level technical standards.
W3C Credentials Community Group. A W3C group focused on developing standards to support the creation, storage, verification, and exchange of verifiable credentials.

Companies

Trinsic. A SaaS-based developer toolkit for building verifiable credential ecosystems. Recently raised a funding round that included angels from the Okta and Auth0 founding teams. Probably the easiest way to build functionality based on verifiable credentials today, as a developer.
Indicio. Another company building end-to-end solutions based on decentralized identity. They’re one of many solutions shops, but are notable for their dedication to open source, and for significant contributions to open standards.
Spruce. Building open source tools for developing decentralized identity based solutions. Focused on serving the web3 ecosystem and public blockchain applications. Also led development of the Sign in with Ethereum standard and reference implementation.
Ceramic Network. Building a data storage toolkit for decentralized applications. Their tech makes use of decentralized identifiers to associate stored data with specific users or blockchain accounts. To see it in action, check out this live coding demo by Nader Dabit.
Cheqd. Building infrastructure and accompanying token-based incentive mechanisms for verifiable credential ecosystems. Bootstrapping an ecosystem can be a difficult coordination challenge between potential issuers, holders, and verifiers. Cheqd hopes a grease payment in the form of their $CHEQ token will help get people on board faster.

Standards and specs

Decentralized Identifiers. W3C standard for decentralized identifiers, the foundational primitive for uniquely identifying an individual user or entity. The standard leaves a great deal of room for the definition of “DID methods” that may differ widely in implementation, as described further in W3C’s DID primer.
Verifiable Credentials. W3C standard for “verifiable credentials”, the foundational primitive for cryptographic proofs about a DID holder. Although the standard is widely known, it does not actually account for all variation in credential formats used by implementers. This lack of real-world standardization presents challenges for achieving interoperability between implementations.
DIDComm. A protocol for secure, stateful communication between DID holders. DIDComm's "statefulness" makes it useful for defining higher-level, domain-specific protocols. It is most commonly used for orchestrating credential exchanges between agents.
Decentralized Web Node (DWN). Specification for a web service that can act on behalf of individual internet users or other DID holders. Enables development of custom service endpoints that can accept requests on behalf of the user who controls the DWN. Although the TBD team hasn’t said so explicitly, this standard is likely to be the basis for their Web5 product.
Verifiable Legal Entity Identifier (vLEI). A framework for uniquely identifying legal entities and their representatives. This is a great example of a the sort of domain-specific specification that can be built using verifiable credentials and decentralized identity as primitives.

Open source projects

ACA-Py. A cloud agent implementation based on the Hyperledger Aries specification. Acts on behalf of one or more DID holders to send and receive messages and credentials using DIDComm.
dwn-sdk-js. Partial implementation of the Decentralized Web Node spec built by TBD. Appears to be a work in progress, but should be usable.
Veramo. A general purpose Javascript toolkit for building with decentralized identity and verifiable credentials. Highly modular, allowing developers to build applications using various DID methods, protocols, credential formats, etc.
Ion. A Bitcoin compatible “Level 2” network built specifically for anchoring DID information on the blockchain, without the disadvantages of using the Bitcoin network directly. Implements the Sidetree protocol, which is itself not Bitcoin-specific.

Twitter accounts

Kaliya Young. A decentralized identity expert, author, and community builder. Tweets about developments across the ecosystem, including standards, product development efforts, and real-world deployments and use cases.
Kim Hamilton Duffy. Decentralized identity engineer, currently leading identity standards at Centre, a consortium focused on governance and standards for decentralized finance. Tweets on decentralized identity tech.
Evin McMullen. Founder of Disco.xyz, a social profile for the decentralized internet. Although Disco is still pre-launch, Evin has established herself as a champion of verifiable credentials as an alternative to NFT’s in the web3 ecosystem.
Dan Buchner. Currently head of decentralized identity at Block, Dan is an editor of the original DWN spec, as well as a contributor to TBD’s open source work. Follow him on Twitter for high-signal decentralized web takes, and more than you (probably) want to know about “voluntaryist libertarianism”.
Drummond Reed. Early pioneer of the self-sovereign identity space, Drummond was Chief Trust Officer at Evernym, one of the first companies to commercialize verifiable credentials. Tweets about stuff happening in the broader decentralized identity community.