Project

High-Likelihood Peptide Combination Library

Published June 2026.

The High-Likelihood Peptide Combination Library is a public, machine-readable catalog of computationally enumerated peptide combinations. It concentrates enumeration in the regions of sequence space most likely to yield functional peptides — the dense neighborhoods around known active scaffolds — and publishes each candidate in a form that people, search engines, and machines can inspect and reproduce.

The public library currently spans 147 peptide packages across twelve static releases. Each row describes a materialized peptide identity and exact form variant using deterministic identifiers, hashes, shard locations, and source provenance. The purpose is to make these high-likelihood combinations openly enumerable and reproducible, not to recommend any biological use.

169,650,046 Enumerated combinations

27,726,923 Backbone sequences

147 Uploaded packages

48.83 GB Public library data

Why this exists

Most of peptide sequence space is inert. The combinations worth examining cluster tightly around scaffolds already known to fold and function. This library enumerates those high-likelihood neighborhoods exhaustively — every close variant of a known active scaffold — rather than sampling sequence space uniformly, so the catalog stays dense exactly where biological relevance is most probable.

Publishing the enumeration in bulk, with enough structure that any reader can identify exactly what was catalogued and where it lives, turns a private search problem into a public, inspectable resource. The project is not a legal, medical, or clinical opinion and does not assert that any row is safe, active, or fit for any use.

What is in the library

Canonical L peptide identity records.
Dense edit-distance-2 neighborhoods around selected source scaffolds.
Linear and configured exact-form peptides, including selected disulfide, pyroglutamate, N-acetyl, C-amide, salt, and counterion forms.
JSON, Parquet, FASTA, hash manifests, source references, and sequence-to-shard indexes.
Per-package wrapper pages and machine-readable upload manifests.

How each record is specified

Every row is a complete structural specification of one molecule, not a bare string. The record is meant to stand on its own terms: a reader skilled in peptide chemistry can determine exactly which compound is described and, using only routine and well-established methods, make and verify it.

Each record fixes:

The full sequence in canonical one-letter form, with explicit length.
Stereochemistry, defaulting to canonical L unless a residue is marked otherwise.
N- and C-terminal modifications such as pyroglutamate, N-acetyl, and C-amide where present.
Side-chain and cross-link state, including specified disulfide connectivity.
Salt and counterion form where a configured form is disclosed.
A deterministic identifier and a content hash that name this exact compound and no other.

Because each enumerated member carries its own deterministic identifier and content hash, the library specifies every member individually and exactly. It is not an undifferentiated genus: a given target molecule either matches a catalogued identifier exactly or it does not.

Reproducing and verifying a record

Releases are deterministic. Given a source scaffold and the published enumeration parameters, the same record set is regenerated in the same canonical form and validates against the package hash manifest, so any reader can independently confirm what was catalogued and when.

To check whether a specific molecule is in the public record:

Canonicalize the target into the same sequence-plus-form representation used here.
Compute its content hash with the published hashing scheme.
Look the hash up in the package hash manifest and the sequence-to-shard index.

A match establishes that the exact compound, in that exact form, was publicly and verifiably catalogued as of the release date. The members are accessible by methods already within ordinary skill — solid-phase synthesis for linear and configured peptides, recombinant expression for longer backbones, with established terminal and disulfide chemistries — so no novel or operational protocol is required, and none is published here.

Public releases

The releases are hosted as static files. Every package has a wrapper page, package manifest, hash manifest, artifact index, citation metadata, sample assets, and shard index.