UniProt

UniProt is the best protein database for manually curated annotations.

Using biomaRt to retrieve data from UniProt

Installation

source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library("biomaRt")

Generalities

One can list available databases for biomaRt.

uniProt <- useMart(biomart = "unimart")
listDatasets(uniProt)
##   dataset description version
## 1 uniprot     uniprot

Constructing a request

Selecting the UniProt dataset, updating the Mart that was just created.

uniProt <- useDataset("uniprot", mart = uniProt)

Alternatively, if the dataset is known already, it can be specified during the Mart creation.

uniProt <- useMart(biomart = "unimart", dataset = "uniprot")

Available filters and attributes can be accessed through corresponding methods.

listFilters(uniProt)
##                name       description
## 1  superregnum_name  Superregnum name
## 2     proteome_name Complete proteome
## 3         accession         Accession
## 4      protein_name           Protein
## 5    length_greater          Length >
## 6    length_smaller          Length <
## 7  protein_evidence Protein existence
## 8           embl_id          EMBL IDs
## 9   arrayexpress_id  ArrayExpress IDs
## 10       ensembl_id       Ensembl IDs
## 11        pdbsum_id        PDBSum IDs
## 12        intact_id        IntAct IDs
## 13      interpro_id      InterPro IDs
## 14            go_id Gene Ontology IDs
## 15        gene_name         Gene name
## 16       entry_type        Entry type
## 17        organelle         organelle
## 18        plasmid_f           Plasmid
listAttributes(uniProt)
##                       name       description
## 1                accession         Accession
## 2                     name        Entry name
## 3             protein_name      Protein name
## 4                gene_name         Gene name
## 5                 organism          Organism
## 6         protein_evidence Protein existence
## 7               entry_type            Status
## 8                    go_id             GO ID
## 9                  go_name           GO name
## 10  db2go_p__dm_primary_id          GO ID(p)
## 11 db2go_p__dm_description           GO name
## 12 db2go_f__dm_description       GO name (F)
## 13  db2go_f__dm_primary_id         GO ID (F)
## 14  db2go_c__dm_primary_id         GO ID (C)
## 15 db2go_c__dm_description       GO name (C)
## 16                 embl_id          EMBL IDs
## 17              ensembl_id       Ensembl IDs
## 18             interpro_id      InterPro IDs
## 19               pdbsum_id        PDBSum IDs
## 20                  pdb_id           PDB IDs
## 21            arrayexpress  ArrayExpress IDs
## 22                pride_id         PRIDE IDs
## 23             interact_id        IntAct IDs
## 24                comments          Comments
## 25               ec_number         Ec number
## 26                 keyword           Keyword
## 27            plasmid_name      Plasmid name
## 28          organelle_name    organelle name

Filters are the building blocks of requests.
Attributes are what can be returned by the requests.

Examples

From a UniProt AC, request the Protein name (ID) and Gene Ontology IDs and names.

uniProt <- useMart(biomart = "unimart", dataset = "uniprot")
getBM(attributes = c("name", "go_id", "go_name"), values = "E9HCD7", filter = "accession", 
    mart = uniProt)
##         name      go_id                    go_name
## 1 NNRE_DAPPU GO:0052856 F:NADHX epimerase activity
## 2 NNRE_DAPPU GO:0046872        F:metal ion binding
## 3 NNRE_DAPPU GO:0000166       F:nucleotide binding

Creative Commons License
This work by Celine Hernandez is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.