UniProt
UniProt is the best protein database for manually curated annotations.
Using biomaRt to retrieve data from UniProt
Installation
source("http://bioconductor.org/biocLite.R")
biocLite("biomaRt")
library("biomaRt")
Generalities
One can list available databases for biomaRt.
uniProt <- useMart(biomart = "unimart")
listDatasets(uniProt)
## dataset description version
## 1 uniprot uniprot
Constructing a request
Selecting the UniProt dataset, updating the Mart that was just created.
uniProt <- useDataset("uniprot", mart = uniProt)
Alternatively, if the dataset is known already, it can be specified during the Mart creation.
uniProt <- useMart(biomart = "unimart", dataset = "uniprot")
Available filters and attributes can be accessed through corresponding methods.
listFilters(uniProt)
## name description
## 1 superregnum_name Superregnum name
## 2 proteome_name Complete proteome
## 3 accession Accession
## 4 protein_name Protein
## 5 length_greater Length >
## 6 length_smaller Length <
## 7 protein_evidence Protein existence
## 8 embl_id EMBL IDs
## 9 arrayexpress_id ArrayExpress IDs
## 10 ensembl_id Ensembl IDs
## 11 pdbsum_id PDBSum IDs
## 12 intact_id IntAct IDs
## 13 interpro_id InterPro IDs
## 14 go_id Gene Ontology IDs
## 15 gene_name Gene name
## 16 entry_type Entry type
## 17 organelle organelle
## 18 plasmid_f Plasmid
listAttributes(uniProt)
## name description
## 1 accession Accession
## 2 name Entry name
## 3 protein_name Protein name
## 4 gene_name Gene name
## 5 organism Organism
## 6 protein_evidence Protein existence
## 7 entry_type Status
## 8 go_id GO ID
## 9 go_name GO name
## 10 db2go_p__dm_primary_id GO ID(p)
## 11 db2go_p__dm_description GO name
## 12 db2go_f__dm_description GO name (F)
## 13 db2go_f__dm_primary_id GO ID (F)
## 14 db2go_c__dm_primary_id GO ID (C)
## 15 db2go_c__dm_description GO name (C)
## 16 embl_id EMBL IDs
## 17 ensembl_id Ensembl IDs
## 18 interpro_id InterPro IDs
## 19 pdbsum_id PDBSum IDs
## 20 pdb_id PDB IDs
## 21 arrayexpress ArrayExpress IDs
## 22 pride_id PRIDE IDs
## 23 interact_id IntAct IDs
## 24 comments Comments
## 25 ec_number Ec number
## 26 keyword Keyword
## 27 plasmid_name Plasmid name
## 28 organelle_name organelle name
Filters are the building blocks of requests.
Attributes are what can be returned by the requests.
Examples
From a UniProt AC, request the Protein name (ID) and Gene Ontology IDs and names.
uniProt <- useMart(biomart = "unimart", dataset = "uniprot")
getBM(attributes = c("name", "go_id", "go_name"), values = "E9HCD7", filter = "accession",
mart = uniProt)
## name go_id go_name
## 1 NNRE_DAPPU GO:0052856 F:NADHX epimerase activity
## 2 NNRE_DAPPU GO:0046872 F:metal ion binding
## 3 NNRE_DAPPU GO:0000166 F:nucleotide binding

This work by Celine Hernandez is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.