Overview of DBpedia Datasets
DBpedia is a cross-domain knowledge graph that is created from Wikipedia infoboxes. The most recent version of the main DBpedia (i.e., DBpedia 2015-10, based on dumps from October 2015) contains 4.7 million entities and 153 million statements about these entities. The DBpedia ontology consists of 791 classes and 2835 relations.
The DBpedia Dump 2015-10 consists of several datasets.
- one "standard" English version (en version)
- several localised versions for other languages (i18n versions)
The URIs contained in the standard English version were constructed by going through all English Wikipedia articles. The i18n versions were created for Wikipedia articles written in other languages. The localised versions use IRIs to identify things.
Principles of DBpedia Datasets Design
The DBpedia dump has a very modular design. Triples are grouped in different datasets according to their contents (e.g. links to homepages of persons, titles of all Wikipedia articles).
Here is a list of core DBpedia datasets.
- labels_en.ttl (1.48GB): rdfs:labels of all English DBpedia URIs.
- labels_en_uris_anotherlang.ttl: cross-language links between the English URIs and IRIs of other languages.
- mappingbased_literals_en.ttl (2.26GB): statements that have literal properties and extracted from infoboxes using mapping-based extraction.
- mappingbased_objects_en.ttl (2.46GB): statements that have object properties and extracted from infoboxes using mapping-based extraction.
Accessing DBpedia Dumps
All dumps of DBpedia is accessible from the DBpedia download server. You can download the whole DBpedia 2015-10 dump using wget.
wget -r -np http://downloads.dbpedia.org/2015-10/
The core directory in the server, which contains all datasets loaded into the public DBpedia SPARQL Endpoint.comments powered by Disqus