Fuzzy entity query in Wikidata with Sparql times out

Question

I'm trying to do a fuzzy (ie.. partial or case-insensitive) entity label lookup in Wikidata with Sparql (via the online endpoint). Unfortunately these return a "QueryTimeoutException: Query deadline is expired." I'm assuming this is because the query is returning too many results to run through the filter in Wikidata's 1 minute timeout.

Here's the specific query:

def findByFuzzyLabel(self, item_label):
    qstring = '''
        SELECT ?item WHERE {
            ?item rdfs:label ?label .
            FILTER( lcase(str(?label)) = "%s")
        }
        LIMIT 20
        ''' % (item_label)
    results = self.query(qstring)

Is there a way to do a partial string and/or case-insensitive label lookup on Wikidata's entity labels or will I need to do this offline on a download of raw data?

I'm looking to match labels such as "Lindbergh" to "Charles Lindbergh" and also handle case insensitivity in some instances. Any suggestions how to do this, whether via Sparql or offline in Python are appreciated.

mhham · Accepted Answer

You can now use the MediaWiki API directly from SPARQL, using a Wikidata magic service as documented here.

Example :

SELECT * WHERE {
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:api "EntitySearch" .
      bd:serviceParam wikibase:endpoint "www.wikidata.org" .
      bd:serviceParam mwapi:search "cheese" .
      bd:serviceParam mwapi:language "en" .
      ?item wikibase:apiOutputItem mwapi:item .
      ?num wikibase:apiOrdinal true .
  }
  ?item (wdt:P279|wdt:P31) ?type
} ORDER BY ASC(?num) LIMIT 20

Fuzzy entity query in Wikidata with Sparql times out

Tags:

sparql

wikidata

bivouac0

1 Answers

mhham

Recent Activity

Donate For Us

Fuzzy entity query in Wikidata with Sparql times out

Tags:

sparql

wikidata

bivouac0

1 Answers

mhham

Related questions

Recent Activity

Donate For Us