I'm trying to do a fuzzy (ie.. partial or case-insensitive) entity label lookup in Wikidata with Sparql (via the online endpoint). Unfortunately these return a "QueryTimeoutException: Query deadline is expired." I'm assuming this is because the query is returning too many results to run through the filter in Wikidata's 1 minute timeout.
Here's the specific query:
def findByFuzzyLabel(self, item_label):
qstring = '''
SELECT ?item WHERE {
?item rdfs:label ?label .
FILTER( lcase(str(?label)) = "%s")
}
LIMIT 20
''' % (item_label)
results = self.query(qstring)
Is there a way to do a partial string and/or case-insensitive label lookup on Wikidata's entity labels or will I need to do this offline on a download of raw data?
I'm looking to match labels such as "Lindbergh" to "Charles Lindbergh" and also handle case insensitivity in some instances. Any suggestions how to do this, whether via Sparql or offline in Python are appreciated.
You can now use the MediaWiki API directly from SPARQL, using a Wikidata magic service as documented here.
Example :
SELECT * WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch" .
bd:serviceParam wikibase:endpoint "www.wikidata.org" .
bd:serviceParam mwapi:search "cheese" .
bd:serviceParam mwapi:language "en" .
?item wikibase:apiOutputItem mwapi:item .
?num wikibase:apiOrdinal true .
}
?item (wdt:P279|wdt:P31) ?type
} ORDER BY ASC(?num) LIMIT 20
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With