Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuzzy entity query in Wikidata with Sparql times out

I'm trying to do a fuzzy (ie.. partial or case-insensitive) entity label lookup in Wikidata with Sparql (via the online endpoint). Unfortunately these return a "QueryTimeoutException: Query deadline is expired." I'm assuming this is because the query is returning too many results to run through the filter in Wikidata's 1 minute timeout.

Here's the specific query:

def findByFuzzyLabel(self, item_label):
    qstring = '''
        SELECT ?item WHERE {
            ?item rdfs:label ?label .
            FILTER( lcase(str(?label)) = "%s")
        }
        LIMIT 20
        ''' % (item_label)
    results = self.query(qstring)

Is there a way to do a partial string and/or case-insensitive label lookup on Wikidata's entity labels or will I need to do this offline on a download of raw data?

I'm looking to match labels such as "Lindbergh" to "Charles Lindbergh" and also handle case insensitivity in some instances. Any suggestions how to do this, whether via Sparql or offline in Python are appreciated.

like image 375
bivouac0 Avatar asked Sep 06 '25 00:09

bivouac0


1 Answers

You can now use the MediaWiki API directly from SPARQL, using a Wikidata magic service as documented here.

Example :

SELECT * WHERE {
  SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:api "EntitySearch" .
      bd:serviceParam wikibase:endpoint "www.wikidata.org" .
      bd:serviceParam mwapi:search "cheese" .
      bd:serviceParam mwapi:language "en" .
      ?item wikibase:apiOutputItem mwapi:item .
      ?num wikibase:apiOrdinal true .
  }
  ?item (wdt:P279|wdt:P31) ?type
} ORDER BY ASC(?num) LIMIT 20
like image 120
mhham Avatar answered Sep 08 '25 23:09

mhham