I'm unable to find a way to match all extended alphabet characters without doing so explicitly. For example, matching the tag språk.
tag = "språk"
tag:match([[%w+]])
This doesn't work because å is not contained within %w. This can be matched with tag:match([[[%wå]+]]), but then you have to explicitly add all special.
One can also extend the range. This works tag:match([[[a-å]+]]), but I'm not 100% clear on why, or at least not where that range actually covers in the character table.
So what is the correct way to match a range that includes all ascii plus all latin extended?
The best solution I've come up with so far is:
tag = "språk"
tag:match([[[a-zA-ZÀ-ÿ]+]])
But I'm still unsure if that is completely correct, and it would be ideal if there is a shortcut class for this I'm simply overlooking.
I will suggest how to make a set of some characters from additional Latin letters - 1. By analogy, you can make sets for the necessary sets (Latin Extended A,B,C,D,E).
------------------------ just generate set Latin-1 Supplement
local set = ""
for x = 0x80, 0xBF do
set = set .. string.char("0xC3", string.format("0x%x",x) )
end
print(set)
--------------------------
--- get it from print above
local ex = [[ÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ]]
-- By analogy you can get Extended Latin A:
-- local ext_latin_A = [[ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſÍ]]
tag = "språk"
print("-----")
print( tag:match("[%w".. ex .."]+") )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With