I am attempting to filter out company tickers from possible list of tickers.
Following code is what I got so far, I need to make the RegExp sophisticated enough that only certain pattern is passed. See example code below for more specific details.
Pattern tickerPattern = Pattern.compile("^[A-Z:\\.0-9]+$");
String[] tickerStrArr={
"JELK90#$", // NOT A TICKER
"1", // NOT A TICKER
"0", // NOT A TICKER
"R", // NOT A TICKER
"25.36", // NOT A TICKER
"1.0", // NOT A TICKER
"GOOG", // Ticker
"NYSE:C", // Ticker (with exchange code NYSE)
"GOOG.BY", // Ticker (with exchange code BY)
"$90", // NOT A TICKER
"98774", // Ticker (because more than 4 digit long)
"789.BY" // Ticker (because ends with .[A-Z]{2,2})
};
for(String tickerStr: tickerStrArr)
{
Matcher matcher =tickerPattern.matcher(tickerStr);
if(matcher.find())
{
System.out.println("It's a ticker=>"+matcher.group());
}
}
Expected output
It's a ticker=>GOOG
It's a ticker=>NYSE:C
It's a ticker=>GOOG.BY
It's a ticker=>98774
It's a ticker=>789.BY
Can you formulate required RegExp which will get the above expected output?
Here are rules for my own filtering (not necessarily applicable to everyone)
Only Character or Numbers could be part of ticker, no special char or currency symbol.
Sometimes tickers are mentioned along with their exchange code as prefix For example => NYSE:C Or there could be two character exchange code as suffix For Example => C.BY
If it is all digit then it should be more than 4 digits. (this is to rule out millions of False positives)
But, if digits are mentioned along with exchange code then ticker could be less than 4 digits. Because, then we have high confidence.
Let me know if you need more details.
The following regex will match the following.
. somewhere later. This is to detect an invalid symbol with multiple exchange symbols. follow by a-z exactly 2 times. ^
(?<PreXChangeCode>[a-z]{2,4}:(?![a-z\d]+\.))?
(?<Stock>[a-z]{1,4}|\d{1,3}(?=\.)|\d{4,})
(?<PostXChangeCode>\.[a-z]{2})?
$
I tested it out with REY and it correctly matches your test data with the exception for R. I included one character stock names since those are valid (R is Ryder Systems).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With