Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter 2 huge list with millions of item in it with same id [duplicate]

Here is my 2 list with more than millions of item. Both has same items with same ID. ID is in String. I need only the item which is not same ID.I did this way. But I am sure there must be a better solution and with high permanence:-

    List<Transaction> differentList = new ArrayList<>();

    for(Transaction tx : foundTransactions ){
        for(Transaction aTx : ArchivedTransactions) 
        {
            if(!tx.getId().equalsIgnoreCase(aTx.getId()) ){
                differentList .add(tx);
            }
        }
    }

I tried to use stream but I couldn't do that. I guess with stream API should be better. Please suggest me any improvements.

like image 598
Masi Boo Avatar asked Oct 14 '25 17:10

Masi Boo


1 Answers

You can try converting it to a HashMap first, something like:

Set<String> collect = ArchivedTransactions.stream().map(i -> i.getId().toLowerCase())
                                           .collect(Collectors.toSet());

for(Transaction tx : foundTransactions )
    if(!collect.contains(tx.getId()))
       differentList.add(tx);

The Collectors.toSet() returns a HashSet. You can simplify the code to:

Set<String> collect = ArchivedTransactions.stream().map(i -> i.getId().toLowerCase())
                                          .collect(Collectors.toSet());

List<Transaction> differentList = foundTransactions.stream()
                                                   .filter(tx -> !collect.contains(tx.getId()))
                                                   .collect(Collectors.toList())

Adding the IDs first into a HashSet as an intermediate step will provide you with a much better overall complexity time since (source):

Time Complexity of HashSet Operations: The underlying data structure for HashSet is hashtable. So amortize (average or usual case) time complexity for add, remove and look-up (contains method) operation of HashSet takes O(1) time.

Consequently, the overall time complexity of the "HashMap" solution will be O(N + M), where N and M begin the number of elements in the lists ArchivedTransactions and foundTransactions, respectively. Nonetheless, space-wise you will pay the price of having that extra structure.

Your solution space-wise is better, but with a worst time complexity. If N = M the time complexity of your solution is O(N^2), whereas the solution with the HashSet would be O(2N), hence O(N). This is a huge difference.

Doing just

Set<Transaction> result = new LinkedHashSet<>();
result.addAll(foundTransactions);
result.addAll(ArchivedTransactions);

alone will not work, because you explicitly requested:

!tx.getId().equalsIgnoreCase(aTx.getId())
like image 137
dreamcrash Avatar answered Oct 17 '25 08:10

dreamcrash