How can I keep a data when ON DUPLICATE KEY executes?

Question

Here is the scenario: I'm trying to make a mechanism on some text comments. For example I want to calculate the most used words in some comments. Here is my code:

function cleanWord( &$word ){
    $word = trim($word, "'\".!<>{}()-/\*&^%$#@+~ ");
}

// list of comments
$arr_str =  [
            "  this!! is     the first &test message./",
            "*Second ^message this (is) ",
            "'another\ **message*** !\"}& it is. also the favorite one (message)."
            ];

// To join array's items      
$str = implode(" ", $arr_str);

// To chop the string based on the space
$words = explode(" ",$str);

// To remove redundant character(s)
array_walk($words, 'cleanWord');

// To remove empty array elements
$words = array_filter($words);

print_r($words);

/* Output:
Array
(
    [2] => this
    [3] => is
    [8] => the
    [9] => first
    [10] => test
    [11] => message
    [12] => Second
    [13] => message
    [14] => this
    [15] => is
    [17] => another
    [18] => message
    [20] => it
    [21] => is
    [22] => also
    [23] => the
    [24] => favorite
    [25] => one
    [26] => message
)

As you see in the fiddle, $words contains an array include of all words from those comments. I also have a table in database that I insert words in it like this:

foreach( $words as $word ){
    $db->query("INSERT INTO words (word) 
                       VALUES $word
                ON DUPLICATE KEY UPDATE used_num = used_num + 1");
                -- there is a unique index on "word" column
}

/* Output:
// words
+----+----------+----------+
| id |   word   | used_num |
+----+----------+----------+
| 1  | this     | 2        |
| 2  | is       | 3        |
| 3  | the      | 2        |
| 4  | first    | 1        |
| 5  | test     | 1        |
| 6  | message  | 4        |
| .  | .        | .        |
| .  | .        | .        |
| .  | .        | .        |
+----+----------+----------+

Then I select the most used words like this:

SELECT * FROM words
ORDER BY used_num DESC
LIMIT $limit

What's my question?! In reality, that array looks like this:

$arr_str =  [
               ["  this!! is     the first &test message./", "Jack", "1488905152"],
               ["*Second ^message this (is) ", "Peter", "1488901178"],
               ["'another\ **message*** !\"}& it is. also the favorite one (message).", "John", "1488895116"]
            ];

As you see, each comment also has both an author and an published-time. Now I want:

to make a filtering system based on that unix-timestamp. (For example getting most used words between x and y times)
to make a list of authors for each word. (For example, the word of "message" is used 4 times in those comments. Now I want to access a list of those comments' authors, i.e. [Jack, Peter, John])

Have you any suggestion about the algorithm of implementing these^ ?

Xorifelse · Accepted Answer

You can use regex to clean the words:

$comments = [
  "  this!! is     the first &test message./",
  "*Second ^message this (is) ",
  "'another\ **message*** !\"}& it is. also the favorite one (message)."
];

foreach($comments as $k => $str){
  preg_match_all('/([a-zA-Z]+)/', $str, $matches);
  $exploded[] = $matches[0];
}

print_r($exploded);

However, you want to attach data to each "word", you'd have to add a table first. Your table has a primary key for each word, good because we dont want to store excess data.

Now for a second table (worddata):

+----+----------+-----------+
| id |  wordid  | commentid |
+----+----------+-----------+
| 1  | 1        | 2         |
+----+----------+-----------+
          |          \-> refers to the primary key of the comments table
          |
          -> refers to 'this'

Now I'm presuming you have a table where all comments are stored (going to call it comments), which are linked to a time of posting and have a author id.

In essence, fill this table like so:

SELECT comments_id, comments_text FROM comments

Filter your words and insert them into the table:

INSERT INTO worddata (wordid, comment_id)

I'd recommend to use a temporarily table for this because each word in each comment should have its own row which could sum up to a lot of data. The wd.wordid = 1 should refer to the word 'this' according to your wordlist table.

You can select all the comments between dates if that value is already known and only insert the words from those comments.

Now you can join table data:

SELECT c.id, c.userid, c.created
FROM `comments` as c
  JOIN `worddata` as wd on wd.commentid = c.userid
WHERE wd.wordid = 1

Now this example should return all the comments id's where the word is this. If you want to filter by author you should change or add c.userid = # to the WHERE clause. Selecting between dates can be done with c.created > NOW() - 3600 for the comments in the last hour.

Of course you can select more data if needed but again this is more of a join example than a copy paste-able code.

How can I keep a data when ON DUPLICATE KEY executes?

Tags:

algorithm

php

mysql

Shafizadeh

1 Answers

Xorifelse

Recent Activity

Donate For Us

How can I keep a data when ON DUPLICATE KEY executes?

Tags:

algorithm

php

mysql

Shafizadeh

1 Answers

Xorifelse

Related questions

Recent Activity

Donate For Us