Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

removing non alaphanumeric character from end and starting of all columns

Tags:

sql

mysql

I have few tables having more than 20 columns and I have to

  1. Trim first
  2. Remove all non alphanumeric characters and non symbol ')' in the end of each column
  3. Remove all non alphanumeric characters and non symbol '(' from the 1st position of each column

If it is from the whole string I can simply use replace method, but in my case only in the first position and from the last position. So now I am using substring, checking for special characters and replacing with empty space. I felt like this is some kind of manual cleaning, I am sure this is not a elegant one.

Any quick approach (query) that helps me to clean up the data?

like image 219
Bujji Avatar asked Nov 22 '25 22:11

Bujji


1 Answers

Does the database have to be online for this? Faced with this problem, I would be thinking about dumping the data out to files to process with perl, awk or some other tool suited to text processing in this manner.

If that's not possible, another alternative is to construct the munging algorithm inside a FUNCTION that receives a VARCHAR, and returns the cleaned up string as a VARCHAR (NB untested code, explanatory only):

CREATE FUNCTION cleanup(instr VARCHAR(255)) RETURNS VARCHAR(255));
    DECLARE outstr VARCHAR(255);
    SET outstr = TRIM(instr);
    IF NOT (outstr REGEXP '^[[:alnum:][.left-parenthesis.]]');
        SET outstr = SUBSTRING(outstr,2);
    END IF;
    WHILE NOT (outstr REGEXP '[[:alnum:][.right-parenthesis.]]$' DO
        SET outstr = LEFT(outstr, LENGTH(outstr)-1);
    END WHILE;
    SELECT outstr;
END FUNCTION;

Then you can write a query that reads the system catalogs, ie information_schema.columns and generates the UPDATE statements required. Something along the lines of (untested):

SELECT CONCAT_WS(" ", "UPDATE", table_name, 
                 "SET", column_name, " = cleanup(", column_name, ")")
  FROM information_schema.columns
  WHERE table_schema = "your-database" AND collation_name IS NOT NULL

Save the output of that, check it, and run it.

The collation_name filter should ensure we limit this to only text-type fields. Again, this is untested, but should give you the general idea. You could even use GROUP_CONCAT to build a version of this that creates a single SQL statement per table rather than per column, but that's getting a bit fancy.

Obviously you would take a backup of the database before running anything that was going to perform such wide-ranging updates...

like image 128
RET Avatar answered Nov 24 '25 21:11

RET



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!