[help] Delete all char except english alphabet

Advanced Renamer forum
#1 : 03/11-20 19:29
Mihai
Mihai
Posts: 2
Hello,

I am struggling with a simple task I think, I searched all internet did not find any way to do it.
I have some kind of playout list that loads some media from a folder.
The problem is when I have non-English char it refuses to load.
So I need to make a BAT with "Advanced renamer". (nice feature by the way)

what it mean none English characters,
anything except this:
{[A-Z],[a-z],"-","_","+"," ","!","?",",",";")}

so I need to remain only English alphabet and some basic punctuation marks

So I need to delete all char except the array mentioned.

If it cant not do this is really not that advance.

Thanks.







03/11-20 19:29
#2 : 03/11-20 21:14
David Lee
David Lee
Posts: 572
The answer should be obvious if you read the User Guide:
www.advancedrenamer.com/user_guide/regular_expresions

Replace: [^A-Za-z-_+ !?,;]
With: leave blank
Use regular expressions

However be aware that "?" is not allowed in Windows filenames.

If you need to keep decimal digits as well then you can use the Replace string: [^\w-+ !?,;]


03/11-20 21:14
#3 : 03/11-20 22:42
Mihai
Mihai
Posts: 2
Reply to #2:

Yea you are right found a solution after I posted the thread,

I was a little mad because I thought I needed some crazy javascript for that because I had also stupid emojis in the filenames.

I wrote something like this [^AĂÂBCDEFGHIÎJKLMNOPQRSȘTȚUVWXYZaăâbcdefghiîjklmnopqrsștțuvwxyz -_!?]


but your solution is more cleaner

so I modified according,
1. Replace Romanian with list replace Ă=A, Ț=T ...
2. Replace: [^A-Za-z-_+ !,;]
3. Trim the text

That nail it,
Thanks so much.




03/11-20 22:42
#4 : 04/11-20 10:13
David Lee
David Lee
Posts: 572
Reply to #3:
Mihai

You may find the regex \p{L} useful - this meta-character will match any letter from any language.

so Replace: [^\p{L}-_+ !,;] will do the same as [^A-Za-z-_+ !,;] but also retain all your Romanian diacritic characters.

Note: \p{x} matches any Unicode character in class x - and x=L is the Unicode letter class.
see www.regular-expressions.info/unicode.html

David


04/11-20 10:13 - edited 05/11-20 15:42