How to remove Internet addresses - or entire word before and after character (or character set)

#1 : 22/08-22 06:28
I've looked through the forum, and can't find how to remove a whole word (or series of words) before a specific character.

We have a file system that spits out filenames, and appends the domain name of the source file.


Data.Reject - Code 01234ABC -
Data Reject - Code ABCxxx12345 -
Data Approve - Code Yz1.a -

Sometimes, we have had the filename append an additional sequential counter (but not always)


Data.Reject - Code 01234ABC - - 0001.doc
Data.Reject - Code 01234ABC - - 0002doc
Data Approve - Code Yz1.a - aaaz.xls

I need to just remove the domain name (I'll create separate rules for the root domains, so no need to incorporate all root domains - .com, .edu, .net, .au, .ca, etc...we don't have a lot of those, so it would be easier to just have a separate rule to search for the ".com" and then another rule to search for ".edu", etc.)

Target Filenames:

Data.Reject - Code 01234ABC.doc
Data Reject - Code ABCxxx12345.txt
Data Approve - Code Yz1.a.xls

Data.Reject - Code 01234ABC - 0001.doc
Data.Reject - Code 01234ABC - 0002doc
Data Approve - Code Yz1.a - aaaz.xls

I can figure out how to remove any trailing "-" and failure has been to be able to search for the root domain name (.com, .edu, etc) and then delete the ENTIRE word preceding the ".com" or ".edu". Additionally, if there is a subdomain that is present, I don't know how to have that included in the entire domain name to be removed. Since the domain name (and any subdomains) vary in length, how do I include those characters? Do I search for the whole word prior to ".com" until I hit a space, "-" or "_" and delete that, (and how do I search for that whole word) or is there a different approach? Is there a more efficient way?

I'm not a programmer by trade, and if you can help to explain it in a semi-layman's terms, that would be appreciated. But I also respect your time, and am not asking you to do all my work for me. ;) If you have an support document to point me to, instead of just answering my question completely, I would be grateful for even that. (If you would like to go farther, I won't complain.)

Bonus request: If we had a file:

Accept - - Code 12345ABC.xls

...I suppose that searching for "https://" and then deleting everything including and after that until I come to a space, or a "-" or "_" would be easy. Maybe this could be a function that you might include in a later release. "Remove Domain Address" as a rule type.

I also commend you on such a versatile and useful app. I've proposed that we buy a few licenses for different departments because of how helpful this is.

Thank you very much, David (or Forum).

"You're My Honeybunch, SugarPlumb"

