IPB

Welcome Guest ( Log In | Register )

 
Reply to this topicStart new topic
> Problem with stripping text
Mike_nl
post Aug 17 2016, 12:22
Post #1


Member


Group: Full Members
Posts: 388
Joined: 16-February 08
From: SE-Asia
Member No.: 6480
Mp3tag Version: 2.83f



I am writing a program in VB.NET (Personal Project) and need your guys help with Regex (or another solution), I have been thinking for a solution for days but I can't solve it.

A line which can looks like this (and maybe more)

QUOTE
"ABCDEFGHIJK ~ AB Ab Cdefgh"
"ABCDEFGHIJK AB Abcdefhijklmnopq"
"ABCDEF-GHIJKLMNO Abcdefgh"
"AB.CDEFGHIJKL Mnop"
"ABCD/EFGHIJ A abcdef"
"ABCD EFGH I Abcd efgh
"ABCD-EFGHIJ A abcdef"
"AB.CDEFGHIJ A abcdef"
"ABC-DEFGHIJ a Abcdef"


But I only want the 1st Part so

QUOTE
ABCDEFGHIJK
ABCDEFGHIJK
ABCDEF-GHIJKLMNO
AB.CDEFGHIJKL
ABCD/EFGHIJ
ABCD EFGH
ABCD-EFGHIJ
AB.CDEFGHIJ
ABC-DEFGHIJ


The only common standard about the format is that the text I want starts after a " chr(34) but after that there are (unfortunately) No more common identifiers (only that text I want is in ALL CAPS [A-Z] wink.gif

I can't seem to find a solution to this problem, the text file formatting is just to messed up and to do it manually would take me days and days (60.000 Lines)

This post has been edited by Mike_nl: Aug 17 2016, 14:38


--------------------
Have fun while it lasts ;)

Go to the top of the page
 
+Quote Post
ohrenkino
post Aug 18 2016, 06:27
Post #2


Member


Group: Full Members
Posts: 9138
Joined: 9-December 09
From: Norddeutschland / Northern Germany
Member No.: 11458
Mp3tag Version: 2.84d



ZITAT(Mike_nl @ Aug 17 2016, 13:22) *
... No more common identifiers...

In the examples, IMHO the blank is the common separator.
So
$regexp('ABCDEFGHIJK ~ AB Ab Cdefgh"',(.*?) .*,$1)
returns
ABCDEFGHIJK


--------------------
42 - wie war die Frage / what was the question / quelle était la question
Go to the top of the page
 
+Quote Post
Mike_nl
post Aug 18 2016, 07:37
Post #3


Member


Group: Full Members
Posts: 388
Joined: 16-February 08
From: SE-Asia
Member No.: 6480
Mp3tag Version: 2.83f



Thanks for the Reply

But 1 thing.

That would fail if the text has a space in between. For example.

"ABCD EFGH I Abcd efgh

As I would like to get as Result ABCD EFGH

And the regex would only return ABCD

And in "the Regex Coach" the (.*?) doesn't give any matches sad.gif

Attached Image


Edit: I came up with this (please don't laugh as I am ABSOLUTELY not a REGEX guy wink.gif )

[A-Z]*(\s)?(-)?(/)?[A-Z]*

Attached Image


But that already fails if the text has Ü (Umlaut) in it wink.gif or will with the text (as it grabs tooo Much)

ABCD EFGH I (and I only want ABCD EFGH)

This post has been edited by Mike_nl: Aug 18 2016, 07:51


--------------------
Have fun while it lasts ;)

Go to the top of the page
 
+Quote Post
ohrenkino
post Aug 18 2016, 08:08
Post #4


Member


Group: Full Members
Posts: 9138
Joined: 9-December 09
From: Norddeutschland / Northern Germany
Member No.: 11458
Mp3tag Version: 2.84d



ZITAT(Mike_nl @ Aug 18 2016, 08:37) *
Thanks for the Reply

But 1 thing.

That would fail if the text has a space in between. ...

That is true, but in your example none of the result examples has a string in it with a blank ... so I can only be as good as the example is.

Perhaps you have to go through the list twice:
first cut all the bits in lower case, then deal with ones in mixed case.

This regexp will leave over anything that starts with capitals:
$regexp('ABCD EFÜGH I Abcd efgh',(?-i)(\u.*) .*,$1)

and this one
$regexp('ABCD EFÜGH I Abcd efgh',(?-i)(\u.*) \u\l.*,$1)

chucks off all strings that start with a captital and continue with a lower case.


--------------------
42 - wie war die Frage / what was the question / quelle était la question
Go to the top of the page
 
+Quote Post
Mike_nl
post Aug 18 2016, 08:12
Post #5


Member


Group: Full Members
Posts: 388
Joined: 16-February 08
From: SE-Asia
Member No.: 6480
Mp3tag Version: 2.83f



QUOTE (ohrenkino @ Aug 18 2016, 14:08) *
That is true, but in your example none of the result examples has a string in it with a blank ... so I can only be as good as the example is.

<snip>


Are you sure wink.gif ???

Example line Nr 6

QUOTE
"ABCD EFGH I Abcd efgh


And I want as result

QUOTE
ABCD EFGH


Thanks again, will try those Regexes. Great help !!

Good one about stripping first the lower cases and then the mixed cases !!

Edit: I most certainly could do it manually, but this list changes sometimes once a week or sometimes twice and then a long while it doesn't change, so Id rather do it "semi auto"

This post has been edited by Mike_nl: Aug 18 2016, 08:15


--------------------
Have fun while it lasts ;)

Go to the top of the page
 
+Quote Post
ohrenkino
post Aug 18 2016, 08:15
Post #6


Member


Group: Full Members
Posts: 9138
Joined: 9-December 09
From: Norddeutschland / Northern Germany
Member No.: 11458
Mp3tag Version: 2.84d



ZITAT(Mike_nl @ Aug 18 2016, 09:12) *
Are you sure wink.gif ???
...

Not any more ...


--------------------
42 - wie war die Frage / what was the question / quelle était la question
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 21st October 2017 - 04:06