IPB

Welcome Guest ( Log In | Register )

> Notice!

Please take a minute to check our Frequently Asked Questions. Use Search to reveal possible related topics.

Also make sure you've read the Forum Guidelines before posting in this forum.

 
Reply to this topicStart new topic
> Extract multiple IDs from different URLs with regexp
Knabbakeks
post Sep 23 2010, 14:37
Post #1


Member


Group: Full Members
Posts: 12
Joined: 23-September 10
Member No.: 12975
Mp3tag Version: 2.50



Hi,

I'm new here and first i want to say thanks to the developer for this great program and the forum members for the awesome support.

Now here is my problem:

Over the years I have stored two species of URLs among others in the %www% field of my mp3 collection.

For Example it looks like:

CODE
http://www.mute.com/ http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:0cfqxq80ldde http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:3ifoxzysldde http://coverparadise.to/index.php?Module=ViewEntry&ID=7735 http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:gcfqxq9jld0e http://coverparadise.to/index.php?Module=ViewEntry&ID=69885 http://www.emimusic.de


My goal is to extract all IDs from the URLs to user defined fields with multiple values like

CODE
Field: ALLMUSICID
Value: 0cfqxq80ldde\\3ifoxzysldde\\gcfqxq9jld0e

Field: COVERPARADISEID
Value: 7735\\69885


My first try to solve this problem was the following regular expressions combined with functions:

CODE
$replace($trim($regexp($regexp(%www%,'http://www\.allmusic\.com/cg/amg\.dll\?p=amg&sql=10:\w+\s?',),'http://coverparadise\.to/index\.php\?Module=ViewEntry&ID=',)), ,\\)


This is for the field COVERPARADISEID. It removes first all instances of the allmusic URL including ID. Then it deletes all instances of the coverparadise URL not including ID. After that it trims space and finally all spaces were replaced by \\.

For the field ALLMUSICID analog:

CODE
$replace($trim($regexp($regexp(%www%,'http://coverparadise\.to/index\.php\?Module=ViewEntry&ID=\d+\s?',),'http://www\.allmusic\.com/cg/amg\.dll\?p=amg&sql=10:',)), ,\\)


The disadvantage of this method is that this only works corrctly if only allmusic and coverparadise urls are stored in the %www% field. In the case of the existence of another type of urls these were added to the user definded fields:

CODE
Field: ALLMUSICID
Value: http://www.mute.com/\\0cfqxq80ldde\\3ifoxzysldde\\gcfqxq9jld0e\\http://www.emimusic.de

Field: COVERPARADISEID
Value: http://www.mute.com/\\7735\\69885\\http://www.emimusic.de


A better sollution would be a expression that matches every instance of a species of urls regardless of their position and returns the IDs separated by space or \\. So I tried again and found this one for the allmusic field:

CODE
$regexp(%www%,'((http://www\.allmusic\.com/cg/amg\.dll\?p=amg&sql=10:)(\w+)\s?)+\s?.*','$3')


This expression matches only the ID of the first instance of the adress and leaves the following ones unconsidered.

CODE
Field: ALLMUSICID
Value: 0cfqxq80ldde


If someone could help me to figure out how to modify the expression that it matches all IDs from allmusic (or coverparadise) would be great.

Kind regards

Knabbakeks


--------------------
Morgen scheint die Sonne!
Go to the top of the page
 
+Quote Post
DazBYorks
post Sep 24 2010, 00:10
Post #2


Member


Group: Full Members
Posts: 47
Joined: 31-October 08
From: Yorkshire, U.K.
Member No.: 7818
Mp3tag Version: 2.55a



Hi,

I haven't played with mp3tag RegEx incarnation and assumes you may be storing more than one URL in a field;

CODE
http:.*(&sql=10:|&ID=)(\w+)


You could, of course, shorten the expression, but it is more readable to me.



Daz
Go to the top of the page
 
+Quote Post
DetlevD
post Sep 24 2010, 15:21
Post #3


Member


Group: Full Members
Posts: 4129
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.54



QUOTE (Knabbakeks @ Sep 23 2010, 15:37) *
... My goal is to extract all IDs from the URLs to user defined fields with multiple values like
CODE
Field: ALLMUSICID
Value: 0cfqxq80ldde\\3ifoxzysldde\\gcfqxq9jld0e
Field: COVERPARADISEID
Value: 7735\\69885

...
This is for the field COVERPARADISEID. It removes first all instances of the allmusic URL including ID. Then it deletes all instances of the coverparadise URL not including ID. After that it trims space and finally all spaces were replaced by \\.
...

So far there are some question.

1.
What happens with such a tag-field ...
Field: COVERPARADISEID
Value: 7735\\69885
... when the file has been saved?
Is this what you want?

2.
Do you know whether it is possible to create a tag-field by using the text content from another tag-field?
Example:
There is a tag-field ...
FIELD1="coverparadise"
... which can be modified to ...
FIELD1=$upper(%FIELD1%)'ID'
... giving ...
FIELD1=COVERPARADISEID
There is a tag-field ...
FIELD2="7735".

The goal is to create a new tag-field this way ...
Field: %FIELD1%
Value: %FIELD2%


If both questions can be answered to satisfaction, then there might be a way for automation to solve your request.
Otherwise there has to be build a sort of lookup table (manually to serve) in order to provide a relation between URL and tag-field name. Once such a lookup is installed, it should be possible to write mutiple values into the corresponding ID tag-field.

But this all needs full understanding and tricky usage of the Mp3tag features.
Mainly it is a challenge because of the lack of loop over an item list within a tag-field and other missing item related string functions.

Really I suggest, that you export your WWW field content into a text file, let the "dictionary work" (multiple values to one key) be done by another "full blown programming language" (or simply use a text editor, which at best provide macro operating) and import the result back into the file.
But even this approach has the need of manually "hard coding" the receiving tag field names (like "COVERPARADISEID") into a Formatstring when importing from "Textfile to Tag", and the import process has the need of human interaction too.

DD.20100924.1621.CEST


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
Knabbakeks
post Sep 26 2010, 17:43
Post #4


Member


Group: Full Members
Posts: 12
Joined: 23-September 10
Member No.: 12975
Mp3tag Version: 2.50



Thanks for your attention and reply!

QUOTE (DetlevD @ Sep 24 2010, 16:21) *
1.
What happens with such a tag-field ...
Field: COVERPARADISEID
Value: 7735\\69885
... when the file has been saved?
Is this what you want?

The field becomes a tag-field with multiple values. This is exactly what I want. First I aim to clean up and shorten the WWW field. Second I intend to use this for tools to browse all IDs with one klick. I changed the fieldnames ALLMUSICID to ALLMUSIC_ID and COVERPARADISEID to COVERPARADISE_ID.

this is the tool for Coverparadise:
CODE
$if(%coverparadise_id%,"$replace($trim($replace( $meta_sep(coverparadise_id, ), , http://coverparadise.to/?Module=ViewEntry&ID=)), ," ")",http://coverparadise.to/index.php?Module=ExtendedSearch&SearchString=$replace($if2(%band%,%artist%) $regexp(%album%,'\s*(\(|\[|\{).+?(\)|\]|\})',), ,+,&,%%26))

and this is for Allmusic:
CODE
$if(%allmusic_id%,"$replace($trim($replace( $meta_sep(allmusic_id, ), , http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:)), ," ")",http://www.allmusic.com/cg/amg.dll?p=amg&opt1=2&sql=$replace($regexp($regexp(%album%,'\s*(\(|\[|\{).+?(\)|\]|\})',),'^(The|A|An)\s(.*)$',$2), ,+,&,%%26))

I use this already for Amazon:
CODE
$if(%asin%,"$replace($trim($replace( $meta_sep(asin, ), , http://www.amazon.$if($neql($regexp(%country%,(?i)germany,),%country%),de,com)/exec/obidos/ASIN/)), ," ")",http://www.amazon.$if($neql($regexp(%country%,(?i)germany,),%country%),de,com)/gp/search?ie=UTF8&index=music&keywords=$replace($if2(%band%,%artist%) \"$regexp(%album%,'\s*(\(|\[|\{).+?(\)|\]|\})',), ,+,&,%%26)\")


QUOTE (DetlevD @ Sep 24 2010, 16:21) *
2.
Do you know whether it is possible to create a tag-field by using the text content from another tag-field?
Example:
There is a tag-field ...
FIELD1="coverparadise"
... which can be modified to ...
FIELD1=$upper(%FIELD1%)'ID'
... giving ...
FIELD1=COVERPARADISEID
There is a tag-field ...
FIELD2="7735".

I have figured this out. It's possible with "Format Values" ("Tag-Felder formatieren" in German). But I don' know why or how this is required for automation.

QUOTE (DetlevD @ Sep 24 2010, 16:21) *
The goal is to create a new tag-field this way ...
Field: %FIELD1%
Value: %FIELD2%

I guess this is also requiered for a full automation. I don't think this really necessary for my needs. I can manually filter the relating files and then apply it manually in a custom column to control the output first with a preview. After that I can wrtite an action to do the work.

QUOTE (DetlevD @ Sep 24 2010, 16:21) *
Really I suggest, that you export your WWW field content into a text file, let the "dictionary work" (multiple values to one key) be done by another "full blown programming language" (or simply use a text editor, which at best provide macro operating) and import the result back into the file.
But even this approach has the need of manually "hard coding" the receiving tag field names (like "COVERPARADISEID") into a Formatstring when importing from "Textfile to Tag", and the import process has the need of human interaction too.

This is an alternate Solution for the problem but I think this is to extensive. Why using a third party tool if mptag can do the work? In the meanwhile I have figured out an expression that do exactly what I want.

QUOTE (DazBYorks @ Sep 24 2010, 01:10) *
You could, of course, shorten the expression, but it is more readable to me.

Yes I've shorten it a little but it's still a little bit long. Here it is:

Allmusic_ID:
CODE
$replace($trim($regexp($regexp(%www%,\r\n, ),'(?<=amg&sql=10:)(\w+\s?)\s*|.+?\s?',$1)), ,\\)

Coverparadise_ID:
CODE
$replace($trim($regexp($regexp(%www%,\r\n, ),'(?<=ViewEntry&ID=)(\d+\s?)\s*|.+?\s?',$1)), ,\\)


The first regexp changes linebreaks to spaces and the second extracts the IDs from the URLs. After trimming and replacing spaces to \\ the values can be written to the desired fields Allmusic_ID and Coverparadise_ID with a custom column or an action.

Surely there is room for improvement. Certanly it's possible to find only one expression without the trim funktion that can do the job. Also it might be possible to generate the fieldnames automaticly from the URLs. Further more this could be combined with an action that automaticly removes the related URLs from the WWW field. I'm doing this still manual. I know its not perfect, but it works for all my tags I want to change.

Greetings!

Knabbakeks

This post has been edited by Knabbakeks: Sep 26 2010, 18:10


--------------------
Morgen scheint die Sonne!
Go to the top of the page
 
+Quote Post
DetlevD
post Sep 27 2010, 05:34
Post #5


Member


Group: Full Members
Posts: 4129
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.54



QUOTE (Knabbakeks @ Sep 26 2010, 18:43) *
Thanks for your attention and reply!
The field becomes a tag-field with multiple values. This is exactly what I want. ...

Ok, i was not quite sure, if you was sure about the meaning of two backslashes as a multi value delimiter surrogate.

QUOTE (Knabbakeks @ Sep 26 2010, 18:43) *
... I guess this is also requiered for a full automation. I don't think this really necessary for my needs. I can manually filter the relating files and then apply it manually in a custom column to control the output first with a preview. After that I can wrtite an action to do the work. ...

Ok.

QUOTE (Knabbakeks @ Sep 26 2010, 18:43) *
... This is an alternate Solution for the problem but I think this is to extensive. Why using a third party tool if mptag can do the work? In the meanwhile I have figured out an expression that do exactly what I want. ...

Mp3tag cannot full automate the process you need. You have to do the main work still by your hands and eyes.

QUOTE (Knabbakeks @ Sep 26 2010, 18:43) *
... Also it might be possible to generate the fieldnames automaticly from the URLs. ...

I think it is not possible to create a new tag-field from a data value out of another tag-field.
That is the caveat which breaks a possible automation process.

In my study of your problem I got these intermediate results. ...
Input field with items ordered by line (blank replaced with newline) ...
CODE
http://coverparadise.to/index.php?Module=ViewEntry&ID=69885
http://coverparadise.to/index.php?Module=ViewEntry&ID=7735
http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:0cfqxq80ldde
http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:3ifoxzysldde
http://www.allmusic.com/cg/amg.dll?p=amg&sql=10:gcfqxq9jld0e
http://www.emimusic.de
http://www.mute.com/


^http://(?:www\.)?(.+?)\..+\?.+$
CODE
coverparadise
coverparadise
allmusic
allmusic
allmusic



^?(?:\?)(?:.+[=:])(.+)$
CODE
69885
7735
0cfqxq80ldde
3ifoxzysldde
gcfqxq9jld0e


That is the pure data which is needed in the process: the tag-field names and the values. Now the challenge is to order the n:m relation of multiple values to unique field names.
But I could not find a tool in the Mp3tag scripting language to handle this task.

DD.20100927.0634.CEST

This post has been edited by DetlevD: Sep 27 2010, 05:35


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 25th May 2013 - 00:41