IPB

Welcome Guest ( Log In | Register )

2 Pages V   1 2 >  
Reply to this topicStart new topic
> Discogs: Parse Album (Release)
Victor Kostas
post Apr 9 2011, 22:56
Post #1


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



Web Sources logic is to locate first release id (album) in ParserScriptIndex an then to parse Album info in parserScriptAlbum step.
Source (.Src) files posted in mp3tag forums use different approaches of parsing album info: HTML and XML.
Wondering why not to use always XML since it is much better formatted then HTML. On top of that HTML is not stable (discogs might modify HTML tags construct in release pages at any time making ParserScriptAlbum unstable/unusable).

Ideally the usage of XML in both ParserScriptIndex and ParserScriptAlbum but combined with appropriate customizing would make it really powerful.
E.g. search by Artist and album. First, search only by artist (API artist search) and then filter (XML) results with given album title. Filtering can be extended with all possible fields (Year, Format, Genre, etc). A kind of Filtering popup similar to current .src popup but dedicating a label and a field for each possible tag would be needed in this case.
I know the interface between Filtering popup and Parser script is difficult but I believe that might be feasible.
Customizing would be needed to let user chose which fields should be extracted from ALBUM release. Scripting code would take into account the flag from customizing and then include/exclude the tag from process.

Less script files would be needed by this approach.
Go to the top of the page
 
+Quote Post
pone
post Apr 9 2011, 23:17
Post #2


Member


Group: Full Members
Posts: 1688
Joined: 15-March 09
From: Germany
Member No.: 9103
Mp3tag Version: 2.59b



ZITAT(Victor Kostas @ Apr 9 2011, 23:56) *
Web Sources logic is to locate first release id (album) in ParserScriptIndex an then to parse Album info in parserScriptAlbum step.
Source (.Src) files posted in mp3tag forums use different approaches of parsing album info: HTML and XML.
Wondering why not to use always XML since it is much better formatted then HTML. On top of that HTML is not stable (discogs might modify HTML tags construct in release pages at any time making ParserScriptAlbum unstable/unusable).

Ideally the usage of XML in both ParserScriptIndex and ParserScriptAlbum but combined with appropriate customizing would make it really powerful.
E.g. search by Artist and album. First, search only by artist (API artist search) and then filter (XML) results with given album title. Filtering can be extended with all possible fields (Year, Format, Genre, etc). A kind of Filtering popup similar to current .src popup but dedicating a label and a field for each possible tag would be needed in this case.
I know the interface between Filtering popup and Parser script is difficult but I believe that might be feasible.
Customizing would be needed to let user chose which fields should be extracted from ALBUM release. Scripting code would take into account the flag from customizing and then include/exclude the tag from process.

Less script files would be needed by this approach.


Scripts are different because they are user generated and every user can do them as he wants.
In my case, I simply didn't know about the XML documents of discogs' API when I created the scripts first and I never changed.
XML is more stable, yes, but not more powerful.

The filtering you propose can't be done with Mp3tag web scripts as far as I know. I have tried to repeat ParserScriptIndex with the relsults of the first, but that had no success. I have tried this with searching for master releases in a firlst step and indexing all releases belonging to the master in a second step. But as I sayed, no success
The filtering you propose doesn't make much sense anyway. As the API artist search is now constructed, you can sort the releases alpabetically, and scroll to the releases of the album you want. That's as easy as using a extra filter to see only these release.

You have seen my new scripts? The customization you propose it possible there. For the first time in such an easy way for several scripts at once, if I may add this with a certain proudness wink.gif
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 10 2011, 11:22
Post #3


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (pone @ Apr 10 2011, 00:17) *
You have seen my new scripts? The customization you propose it possible there. For the first time in such an easy way for several scripts at once, if I may add this with a certain proudness wink.gif

Of course I've seen (I had posted a comment also there: INCLUDE command wink.gif ).

QUOTE (pone @ Apr 10 2011, 00:17) *
I simply didn't know about the XML documents of discogs' API when I created the scripts first and I never changed.
XML is more stable, yes, but not more powerful.

I still believe album parser based on XML is not inferior to HTML. XML response has some great advantages (for me at least). E.g. Artist full name Vs abbreviations. I prefer artist full name (e.g. Pat Benatar instead of P. Benatar or even Benatar). Discogs put artist abbreviation between tags in HTML (artist name can be extracted from the tag itself but you increase the complexity of your scripts) while in XML you have them both.
CODE
HTML:
<a href="/artist/Pat+Benatar?anv=P.+Benatar" class="rollover_link">P. Benatar</a>

XML:
<artist>
<name>Pat Benatar</name>
<anv>P. Benatar</anv>


Another example is the track info of persons involved. In HTML you need some fancy (complex) code to assign the personnel to correct track. XML is simple.
I encourage you to switch albumparser to XML. It will save you time.
Nevertheless I respect the good work you have done in your scripts. CONGRATULATIONS.
Go to the top of the page
 
+Quote Post
pone
post Apr 10 2011, 14:12
Post #4


Member


Group: Full Members
Posts: 1688
Joined: 15-March 09
From: Germany
Member No.: 9103
Mp3tag Version: 2.59b



ZITAT(Victor Kostas @ Apr 10 2011, 12:22) *
XML response has some great advantages (for me at least). E.g. Artist full name Vs abbreviations. I prefer artist full name (e.g. Pat Benatar instead of P. Benatar or even Benatar). Discogs put artist abbreviation between tags in HTML (artist name can be extracted from the tag itself but you increase the complexity of your scripts) while in XML you have them both.


Yes, you are right. That's would be the big advantag of XML/API besides the more stable character of the scripts.
It's not full artist name vs. artist abbreviation but main artist name vs. artist name variation (anv). It appear on every place artist names are listed: Albumartist, Artist, Track Extra Artists, Credits, Notes.

I could extract that from the HTML documents too in most cases. But I have problems ther with special characters as the main artist name is hidden in a linke and there speical characters are URL encoded (%21 for !, %22 ", %23 for #, ...)

So yes, switching to XML/API is on my to do list. I just haven't done it yet because personally I prefer the artist names as given on the release pages.
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 11 2011, 01:38
Post #5


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (pone @ Apr 10 2011, 15:12) *
It's not full artist name vs. artist abbreviation but main artist name vs. artist name variation (anv). It appear on every place artist names are listed: Albumartist, Artist, Track Extra Artists, Credits, Notes.

I could extract that from the HTML documents too in most cases. But I have problems ther with special characters as the main artist name is hidden in a linke and there speical characters are URL encoded (%21 for !, %22 ", %23 for #, ...)


Actually you find both artist variations in XML/API.

CODE
<artist>
<name>Pat Benatar</name>
<anv>P. Benatar</anv>
<join/>
<role>Written-By</role>
<tracks>4, 5, 7, 8, 15, 19</tracks>
</artist>


There are some things Web source development should be amended for.
  • It is impossible (AFAIK) to process/access TAG value filled from parse process. E.g. you want to merge Genre and Styles in Genre. Value separator should be added to Genre if it is already filled before adding Styles. No way to know Genre is (not) blank.
  • I am struggling to assign involved artists to each track but it is a pain.
    e.g.
    <artist>
    <name>xxx yyyy</name>
    <role>Written-By</role>
    <tracks>2 to 5, 7 to 8, 15, 19</tracks>
    </artist>

    Single track nrs can be parsed with different searches and regexp but is impossible to process ranges N1 to N2.
  • No returncode is provided for find (FindInLine/FindLine) commands. If a specifig tag is not provided in the page being parsed then the whole script will fail.


Discogs provides compressed XML dump of all Artists/Labels/Releases on monthly basis.
I am seriously thinking to convert those 3 XML files in more friendly format to mp3tag.
mp3tag source will be used to find the correct release ID then local converted XML files will be used to fill desired tags.
Go to the top of the page
 
+Quote Post
pone
post Apr 11 2011, 02:23
Post #6


Member


Group: Full Members
Posts: 1688
Joined: 15-March 09
From: Germany
Member No.: 9103
Mp3tag Version: 2.59b



ZITAT(Victor Kostas @ Apr 11 2011, 02:38) *
It is impossible (AFAIK) to process/access TAG value filled from parse process. E.g. you want to merge Genre and Styles in Genre. Value separator should be added to Genre if it is already filled before adding Styles. No way to know Genre is (not) blank.


I don't know what you mean. I think you are talking about my discogs scripts. I write Syle befor Genre and I found a way to check if Style is blank, which leaves the value seperator out.
I think there is no discogs release without Genre. At least I have not found one, and I looked at many for script developing and also with tagging my own music collection.


ZITAT(Victor Kostas @ Apr 11 2011, 02:38) *
I am struggling to assign involved artists to each track but it is a pain.
  • e.g.
    <artist>
    <name>xxx yyyy</name>
    <role>Written-By</role>
    <tracks>2 to 5, 7 to 8, 15, 19</tracks>
    </artist>

    Single track nrs can be parsed with different searches and regexp but is impossible to process ranges N1 to N2.
This will be hard, I think there is no standard at discogs for the way this is written. And it's not enough to be able to parse a number. You would have to sort these things in the order of the tracklistlisting to be able to parse it. Texts which are different for every track are difficult, because you need a
do ... while loop which must also take care of the tracks for which this credits should not be written.


ZITAT(Victor Kostas @ Apr 11 2011, 02:38) *
No returncode is provided for find (FindInLine/FindLine) commands. If a specifig tag is not provided in the page being parsed then the whole script will fail.


use
findline "..." 1 1
and
findinline "..." 1 1
this jumps to the end of the line or script is text is not found and prevents the script form failing

ZITAT(Victor Kostas @ Apr 11 2011, 02:38) *
Discogs provides compressed XML dump of all Artists/Labels/Releases on monthly basis.
I am seriously thinking to convert those 3 XML files in more friendly format to mp3tag.
mp3tag source will be used to find the correct release ID then local converted XML files will be used to fill desired tags.

What is the compressed XML dump? Can you download the whole discogs database???


It is a bit hard to answer your questions. I never know exactly if you do
- questions for how things can be done,
- proposals how things should be done,
- or proposals how Mp3tag's web sources script sytem should be changed by the programmer of Mp3tag.


Are you working already on a web sources script for discogs' XML/API? I'm alway ready to help if you have some problems. Just post your questions here!

This post has been edited by pone: Apr 11 2011, 02:23
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 11 2011, 10:08
Post #7


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (pone @ Apr 11 2011, 03:23) *
I don't know what you mean. I think you are talking about my discogs scripts. I write Syle befor Genre and I found a way to check if Style is blank, which leaves the value seperator out.
I think there is no discogs release without Genre. At least I have not found one, and I looked at many for script developing and also with tagging my own music collection.

My point is that is would be much better if we could use some commands on already filled tag. To achieve this we use workarounds (you already have done also) which increase a lot script complexity.


QUOTE (pone @ Apr 11 2011, 03:23) *
use
findline "..." 1 1
and
findinline "..." 1 1
this jumps to the end of the line or script is text is not found and prevents the script form failing

THANKS. I had seen the extension "1 1" but didn't find any info in forums.

QUOTE (pone @ Apr 11 2011, 03:23) *
What is the compressed XML dump? Can you download the whole discogs database???

Yes you can Discogs RAW data
It is not difficult to read those XML files and amend them.
- Put credits in correct tracklist
- Add discnumber
- Add artist info into release XML file
etc.
Then with a simple script we save discogs release ID to files and 2nd script would parse the local (huge) XML file.
It could be possible to even split XML release id file to several ones by using range of release id.

QUOTE (pone @ Apr 11 2011, 03:23) *
Are you working already on a web sources script for discogs' XML/API? I'm always ready to help if you have some problems. Just post your questions here!

Thank you.
I have attached a simple .src file based on Florian discogs.src but your scripts helped a lot also.
Please note you have to add your discogs API_KEY in the script file. Anyone can get one for free in discogs site.
Attached File(s)
Attached File  _discogs___VKOSTAS__Search_by__Artist___Album.src ( 6.25K ) Number of downloads: 376
 
Go to the top of the page
 
+Quote Post
DetlevD
post Apr 11 2011, 10:20
Post #8


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (Victor Kostas @ Apr 11 2011, 11:08) *
... I had seen the extension "1 1" but didn't find any info in forums. ...

See section "Web Sources Framework/List of Parser Commands" in the Mp3tag help manual.
FindLine S n Find line with first or Nth occurrence of S (starting from the current position)
FindInLine S n Find the next/Nth occurrence of S within the current line

DD.20110411.1120.CEST

This post has been edited by DetlevD: Apr 11 2011, 10:21


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 11 2011, 10:55
Post #9


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (DetlevD @ Apr 11 2011, 11:20) *
See section "Web Sources Framework/List of Parser Commands" in the Mp3tag help manual.
FindLine S n Find line with first or Nth occurrence of S (starting from the current position)
FindInLine S n Find the next/Nth occurrence of S within the current line

DD.20110411.1120.CEST

I've seen and clearly understood that.

QUOTE (pone)
se
findline "..." 1 1
and
findinline "..." 1 1
this jumps to the end of the line or script is text is not found and prevents the script form failing

What I couldn't find was the what if "Find" command fails?
Pone made clear to me.

Thank you for your comment
Go to the top of the page
 
+Quote Post
DetlevD
post Apr 11 2011, 15:40
Post #10


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



Regarding the discussion about using discogs monthly updated xml database as a data source for Mp3tag I want to mention ...
... I downloaded the three gz files and unpacked them ...
discogs_20110401_artists.xml 216 MB (227.083.986 Bytes)
discogs_20110401_labels.xml 32,9 MB (34.554.966 Bytes)
discogs_20110401_releases.xml 4,62 GB (4.961.080.036 Bytes)

The huge "releases" xml file with about a million lines is not readable, neither into the text editor KEDIT nor into Notepad++, nor into XML Notepad, perhaps it is because the file is larger than 4 GB, but astonishingly Textpad can read the file, it takes its time, good for a break for a cup of tea.

I assume, that there is no practical chance to use such huge file as a source for the Mp3tag source scripting feature.
Even the use of XML Path Language and high end professional xml reader let occur hiccups, I know it from other applications.
XML is a lame duck when it comes to large amounts of data.
For practical use the "releases" xml file has to be split into smaller parts.

DD.20110411.1640.CEST


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 11 2011, 16:39
Post #11


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (DetlevD @ Apr 11 2011, 16:40) *
Regarding the discussion about using discogs monthly updated xml database as a data source for Mp3tag I want to mention ...
... I downloaded the three gz files and unpacked them ...
discogs_20110401_artists.xml 216 MB (227.083.986 Bytes)
discogs_20110401_labels.xml 32,9 MB (34.554.966 Bytes)
discogs_20110401_releases.xml 4,62 GB (4.961.080.036 Bytes)

The huge "releases" xml file with about a million lines is not readable, neither into the text editor KEDIT nor into Notepad++, nor into XML Notepad, perhaps it is because the file is larger than 4 GB, but astonishingly Textpad can read the file, it takes its time, good for a break for a cup of tea.

I assume, that there is no practical chance to use such huge file as a source for the Mp3tag source scripting feature.
Even the use of XML Path Language and high end professional xml reader let occur hiccups, I know it from other applications.
XML is a lame duck when it comes to large amounts of data.
For practical use the "releases" xml file has to be split into smaller parts.

DD.20110411.1640.CEST

Yep you are CORRECT. First I thought to split each the huge release IDs XML on each release ID but then I a better idea came to my mind.
Since we locate the release id then we can download and convert the release ID XML file wink.gif .
Actually with "debugging ON" option we download the source XML file (ParseScriptAlbum section).
I can write a small program to convert the discogs XML file to a mp3tag web source simplified format XML file.
Now, can Web Script file be read directly the converted XML file in local disk? Possibly yes but need to check. Can someone confirm this? Thank you.

The question is what should be amended in the discogs XML file?
  • Move credits, artists, etc to track level.
  • Merge Genre/Styles under same XML tag (Genres).
  • Add new tag on XML tags containing number of values (e.g. nr of tracks, nr of artists, etc).


Please send your comments/suggestions.

This post has been edited by Victor Kostas: Apr 11 2011, 16:41
Go to the top of the page
 
+Quote Post
DetlevD
post Apr 11 2011, 17:10
Post #12


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (Victor Kostas @ Apr 11 2011, 17:39) *
... Now, can Web Script file be read directly the converted XML file in local disk? Possibly yes but need to check. Can someone confirm this? Thank you. ...Please send your comments/suggestions.

Hmm, as I understand the Web Source Feature, at least it needs a local webserver to call a webpage. See configuration in the src header.
See also: http://forums.mp3tag.de/index.php?showtopi...ost&p=53372
But that might easily to be modifed.

DD.20110411.1811.CEST

This post has been edited by DetlevD: Apr 11 2011, 17:13


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 12 2011, 11:16
Post #13


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (DetlevD @ Apr 11 2011, 18:10) *
Hmm, as I understand the Web Source Feature, at least it needs a local webserver to call a webpage. See configuration in the src header.
See also: http://forums.mp3tag.de/index.php?showtopi...ost&p=53372
But that might easily to be modifed.

DD.20110411.1811.CEST


Local Web server works ok. I am working on converting the XML file now...
In the next few days I will come back.

Thank you cool.gif
Go to the top of the page
 
+Quote Post
pone
post Apr 12 2011, 11:38
Post #14


Member


Group: Full Members
Posts: 1688
Joined: 15-March 09
From: Germany
Member No.: 9103
Mp3tag Version: 2.59b



ZITAT(Victor Kostas @ Apr 12 2011, 12:16) *
Local Web server works ok. I am working on converting the XML file now...
In the next few days I will come back.

Thank you cool.gif


Just a question:
What do you wnat to do with Allbum Credits which are not assigned to special tracks? These credits are sometimes for all tracks, and sometimes they have nothing to do with the tracks at all:

See here:
http://www.discogs.com/release/378017
You have credits for A&R, Artwork, Booking, Tour Managment, ....
In what tag field do you want to write this?
Go to the top of the page
 
+Quote Post
Victor Kostas
post Apr 12 2011, 13:05
Post #15


Member


Group: Full Members
Posts: 95
Joined: 2-March 08
From: Belgium
Member No.: 6578
Mp3tag Version: 2.65



QUOTE (pone @ Apr 12 2011, 12:38) *
Just a question:
What do you want to do with Album Credits which are not assigned to special tracks? These credits are sometimes for all tracks, and sometimes they have nothing to do with the tracks at all:

See here:
http://www.discogs.com/release/378017
You have credits for A&R, Artwork, Booking, Tour Managment, ....
In what tag field do you want to write this?


Discogs_Credits_Multi is ok I guess.
You are right. There are credits which are not directly related to the track itself. Someone might not care about who the photographer is or who did the design of the album cover, etc.
First step is to apply a simple rule: assign credits to correct track; No track info means all tracks.
Later we can add options/filters so they can define which credits can be included or not in the tracks. Even we can split credits in different track tags.
Although don't I see a big issue here. If more information is assigned to track it can be removed somehow (automatically/manually).
First priority is to be able to convert Discogs XML file with less hustle for user. The rest will come.
cool.gif

This post has been edited by Victor Kostas: Apr 12 2011, 13:09
Go to the top of the page
 
+Quote Post

2 Pages V   1 2 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 30th October 2014 - 19:11