IPB

Welcome Guest ( Log In | Register )

> Notice!

Please take a minute to check our Frequently Asked Questions. Use Search to reveal possible related topics.

Also make sure you've read the Forum Guidelines before posting in this forum.

6 Pages V  < 1 2 3 4 > »   
Reply to this topicStart new topic
> Case conversion...
DetlevD
post Mar 17 2011, 00:09
Post #16


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (yog-sothoth @ Mar 16 2011, 21:17) *
... There is a bug in the aforementioned action. If a comma is present anywhere in the field, all characters after the comma are truncated. ...[ ] case-sensitive comparison

Yes it looks like bug in the action's parameter parsing resp. in the chaining of regexp and any other additionally function in the replacement parameter.

You can workaround this buggy behaviour by using the action this way:

Action: Replace with regular expression
Field: _ALL
Regular expression: ([^,]*)
Replace matches with: $caps2($1)
[ ] case-sensitive comparison

DD.20110317.0007.CET


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
DetlevD
post Mar 17 2011, 00:31
Post #17


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (Doug Mackie @ Mar 16 2011, 22:20) *
... However, that action has other problems. It preserves upper-case errors in the source text, which must then be fixed by hand. More important, unlike "Mixed Case", "Replace with Regular Expression" lacks the option for custom word boundary markers. So you must then add actions to correct the resulting errors after punctuation, brackets, and so on. ...

Yes, how someone will use the Mp3tag toolbox is always a mix of personal needs and skills and knowledge about how to find the perfect way in Mp3tag.

Personally I dislike to use the pseudo tag field _ALL, because it will lead to possibly corruption of many other other tag-field content like UNSYNCEDLYRICS, COMMENT and so on.
I like to put the focus on the tag-field which needs a change.
And because someone can combine as many actions as needed into one group of actions (or create several action groups), each single problem is still a single step and can be moved in the work flow to another place or removed or replaced by another solution without affecting other tasks.

DD.20110317.0031.CET


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
yog-sothoth
post Mar 17 2011, 01:00
Post #18


Member


Group: Full Members
Posts: 64
Joined: 8-November 10
From: UK
Member No.: 13190
Mp3tag Version: 2.48



QUOTE (DetlevD @ Mar 17 2011, 01:09) *
Yes it looks like bug in the action's parameter parsing resp. in the chaining of regexp and any other additionally function in the replacement parameter.

You can workaround this buggy behaviour by using the action this way:

Action: Replace with regular expression
Field: _ALL
Regular expression: ([^,]*)
Replace matches with: $caps2($1)
[ ] case-sensitive comparison

DD.20110317.0007.CET



Thank you DetlevD, that seems to have fixed the problem alright. Well done that man! *applause*
Go to the top of the page
 
+Quote Post
Doug Mackie
post Mar 17 2011, 01:24
Post #19


Member


Group: Full Members
Posts: 37
Joined: 27-April 09
From: New Jersey, USA
Member No.: 9952
Mp3tag Version: 2.51



QUOTE (yog-sothoth @ Mar 16 2011, 17:53) *
I have quite a large collection (around 70k tracks), a great deal of which has upper-case words that would be incorrectly formatted with the mixed case action.


Of course, running Mixed Case alone is not enough. That is why my scripts have so many additional actions. Granted, I work mostly with pop and jazz titles, but the basic idea is the same. I took your Harp Concerto title above and ran my tag action on it, and it came out perfectly except for "Op." which came out "OP." That happened because OP is in my abbreviation list for "Out of Print".

Actually, as a Latinism, I would expect that opus and op. should be lower-case in music titles. That is an example of why I use word lists: they are good for catching errors made by others smile.gif

Cheers,
Doug
Go to the top of the page
 
+Quote Post
yog-sothoth
post Mar 17 2011, 01:42
Post #20


Member


Group: Full Members
Posts: 64
Joined: 8-November 10
From: UK
Member No.: 13190
Mp3tag Version: 2.48



Hi Doug, I agree with what you say, except for one caveat. Most of these upper-case words happen to be in the artist field. So, AC/DC (the correct format) becomes Ac/Dc, 10CC -> 10cc, etc... Then there are acronyms that should be interspersed with stops but aren't, and so end up being formated to title case. There's just no way a word correction list can account for every eventuality, unfortunately. Of course, these methods are not mutually exclusive - I also use a word list. Cheers for the script though. I've merged some parts of it with mine and it's working very well.

This post has been edited by yog-sothoth: Mar 17 2011, 02:00
Go to the top of the page
 
+Quote Post
yog-sothoth
post Mar 17 2011, 16:54
Post #21


Member


Group: Full Members
Posts: 64
Joined: 8-November 10
From: UK
Member No.: 13190
Mp3tag Version: 2.48



Another update: I've done some more testing of the script and found a few problems. Firstly, Doug was right in saying that without the "words begin after" command in the native case conversion, it preserves or causes case errors. For instance, any word immediately after a bracket (like "(Featuring..." ) either remains or becomes lower-case after conversion. The original script I described in my first post used the mixed case function, with the following "words begin after" instruction: ({[]})-_",./+&@:;* Therefore, this somehow needs to be accounted for in the new script using a reg-ex action.

Secondly, DetlevD's solution has one minor flaw. If a bracket is left open, i.e. "(Remix", then the following error occurs: [ SYNTAX ERROR IN FORMATTING STRING ], and the field info is lost.

I really appreciate all the help I'm getting. Please help me fix this, once and for all. happy.gif
Go to the top of the page
 
+Quote Post
DetlevD
post Mar 17 2011, 17:06
Post #22


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (yog-sothoth @ Mar 17 2011, 16:54) *
...Secondly, DetlevD's solution has one minor flaw. If a bracket is left open, i.e. "(Remix", then the following error occurs: [ SYNTAX ERROR IN FORMATTING STRING ], and the field info is lost. ...

I want to mention, that I did not offer a "solution", but only a workaround for a possible bug respectively for possible misuse of the Action "Replace using Regular Expession".
'yog-sothoth' you should raise a bug report about the both error cases which you have detected.

QUOTE (yog-sothoth @ Mar 17 2011, 16:54) *
... any word immediately after a bracket (like "(Featuring..." ) either remains or becomes lower-case after conversion. ...

You should read the manual and check out, what the second parameter of the $caps2 function can do for you.

DD.20110317.1706.CET

This post has been edited by DetlevD: Mar 17 2011, 19:18


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
dano
post Mar 17 2011, 19:04
Post #23


Moderator


Group: Moderators
Posts: 5688
Joined: 4-September 03
From: Germany
Member No.: 201
Mp3tag Version: 2.65



Instead of the first mixed case action try these two:

Action type: Replace with regular expression
Field: _TAG
Regular expression: ([-({\[\]}) _",./+&@:;*])(\l)
Replace matches with: $1$upper($2)
[x] case-sensitive comparison

Action type: Replace with regular expression
Field: _TAG
Regular expression: ^(\l)
Replace matches with: $upper($1)
[x] case-sensitive comparison


--------------------
Go to the top of the page
 
+Quote Post
yog-sothoth
post Mar 17 2011, 19:47
Post #24


Member


Group: Full Members
Posts: 64
Joined: 8-November 10
From: UK
Member No.: 13190
Mp3tag Version: 2.48



QUOTE (dano @ Mar 17 2011, 20:04) *
Instead of the first mixed case action try these two:

Action type: Replace with regular expression
Field: _TAG
Regular expression: ([-({\[\]}) _",./+&@:;*])(\l)
Replace matches with: $1$upper($2)
[x] case-sensitive comparison

Action type: Replace with regular expression
Field: _TAG
Regular expression: ^(\l)
Replace matches with: $upper($1)
[x] case-sensitive comparison


Excellent! Thank you so much dano, great job! biggrin.gif

One more question. You use the _TAG field, which I assume is all tag fields, but not the filename, right? As I wish to apply these actions to the filename also, would changing the field to _ALL have any potential drawbacks. The same with _DIRECTORY, also. Thanks again
Go to the top of the page
 
+Quote Post
DetlevD
post Mar 17 2011, 21:07
Post #25


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (yog-sothoth @ Mar 17 2011, 19:47) *
... One more question. You use the _TAG field, which I assume is all tag fields, but not the filename, right? As I wish to apply these actions to the filename also, would changing the field to _ALL have any potential drawbacks. The same with _DIRECTORY, also. Thanks again

I beg your pardon, just coming in for a moment. rolleyes.gif

You have no need to apply this cleaning procedure for the filename too, because once all the tag fields have been cleaned and got their correct values, you can assemble the file name and all the folder names in the folder tree from the content of the tag fields.
This is the recommended work flow.

Be aware that the file name resp. full path name needs additional housekeeping to become a valid file name, because there are forbidden characters in the file system.

DD.20110317.2108.CET

This post has been edited by DetlevD: Mar 17 2011, 21:11


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
dano
post Mar 17 2011, 21:11
Post #26


Moderator


Group: Moderators
Posts: 5688
Joined: 4-September 03
From: Germany
Member No.: 201
Mp3tag Version: 2.65



I don't recommend _ALL for the first action (the second is ok). It would change your filename extensions: .mp3 -> .Mp3
Well you could make an additional action to fix that.

You can use these actions on _DIRECTORY


The first action can be changed to a Format value action so it does not mess with the extension:

Action type: Format value
Field: _FILENAME
Formatstring: $regexp(%_filename%,'([-({\[\]}) _",./+&@:;*])(\l)',$1\u$2)



--------------------
Go to the top of the page
 
+Quote Post
yog-sothoth
post Mar 17 2011, 22:14
Post #27


Member


Group: Full Members
Posts: 64
Joined: 8-November 10
From: UK
Member No.: 13190
Mp3tag Version: 2.48



QUOTE (DetlevD @ Mar 17 2011, 21:07) *
You have no need to apply this cleaning procedure for the filename too, because once all the tag fields have been cleaned and got their correct values, you can assemble the file name and all the folder names in the folder tree from the content of the tag fields.
This is the recommended work flow.


I accept that. However, I've already long ago edited the tags of my entire collection and formatted the filename and parent directory names accordingly. I have no desire to do it again if I can avoid it, because doing so would only cause more problems. For example, some tracks (when in albums with various artists) include the artist field in the filename, while tracks from single-artist albums don't. There are other idiosyncrasies that together would make it a pretty laborious task to have to differentiate between them. All that I want to do now is standardise all fields to title case.

QUOTE (DetlevD @ Mar 17 2011, 21:07) *
Be aware that the file name resp. full path name needs additional housekeeping to become a valid file name, because there are forbidden characters in the file system.

DD.20110317.2108.CET


Correct me if I'm wrong, but doesn't mp3tag automatically remove illegal characters when formatting filenames?
Go to the top of the page
 
+Quote Post
yog-sothoth
post Mar 17 2011, 22:17
Post #28


Member


Group: Full Members
Posts: 64
Joined: 8-November 10
From: UK
Member No.: 13190
Mp3tag Version: 2.48



QUOTE (dano @ Mar 17 2011, 21:11) *
I don't recommend _ALL for the first action (the second is ok). It would change your filename extensions: .mp3 -> .Mp3
Well you could make an additional action to fix that.

You can use these actions on _DIRECTORY


The first action can be changed to a Format value action so it does not mess with the extension:

Action type: Format value
Field: _FILENAME
Formatstring: $regexp(%_filename%,'([-({\[\]}) _",./+&@:;*])(\l)',$1\u$2)


Ah, I see. So, just to recapitulate, here is the script in chronological order:

Action type: Replace with regular expression
Field: _TAG
Regular expression: ([-({\[\]}) _",./+&@:;*])(\l)
Replace matches with: $1$upper($2)
[x] case-sensitive comparison

Action type: Format value
Field: _FILENAME
Formatstring: $regexp(%_filename%,'([-({\[\]}) _",./+&@:;*])(\l)',$1\u$2)

Action type: Replace with regular expression
Field: _All
Regular expression: ^(\l)
Replace matches with: $upper($1)
[x] case-sensitive comparison

Followed by...

Action type: Replace with regular expression
Field: _ALL
Regular expression: (?<![/\-:;\(\)\[\]{}])\s(a|the|of|for|as|at|an|by|off|on|from|in|to|and|with|or|nor|von|de)(?=\s)(?!\s[\-\(\)\[\]{}])
Replace matches with: $lower($0)
[ ] case-sensitive comparison

Action type: Replace with regular expression
Field: _ALL
Regular expression: (^|\s|\(|\[|/)'(.{1})
Replace matches with: $1'$upper($2)
[ ] case-sensitive comparison


Please let me know if there's anything out of place here. Thanks dano.
Go to the top of the page
 
+Quote Post
DetlevD
post Mar 18 2011, 03:37
Post #29


Member


Group: Full Members
Posts: 5031
Joined: 26-May 06
From: Wuppertal, Germany, Planet Earth
Member No.: 3194
Mp3tag Version: 2.63



QUOTE (yog-sothoth @ Mar 17 2011, 22:14) *
...... doesn't mp3tag automatically remove illegal characters when formatting filenames?

Hmm, yes, but it does not replace the illegal characters with legal characters.
Additionally the probability rises of obtaining duplicate filenames.

You may get "problems" with ... e. g.
- "AC/DC"
- "Harp Concerto in B-flat Major Op. 4 Nr. 6: 1: Andante-Allegro"
... but i am not totally sure in this moment, whether it will be a problem for you.
You will lost the slash and the colon.
Check it out yourself.

DD.20110318.0338.CET

This post has been edited by DetlevD: May 13 2011, 17:45


--------------------
* Beyond that, don't ask, when you don't know what to do with the answer. *
♥ home is where the heart is ♥
Go to the top of the page
 
+Quote Post
Doug Mackie
post Mar 20 2011, 00:03
Post #30


Member


Group: Full Members
Posts: 37
Joined: 27-April 09
From: New Jersey, USA
Member No.: 9952
Mp3tag Version: 2.51



QUOTE (yog-sothoth @ Mar 16 2011, 20:42) *
... There's just no way a word correction list can account for every eventuality, unfortunately. Of course, these methods are not mutually exclusive - I also use a word list....

Hello Yog,

You may be interested to hear that I have followed your lead and am now using both methods. I now begin with a modified version of Dano's excellent example, followed by corrections, some using word lists. Some of the latter handle less common situations, such as source files that are all lowercase. The combination is working very well.

I did find that Dano's first regular expression could be simplified because many of the characters that he specified as word boundary markers are never used that way in the material that I work with. He used 19 markers:
([-({\[\]}) _",./+&@:;*])(\l)
I find that just seven are enough for my file names:
([-({ _.+])(\l)
and nine are enough for my artist, album, and title tags:
([-({ _.+\x{201c}"])(\l)
The square brackets are omitted as boundaries because I reserve those for my comments, which are always lowercase. The Unicode reference is to left curly quotation marks (“).

At first, I thought that more markers might be needed, but so far not. Having fewer markers speeds up the scripts, since every character in the source string must be compared with every boundary marker.

I've tweaked and refined other elements in my scripts, and have updated the zip file: Title Case Scripts

I also fixed a bug in the Latinisms sections of my previously-posted scripts. There was an unescaped period that could cause the mispelled contraction Ive to be forced to lowercase.

Cheers,
Doug

This post has been edited by Doug Mackie: Mar 20 2011, 23:00
Go to the top of the page
 
+Quote Post

6 Pages V  < 1 2 3 4 > » 
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 23rd October 2014 - 10:22