Case conversion...

Howdy. I've stumbled across this Action script that formats all fields to title case. It works very well, except that I would like it to ignore words that are upper-case. For instance, words (or acronyms without periods) like UFO are converted to Ufo. I've left a message for the script's author but have not yet had a reply, so I thought I'd ask the geniuses over here. :wink:

So, any ideas?

Also, I'm wondering what similar scripts others here are using, so alternative suggestions are welcome. Thanks!


Update: What began as a humble effort to format my mp3 tags to title case has become a bit of an odyssey. With perseverance and a lot of help from the wonderful people that frequent this forum, I have assembled a script that is intended as a comprehensive solution to the problem of standardising, grammatically speaking, all text fields of a digital music collection - including filenames and parent directories.
Presenting... Grammartron: Smart Case and Grammar Restorer
Features include...*
  • Trim trailing, preceding and extra spaces
  • Enforce correct spacing between words and punctuation
  • Add missing apostrophes to word contractions
  • True title case conversion (upper-case words and letters are preserved)
  • CamelCasing of common Scottish and Irish names
  • Upper-case Roman numerals (up to LXXIX)
Notes
  1. Filenames: Track numbers should ideally be separated from the title by a boundary (e.g "-"). For instance: "%track% - %title%.ext". This avoids (non-critical) problems arising from the title case function. Edit for clarification: articles, conjunctions, etc. (e.g. "the", "and"), are usually lower-case, except if they are the first or last word in the group. In the following example: "01 The End.mp3", "The", although being the first word of the title, is considered as the second word after "01", as all alphanumeric characters are treated equally. Thus we end up with "01 the End.mp3". To avoid this, use a non-space word boundary between the track number and title.
  2. It is advisable to run the "Directory Names" script on its own, in a separate process. The "Tags" and "Filenames" scripts can be used together if desired.
  3. This script adds missing apostrophes to word contractions. Some CD burning software (Nero 9 and earlier versions in particular), are reported to have problems handling apostrophised words and may cause programme instability or aborted burning runs.
Acknowledgements

This script would not have been possible without the help of certain people. Special thanks goes to Liquid Parallax, whose work formed the basis for this script; Doug Mackie, whose script (word contractions) I also ripped off. :ph34r: Thanks guys, you rock!

*For the complete list of functions, please refer to the descriptions contained within the source code by opening the files in a text editor.

Grammartron_v1.0.zip (8.06 KB)

3 Likes

To save already upcased letters from title casing you can use the function $caps2.

DD.20110309.1430.CET

Oh good, but how do I integrate that into the script? Thanks.

Here are my two title-case MTA files: Title Case Action Files (Mackie) Link updated 22 April 2016

They add a series of corrections after running Mp3tag's Mixed Case function. Included are editable lists of abbreviations and acronyms that should be upper-case. If you open the MTA files in Notepad, you will see comments that describe each element in the action.

Note that my tag field actions cover just three fields and are slightly different for each field. That is by design but of course you can edit them as needed.

My regards and thanks to the posters here (including DetlevD) for the examples and explanations that I drew upon when creating these files.

Doug Mackie

Thanks, I'll give it a go. With regards to my other query, would you have an idea of how to achieve this, either using your script or the one I provided? Cheers.

Yog-Sothoth, you are quite welcome.

Your action starts with the supplied Mixed Case function, as does mine. At that point, your abbreviations have already lost capitalization after the first letter. That is why in my script I added a list of abbreviations to correct the errors. If you were writing a title-case script from scratch (i.e., not using Mixed Case), then $caps2() would be exactly what you want. My guess is that is what DetlevD meant, but of course we should let him speak for himself.

I did not attempt to write my script from scratch because it was much easier to let Florian's built-in Mixed Case function do the heavy-lifting, so to speak, and then to apply corrections afterwards.

Doug Mackie

Doug, your collection of 'cleaning and repair tools' looks very impressive, thank you for sharing it!

DD.20110309.2225.CET

Cheers Doug, those scripts are very good. Thanks for your time.

Ok, I've done a little digging and found a solution. All credit to Stevest.

Action: Replace with regular expression
Field: _ALL
Regular expression: ^(.*)$
Replace matches with: $caps2($1)
[ ] case-sensitive comparison

I just replaced the Mixed Case action with this. Works like a charm. :laughing:

EDIT: THERE IS A PROBLEM WITH THIS SCRIPT. READ ON BEFORE IMPLEMENTING!

What I said above in post#2.

DD.20110314.1908.CET

Sorry, co-credit to DetlevD too. :sunglasses:

Thanks. :music:

DD.20110314.1912.CET

Update: There is a bug in the aforementioned action. If a comma is present anywhere in the field, all characters after the comma are truncated. For example, the title "Harp concerto in B-flat major, op. 4 Nr. 6: 1: Andante-Allegro" becomes "Harp ConCerto in B-Flat MaJor" after conversion. If I remove the comma beforehand, it converts to "Harp Concerto in B-flat Major Op. 4 Nr. 6: 1: Andante-Allegro", which is normal.

Humph, I guess I'm back to square one again. :frowning: Any ideas what is going wrong, anyone?

BTW, this is the offending action, just to avoid confusion:

Action: Replace with regular expression
Field: _ALL
Regular expression: ^(.*)$
Replace matches with: $caps2($1)
[ ] case-sensitive comparison

I can confirm the comma problem but I can't explain it. Perhaps someone else can.

However, that action has other problems. It preserves upper-case errors in the source text, which must then be fixed by hand. More important, unlike "Mixed Case", "Replace with Regular Expression" lacks the option for custom word boundary markers. So you must then add actions to correct the resulting errors after punctuation, brackets, and so on. Perhaps the regular expression could be rewritten to minimize the effects of this limitation, but it would be a challenge. The number of abbreviations and acronyms in my titles is limited, so in my scripts I prefer a "Mixed Case" action followed by lists with corrections. There are no problems with commas.

Unless I've misinterpreted you, I believe the other scripts in the action group (see first post) should correct these problems. I could be wrong on that though, so please tell me if you think I'm mistaken.

Granted, but I'm not so fortunate. I have quite a large collection (around 70k tracks), a great deal of which has upper-case words that would be incorrectly formatted with the mixed case action.

Yes it looks like bug in the action's parameter parsing resp. in the chaining of regexp and any other additionally function in the replacement parameter.

You can workaround this buggy behaviour by using the action this way:

Action: Replace with regular expression
Field: _ALL
Regular expression: ([^,]*)
Replace matches with: $caps2($1)
case-sensitive comparison

DD.20110317.0007.CET

Yes, how someone will use the Mp3tag toolbox is always a mix of personal needs and skills and knowledge about how to find the perfect way in Mp3tag.

Personally I dislike to use the pseudo tag field _ALL, because it will lead to possibly corruption of many other other tag-field content like UNSYNCEDLYRICS, COMMENT and so on.
I like to put the focus on the tag-field which needs a change.
And because someone can combine as many actions as needed into one group of actions (or create several action groups), each single problem is still a single step and can be moved in the work flow to another place or removed or replaced by another solution without affecting other tasks.

DD.20110317.0031.CET

Thank you DetlevD, that seems to have fixed the problem alright. Well done that man! applause

Of course, running Mixed Case alone is not enough. That is why my scripts have so many additional actions. Granted, I work mostly with pop and jazz titles, but the basic idea is the same. I took your Harp Concerto title above and ran my tag action on it, and it came out perfectly except for "Op." which came out "OP." That happened because OP is in my abbreviation list for "Out of Print".

Actually, as a Latinism, I would expect that opus and op. should be lower-case in music titles. That is an example of why I use word lists: they are good for catching errors made by others :slight_smile:

Cheers,
Doug

Hi Doug, I agree with what you say, except for one caveat. Most of these upper-case words happen to be in the artist field. So, AC/DC (the correct format) becomes Ac/Dc, 10CC -> 10cc, etc... Then there are acronyms that should be interspersed with stops but aren't, and so end up being formated to title case. There's just no way a word correction list can account for every eventuality, unfortunately. Of course, these methods are not mutually exclusive - I also use a word list. Cheers for the script though. I've merged some parts of it with mine and it's working very well.