Notepad++: Use regular expressions to clean text with ease

Today I need to write some CMS documentation for an SDL Tridion implementation. As I am a programmer, I should be lazy and use the easiest method to get things done. Tridion uses Xml Schema’s to describe the content data and I need to describe the field names. The XML is like this:

<tcm:Label ElementName="main_title" Metadata="false">Title</tcm:Label>
<tcm:Label ElementName="start_intro_title" Metadata="false">Start intro title</tcm:Label>
<tcm:Label ElementName="start_intro_paragraph" Metadata="false">Start intro paragraph</tcm:Label>

The schema has much more elements then the example above (42 fields / lines), but it’s to illustrate that it contains a field name (marked in bold). I would like to filter out other data, so that this remains:

Start intro title
Start intro paragraph

I could do this by hand, but better is the search and replace with regular expression power! My favorite open source text editor is Notepad++, but I think it will be more or less the same with other editors.





Open the replace dialog (CTRL-H), important to check Wrap Around and Regular Expression checkboxes. Enter in the Find what: .*>(.*)<.* , and replace this with \1. This regular expression captures every line and creates a group between the ‘>’ and the ‘<’ character. This first grouping is used as the replace value \1.

This is just one simple example, you can find more here at Mark’s speechblog.

Hope this helps,