looking for some solutions? You are welcome.

SOLVED: Separate Title string with no spaces into words

Matt McManis:

I want to find and separate words in a title that has no spaces.




This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'

I'm looking for a Regex rule that can do the following.

I thought I'd identify each word if it starts with an Uppercase letter.

But also preserve ALL UPPERCASE words as not to space them into A L L U P P E R C A S E.

Additional rules:

  • Space a letter if it touches a number Hello2019World Hello 2019 World
  • Ignore spacing initials that contain periods, hyphens, or underscores T.E.S.T.
  • Ignore spacing if between brackets, parentheses, or quotes [Test] (Test) "Test" 'Test'
  • Preserve hyphens Hello-World



// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";

// Detect where to space words
string[] split =  Regex.Split(title, "(?<!^)(?=(?<![.\\-'\"([{])[A-Z][\\d+]?)"); 

// Trim each word of extra spaces before joining
split = (from e in split
         select e.Trim()).ToArray();

// Join into new title
string newtitle = string.Join(" ", split);

// Display


I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.



(?<!^)          // negative look behind

(?=             // positive look ahead

(?<![.\-'"([{]) // ignore if starts with punctuation
(?<![A-Z])      // ignore if starts with double Uppercase letter
[A-Z]           // space after each Uppercase letter
[\d+]?          // space after number


Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots

No comments: