looking for some solutions? You are welcome.

SOLVED: Separate Title string with no spaces into words

Matt McManis:

I want to find and separate words in a title that has no spaces.

Before

ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)"Test"'Test'[Test]

After

This Is An Example Title HELLO-WORLD 2019 T.E.S.T. (Test) [Test] "Test" 'Test'


I'm looking for a Regex rule that can do the following.

I thought I'd identify each word if it starts with an Uppercase letter.

But also preserve ALL UPPERCASE words as not to space them into A L L U P P E R C A S E.

Additional rules:

  • Space a letter if it touches a number Hello2019World Hello 2019 World
  • Ignore spacing initials that contain periods, hyphens, or underscores T.E.S.T.
  • Ignore spacing if between brackets, parentheses, or quotes [Test] (Test) "Test" 'Test'
  • Preserve hyphens Hello-World

C#

https://rextester.com/GAZJS38767

// Title without spaces
string title = "ThisIsAnExampleTitleHELLO-WORLD2019T.E.S.T.(Test)[Test]\"Test\"'Test'";

// Detect where to space words
string[] split =  Regex.Split(title, "(?<!^)(?=(?<![.\\-'\"([{])[A-Z][\\d+]?)"); 

// Trim each word of extra spaces before joining
split = (from e in split
         select e.Trim()).ToArray();

// Join into new title
string newtitle = string.Join(" ", split);

// Display
Console.WriteLine(newtitle);


Regex

I'm having trouble with spacing before the numbers, brackets, parentheses, and quotes.

https://regex101.com/r/9IIYGX/1

(?<!^)(?=(?<![.\-'"([{])(?<![A-Z])[A-Z][\d+?]?)



(?<!^)          // negative look behind

(?=             // positive look ahead

(?<![.\-'"([{]) // ignore if starts with punctuation
(?<![A-Z])      // ignore if starts with double Uppercase letter
[A-Z]           // space after each Uppercase letter
[\d+]?          // space after number

)



Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots
Share:

No comments:

Recent