Here you can find the source of normalizeWhitespace(String text)
Parameter | Description |
---|---|
text | a parameter |
public static String normalizeWhitespace(String text)
//package com.java2s; /*/*from www . ja v a 2s . co m*/ * Copyright 2016 * Ubiquitous Knowledge Processing (UKP) Lab * Technische Universit?t Darmstadt * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ public class Main { /** * Translates multiple whitespace into single space character. If there is * at least one new line character chunk is replaced by single LF (Unix new * line) character. * * @param text * @return */ public static String normalizeWhitespace(String text) { text = text.replaceAll("(\r\n|\r)", "\n"); //remove multiple white spaces but keep new lines text = text.replaceAll("(?:(?![\n])\\s+)", " "); // or [\\s+&&[^\n])] //replace extra <br> (sometimes the paragraph contains <br><br>, //the first one will be use as new paragraph marker but the second //one must be removed) text = text.replaceAll("<br>", ""); // or [\\s+&&[^\n])] return text; } }