Here you can find the source of stripAccents(final String input)
Removes diacritics (~= accents) from a string.
Parameter | Description |
---|---|
input | String to be stripped |
static String stripAccents(final String input)
//package com.java2s; /*// w w w. jav a 2 s . c o m * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.text.Normalizer; import java.util.regex.Pattern; public class Main { private static final String EMPTY = ""; /** * <p>Removes diacritics (~= accents) from a string. The case will not be altered.</p> * <p>For instance, 'à' will be replaced by 'a'.</p> * <p>Note that ligatures will be left as is.</p> * * <pre> * StringUtils.stripAccents(null) = null * StringUtils.stripAccents("") = "" * StringUtils.stripAccents("control") = "control" * StringUtils.stripAccents("éclair") = "eclair" * </pre> * * @param input String to be stripped * @return input text with diacritics removed * * @since 3.0 */ // See also Lucene's ASCIIFoldingFilter (Lucene 2.9) that replaces accented characters by their unaccented equivalent (and uncommitted bug fix: https://issues.apache.org/jira/browse/LUCENE-1343?focusedCommentId=12858907&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12858907). static String stripAccents(final String input) { if (input == null) { return null; } final Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");//$NON-NLS-1$ final StringBuilder decomposed = new StringBuilder(Normalizer.normalize(input, Normalizer.Form.NFD)); convertRemainingAccentCharacters(decomposed); // Note that this doesn't correctly remove ligatures... return pattern.matcher(decomposed).replaceAll(EMPTY); } private static void convertRemainingAccentCharacters(StringBuilder decomposed) { for (int i = 0; i < decomposed.length(); i++) { if (decomposed.charAt(i) == '\u0141') { decomposed.deleteCharAt(i); decomposed.insert(i, 'L'); } else if (decomposed.charAt(i) == '\u0142') { decomposed.deleteCharAt(i); decomposed.insert(i, 'l'); } } } }