135 Stimmen

Alle akzentuierten Zeichen in einer Zeichenkette effizient ersetzen?

Für die Umsetzung eines armen Mannes von in der Nähe von -korrekte Sortierung auf der Client-Seite Ich brauche eine JavaScript-Funktion, die effizient Ersetzung einzelner Zeichen in einer Zeichenkette.

Hier ist, was ich meine (beachten Sie, dass dies für den deutschen Text gilt, andere Sprachen sortieren anders):

native sorting gets it wrong: a b c o u z ä ö ü
collation-correct would be:   a ä b c o ö u ü z

Im Grunde muss ich alle Vorkommen von "ä" in einer gegebenen Zeichenkette durch "a" (und so weiter) ersetzen. Auf diese Weise würde das Ergebnis der nativen Sortierung sehr nahe an dem liegen, was ein Benutzer erwarten würde (oder was eine Datenbank zurückgeben würde).

In anderen Sprachen ist dies möglich: Python-Lieferungen str.translate() , in Perl gibt es tr/…/…/ , XPath hat eine Funktion translate() , ColdFusion hat ReplaceList() . Aber was ist mit JavaScript?

Hier ist, was ich im Moment habe.

// s would be a rather short string (something like 
// 200 characters at max, most of the time much less)
function makeSortString(s) {
  var translate = {
    "ä": "a", "ö": "o", "ü": "u",
    "Ä": "A", "Ö": "O", "Ü": "U"   // probably more to come
  };
  var translate_re = /[öäüÖÄÜ]/g;
  return ( s.replace(translate_re, function(match) { 
    return translate[match]; 
  }) );
}

Zunächst einmal gefällt mir nicht, dass die Regex bei jedem Funktionsaufruf neu aufgebaut wird. Ich schätze, dass eine Schließung in dieser Hinsicht helfen kann, aber ich scheine aus irgendeinem Grund nicht den Dreh raus zu haben.

Fällt jemandem etwas Effizienteres ein?


Die folgenden Antworten fallen in zwei Kategorien:

  1. Funktionen zur Ersetzung von Zeichenketten mit unterschiedlichem Grad an Vollständigkeit und Effizienz (worum ich ursprünglich gebeten hatte)
  2. A späte Erwähnung von String#localeCompare die jetzt weitgehend unterstützt unter den JS-Engines (nicht so sehr zum Zeitpunkt der Frage) und könnte diese Art von Problem viel eleganter lösen.

12 Stimmen

Sie liegen falsch mit Ihrer Annahme, dass ein Benutzer erwartet, dass "ä" mit "a" sortiert wird. Das schwedische Alphabet hat 29 Buchstaben: abcdefghijklmnopqrstuvwxyzåäö und das dänische/norwegische auch: abcdefghijklmnopqrstuvwxyzæøå. Die erwartete Reihenfolge ist: "Apelsin", "Banan", "Äpple".

1 Stimmen

Ich weiß. Die Lösung war für die Sortierung deutscher Texte gedacht. Auch dort ist sie nicht richtig aber gut genug für den Anwendungsfall. Diese Frage war nie als Suche nach dem "löst alle Probleme"-Algorithmus gedacht.

1 Stimmen

Ich habe die Frage ein wenig umformuliert, um das von Anfang an klar zu machen.

0voto

eddyP23 Punkte 5464

Für die Jungs, die TypScript und diejenigen, die sich nicht mit String-Prototypen befassen wollen, finden hier eine Typescript-Version von Ed.'s Antwort :

    // Usage example:
    "Some string".replace(/[^a-zA-Z0-9-_]/g, char => ToLatinMap.get(char) || '')

    // Map:
    export let ToLatinMap: Map<string, string> = new Map<string, string>([
        ["Á", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["Â", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["Ä", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["À", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["Å", "A"],
        ["", "A"],
        ["", "A"],
        ["", "A"],
        ["Ã", "A"],
        ["", "AA"],
        ["Æ", "AE"],
        ["", "AE"],
        ["", "AE"],
        ["", "AO"],
        ["", "AU"],
        ["", "AV"],
        ["", "AV"],
        ["", "AY"],
        ["", "B"],
        ["", "B"],
        ["", "B"],
        ["", "B"],
        ["", "B"],
        ["", "B"],
        ["", "C"],
        ["", "C"],
        ["Ç", "C"],
        ["", "C"],
        ["", "C"],
        ["", "C"],
        ["", "C"],
        ["", "C"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "D"],
        ["", "DZ"],
        ["", "DZ"],
        ["É", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["Ê", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["Ë", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["È", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "E"],
        ["", "ET"],
        ["", "F"],
        ["", "F"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "G"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["", "H"],
        ["Í", "I"],
        ["", "I"],
        ["", "I"],
        ["Î", "I"],
        ["Ï", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["Ì", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["", "I"],
        ["", "D"],
        ["", "F"],
        ["", "G"],
        ["", "R"],
        ["", "S"],
        ["", "T"],
        ["", "IS"],
        ["", "J"],
        ["", "J"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "K"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "L"],
        ["", "LJ"],
        ["", "M"],
        ["", "M"],
        ["", "M"],
        ["", "M"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["", "N"],
        ["Ñ", "N"],
        ["", "NJ"],
        ["Ó", "O"],
        ["", "O"],
        ["", "O"],
        ["Ô", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["Ö", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["Ò", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["Ø", "O"],
        ["", "O"],
        ["Õ", "O"],
        ["", "O"],
        ["", "O"],
        ["", "O"],
        ["", "OI"],
        ["", "OO"],
        ["", "E"],
        ["", "O"],
        ["", "OU"],
        ["", "P"],
        ["", "P"],
        ["", "P"],
        ["", "P"],
        ["", "P"],
        ["", "P"],
        ["", "P"],
        ["", "Q"],
        ["", "Q"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "R"],
        ["", "C"],
        ["", "E"],
        ["", "S"],
        ["", "S"],
        ["Š", "S"],
        ["", "S"],
        ["", "S"],
        ["", "S"],
        ["", "S"],
        ["", "S"],
        ["", "S"],
        ["", "S"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "T"],
        ["", "A"],
        ["", "L"],
        ["", "M"],
        ["", "V"],
        ["", "TZ"],
        ["Ú", "U"],
        ["", "U"],
        ["", "U"],
        ["Û", "U"],
        ["", "U"],
        ["Ü", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["Ù", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "U"],
        ["", "V"],
        ["", "V"],
        ["", "V"],
        ["", "V"],
        ["", "VY"],
        ["", "W"],
        ["", "W"],
        ["", "W"],
        ["", "W"],
        ["", "W"],
        ["", "W"],
        ["", "W"],
        ["", "X"],
        ["", "X"],
        ["Ý", "Y"],
        ["", "Y"],
        ["Ÿ", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Y"],
        ["", "Z"],
        ["Ž", "Z"],
        ["", "Z"],
        ["", "Z"],
        ["", "Z"],
        ["", "Z"],
        ["", "Z"],
        ["", "Z"],
        ["", "Z"],
        ["", "IJ"],
        ["Œ", "OE"],
        ["", "A"],
        ["", "AE"],
        ["", "B"],
        ["", "B"],
        ["", "C"],
        ["", "D"],
        ["", "E"],
        ["", "F"],
        ["", "G"],
        ["", "G"],
        ["", "H"],
        ["", "I"],
        ["", "R"],
        ["", "J"],
        ["", "K"],
        ["", "L"],
        ["", "L"],
        ["", "M"],
        ["", "N"],
        ["", "O"],
        ["", "OE"],
        ["", "O"],
        ["", "OU"],
        ["", "P"],
        ["", "R"],
        ["", "N"],
        ["", "R"],
        ["", "S"],
        ["", "T"],
        ["", "E"],
        ["", "R"],
        ["", "U"],
        ["", "V"],
        ["", "W"],
        ["", "Y"],
        ["", "Z"],
        ["á", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["â", "a"],
        ["", "a"],
        ["", "a"],
       ["", "a"],
        ["", "a"],
        ["", "a"],
        ["ä", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["à", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["å", "a"],
        ["", "a"],
        ["", "a"],
        ["", "a"],
        ["ã", "a"],
        ["", "aa"],
        ["æ", "ae"],
        ["", "ae"],
        ["", "ae"],
        ["", "ao"],
        ["", "au"],
        ["", "av"],
        ["", "av"],
        ["", "ay"],
        ["", "b"],
        ["", "b"],
        ["", "b"],
        ["", "b"],
        ["", "b"],
        ["", "b"],
        ["", "b"],
        ["", "b"],
        ["", "o"],
        ["", "c"],
        ["", "c"],
        ["ç", "c"],
        ["", "c"],
        ["", "c"],
        ["", "c"],
        ["", "c"],
        ["", "c"],
        ["", "c"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "d"],
        ["", "i"],
        ["", "j"],
        ["", "j"],
        ["", "j"],
        ["", "dz"],
        ["", "dz"],
        ["é", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["ê", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["ë", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["è", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "e"],
        ["", "et"],
        ["", "f"],
        ["ƒ", "f"],
        ["", "f"],
        ["", "f"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "g"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "hv"],
        ["í", "i"],
        ["", "i"],
        ["", "i"],
        ["î", "i"],
        ["ï", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["ì", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["", "i"],
        ["", "d"],
        ["", "f"],
        ["", "g"],
        ["", "r"],
        ["", "s"],
        ["", "t"],
        ["", "is"],
        ["", "j"],
        ["", "j"],
        ["", "j"],
        ["", "j"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "k"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "l"],
        ["", "lj"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "m"],
        ["", "m"],
        ["", "m"],
        ["", "m"],
        ["", "m"],
        ["", "m"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["", "n"],
        ["ñ", "n"],
        ["", "nj"],
        ["ó", "o"],
        ["", "o"],
        ["", "o"],
        ["ô", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["ö", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["ò", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["ø", "o"],
        ["", "o"],
        ["õ", "o"],
        ["", "o"],
        ["", "o"],
        ["", "o"],
        ["", "oi"],
        ["", "oo"],
        ["", "e"],
        ["", "e"],
        ["", "o"],
        ["", "o"],
        ["", "ou"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "p"],
        ["", "q"],
        ["", "q"],
        ["", "q"],
        ["", "q"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "c"],
        ["", "c"],
        ["", "e"],
        ["", "r"],
        ["", "s"],
        ["", "s"],
        ["š", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "s"],
        ["", "g"],
        ["", "o"],
        ["", "o"],
        ["", "u"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "t"],
        ["", "th"],
        ["", "a"],
        ["", "ae"],
        ["", "e"],
        ["", "g"],
        ["", "h"],
        ["", "h"],
        ["", "h"],
        ["", "i"],
        ["", "k"],
        ["", "l"],
        ["", "m"],
        ["", "m"],
        ["", "oe"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "r"],
        ["", "t"],
        ["", "v"],
        ["", "w"],
        ["", "y"],
        ["", "tz"],
        ["ú", "u"],
        ["", "u"],
        ["", "u"],
        ["û", "u"],
        ["", "u"],
        ["ü", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["ù", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "u"],
        ["", "ue"],
        ["", "um"],
        ["", "v"],
        ["", "v"],
        ["", "v"],
        ["", "v"],
        ["", "v"],
        ["", "v"],
        ["", "v"],
        ["", "vy"],
        ["", "w"],
        ["", "w"],
        ["", "w"],
        ["", "w"],
        ["", "w"],
        ["", "w"],
        ["", "w"],
        ["", "w"],
        ["", "x"],
        ["", "x"],
        ["", "x"],
        ["ý", "y"],
        ["", "y"],
        ["ÿ", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "y"],
        ["", "z"]
        ["ž", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "z"],
        ["", "ff"],
        ["", "ffi"],
        ["", "ffl"],
        ["", "fi"],
        ["", "fl"],
        ["", "ij"],
        ["œ", "oe"],
        ["", "st"],
        ["", "a"],
        ["", "e"],
        ["", "i"],
        ["", "j"],
        ["", "o"],
        ["", "r"],
        ["", "u"],
        ["", "v"],
        ["", "x"],
    ]);

0 Stimmen

"Wie entfernt man alle Zeichen, die nicht in der Zuordnung enthalten sind" war nicht die Frage. Außerdem würde ich inzwischen wahrscheinlich Intl.Collator für diese Aufgabe - zum Zeitpunkt der Frage und in der Umgebung, für die ich es brauchte, war das keine Option.

0 Stimmen

Wie würden Sie die Intl.Collator nicht lateinische Buchstaben auf lateinische "Äquivalente" abbilden?

2 Stimmen

Ich würde es nicht tun. Die ursprüngliche Frage bezog sich auf Sortierung eine Liste von Zeichenketten korrekt in Bezug auf eine bestimmte Sprache. Verschiedene Sprachen sortieren Zeichenketten unterschiedlich, aber einfachen JS-Zeichenketten fehlt das nötige Wissen, um dies richtig zu tun. Die Zuordnung von akzentuierten Zeichen zu nicht akzentuierten Formen ist ein Workaround. Mit der Verfügbarkeit von nativer Sortierunterstützung wird die Zeichenzuordnung zu einem relativ nutzlosen Vorgang, da sie niemals die Korrektheit und Geschwindigkeit einer Sortierung auf Basis von Sortierreihen erreichen kann.

CodeJaeger.com

CodeJaeger ist eine Gemeinschaft für Programmierer, die täglich Hilfe erhalten..
Wir haben viele Inhalte, und Sie können auch Ihre eigenen Fragen stellen oder die Fragen anderer Leute lösen.

Powered by:

X