如何将 std :: string 转换为小写？

我想将std::string转换为小写。我知道函数tolower() ，但是在过去我对此函数有问题，无论如何它都不是理想的，因为与std::string将需要遍历每个字符。

有没有一种替代方法可以 100％地起作用？

#include <algorithm>
#include <cctype>
#include <string>

std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
    [](unsigned char c){ return std::tolower(c); });

如果没有遍历每个角色，您真的不会逃脱。否则无法知道字符是小写还是大写。

如果您真的讨厌tolower() ，这是一个专用的纯 ASCII 替代方案，我不建议您使用：

char asciitolower(char in) {
    if (in <= 'Z' && in >= 'A')
        return in - ('Z' - 'z');
    return in;
}

std::transform(data.begin(), data.end(), data.begin(), asciitolower);

请注意， tolower()只能执行单字节字符替换，这不适用于许多脚本，尤其是在使用像 UTF-8 这样的多字节编码的情况下。

Boost 为此提供了一个字符串算法：

#include <boost/algorithm/string.hpp>

std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str

或者，对于非就地：

#include <boost/algorithm/string.hpp>

const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);

tl; dr

使用ICU 库。

首先，您必须回答一个问题： std::string的编码是什么？是 ISO-8859-1 吗？也许是 ISO-8859-8？还是 Windows Codepage 1252？ 您使用什么将大写转换为小写吗？ （或者它对于0x7f以上的字符会失败吗？）

如果您使用std::string作为容器的 UTF-8（8 位编码中唯一的明智选择），则您已经在欺骗自己，认为自己仍然可以控制一切，因为您正在存储多字节字符不了解多字节概念的容器中的序列。甚至像.substr()这样简单的事情也是一个.substr() 。（因为拆分多字节序列将导致无效（子）字符串。）

而且，如果您尝试使用任何编码的std::toupper( 'ß' )类的东西，都会遇到麻烦。（因为用标准库根本不可能做到 “正确”，因为标准库只能传递一个结果字符，而不是这里需要的"SS" 。）[1] 另一个示例是std::tolower( 'I' ) ，这会因地区而异 。在德国， 'i'是正确的；在土耳其， 'ı' （拉丁文小写字母 I）是预期的结果（同样，以 UTF-8 编码超过一个字节）。

然后有一点要说的是，标准库取决于您的软件在其上运行的计算机上支持的语言环境... 如果没有，您将怎么办？

因此，您真正要寻找的是一个能够正确处理所有这些问题的字符串类， 并且不是任何std::basic_string<>变体 。

（C ++ 11 注意： std::u16string和std::u32string 更好，但仍不完美。C++ 20 带来了std::u8string ，但所有这些操作都指定了编码。在许多其他方面，它们仍然保留对 Unicode 机制一无所知，例如标准化，整理，...）

虽然 Boost 看起来不错，但 API 明智，Boost.Locale 基本上是ICU的包装器。如果 Boost 是在 ICU 支持下编译的 ... 如果不是，则 Boost.Locale 限于为标准库编译的语言环境支持。

相信我，有时候让 Boost 与 ICU 一起编译可能是一件很痛苦的事情。（没有针对 Windows 的预编译二进制文件，因此您必须将它们与应用程序一起提供，这将打开一堆全新的蠕虫病毒……）

因此，我个人建议直接从马口中获得完整的 Unicode 支持，并直接使用ICU库：

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    char const * someString = "Eidenges\xe4\xdf";
    icu::UnicodeString someUString( someString, "ISO-8859-1" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale.
    std::cout << someUString.toLower( "de_DE" ) << "\n";
    std::cout << someUString.toUpper( "de_DE" ) << "\n";
    return 0;
}

编译（在此示例中为 G ++）：

g++ -Wall example.cpp -licuuc -licuio

这给出：

eidengesäß
EIDENGESÄSS

[1] 在 2017 年，德国拼字法委员会裁定可以正式使用 “ẞ” U + 1E9E 拉丁大写字母 SHARP S，作为传统的 “SS” 转换旁边的一种选择，以避免歧义，例如在护照中（姓名大写））。我美丽的例子，由于委员会的决定而过时了。

协慌网

如何将 std :: string 转换为小写？

答案