tokenizers.bpe
Byte Pair Encoding Text Tokenization
v0.1.4
·
Sep 5, 2025
·
MPL-2.0
Description
Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
Downloads
751
Last 30 days
4673rd
751
Last 90 days
751
Last year
CRAN Check Status
1
WARNING
2
NOTE
11
OK
Show all 14 flavors
| Flavor | Status |
|---|---|
| r-devel-linux-x86_64-debian-clang | WARNING |
| r-devel-linux-x86_64-debian-gcc | OK |
| r-devel-linux-x86_64-fedora-clang | OK |
| r-devel-linux-x86_64-fedora-gcc | OK |
| r-devel-macos-arm64 | OK |
| r-devel-windows-x86_64 | OK |
| r-oldrel-macos-arm64 | NOTE |
| r-oldrel-macos-x86_64 | NOTE |
| r-oldrel-windows-x86_64 | OK |
| r-patched-linux-x86_64 | OK |
| r-release-linux-x86_64 | OK |
| r-release-macos-arm64 | OK |
| r-release-macos-x86_64 | OK |
| r-release-windows-x86_64 | OK |
Check details (14 non-OK)
WARNING
r-devel-linux-x86_64-debian-clang
whether package can be installed
Found the following significant warnings: ./parallel_hashmap/phmap_base.h:1266:1: warning: 'is_always_equal' is deprecated: use 'std::allocator_traits::is_always_equal' instead [-Wdeprecated-declarations] See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/tokenizers.bpe.Rcheck/00install.out’ for details. * used C++ compiler: ‘Debian clang version 21.1.8 (3+b1)’
OK
r-devel-linux-x86_64-debian-gcc
*
OK
r-devel-linux-x86_64-fedora-clang
*
OK
r-devel-linux-x86_64-fedora-gcc
*
OK
r-devel-macos-arm64
*
OK
r-devel-windows-x86_64
*
NOTE
r-oldrel-macos-arm64
installed package size
installed size is 6.1Mb
sub-directories of 1Mb or more:
libs 5.2Mb
NOTE
r-oldrel-macos-x86_64
installed package size
installed size is 6.2Mb
sub-directories of 1Mb or more:
libs 5.3Mb
OK
r-oldrel-windows-x86_64
*
OK
r-patched-linux-x86_64
*
OK
r-release-linux-x86_64
*
OK
r-release-macos-arm64
*
OK
r-release-macos-x86_64
*
OK
r-release-windows-x86_64
*
Check History
WARNING 11 OK · 2 NOTE · 1 WARNING · 0 ERROR · 0 FAILURE Mar 9, 2026
WARNING
r-devel-linux-x86_64-debian-clang
whether package can be installed
Found the following significant warnings: ./parallel_hashmap/phmap_base.h:1266:1: warning: 'is_always_equal' is deprecated: use 'std::allocator_traits::is_always_equal' instead [-Wdeprecated-declarations] See ‘/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/tokenizers.bpe.Rcheck/00install.out’ for details. * used C++ compiler: ‘Debian clang version 21.1.8 (3+b1)’
NOTE
r-oldrel-macos-arm64
installed package size
installed size is 6.1Mb
sub-directories of 1Mb or more:
libs 5.2Mb
NOTE
r-oldrel-macos-x86_64
installed package size
installed size is 6.2Mb
sub-directories of 1Mb or more:
libs 5.3Mb
Reverse Dependencies (3)
suggests
Dependency Network
Version History
new
0.1.4
Mar 9, 2026