Skip to content

textclean

Text Cleaning Tools

v0.9.7 · Mar 4, 2026 · GPL-2

Description

Tools to clean and process text. Tools are geared at checking for substrings that are not optimal for analysis and replacing or removing them (normalizing) with more analysis friendly substrings (see Sproat, Black, Chen, Kumar, Ostendorf, & Richards (2001) <doi:10.1006/csla.2001.0169>) or extracting them into new variables. For example, emoticons are often used in text but not always easily handled by analysis algorithms. The replace_emoticon() function replaces emoticons with word equivalents.

Downloads

6.2K

Last 30 days

1512th

6.2K

Last 90 days

6.2K

Last year

CRAN Check Status

2 NOTE
12 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 OK
r-oldrel-macos-x86_64 OK
r-oldrel-windows-x86_64 OK
r-patched-linux-x86_64 NOTE
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 NOTE
Check details (14 non-OK)
OK r-devel-linux-x86_64-debian-clang

*


            
OK r-devel-linux-x86_64-debian-gcc

*


            
OK r-devel-linux-x86_64-fedora-clang

*


            
OK r-devel-linux-x86_64-fedora-gcc

*


            
OK r-devel-macos-arm64

*


            
OK r-devel-windows-x86_64

*


            
OK r-oldrel-macos-arm64

*


            
OK r-oldrel-macos-x86_64

*


            
OK r-oldrel-windows-x86_64

*


            
NOTE r-patched-linux-x86_64

Rd files

checkRd: (-1) check_text.Rd:31: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:32: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:33: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:34: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:35: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:36: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:37: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:38: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:39: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:40: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:41: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:42: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:43: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:44: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:45: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:46: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:47: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:48: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:49: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:50: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:51: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:52: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:53: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) replace_html.Rd:12: Lost braces
    12 | \item{symbol}{logical.  If code{TRUE} the symbols are retained with appropriate
       |                                ^
OK r-release-linux-x86_64

*


            
OK r-release-macos-arm64

*


            
OK r-release-macos-x86_64

*


            
NOTE r-release-windows-x86_64

Rd files

checkRd: (-1) check_text.Rd:31: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:32: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:33: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:34: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:35: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:36: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:37: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:38: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:39: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:40: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:41: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:42: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:43: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:44: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:45: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:46: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:47: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:48: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:49: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:50: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:51: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:52: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:53: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) replace_html.Rd:12: Lost braces
    12 | \item{symbol}{logical.  If code{TRUE} the symbols are retained with appropriate
       |                                ^

Check History

NOTE 12 OK · 2 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 9, 2026
NOTE r-patched-linux-x86_64

Rd files

checkRd: (-1) check_text.Rd:31: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:32: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:33: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:34: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:35: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:36: Lost braces i
NOTE r-release-windows-x86_64

Rd files

checkRd: (-1) check_text.Rd:31: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:32: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:33: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:34: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:35: Lost braces in \itemize; \value handles \item{}{} directly
checkRd: (-1) check_text.Rd:36: Lost braces i

Reverse Dependencies (8)

suggests

Dependency Network

Dependencies Reverse dependencies data.table english glue lexicon (>= 1.0.0) mgsub qdapRegex stringi textshape(>= 1.0.1) NUSS SemanticDistance sentimentr spell.replacer text2emotion textstem upstartr LilRhino textclean

Version History

new 0.9.7 Mar 9, 2026