Skip to content

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.16 · Jan 30, 2026 · MPL-2.0

Description

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Downloads

6.1K

Last 30 days

1526th

6.1K

Last 90 days

6.1K

Last year

CRAN Check Status

3 NOTE
11 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 NOTE
r-oldrel-macos-x86_64 NOTE
r-oldrel-windows-x86_64 NOTE
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK
Check details (14 non-OK)
OK r-devel-linux-x86_64-debian-clang

*


            
OK r-devel-linux-x86_64-debian-gcc

*


            
OK r-devel-linux-x86_64-fedora-clang

*


            
OK r-devel-linux-x86_64-fedora-gcc

*


            
OK r-devel-macos-arm64

*


            
OK r-devel-windows-x86_64

*


            
NOTE r-oldrel-macos-arm64

installed package size

  installed size is 25.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       21.5Mb
NOTE r-oldrel-macos-x86_64

installed package size

  installed size is 26.8Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       22.9Mb
NOTE r-oldrel-windows-x86_64

installed package size

  installed size is  6.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs        2.5Mb
OK r-patched-linux-x86_64

*


            
OK r-release-linux-x86_64

*


            
OK r-release-macos-arm64

*


            
OK r-release-macos-x86_64

*


            
OK r-release-windows-x86_64

*


            

Check History

NOTE 11 OK · 3 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 9, 2026
NOTE r-oldrel-macos-arm64

installed package size

  installed size is 25.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       21.5Mb
NOTE r-oldrel-macos-x86_64

installed package size

  installed size is 26.8Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs       22.9Mb
NOTE r-oldrel-windows-x86_64

installed package size

  installed size is  6.5Mb
  sub-directories of 1Mb or more:
    dummydata   1.4Mb
    libs        2.5Mb

Reverse Dependencies (19)

Dependency Network

Dependencies Reverse dependencies Rcpp data.table Matrix MadanText MadanTextNetwork TextForecast cleanNLP corpustools finnsurveytext sumup tall BTM birddog doc2vec nametagger pseudobibeR text2vec textplot +4 more reverse deps udpipe

Version History

new 0.8.16 Mar 9, 2026