Skip to content

doc2vec

Distributed Representations of Sentences, Documents and Topics

v0.2.2 · Nov 27, 2025 · MIT + file LICENSE

Description

Learn vector representations of sentences, paragraphs or documents by using the 'Paragraph Vector' algorithms, namely the distributed bag of words ('PV-DBOW') and the distributed memory ('PV-DM') model. The techniques in the package are detailed in the paper "Distributed Representations of Sentences and Documents" by Mikolov et al. (2014), available at <doi:10.48550/arXiv.1405.4053>. The package also provides an implementation to cluster documents based on these embedding using a technique called top2vec. Top2vec finds clusters in text documents by combining techniques to embed documents and words and density-based clustering. It does this by embedding documents in the semantic space as defined by the 'doc2vec' algorithm. Next it maps these document embeddings to a lower-dimensional space using the 'Uniform Manifold Approximation and Projection' (UMAP) clustering algorithm and finds dense areas in that space using a 'Hierarchical Density-Based Clustering' technique (HDBSCAN). These dense areas are the topic clusters which can be represented by the corresponding topic vector which is an aggregate of the document embeddings of the documents which are part of that topic cluster. In the same semantic space similar words can be found which are representative of the topic. More details can be found in the paper 'Top2Vec: Distributed Representations of Topics' by D. Angelov available at <doi:10.48550/arXiv.2008.09470>.

Downloads

356

Last 30 days

9068th

608

Last 90 days

608

Last year

Trend: +41.3% (30d vs prior 30d)

CRAN Check Status

3 NOTE
11 OK
Show all 14 flavors
Flavor Status
r-devel-linux-x86_64-debian-clang OK
r-devel-linux-x86_64-debian-gcc OK
r-devel-linux-x86_64-fedora-clang OK
r-devel-linux-x86_64-fedora-gcc OK
r-devel-macos-arm64 OK
r-devel-windows-x86_64 OK
r-oldrel-macos-arm64 NOTE
r-oldrel-macos-x86_64 NOTE
r-oldrel-windows-x86_64 NOTE
r-patched-linux-x86_64 OK
r-release-linux-x86_64 OK
r-release-macos-arm64 OK
r-release-macos-x86_64 OK
r-release-windows-x86_64 OK
Check details (14 non-OK)
OK r-devel-linux-x86_64-debian-clang

*


            
OK r-devel-linux-x86_64-debian-gcc

*


            
OK r-devel-linux-x86_64-fedora-clang

*


            
OK r-devel-linux-x86_64-fedora-gcc

*


            
OK r-devel-macos-arm64

*


            
OK r-devel-windows-x86_64

*


            
NOTE r-oldrel-macos-arm64

installed package size

  installed size is  7.3Mb
  sub-directories of 1Mb or more:
    data   4.8Mb
    libs   2.3Mb
NOTE r-oldrel-macos-x86_64

installed package size

  installed size is  8.3Mb
  sub-directories of 1Mb or more:
    data   5.8Mb
    libs   2.4Mb
NOTE r-oldrel-windows-x86_64

installed package size

  installed size is  6.0Mb
  sub-directories of 1Mb or more:
    data   4.8Mb
    libs   1.1Mb
OK r-patched-linux-x86_64

*


            
OK r-release-linux-x86_64

*


            
OK r-release-macos-arm64

*


            
OK r-release-macos-x86_64

*


            
OK r-release-windows-x86_64

*


            

Check History

NOTE 11 OK · 3 NOTE · 0 WARNING · 0 ERROR · 0 FAILURE Mar 9, 2026
NOTE r-oldrel-macos-arm64

installed package size

  installed size is  7.3Mb
  sub-directories of 1Mb or more:
    data   4.8Mb
    libs   2.3Mb
NOTE r-oldrel-macos-x86_64

installed package size

  installed size is  8.3Mb
  sub-directories of 1Mb or more:
    data   5.8Mb
    libs   2.4Mb
NOTE r-oldrel-windows-x86_64

installed package size

  installed size is  6.0Mb
  sub-directories of 1Mb or more:
    data   4.8Mb
    libs   1.1Mb

Dependency Network

Dependencies Reverse dependencies Rcpp doc2vec

Version History

new 0.2.2 Mar 9, 2026