# -------------------------------------------- # CITATION file created with {cffr} R package # See also: https://docs.ropensci.org/cffr/ # -------------------------------------------- cff-version: 1.2.0 message: 'To cite package "AhoCorasickTrie" in publications use:' type: software license: Apache-2.0 title: 'AhoCorasickTrie: Fast Searching for Multiple Keywords in Multiple Texts' version: 0.1.2 doi: 10.32614/CRAN.package.AhoCorasickTrie abstract: Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported. authors: - family-names: Chambers given-names: Matt email: matt.chambers42@gmail.com - family-names: Petricek given-names: Tomas repository: https://chambm.r-universe.dev repository-code: https://github.com/chambm/AhoCorasickTrie commit: 1181d49f57480c8c7bf88125fbdf2b446f342ebc url: https://github.com/chambm/AhoCorasickTrie contact: - family-names: Chambers given-names: Matt email: matt.chambers42@gmail.com