Contributing our own creativity (in the form of text, image, audio, and video) to the pool of online information is fast becoming an essential part of online experience. However, it is still an open question as to how we, as authors, can control the way that the information we create is distributed or re-used.
Rights management problems are serious for text since it is particularly easy for other people to download and manipulate copyrighted text from the Internet and later re-use it free from control. There is a need for a rights protection system that "travels with the content". Digital watermarking is a mechanism that embeds the copyright information in the document. besides traveling with the content of the documents, digital watermarks can also be imperceptible to the user, which makes the process of removing them from the document challenging.
The goal of this thesis is to design practical and resilient natural language watermarking systems. I have designed and implemented several natural language watermarking algorithms that use the linguistic features of the cover text in order to embed information. Using linguistic features provides resilience through making the message an elemental part of the content of the text, and through the judicious use of ambiguity in the usage of natural language and richness of features of natural language constituents. In this thesis, I propose several practical and resilient natural language watermarking systems for a variety of genres of text (short, long, edited and cursory text) and analyze their resilience and feasibility.
2. Background on Information Hiding and Natural Language Processing
3. Lexical Natural Language Watermarking: Equmark and MarkErr
4. Sentence Level Natural Watermarking: Enigmark
5. Improving Stealthiness by Adaptive Embedding
6. Applications of Information Hiding to Private Communication and Defense against Phishing
7. Previous Work in Information Hiding into Natural Language Text