Spanish linguistic stenography based on N-gram model and Zipf law

Authors

  • Alfonso Muñoz Muñoz Universidad Politécnica de Madrid
  • Irina Argüelles Álvarez Universidad Politécnica de Madrid

DOI:

https://doi.org/10.3989/arbor.2014.768n4014

Keywords:

Linguistic steganography, automatic generation of stegotexts, N-gram, algorithm

Abstract


Linguistic Steganography is a science that utilises computational linguistics to design systems that can be used to protect and ensure the privacy of digital communications and for the digital marking of texts. Various proposed ways of achieving this goal have been documented in recent years. This paper analyses the possibility of generating natural language texts in Spanish that conceal information automatically. A number of hypotheses are put forward and tested using an algorithm. Experimental evidence suggests that it is feasible to use N-gram models and specific features of the Zipf law to generate stegotexts with a good linguistic quality where human readers could not differentiate the stegotext from authentic texts. The stegotexts obtained allow the concealment of at least 0.5 bits per word generated.

Downloads

Download data is not yet available.

References

Bergmair, R. (2007). A comprehensive bibliography of linguistic steganography. Proceedings of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents. http://dx.doi.org/10.1117/12.711325

Blasco, J., Hernández-Castro, J., Tapiador, J. y Ribagorda, A. (2008). Csteg: Talking in C code. Proceedings of SECRYPT International Conference, pp. 399–406.

Chand, V. y Orgun, C. (2006). Exploiting linguistic features in lexical steganography: design and proofof-concept implementation. Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS '06), 6, p. 126b. http://dx.doi.org/10.1109/HICSS.2006.175

Chapman, M. y Davida, G. (1997). Hiding the hidden: A software system for concealing ciphertext as innocuous text. Proceedings of the International Conference on Information and Communication Security. Lecture Notes in Computer Sciences, 1334. http://dx.doi.org/10.1007/BFb0028489

Chapman, M., Davida, G. y Rennhard, M. (2001). A practical and effective approach to large-scale automated linguistic steganography. ISC '01 Proceedings of the 4th International Conference on Information Security, pp. 156-165.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.

Cox, I., Miller, M., Bloom, J., Fridrich, J. y Kalker, T. (2007). Digital Watermarking and Steganography (2ª ed.). San Francisco: Morgan Kaufmann Publishers.

Dai, W., Yu, Y. y Deng, B. (2009). BinText steganography based on Markov state transferring probability. ICIS '09 Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, New York: ACM, pp. 1306-1311.

Grothoff, C., Grothoff, K., Alkhutova, L., Stutsman, R. y Atallah, M. (2005). Translation-Based Steganography. Computer Science Information Hiding. Lecture Notes in Computer Science, 3727, pp. 219-233. http://dx.doi.org/10.1007/11558859_17

Kahn, D. (1996). The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet. New York: Scribner.

Kerckhoffs, A. (1883). La cryptographie militaire. Journal des sciences militaires, IX, pp. 5-38 y 161-191.

Meng, P., Hang, L., Yang, W., Chen, Z. y Zheng, H. (2009). Linguistic Steganography Detection Algorithm Using Statistical¡ Language Model. International Conference on Information Technology and Computer Science 2009 (ITCS 2009), pp. 540-543. http://dx.doi.org/10.1109/ITCS.2009.246

Meng, P., Liusheng, H., Zhili, C., Yuchong, H. y Yang, W. (2010). STBS: A Statistical Algorithm for Steganalysis of Translation-Based Steganography. Lecture Notes in Computer Science, 6387, pp. 208-220. http://dx.doi.org/10.1007/978-3-642-16435-4_16

Muñoz, A., Argüelles, I. y Carracedo, J. (2009). Modificaciones sintácticas en lengua espa-ola con utilidad en esteganografía lingüistica. Revista Electrónica de Lingüistica Aplicada, 8, pp. 229-247.

Muñoz, A. y Argüelles, I. (2012). Modificaciones sintácticas basadas en la reordenación de complementos del verbo con utilidad en esteganografía lingüistica. Revista Electrónica de Lingüistica Aplicada, 10, pp. 31-54.

Shu-feng, W. y Huang, L. (2003). Research on Information Hiding. Degree of master, University of Science and Technology of China.

Topkara, M. Topkara, U. y Atallah, M. (2007). Information Hiding through Errors: A Confusing Approach. Proceedings of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, 29. http://dx.doi.org/10.1117/12.706980

Wayner, P. (1992). Mimic functions. Cryptologia, XVI, pp. 193–214. http://dx.doi.org/10.1080/0161-119291866883

Wayner, P. (1995). Strong theoretical steganography. Cryptologia, XIX, pp. 285–299. http://dx.doi.org/10.1080/0161-119591883962 http://dx.doi.org/10.1080/0161-119591883962

Zuxu, D., Fan, H., Muxiang, Y. y Guohua, C. (2007). Text Information Hiding Based on Part of Speech Grammar. Proceedings of the 2007 International Conference on Computational Intelligence and Security Workshops (CISW 2007), pp. 632-635. http://dx.doi.org/10.1109/CISW.2007.4425575

Published

2014-08-30

How to Cite

Muñoz Muñoz, A., & Álvarez, I. A. (2014). Spanish linguistic stenography based on N-gram model and Zipf law. Arbor, 190(768), a160. https://doi.org/10.3989/arbor.2014.768n4014

Issue

Section

Varia