Sanskrit Computational Linguistics: January 2013

Linguistics, Psycholinguistics and Semantics

Language, in other words the storehouse of all human Knowledge is represented by words and meanings. Language by itself has an Ontological structure, Epistemological underpinnings and Grammar. Across languages, even though words /usages differ, the concept of meanings remain the same in respective communications. Yet the "Meanings" are understood by human beings based on Contextual, Relative, Tonal and Gestural basis. The dictionary meanings or 'as it is' meanings are taken rarely into consideration, thus human language is ambigious in one sense and flexible in other.

Computers on the other hand are hard-coded to go by the dictionary meanings. Thus teaching (programming) Computers to understand natural language (human language) has been the biggest challange haunting Scientists ever since the idea of Artificial Intelligence (AI) came into existance. In addition this has lead to the obvious question of "What is intelligence" from a Computation perspective. Defining intelligence precisely being impossible, this field of study has taken many shapes such as Computational Linguistics, Natural Language Processing and "Machine Learning" etc. Artificial Intelligence instead of being used as a blanket term, is now being used increasingly as "Analytics" in many critical applications.

Sanskrit being the oldest is also the most Scientific and Structured language. Sanskrit has many hidden Algorithms built into it as part of its vast scientific treatises, for analysing "Meanings" or "Word sense" from many perspectives since time immemorial. "It is perhaps our job to discover and convert the scientific methods inherent in Sanskrit into usable Computational models and Tools for Natural Language Processing rather than reinventing the wheel" - as some Scientists put it. This blog's purpose is to expose some of the hidden intricate tools and methodolgies used in Sanskrit for centuries to derive precise meanings of human language, to a larger audiance particularly Computational Linguists for futher study, analysis and deployment in Natural Language Processing.

In addition, Sanskrit even though being flexible as a human language, is the least ambigious as the structure of the language is precisely difined from a semantical and syntactical point of view. From a Psycholinguistic perspective this blog could also give us a glimpse of the advanced linguistic capabilities of our forefathers as well their highly disciplined approach towards the structure and usage.

Saturday, January 26, 2013

Linguistics in Sanskrit - 3 distinctive perspectives

In Sanskrit, research on linguistics existed since time immemorial. Analysis on the meanings of the Vedic statements are called Arthavada. Debates on the precise meanings of various statements were also existed time immemorial. In Sage Patanjali's Mahabhashyam - the first chapter Paspashanikam - starts with the discussions on what is Sound (word) and what is inherent in the sound (Artha - meaning) -it starts 'particularly when someone says "gau" (Cow) - what this sound represents?' - we can see that a clear overlapping of cognitive science and philosophy and epistemology - exists in these discussions - this is a generic feature of all Sanskrit scientific treatises

The picture below gives an elementary view of the 3 important schools of Sanskrit Linguistics or philosophy /epistemology with respect to analysis of "meanings" of words in a sentence. - which in Sanskrit is referred to as "Shaabda bodha". A level of abstraction of the words in a sentence and their relationship with each other. Thus the analysis becomes air-tight and definitive. Each of the 3 schools of analysis on the sentence meaning focus on each one of the primary block of the sentence - Verb, Subject and Object. The oldest school Vyakarana is focuses on Kriya (Action), Mimamsa which born as a science of sentence meanings for understanding Vedas focuses on Kriya (Purpose) and Nyaya the epistemological system focuses on Karta (Actor). Each has its own merits in interpreting different kinds of treatises and linguists use all the 3 to understand even when there is a minute difference. Debates between these 3 schools were scientific and tread in the lines of hair-splitting arguments. - CGK

Friday, January 18, 2013

Panini - Sanskrit Linguist (Grammarian) could have lived 4000 years back

There were great Vaiyaakaranaas (not just grammarians but Linguists) before and after Sage Panini. Sage Panini himself refers about 16 Vaiyaakaranaas (linguists) in his book Ashtadyayi (some are also referred by Sage Yaska the etymologist who lived before Sage Panini). Sage Panini borrowed some of their rules to build Ashtadyayi - The greatest linguistic canon in existence. There were sure other Vaiyaakaranaas whose works are lost and also Sage Panini hadn't referred /used in Ashtadyayi. Names of these linguists who were referred by Sage Panini (partial) are:

Apishaali, Audumbaraayana, Chakravarma, Gaargya, Galava, Kaasakritsna, Kasyapa, Paushkarasaadi, Shaakalya, Shaakataayana, Shaunaka, Sphotaayana, Vaarshayani, Vaarthaaksha, Vaajapyaayana, Vyaadi, and the Etymologist Yaska

Can we say that all those 16 Vaiyaakaranaas (linguists) who Sage Panini referred were neighbors of Sage Panini and were living in the same time? - It would be silly to say like that - but some western scholars and so called "Indian rationalists" say that or mean that in an indirect way.

First, western Indologists have fixed the time of Sage Panini to 2500 years back or around 500 BC. (The rationale behind fixing this timeframe is not properly established). This date fixing was done during 19th Century during British rule with very limited data and very little understanding of Sanskrit. Because Buddha conveyed his message in Paali the colloquial dialect of Eastern India spoken that time – Paali was chosen, so that the message not only reaches the educated elite (Sanskrit scholars) but also the uneducated masses - thus it is very evident that the widespread scholarly language used at that time was Sanskrit. If so, then it must be much older than Buddha and a scholarly language must have a tight grammar – thus the Grammar of Sanskrit must be much older. In my view Sage Patanjali and his linguistic cannon Mahabhashyam must have existed before Buddha’s /Mahavira’s time – this is evident from the fact that Jaina texts of Mahavira and Parswanatha discussions didn’t have any non-Paniniya usage (apaniniya prayoga) where as the Ramayana and Mahabharata and many puranas have many non-Paniniya usage.

Secondly some Indologists keep writing that Sage Panini invented Sanskrit language, etc. without any basis or research. Ashtadyayi, the linguistic canon written by Sage Panini was descriptive and not Prescriptive in those days. - only after the days of Buddha when scholars embraced Buddhism and started writing in Paali it has become Prescriptive - so it is unwise to say Sage Panini Structured the language etc. - the structure (grammar) was existing before - Sage Panini structured the Grammar Rules in an easy-to-read manner in a small book having 4000 formulas (3959 to be precise). In those days Ashtadyayi was much easier in comparison with other grammar texts or Pratishakyam (vedic grammar) texts.

Thirdly some argue that Sanskrit wasn’t a spoken language Sage Patanjali’s Mahabhashyam explains how the usage of Sanskrit was in various regions. He highlights the differences of same verb /noun usage with different meanings in different parts of Ancient India.

Those 16 Vaiyaakaranaas (linguists) who Sage Panini referred must have lived at least 100s of years before Sage Panini if not more. Because since we are reading the texts of Sage Panini now - after 2500 years (this timeframe is again as per western Indologists). So it could be possible that Sage Panini was reading the texts of earlier Vaiyaakaranaas (linguists) who lived 1000 years before Sage Panini. More over the works of earlier linguists were spread in many volumes and also they were having regional grammatical flavors and possibly some outdated usages of Sanskrit. Finally to provide an easy way of understanding the structure of the language, and instead of having to refer many works, Sage Panini wrote a treatise in which all the rules of the language were codified in a simple manner - thus born Ashtadyayi.

Most importantly those 16 Vaiyaakaranaas (linguists) and their schools referred by Sage Panini were different from the "Nava-Vyakarana" (9 grammatical traditions) - referred in Valmiki Ramayana (Sri. Hanumaan is a Navavyakaranavettaa - a scholar of all the nine grammar schools). (The 9 grammar schools are Aindra, Kaumaara, Shaakta, Saaraswata, Chandra, Soorya, Braahma, etc.). Some of the Indian scholars themselves confuse between the 9 Vyakarana schools (which are Devataa or God’s schools) and the pre-Paninian 16 Vyakarana schools, which are the grammar traditions of various regions /various times of Ancient Bharata (India) and not that of Devataa – both these 2 groups are different.

After Sage Panini, Sage Katyayana in 300 BC (this timeframe is again as per western Indological theories) added 23,000 new words - in linguistics parlance these many words take over 100s of years to get added to the language - provided the language has in-built word generation capabilities - Morphological capabilities. Sage Katyayana also added few missing rules to Ashtadyayi as the language and its usage has transformed from the time of Sage Panini - this itself proves that there is a long gap between these 2 linguists.

Later in 200 BC (this timeframe is again as per western Indologists) Sage Patanjali in his explanation treatise of Ashtadyayi called Mahabhashyam added another 28,000 new words due to the usage patterns and transformation of the language - this proves that a]. Sanskrit was widely used, b]. there existed a long gap between the times of Sage Panini and Sage Patanjali. These facts are known to Sanskrit scholars of Vyakarana - it is a pity that still many choose to tread the lines of western indological theories either because of no point in fighting with people who do surface level research and fix timeframe for Sanskrit or out of indifference. Which ever way this is injustice to the language and to our forefathers. I'm not writing this so that we all can feel proud that the language is much older, than what it was thought of, but to do justice to this great language. No point in simply talking about Sanskrit without putting it to use. We have responsibility to learn Sanskrit deeply and unlock the secrets hidden in millions of Sanskrit scientific treatises - still many of then are in Palm-leaf /wooden Manuscript forms.

Great Vaiyaakaranaas (linguists) like Bartrhari, Battoji Dikshita, Narayana Battathiri, Kaunta Bhatta, Nagesa Bhatta are Post Panini/Katyayana /Patanjali – just to quote few names. Each one of these and many other great Linguists have contributed many things to the Sanskrit linguistic science. Eg:. Semantics, Psycholinguistics, Neuro-Linguistics, etc. were dealt in detail in 5th Century AD itself by Sage Bartrhari in his work Vakyapadiyam.

Since Vyakarana (grammar) is a Vedanga (part of Veda) like Veda and the Sanskrit language, Vyakarana is also Anaadi (time immemorial). So when we talk /quote about Sanskrit Language we need to keep all these in Mind. Some myopic views do exist that Sanskrit was born in 1500 BC and not before, etc. We as learned should know how to brush aside the untruth.

Thus with all these we can assume that Sage Panini could have lived before 4000 years back, not later – After the period of rebuilding of the Vedic civilizations during the start of Kali yuga and after the deluge due to which the Dwaraka City /state submerged in the ocean – 5114 years back. These dates are debated in Indian Science Congress and some are proven (accepted by majority of scientists) based on planetary positions and astronomical calendar systems. – some info http://en.wikipedia.org/wiki/Kurukshetra_War and http://articles.timesofindia.indiatimes.com/2007-03-10/special-report/27883505_1_mahabharata-ramayana-epics ; on Dwaraka http://www.youtube.com/watch?v=zeDMSXOhDbY

- CGK

Sunday, January 13, 2013

Disruptive Nature of Technology

The idea that IT disrupts only the others is wrong – the biggest victim (or beneficiary) is the IT industry itself – why? Read on... In the beginning of the millennium along with Dot-com hype people were making a hue-cry about convergence of TMT (Tech /Media /Telecom) or ICT- money flowed and the whole thing disappeared from the limelight - does it - NO. It is really happening now - AppleTV, Smart TV (Samsung), Amazon TV, Googleplay /Cube /TV, 3D TV, all these are indeed proofs that it wasn't a hype. Rather it has gone one step above by including games, cloud, education, user content (YouTube) and social networking - which weren't part of the original ICT. With technology, these giants have overtaken the old giants (the mainstream media).

I meant the mainstream media is quoting the opinions and views from the Electronic media and increasingly depending on the facebooks and twitters to get real pulse of the masses. The power of Electronic media very evident in the recent American elections, Anti-corruption protests in India, Wiki-leaks, Occupy wall st. movement, Arab Spring, Uprising for justice on New Delhi rape incident, etc. All originated in the Electronic media – which mainstream media just echoed.

Similarly some time back Web 2.0 Technologies were making noise - at least in the tech community people are aware of that - was it just a hype, No, certainly not. Facebook, Twitter, Wikipedia, Google (all services including the original search /email are in Web 2) Amazon (all services including the original ecommerce of books are in Web 2 ) - Now these companies are occupying the mind space in consumers’ minds and not traditional ones. Similarly Apple very quickly transformed itself and not just adopted Web 2 concepts (embraced the concepts not the technologies per-say) and innovated on those.

Now with this background, the subject "disruptive nature of tech" we'll look at - Currently who are stirring the waters are (some are known and some not) - ARM and Nvidia - on the CPU front, Samsung on the larger convergence space; Google on the Tech space; Eclipse not just on the VDE platforms also on the Open-Source biz applications; WolframAlpha on the Web 3 Search space; Chromebook on the Laptop market; Ubuntu on the Consumer Linux; Tizen on the Tablet OS space; and new technological innovations in 4D-Optical storages, Speech recognition and Machine translation and most importantly - Semantic Web /Web 3.0 technologies – Sanskrit Computational Linguistics can play a major part here.

Samsung - is using a larger convergence model - TV (SmartTVs), Smartphones, Tablet, Laptops, Game console (on the cards), Chromebook, Web connected Digi-Cams, (Hardware) and on with (Software) Tizen (alternative to Android), TouchWiz, Samsung-cloud, etc.

Amazon - it is really amazing as how this company showing losses since inception except only the past few years - is able to take on Google and Apple? And that too the transformation from selling books to now Technology Eco-system powerhouse is Amazon oops amazing

Apple - to penetrate into larger mass market - planning cheaper iPhone and iPad. The best strategy that could alter the landscape further - Social web is the missing link.

Sony - has everything in its disposal - Sony Pictures, Music, Game consoles, Phones, Tablets, TVs, Laptops, Cameras, and what not?... yet is in a catch-up game for the past few years with respect to key technologies. Except Blu-ray no substantial launch. Lost the top spot to Samsung in consumer electronics space in some countries – lack of foothold in the Software space could prove to be a setback.

Google - This technology powerhouse has the capacity to do many things - but I wish more things are done - Integration of Android and Chrome OS (Chromebook), Orkut and Google plus both Social webs aren't fully integrated, downloadable and locally usable Google Docs are some.

Microsoft - except Kinect none of the recent launches has really made an impact with the masses, yet the formidable combination of MSN, Zune, Skydrive, Surface, Outlook, WindowsPhone, Windows 8, etc. - collectively as an eco-system packs a strong punch.

IBM and Oracle, SAP, CA, HP, Dell and the other giants are focusing on the Enterprise application space or Information Services space and not participating on the consumer ICT world - however the enterprise world and consumer world are actually 2 sides of the same coin. The same user who uses the iOS /Android in the so called mass IT (market) is the one who uses Blackberry in the so called enterprise IT (market). Ease of use /experience of comfort, dictates the winner in the long run. Microsoft till WindowsNT wasn’t a big force in the enterprise IT market. Others like Facebook is fully focussed only on Consumer. The ideal is to be present in Consumer ICT /TMT experience and on Information services /applications wrt. Enterprises.

Finally the David(s) who is standing in front is the – Open-Source Community – the one who has capability to disrupt everything in the Technology world. Beware - not just in Software, Open-Source is now into everything that touches R&D - New Drug Discovery, Solar photovoltaic technology, Alternative energy technologies, Education (KHAN Academy), Education tools (Moodle), Knowledge (Wikis, Developer works), Laptops (VIA Openbook), etc. Open-Source will eventually force all spheres of IT into commoditization.

Growing ethical investment community and the Green money is flowing towards this direction. Remember in the browser war (IE vs Netscape) the final victory is achieved by Open-Source /Free products (Firefox, Chrome, Opera, Android). Similarly in the enterprise Server OS category Linux is increasing its market share as is with Smartphone OS - Android. Eclipse (IDE), Wikis, MySQL (Database), Joomla (CMS), Apache - Powering 100 million+ websites, Hadoop (Apache) - Big Data /Data Mining, Ubuntu - Consumer Linux, Genome - GUI, etc. are few of the examples of the disruptive nature of the Open-Source technology. If one notices Java and Google were actually born in the Open-Source cradles.

It is also evident that if the corporations want to survive long, then they need to have an Open-Source program - IBM with Linux & Apache Derby and Google with Android, Adobe with Apache Flex, etc.

It is really amazing that the same industry which rose to heights and responsible in some way for the economic inequalities in the world is the one correcting itself, though slowly - the best possible Social responsibility. IT industry has the reputation of bringing out maximum number of Entrepreneurs, Innovators and Social entrepreneurs who are into Alternative Energy, Education, etc. Like everything Alternative – Education, Food, Energy, Economy, Open-Source is the alternative of mainstream IT. “Open-Source” and “Native Language Computing” (Sanskrit plays a major role here) are the two main pathways to bridge the digital divide as it can make Technology “Affordable” and increase the “Reach”.

Saturday, January 12, 2013

Scientific method - flaws

Some flaws in the so called "Scientific method" of Research

The Scientific method used in Research today as described by Scientists consists of the following 4 iterations.

1. Question - Framing the Question
2. Hypothesis - A proposal based on reason suggesting a possible correlation between or among a set of phenomena (more than one hypothesis is expected but seldom given)
3. Prediction - The logical consequences of the hypothesis
4. Experiment - Only when one can't design an experiment which can disprove the hypothesis the hypothesis stays and becomes the conclusion (answer) to the question. (this is like proving the opposite!)

The scientific method is iterative or supposed to be iterative. But prior to this, what matters is that in the above 4 items essentially have to deal with the 'What', 'Why' and then later comes 'How' - so we question first "the Premise" - the most important starting point of any Scientific method. Note that every scientific theory starts with a premise. It is seldom asked on what basis the "Premise" is chosen for a particular theory

1. Language - what kind of Scientific language - arithmetic, symbols, algebra, FOPL, calculus or simply Natural Language (susceptible to has ambiguity)
2. Ontology - Type of Classification that is and the starting point - where do you stand - with respect to your question - are you in agreement with Newtonian ontology - which is primarily based on Material world and on Reductionism or Einsteinian - which is causality or Quantum theory which is on Probability
3. Epistemology - Logic of logic - when a hypothesis is made, what are the logical guidelines the hypothesis is adhering to and why such a logic is chosen instead of another
4. Computation - The scale - what is the purpose and the method of computing, also the parameters - this will reveal the core purpose of the hypothesis the corresponding experiment and their relationship - what is trying to be concluded (least for now)
5. Finally the big question - "Is conclusion possible or necessary?" the popular opinion is that Scientists seek conclusion but that's not true, not all Scientists are rushing to conclude - prevalent practice nowadays is that a view is given - which media takes and interprets as conclusion

Sanskrit Computational Linguistics

Linguistics, Psycholinguistics and Semantics

Saturday, January 26, 2013

Linguistics in Sanskrit - 3 distinctive perspectives

Friday, January 18, 2013

Panini - Sanskrit Linguist (Grammarian) could have lived 4000 years back

Sunday, January 13, 2013

Disruptive Nature of Technology

Saturday, January 12, 2013

Scientific method - flaws

Prof. L. Kumaraswamy

Views since Apr-2013

Translate

Prof. Shrinivasa Varakhedi

Sanskrit Wikipedia