Given that the CZ corpus is now available it is time to get acquainted with it. If CZ had been a book I would probably have started on the first page up to the time the discovery was made that most can be read in a random order. tOKo does this a little quicker, it reads the whole of CZ in seconds and can summarise the results just as quickly, applying techniques from Statistical Natural Language Processing (NLP). The simplest example is to count the words, ignoring stop words, and ordering them by frequency (see figure on the right).
Of all the other corpora loaded into tOKo so far a domain term was also the most frequent. Although counting words is rather crude, the simple fact that words like little and like are used very frequently illustrates the writing style of CZ. Fortunately, I'm after all trying to develop a cooking ontology, there also seem some domain terms used with high frequency: chocolate (why would that be), food, cheese, etc. The list demonstrates that if an author is passionate about a topic, this will appear in the statistics :-).
For the ontology it seems to make sense to define some kind of division. For example, separate branches could deal with hardware to prepare food, ingredients, drinks, types (styles) of food, places to eat and so forth. One of the first posts of CZ is E. Dehillerin which is a cooking utensils outlet. Trusting that cooking utensils is the general term for hardware realted to preparing food, I selected it in the post (it becomes yellow) and after releasing the mouse it also appears in the text entry box. A concept is created by hitting the button with a green C. We now have an ontology with two concepts CZ concept is the top-level and cooking utensils is a sub-concept of it.
In the same post I read that CZ has bought some knives in the shop. knife is therefore added as a kind of cooking utensil in the ontology. Further down:
"A mezza-luna (chopping tool with two handles and two half-moon blades). In French, it's called a "berceuse" because of the cradling movement you make while using it."
This seems to suggest that chopping tool is also a cooking utensil and that a mezza-luna is a kind of chopping tool. Moreover, chopping tools and knives consist of one or more blades and handles. The latter can be modelled by using the hasPart relation (shown as a triangle with a P). See figure on the right.
Perhaps the most useful function in tOKo is to select a point of view and then zoom in or out. Wondering whether there are different kinds of knives, I first select knife and then applied the prefix button. This shows all word pairs in which knife is the second word. The result is shown on the left.
Which of these are types of knife? In most cases it is obvious, although the composition of chef knife is curious. Is a paring knife a kind of knife? tOKo provides three ways of finding out: (1) Clicking on a term results in the posts in which it occurs being listed, which can then be studied; (2) Selecting the KWIC option shows the context; and (3) a collocation algorithm can be used (more about that later). In this case KWIC answers the question, it is a kind of knife (see the figure below). The KWIC concordancing technique is very old, it was for example used to study religious texts.
It is time for some preliminary conclusions on the experiment:
- CZ is not only a nice read, it also is very carefully worded (see the quoted example above). If you are listening Clotilde, in English there is no space before the : (colon).
- Developing an ontology is not easy, especially finding general terms that cover a set of more specific concepts. The cooking domain suffers from this to the extreme, or so it seems: a knife can be used for food preparation and it can also be used while eating. Social tagging tools ignore problems of this kind altogether, this makes them simple to use ("this is a photo of my sister") but who is to benefit from such tags?
- The experiment started of as "From Weblogs to Ontologies" suggesting that a weblog and common sense would be sufficient to create an ontology. Perhaps this was slightly over -ambitious and other resources are needed. Prime candidates are: WordNet and, of course, Wikipedia.
Could you help me. Talk sense to a fool and he calls you foolish.
I am from Albania and also am speaking English, please tell me right I wrote the following sentence: "Back than these doses, many follicles unfortunately shine their insults for a similar radiation."
With best wishes 8-), Griselda.
Posted by: Griselda | September 03, 2009 at 08:54 PM
Nowadays people don't use knives to eat even if those are necessary, people don't have time they just try to eat faster and they come to work again.
Posted by: cheap viagra | August 13, 2010 at 04:26 PM
I would first like to send a greeting to the readers, I share with my comment. for me everything that is extreme is great, I love all that is risky or totally impossible, I feel that that is what makes life interesting. who supports me? life is extreme!
Posted by: Impotence causes | October 07, 2010 at 04:33 AM
My name is Rachel, I live in a place where strange situations occur every day, which is why I spend hours and hours sitting in my chair, watching the computer, I confess that since I have this habit never had the opportunity to read an article so interesting, I hope to continue to make contributions similar to or better than these. Thank you very much ...
Posted by: Cheap viagra | October 12, 2010 at 04:49 AM
The above are just some examples of how posts are linked over time. Do these links (and the networks they depict) constitute a conversation? This is a tricky question. The colours and the distribution over time provide some clues, but the meat of the matter has to come from text analysis. Is there a common topic that can be identified? Or, phrased otherwise, can it be determined "why" bloggers join the conversation?
Posted by: buy jeans | October 18, 2010 at 01:26 PM
The thing that oneself like to grasping, strive for, no regrets!
Posted by: air yeezys | November 12, 2010 at 04:40 AM
Thanks for sharing this. If you are a man and have problems in bed? You should ask help from semenax and virility ex. These products will guarantee good results..
Posted by: Karen Anderson | January 14, 2011 at 12:33 AM
I am still confused. I will read it for several times and I will try to understand it. Anyway, thanks for the info.
Trina J.
nono hair removal
Posted by: Trina Jackson | February 08, 2011 at 07:20 AM
Is Natural Language Processing the same thing as Neuro Linguistic Programming? I will definitely need to check out the CZ Corpus.
Posted by: nono hair removal | March 01, 2011 at 09:42 PM
Hi, love this post. Very interesting, I look forward to reading more of your work in the future. Keep up the good work.
Posted by: roofers in manchester | March 13, 2011 at 03:33 PM
Very interesting post, extemely well thought out and put together. Keep up the good work.
Posted by: seo manchester | March 14, 2011 at 02:25 PM
Great post, love your work. Very well thought out and put together.
Posted by: t5 slimming pills | March 21, 2011 at 11:51 AM
Please one more post about that.I wonder how you got so good. This is really a fascinating blog, lots of stuff that I can get into. One thing I just want to say is that your Blog is so perfect
Posted by: Generic Viagra | March 23, 2011 at 10:10 AM
Thanks for the post but does the rss work? Seems that it does not.
Posted by: african mango reviews | March 28, 2011 at 09:38 AM
I wish one day I could be as good a developer as you. I have been following your posts and learning a lot from you.
Posted by: auto insurance quotes | April 14, 2011 at 03:00 PM
Wow, Great postNice work, I would like to read your blog every day Thanks
Posted by: Networking solutions | April 15, 2011 at 02:54 PM
Overall it is good information about cooking , by the way in middle post you were mentioned about any software so let me know what is that & how it work ?
Posted by: enhancexl | April 22, 2011 at 07:40 PM
Hi there,
Really nice job,There are many people searching about that now they will find enough sources by your tips.
Also looking forward for more tips about that
Posted by: ファロム | May 06, 2011 at 11:50 AM
I really love to cook something different with new idea , i really appreciate your blog post about cooking :)
wartrol
acnezine
Posted by: natural herbals | May 06, 2011 at 04:22 PM
Thanks for this helpful information on developing an ontology. You are right, is not easy, but when done right, it looks easy.
Posted by: Emmitt | May 13, 2011 at 06:19 AM
One must really know their subject well to set up a satisfactory ontology. When this is done poorly it is very frustrating for the user. Looks like you did a great job - I had no idea there are so many kinds of knives. - Ellyn Deuink
Posted by: Ellyn | June 19, 2011 at 01:05 AM
Mauris ut dui vel plures us.there questio de viribus tuis, iam satis invenient post.I ut una nobiscum socius persevero vestri blog usquam ut placeas Skönt att vara besöka din blogg igen, har det varit månader för mig. Väl här artikeln som jag har väntat så länge. Jag behöver den här artikeln för att slutföra mitt uppdrag i kollegiet, och den har samma tema med din artikel. Tack, bra aktie.
Posted by: Seo Services India | July 02, 2011 at 03:26 PM
This is a brilliant post, im really glad I found it thank you very much.
Posted by: michael vick jersey | August 19, 2011 at 03:39 AM
I got quite a bit out of this post and thank you kindly for taking the time to publish it.
Posted by: Landscape Lighting Ideas | September 11, 2011 at 07:41 PM
I love to cook food for my parents. I highly appreciate your post and can understand it is not a easy task to explain someone in brief.
Regards
Angry Birds Game
Posted by: angry birds | September 20, 2011 at 12:34 PM