Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do.

There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Thursday, 3 November 2016

ARDIS - Some further thoughts !




Note dt :  09  June  1998

My first note on ARDIS ( Artificial  Resume  Deciphering  Intelligent  Software  ) was written on 01 Dec 1996

Some 18 months later , I sent following note to Yogesh / Cyril , who had translated my notes / U-Is / logic into
www.3pJobs.com ... and launched it on 14  Nov  1997 - some 10 months before GOOGLE got launched officially !

--------------------------------------------------------------------------------------

Uploaded :  04  Nov  2016

-----------------------------------------------------------------------------------------


Yogesh / Cyril ,


ARDIS

While discussing the " Data Capture & Query " Module ( Module # 1 ) , a few days back , we also talked about the " Knowledge Base " already available with us . This knowledge base has been acquired / created over last 8 years

This knowledge base comprises of English Language ,

*   Words

*   Phrases

*   Sentences

*   Paragraphs


As far as " Words " are concerned , I myself worked on " Categorizing " them in different " Categories "

This was nearly 12 months ago , using software tool " TELL  ME  " , developed by Cyril

In this connection , I enclose Annex  A / B / C / D


Under " TELL  ME  " , I have already " categorized " over 15,000 words into some 60 different " Categories "

Some of these are shown in Annex  C


In addition , Cyril had developed another simple method , under which , I could quickly categorize :

*   P  =  Person's  Name  ( " Name " of a person )

*   C  =  Company Name

*   Q  =  Edu Qualification of an individual

*   L  =  Name of a Location ( mostly , a CITY )


As far as these 4 categories ( out of 60 odd categories of words ) is concerned , I have already covered :

----------------------------------------------------------

FREQUENCY   ......................  No of Words Covered

----------------------------------------------------------

>  100 ...................................  7,056

51  -  100..............................    3,913

26  -  50 ...........................        5880

11  -  25  .............................   13,246

-------------------------------------------------------------

TOTAL ...............................   30,246

-------------------------------------------------------------


(  See  Annex  A  ) .   These are ISYS-indexed words

So ,

under both the tools combined , I might have already " categorized " over 30,000 words


Over the last 5 / 6 weeks , we have already scanned / OCRed  and created .txt files of some 13573 pages of bio data . And this population is growing at the rate of some 300 pages per day


We talked about a simple software which will pick - out ALL the words ( except for " common " words ) , in each of these page , then,

compare each such word with the " Knowledge Base " of 30,000 words which I have already " categorized "


If a " match " is found , the word is transferred to respective " category " and marked " KNOWN "

If there is " no match " , the word gets tagged as " NEW " and gets highlighted in the .txt file


Now , anytime , any consultant is viewing that page on the screen and comes across a " NEW " marked word , whose " Meaning / Category : he knows , he will have a simple " Tool " ( on that very screen ) , with which he will go ahead and " categorize " that word . This TOOL could be perhaps " TELL  ME  "


We should debate whether we also give the " rights " to any consultant to " ADD " a new CATEGORY , itself

It should be possible for any number of consultants to work on this TOOL , simultaneously , from their own individual work-stations , whenever time permits or whenever they are " Viewing " a .txt page for any reason


This arrangement would " multiply " the effort several times as compared to my doing it " single-handedly " !

PLUS ,

It has the advantage of using the knowledge of several persons having different academic / experience background


We could also consider hiring " Experts " from different " Functional Areas " , to carry out ( this categorization ) in a dedicated manner


Now that we have 13,573 pages ready ( for this simple " match making " process ) , we could seriously consider " hiring " such " experts "


We could even take " Text Books " on various  SUBJECTS / CATEGORIES  and prepare an INVENTORY of all words appearing in each book and put them in the SUBJECT category


Many innovations are possible - if only we could make a beginning

Such a beginning is possible now

Let us give this a serious thought and discuss soon

regards,

hcp

-------------------------------------------------------------------------------------------------------------------------

     



No comments:

Post a Comment