Hi Friends,

Even as I launch this today ( my 80th Birthday ), I realize that there is yet so much to say and do.

There is just no time to look back, no time to wonder,"Will anyone read these pages?"

With regards,
Hemen Parekh
27 June 2013

Monday, 24 October 2016

AIR - ( PART ONE ) : ARTIFICIAL INTELLIGENCE IN RECRUITING


A few notes that I made in margins of a book ( written in 1989 ) that I read in 2002
May be by now ( in Oct 2016 ) , someone has already implemented the type of  EXPERT SYSTEM  that I conceived in my notes
If not , here is a great opportunity for some Indian Start Up !
I would be happy to guide , if requested
hemen parekh
hcp@RecruitGuru.com
25  Oct 2016
-------------------------------------------------------------------------------------
Book   >        Expert Systems
Author >        Edited by RICHARD FORSYTH
When Read > Aug 2002
-------------------------------------------------------------------------------------

Page 6
Is this like our saying : IF such and such keywords appear in a resume , THEN , it may belong to such and such INDUSTRY or FUNCTION ?

Page 7
I believe " knowledge " contained in our 65,000 resumes, is good enough to develop an Expert System ( ARDIS - ARGIS )
We started work on ARDIS-ARGIS in 1996 ! But taken up seriously , only 3 months ago
ARDIS = Artificial Resume Deciphering Intelligent System
ARGIS = Artificial Resume Generating Intelligent System

Page 8
Keywords are nothing but descriptions of resumes

Page 11
I believe ISYS manual speaks of " Context Tree " - so does Oracle " Context Cartridge ( Themes ) "

Page 15
In our case , " Hypothesized Outcome " could be a resume,
- getting shortlisted by 3P / by Client,
Or ,
a Candidate getting " appointed " ( after interview )
In our case :-
The " presence of the evidence " could be presence of certain " keywords " in a given resume ( the Horse ) or certain " Edu Qualification " or certain " Age " or certain " Exp ( years ) " or certain " Current Employer " etc

Page 16
In our case, these several " pieces of evidence " could be ,
* Keywords * Age * Exp * Edu Quali * Current Industry Background * Current Function Background * Current Designation Level * Current Salary * Current Employer etc
We could " establish " ODDS ( for each piece of evidence ) and then apply " Sequentially " , to figure out the " Probability / Odds " of that particular resume getting shortlisted / getting selected
We have to examine ( statistically ), resumes of all candidates shortlisted during last 13 years , to calculate " Odds "

Page 18
" Automating " the process of Knowledge Acquisition ?
We could do this ( automating ), by getting / inducing the jobseekers to select / fill in , keywords themselves online , in the web form

Page 19
The " Decision Support " that our consultants need is : -
" From amongst thousands of resumes in our databank, which " few " should be sent to client ? Can software locate those few automatically, which have " excellent probability " of getting shortlisted / selected ? "
Our consultants , today, spend a lot of time in doing just this , manually - which we need to automate .

Page 20
These ( few ) resumes are " GOOD " for this " VACANCY "

Page 22
According to me , this " notation " is :-
" All human thoughts / speech and action, are directed towards either increasing the happiness ( of that person ) or towards decreasing the pain , by choosing from amongst available thoughts / spoken words / actions "
This notation describes all aspects of human race
This ability to choose the most appropriate option ( at that point of time ) , makes a human being , " intelligent "

Page 23
There are millions of " words " in english language - used by authors of books and poets in songs and laywers in documents but the words of interest to us are those used by jobseekers & recruiters in job advts . This is our area of expertise
Data = Probabilities of 10,000 keywords occurrence amongst " past successful " candidates
Problem Description = See remarks at the bottom of page 19 for OUR problem description

Page 25
RESUMIX ( Resume Management Software ) claims to contain 100,000 " rules "
* Our expertise in " matchmaking " of jobseekers and " vacancies " of recruiters
* Our business does fall in such a " specialist " category
* Persons who have spent 15 years reading resumes / deciding their " suitability " and interviewing candidates

Page 27
* Agree . We do not expect " Expert System " to conduct interviews !
* Our consultants do spend 2/3 hours daily in reading / short-listing  resumes
* We want a " Decision Support System " to assist our consultants , so they can spend more time in " interview " type of assessment
* If, during last 13 years , we have placed 500 executives , then we / client , must have short-listed 5,000 resumes . These are enough " Test Cases "

Page 28
Now ( in 2002 ), expert systems have become an " Essential " to survival of all organizations . We can ignore it at our peril

Page 29
* We can become VICTORS or VICTIMS : choice is ours
* I am sure , by 2002 , we must have many " MATURE " expert system " kernels / shells " commercially available in the market ( now available for Rs 3,000 / pound 40 )

Page 30
May be we could send an email to Mr FORSYTH himself, to seek his guidance
We will need to explicitly state > our problem  > solution which we seek from the Expert System ,
and ask him which commercially available " shell " does he recommend
email : Richard.Forsyth@uwe.ac.uk

Page 32
* How many does this ( CRI-1986 ) directory list in 2002 ?
* Google still shows CRI-1986 , as the latest ! But , " Expert Systems " in Google returned 299,000 links !
*  I took a course in X-Ray Crystallography at KU in 1958

Page 33
*  When developed , our system would fall under this category
*  Most certainly, we should integrate the two

Page 35
The resumes short-listed by our proposed " Expert System " ( resumes having highest probability of getting short-listed ), must be manually " examined " - and assigned " Weightage " by our consultants & these " Weightages " fed back into the system

Page 37
I believe, our system will be simple " rule-based " - although , there may be a lot of " processing " involved , in " Sequential Computation " of probabilities for keywords related to :
Industry / Function / Designation Level / Age / Exp / Edu Quali / Attitudes / Attributes / Skills / Knowledge / Salary / Current Employer / Current posting location / family etc

Page 39
* Abhi / Rajeev :
In my notes on ARDIS / ARGIS , see separate notes on " Logic for.......... "
Here, I have listed the under-lying rules

Page 40
Expert Knowledge ( - and consequently , the rules ) contained in RESUMIX have relevance to USA jobseekers - and their " Style " of resume preparation.
These ( rules ) , may not apply in Indian context

Page 41
We are trying to establish the " relationship " between :
(A)  Probability of occurrence of a given " keyword " in a given resume,
WITH,
(B) Probability of such a resume getting short-listed

Page 42
So , we will need to prepare a comprehensive list of inconsistencies , with respect to a resume

Page 43
* We should ask  both ( the Expert System and the Experts ) , to independently short-list resumes and compare
* We have to experiment with building of an Expert System which would " test / validate " the assumptions :-
If certain ( which ? ) keywords or search parameters are found in a resume, it has a higher probability of getting short-listed / selected

Page 44
* Eg: System short-listing a " Sales " executive against a " Production " vacancy !
* What / Which  " Cause " could have produced, what / which , " Effect / Result "

Page 45
In our case , the Expert System , should relieve our consultants to do more " Intelligent " work of assessing candidates thru personal interviewing
" Human Use of Human Beings " by Norbert Weiner ( read first in 1956 )

Page 47
Eg:
(1) Entering email resumes in structured database of Module 1
(2) Reconstituting a resume ( converted bio-data ) thru ARGIS
For these " tasks " , we should not need human beings at all
Read , " What Will Be " by Michael Dertouzo ( MIT Lab - 1997 )

Page 48
* Even when our own Expert System " short-lists " the resumes ( based on perceived high probability of appointment ) , our consultants would still need to go thru these resumes before sending to clients. They would need to " interpret "
* Read all my notes written over last 13 years

Page 50
* Our future / new consultants , need to be taken thru OES ( Order Execution System ) , step by step, thru the entire process - thru simulation , ie; fake search assignment
* Our " Task Area " is quite obvious - but may not be simple _ , viz:
We must find the " right " candidates for our clients , in shortest possible time
* In 13 years, since this book was written , " Mobile Computing " has made enormous strides . Also internet arrived in a big way in 1995
By March 2004 , I envisage our consultants , carrying their laptop or even smaller mobile computers & search our MAROL database for suitable candidates ( of course , using Expert System ) , sitting across clients' table

Page 58
Resumes are " data " but when arranged as a " short-list " , they become " information " because a " short-list " is always in relation to our " Search Assignment "!
It is that " Search Assignment " that lends " meaning " to a set of resumes

Page 59
Are " resumes " , knowledge about " people " ? - and their " achievements " ?

Page 60
... but , is a human , a part and parcel of nature ?
Human did not create Nature but did Nature create Human ?
Our VEDAS say that the entire UNIVERSE is contained in an ATOM . May be they meant, that an entire UNIVERSE can arise from an ATOM !

Page 61
* Read " Aims of Education " by A N Whitehead
* Inference is a process of drawing / reaching " conclusions " based on knowledge

Page 62
Calculating probabilities of occurrence of Keywords and then comparing with keywords contained in " Resumes " , of SUCCESSFUL candidates

Page 64-65
IF
 a resume " R " contains keywords : a : b : c :
and,
IF
 resumes of all past " Successful " candidates also contain keywords : a : b : c : ,
THEN,
resume " R " will also be " successful " ( conclusion )
Our Expert System will be such an " Automatic Theorem Proving System " ,
WHERE,
" Inference Rules " , will have to be first figured out / established , from large volumes of past " co-relations " between " keywords " and " successes "
" Successes " can be defined in a variety of ways , including :
#  Short-listing  #  Interviewing ( getting interviewed )  #  Appointing ( getting appointed )

Page 67 { 4.3.2 / Backward Chaining }
In our case too, we are trying to " interpret / diagnose " the " symptoms " ( in our case the " Keywords " ) , contained in any given resume ( patient ? ) & then " predict " what are its chances ( ie; probabilities ) of " success " ( = getting cured ),
ie; getting " Short-listed " / " Interviewed " / " Appointed "

Page 68
These resumes can be further sub-divided according to, INDUSTRY / FUNCTION / DESIG LEVEL etc, to reduce population size of keywords
For us , there are as many " Rules " as there are " Keywords " in the resumes of the past " Successful " candidates - with the " frequency of occurrence " of each such keyword ( in say , 7,500 successful resumes ) , deciding its " weightage " while applying the rule

Page 69
(1)   Our initial assumption >  Resumes of " Successful ( past ) candidates , always contain keywords, a , b , c, d ..etc
(2)   Process  >  Find all other resumes which contain..  a, b, c, d
(3)   Conclusion  >  These should " succeed " too
Our goal :
Find all resumes which have high probability of " success "
System should automatically keep adding to the database , all the " actually successful " candidates as each search assignment gets over

Page 70
* In 1957 , this ( travelling salesman problem ) , was part of " Operations Research " course at Uni of Kansas
*  With huge number crunching capacities of modern computers , computational costs are not an important consideration

Page 72
* We are plotting frequency of occurrence of keywords in specific past resumes to generalize our observations
*  We can construct a table like this from our 65,000 resumes & then try to develop " algorithms "
* Abhi / Rajeev > In above table , the last column ( Job or Actual Designation ) , can also be substituted by ,
# Industry  #  Function  #  Edu Qualification ,
and a new set of " algorithms " will emerge

Page 74
Our resumes also " leave out " a hell of a lot of " Variables " !
A resume is like a jigsaw puzzle with a lot of missing pieces !
We are basing on " Statistical Forecasting " , viz:
frequency of occurrence of certain keywords and attaching ( assigning ) probability values

Page 76
Just imagine , if we too can locate and deliver to our clients, just that candidate that he needs ! - in the first place , just those resumes , which he is likely to value / appreciate
I am sure, by now superior languages must have emerged.
Superior hardware certainly has , as has " Conventional Tools " of database management

Page 77
Perhaps what was " specialized " hardware in 1989, must have become quite common today in 2002 - and fairly cheap too

Page 81
* We must figure out ( - and write down ) , what " logic / rules " our consultants use ( even subconsciously ) , while selecting / rejecting a resume ( as being suitable ) , for a client need . Expert System must mimic a human expert
*  We are basing ourselves ( ie: our proposed Expert System ) on this " type " ( see " patterns " in " Digital Biology " )

Page 82
This is what we are after ( Statistical Analysis / Fuzzy Forecasting / Pattern Recognition )

Page 87
Probability that this keyword ( symptom ) will be observed / found , given that the resume ( patient ) belongs to XYZ industry ( illness )

Page 88
*  Random person = any given " incoming " email resume
    Influence = " Automobile " industry
Based on our past population ? = { " Auto" resumes divided by All resumes } probability
If " symptoms " = keywords, which " symptoms " ( keywords ) , have appeared in " Auto " industry resumes , OR ,
which keywords have NEVER appeared in " Auto " resumes , OR
appeared with low frequency ?

Page 89
With addition of ALL keywords ( including new keywords , not previously recorded as " keyword " ) from each new incoming resume , the " prior probabilities " keep changing

Page 90
In this " resume " , assuming it belongs to a particular " industry " ,
>  what keywords can be expected
>  what keywords may not be expected
We can also reverse the " reasoning " viz:
>  What " industry " ( or " Function " ) might a given resume belong to , if
   *  certain keywords are present , AND
   *  certain keywords are absent  ?
*  The " result / answer " provided by Expert System , can then be tested / verified with what jobseeker has himself " clicked "

Page 91
Reiteration
So each new incoming resume would change the " prior probability ' ( again and again ), for each Industry / Function / Designation / Edu Quali / Exp etc
Graph : X Axis > Cumulative No of Resumes    Y Axis > Probability No ( 0 to 1 )

Page 92
With 65,000 resumes, ( ie; Patients ), and 13,000 keywords ( symptoms ) , we could get a fairly accurate " estimate " of " prior probabilities "
This will keep improving / converging ( see hand-drawn graph on page 91 ) as resume database and keywords database keeps growing ( especially if we can succeed in downloading thousands of resumes ( or job advts ) , from Naukri / Monster / JobsAhead etc

Page 93
In Indian resumes , keyword " Birth date " would have the probability of 0.99999
Of course, most such keywords are of no interest to us !

Page 95
For our purpose, " keywords " are all " items of evidence "
If each and every keyword found in a incoming resume , corresponds to our " hypothesis " ( viz: keywords A / B / C / , are always present in resumes belonging to ' Auto ' industry ), then we obtain max possible " Posterior Probability "
So , if our knowledge base ( not only of keywords , but phrases / sentences / names of current and past employers / posting cities etc ) is VERY WIDE and VERY DEEP , we would be able to formulate more accurate hypothesis and obtain higher " posterior probability "

Page 96
So , the key issue is to write down a " Set of hypothesis "

Page 97
Let us say , keyword " Derivative " may have a very low frequency of occurrence in 65,000 resumes ( of all industries put together ) but, it could be a very important keyword for the " Financial Services " industry

Page 98
Eg: certain keywords are often associated ( found in ) with certain " Industries " or certain ' Functions " ( Domain keywords )

Page 99
With each incoming resume , the probability of each keyword ( in keyword database )  will keep changing
Eg:
Does " Edu Quali " have any role / effect /ightage , in the selection of a candidate ?
Eg :
What is the max age ( or Min Exp ) at which corporate will appoint a  Manager / a General Manager / a Vice President ?

Page 103
Based on frequency distribution in existing 65,000 resumes , say , every 20th incoming resume belongs to ,
>  XYZ  industry
>  ABC function

Page 104
Evidence : If an incoming resume belongs to " Auto " industry, it would contain keywords , " car / automobile " etc , OR Edu Quali = Diploma in Auto Engineering
Eg :
Probability of selection of an executive is ZERO , if he is 65 years of age !
One could assign " probability of getting short-listed " or even " probability of getting appointed " for each  AGE or for each " YEARS OF EXPERIENCE " or for each " EDU LEVEL " etc
This is very similar to asking :
" Will this executive ( incoming resume ) NOT get short-listed ? OR , NOT get appointed ?

Page 105
Obviously ,
#  a person ( incoming resume )  having no experience ( fresh graduate ) will NOT get shortlisted for the position of MANAGER ( Zero Probability )
#  a person ( incoming resume ) having NO graduate degree , will NOT get shortlisted for the position of MANAGER ( Zero Probability )
#  a person ( incoming resume ) with less than 5 years of experience , will NOT get shortlisted for the position of GENERAL MANAGER ( Zero Probability ),
but
will get shortlisted for the position of SUPERVISOR ( 0.9 probability )

Page 106
So , we need to build up a database of WHO ( executive ) got shortlisted / appointed , by WHOM ( client ) and WHEN , and WHY ( to best of our knowledge ), over the last 13 years and HOW MUCH his " background " matched " Client Requirement " and Keywords in resumes

Page 107
*  How " good " or " bad " is a given resume for a given client needs ?
*  Co-relating ,
>  Search parameters used , with
>  Search results ( resumes / keywords in resumes ) ,
for each and every , online / offline Resume Search


Page 109
*  Is this like saying : " Resume A belongs to Engineering Industry to some degree and also to Automobile Industry to some degree ?
Same way can be said about " Functions "
*  Same as our rating one resume as " Good " and another as " Bad " ( of course, in relation to client's stated needs )

Page 110
Eg: SET A  >  Resumes belonging to Eng Industry
      SET B  >  Resumes belonging to Auto Industry
A resume can belong to both the sets with " different degree of membership "

Page 114
In our case , we have to make a " discreet decision " , viz; " Shall I shortlist & forward this resume to client or not ?

Page 115
* A given keyword is present in a resume , then treat the resume as belonging to " Eng Ind "
Or ( better ) ,   a " given combination " of keywords being present or absent in a resume should be classified under ( say ) , " Engg Ind "
*  For us , " real world experience " = several sets of " keywords " discovered in 65,000 resumes which are already categorized ( by human experts ) as belonging to Industry or Function , A or B or C

Page 118
Eg : A given resume is a  " very close match " / " close match " / " fairly close match " / " not a very close match " ,
with client's requirement

Page 127
*  Who knows , what " new / different " keywords would appear ( - and what will disappear ) , in a given type of resumes , over next 5 / 10 years ?
*  We may think of our Corporate Database in these terms and call it " Corporate Knowledgebase "

Page 129
*  One could replace " Colour of Teeth " = Designation Level ( MD / CEO / President / GM )
*  " Data Structure " table with following columns :
   >   Keywords
   >   Industry
   >   Function
   >   Designation
   >   Education
   >   Age
   >   Experience
   >   Current Employer
   >   Skills
   >   Attitudes
   >   Attributes

Page 132
Is this somewhat like saying :
" This ( given ) resume belongs to Industry X or Industry Y ?
Number of times a given resume got shortlisted in years , X / X+1 / X+2 / etc
If we treat " getting shortlisted " as a measure of " Success " , ( which is as close to defining " success " as we can ),  = prize money won
Of course , in our case , " No of times getting shortlisted " ( in any given year 0 , is a function of ,
>  Industry background
>  Function background
>  Designation
>  Edu
>  Age ( Max )
>  Exp ( Min )
>  Skills
>  Knowledge etc

Page 133
Which is what the resumes are made up of ( - sentences )
See my notes on ARDIS and ARGIS

Page 136
This was written 13 years ago . In last few months, scientists have implanted simple micro sensors in human body and succeeded in " integrating " these ( chips ) into human nervous system !
Eg : restoring " vision " thru chip implant ( Macular Degeneration of Retina )

Page 143
In ARDIS , I have talked about character recognition / word recognition / sentence - phrase recognition

Page 148
" You see what you want to see and hear what you want to hear "
I have no doubt these ( Object oriented Language ) must have made great strides in last 20 years

Page 149
*  Are our 13,000 keywords , the " Working Memory " ?
*  Eg : Frequency distribution of keywords belonging to ( say ) 1,000 resumes belonging to " Pharma Industry " , is a " pattern "
When a new resume arrives, the software " invokes " this " pattern " to check the amount / degree of " MATCH "

Page 150
Eg : This is like starting with an assumption ( hypothesis ) that the next incoming resume belongs to " Auto Industry "  or to " Sales " function , and then proceed to prove / disprove it

Page 151
Keywords pertaining to any given " Industry " or " Function " will go on changing over the years , as new skills and knowledge gets added so " recent ' keywords are more valid
Eg : " Balanced Score Card " was totally unknown 2 years ago !

Page 152
In case of " keywords " , is this comparable to ordering ( ie ; arranging ) by frequency of occurrence ?

Page 153
*  Treating child as an " Expert System " , capable of drawing inference
*  Eg: blocks different colours
*  Rules will remain same
*  Keywords will change over time ( working memory )
*  Gross ( or crude ) search , to be refined later on

Page 154
*   Obtaining an understanding of what the system is trying to achieve
*   I suppose this has already happened

Page 155
* See my notes on " Words and Sentences that follow "

Page 178
*  We are thinking in terms of past " Resumes " which have been " short-listed"
*  I have already written some rules
*  This must have happened in last 13 years !

Page 179
We must question our consultants as to what logic / rules do they use / apply ( even subconsciously ) - to short-list or rather select from shortlisted candidates

Page 181
" KNOWLEDGE  ACQUISITION  TOOLS " , developed by us over the years :
*   Highlighter
*   Eliminator
*   Refiner
*   Compiler
*   Educator
*   Composer
*   Matchmaker
*   Member Segregator
*   Mapper ( to be developed )

Page 186
I have written some rules but many more need to be written

Page 191
Like " weightages" in Neural Networks ?
To find " real evidence " , take resumes ( keywords ) AND " Interview Assessment Sheets " of " Successful " candidates and find co-relation
Eg : Rules re " Designation Level " could conflict with rules re: " Experience ( years ) " or " Age ( Min ) "

Page 192
Special Rule  > Eg: Edu Quali = CA / ICWA / MBA , for " Finance " function
General Rule > Basic degree in commerce
" Rule Sets " on Age ( max ) / Exp ( min ) / Edu Quali / Industry / Function / Designation level ..etc
If our Corporate Client can explicitly state " weightage " for each of the above ,  it would simplify the design of expert system
Mr Nagle had actually built this ( weightage )  factor in our HH3P search engine , way back in 1992
Instead of clients , consultants entered these " weightages "  in search engine ; but resume database was too small ( may be < 5,000 )

Page 197
Quacks sometimes do " cure " a disease , but we would never " know " why or how !

Page 198
We have databases of :
*  Successful ( or " failed " ) candidates , and
*  their " Resumes " and " Assessment  Sheets " ,  and
*  Keywords contained in these resumes

Page 199
*  " Not Fine " = Failed candidates
*  Fine = Successful candidates  ( from historical data )
*  We will need to define which are " successful " and which are " failure " candidates ( of course , in relation to a given vacancy )
*  We want to be able to predict which candidates are likely to be " successful "
*  We must have data on past 500 " successful " and 5000  " failure " candidates in our database

Page 222
Abhi :
This is exactly what our " Knowledge Tools " do , viz:
*  Highlighter
*  Eliminator
*   Refiner
*   Sorter
*   Matchmaker
*   Educator
*  Compiler
*   Member Segregator  etc

Page 227
Diebold predicted such a factory in his 1958 book  " Automatic Factory "
Someday , we would modify our OES ( Order Execution System ) in such a way that our clients will " Self Service " themselves
" SiVA " on JobStreet.com has elements of such a self-serving system

Page 232
Statistical Analysis of thousands of job advts of last 5 years , could help us extrapolate next 10 year's trend

Page 236
We are planning a direct link from HR Managers to our OES ( which is our factory floor )
We will still need " consultants " - but only to interact ( talk ) with clients & candidates ; not to fill in INPUT screens of OES !

Page 237
Obviously , author had a vision of Internet - which , I could envision 2 years earlier in 1987 , in my report " QUO  VADIS  "

Page 239
Norbert Weiner had predicted this in 1958

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

No comments:

Post a Comment