Reports: AIR - ( PART ONE ) : ARTIFICIAL INTELLIGENCE IN RECRUITING

A few notes that I made in margins of a book ( written in 1989 ) that I read in 2002

May be by now ( in Oct 2016 ) , someone has already implemented the type of EXPERT SYSTEM that I conceived in my notes

If not , here is a great opportunity for some Indian Start Up !

I would be happy to guide , if requested

hemen parekh

hcp@RecruitGuru.com

25 Oct 2016

-------------------------------------------------------------------------------------

Book > Expert Systems

Author > Edited by RICHARD FORSYTH

When Read > Aug 2002

-------------------------------------------------------------------------------------

Page 6

Is this like our saying : IF such and such keywords appear in a resume , THEN , it may belong to such and such INDUSTRY or FUNCTION ?

Page 7

I believe " knowledge " contained in our 65,000 resumes, is good enough to develop an Expert System ( ARDIS - ARGIS )

We started work on ARDIS-ARGIS in 1996 ! But taken up seriously , only 3 months ago

ARDIS = Artificial Resume Deciphering Intelligent System

ARGIS = Artificial Resume Generating Intelligent System

Page 8

Keywords are nothing but descriptions of resumes

Page 11

I believe ISYS manual speaks of " Context Tree " - so does Oracle " Context Cartridge ( Themes ) "

Page 15

In our case , " Hypothesized Outcome " could be a resume,

- getting shortlisted by 3P / by Client,

Or ,

a Candidate getting " appointed " ( after interview )

In our case :-

The " presence of the evidence " could be presence of certain " keywords " in a given resume ( the Horse ) or certain " Edu Qualification " or certain " Age " or certain " Exp ( years ) " or certain " Current Employer " etc

Page 16

In our case, these several " pieces of evidence " could be ,

* Keywords * Age * Exp * Edu Quali * Current Industry Background * Current Function Background * Current Designation Level * Current Salary * Current Employer etc

We could " establish " ODDS ( for each piece of evidence ) and then apply " Sequentially " , to figure out the " Probability / Odds " of that particular resume getting shortlisted / getting selected

We have to examine ( statistically ), resumes of all candidates shortlisted during last 13 years , to calculate " Odds "

Page 18

" Automating " the process of Knowledge Acquisition ?

We could do this ( automating ), by getting / inducing the jobseekers to select / fill in , keywords themselves online , in the web form

Page 19

The " Decision Support " that our consultants need is : -

" From amongst thousands of resumes in our databank, which " few " should be sent to client ? Can software locate those few automatically, which have " excellent probability " of getting shortlisted / selected ? "

Our consultants , today, spend a lot of time in doing just this , manually - which we need to automate .

Page 20

These ( few ) resumes are " GOOD " for this " VACANCY "

Page 22

According to me , this " notation " is :-

" All human thoughts / speech and action, are directed towards either increasing the happiness ( of that person ) or towards decreasing the pain , by choosing from amongst available thoughts / spoken words / actions "

This notation describes all aspects of human race

This ability to choose the most appropriate option ( at that point of time ) , makes a human being , " intelligent "

Page 23

There are millions of " words " in english language - used by authors of books and poets in songs and laywers in documents but the words of interest to us are those used by jobseekers & recruiters in job advts . This is our area of expertise

Data = Probabilities of 10,000 keywords occurrence amongst " past successful " candidates

Problem Description = See remarks at the bottom of page 19 for OUR problem description

Page 25

RESUMIX ( Resume Management Software ) claims to contain 100,000 " rules "

* Our expertise in " matchmaking " of jobseekers and " vacancies " of recruiters

* Our business does fall in such a " specialist " category

* Persons who have spent 15 years reading resumes / deciding their " suitability " and interviewing candidates

Page 27

* Agree . We do not expect " Expert System " to conduct interviews !

* Our consultants do spend 2/3 hours daily in reading / short-listing resumes

* We want a " Decision Support System " to assist our consultants , so they can spend more time in " interview " type of assessment

* If, during last 13 years , we have placed 500 executives , then we / client , must have short-listed 5,000 resumes . These are enough " Test Cases "

Page 28

Now ( in 2002 ), expert systems have become an " Essential " to survival of all organizations . We can ignore it at our peril

Page 29

* We can become VICTORS or VICTIMS : choice is ours

* I am sure , by 2002 , we must have many " MATURE " expert system " kernels / shells " commercially available in the market ( now available for Rs 3,000 / pound 40 )

Page 30

May be we could send an email to Mr FORSYTH himself, to seek his guidance

We will need to explicitly state > our problem > solution which we seek from the Expert System ,

and ask him which commercially available " shell " does he recommend

email : Richard.Forsyth@uwe.ac.uk

Page 32

* How many does this ( CRI-1986 ) directory list in 2002 ?

* Google still shows CRI-1986 , as the latest ! But , " Expert Systems " in Google returned 299,000 links !

* I took a course in X-Ray Crystallography at KU in 1958

Page 33

* When developed , our system would fall under this category

* Most certainly, we should integrate the two

Page 35

The resumes short-listed by our proposed " Expert System " ( resumes having highest probability of getting short-listed ), must be manually " examined " - and assigned " Weightage " by our consultants & these " Weightages " fed back into the system

Page 37

I believe, our system will be simple " rule-based " - although , there may be a lot of " processing " involved , in " Sequential Computation " of probabilities for keywords related to :

Industry / Function / Designation Level / Age / Exp / Edu Quali / Attitudes / Attributes / Skills / Knowledge / Salary / Current Employer / Current posting location / family etc

Page 39

* Abhi / Rajeev :

In my notes on ARDIS / ARGIS , see separate notes on " Logic for.......... "

Here, I have listed the under-lying rules

Page 40

Expert Knowledge ( - and consequently , the rules ) contained in RESUMIX have relevance to USA jobseekers - and their " Style " of resume preparation.

These ( rules ) , may not apply in Indian context

Page 41

We are trying to establish the " relationship " between :

(A) Probability of occurrence of a given " keyword " in a given resume,

WITH,

(B) Probability of such a resume getting short-listed

Page 42

So , we will need to prepare a comprehensive list of inconsistencies , with respect to a resume

Page 43

* We should ask both ( the Expert System and the Experts ) , to independently short-list resumes and compare

* We have to experiment with building of an Expert System which would " test / validate " the assumptions :-

If certain ( which ? ) keywords or search parameters are found in a resume, it has a higher probability of getting short-listed / selected

Page 44

* Eg: System short-listing a " Sales " executive against a " Production " vacancy !

* What / Which " Cause " could have produced, what / which , " Effect / Result "

Page 45

In our case , the Expert System , should relieve our consultants to do more " Intelligent " work of assessing candidates thru personal interviewing

" Human Use of Human Beings " by Norbert Weiner ( read first in 1956 )

Page 47

Eg:

(1) Entering email resumes in structured database of Module 1

(2) Reconstituting a resume ( converted bio-data ) thru ARGIS

For these " tasks " , we should not need human beings at all

Read , " What Will Be " by Michael Dertouzo ( MIT Lab - 1997 )

Page 48

* Even when our own Expert System " short-lists " the resumes ( based on perceived high probability of appointment ) , our consultants would still need to go thru these resumes before sending to clients. They would need to " interpret "

* Read all my notes written over last 13 years

Page 50

* Our future / new consultants , need to be taken thru OES ( Order Execution System ) , step by step, thru the entire process - thru simulation , ie; fake search assignment

* Our " Task Area " is quite obvious - but may not be simple _ , viz:

We must find the " right " candidates for our clients , in shortest possible time

* In 13 years, since this book was written , " Mobile Computing " has made enormous strides . Also internet arrived in a big way in 1995

By March 2004 , I envisage our consultants , carrying their laptop or even smaller mobile computers & search our MAROL database for suitable candidates ( of course , using Expert System ) , sitting across clients' table

Page 58

Resumes are " data " but when arranged as a " short-list " , they become " information " because a " short-list " is always in relation to our " Search Assignment "!

It is that " Search Assignment " that lends " meaning " to a set of resumes

Page 59

Are " resumes " , knowledge about " people " ? - and their " achievements " ?

Page 60

... but , is a human , a part and parcel of nature ?

Human did not create Nature but did Nature create Human ?

Our VEDAS say that the entire UNIVERSE is contained in an ATOM . May be they meant, that an entire UNIVERSE can arise from an ATOM !

Page 61

* Read " Aims of Education " by A N Whitehead

* Inference is a process of drawing / reaching " conclusions " based on knowledge

Page 62

Calculating probabilities of occurrence of Keywords and then comparing with keywords contained in " Resumes " , of SUCCESSFUL candidates

Page 64-65

a resume " R " contains keywords : a : b : c :

and,

resumes of all past " Successful " candidates also contain keywords : a : b : c : ,

THEN,

resume " R " will also be " successful " ( conclusion )

Our Expert System will be such an " Automatic Theorem Proving System " ,

WHERE,

" Inference Rules " , will have to be first figured out / established , from large volumes of past " co-relations " between " keywords " and " successes "

" Successes " can be defined in a variety of ways , including :

# Short-listing # Interviewing ( getting interviewed ) # Appointing ( getting appointed )

Page 67 { 4.3.2 / Backward Chaining }

In our case too, we are trying to " interpret / diagnose " the " symptoms " ( in our case the " Keywords " ) , contained in any given resume ( patient ? ) & then " predict " what are its chances ( ie; probabilities ) of " success " ( = getting cured ),

ie; getting " Short-listed " / " Interviewed " / " Appointed "

Page 68

These resumes can be further sub-divided according to, INDUSTRY / FUNCTION / DESIG LEVEL etc, to reduce population size of keywords

For us , there are as many " Rules " as there are " Keywords " in the resumes of the past " Successful " candidates - with the " frequency of occurrence " of each such keyword ( in say , 7,500 successful resumes ) , deciding its " weightage " while applying the rule

Page 69

(1) Our initial assumption > Resumes of " Successful ( past ) candidates , always contain keywords, a , b , c, d ..etc

(2) Process > Find all other resumes which contain.. a, b, c, d

(3) Conclusion > These should " succeed " too

Our goal :

Find all resumes which have high probability of " success "

System should automatically keep adding to the database , all the " actually successful " candidates as each search assignment gets over

Page 70

* In 1957 , this ( travelling salesman problem ) , was part of " Operations Research " course at Uni of Kansas

* With huge number crunching capacities of modern computers , computational costs are not an important consideration

Page 72

* We are plotting frequency of occurrence of keywords in specific past resumes to generalize our observations

* We can construct a table like this from our 65,000 resumes & then try to develop " algorithms "

* Abhi / Rajeev > In above table , the last column ( Job or Actual Designation ) , can also be substituted by ,

# Industry # Function # Edu Qualification ,

and a new set of " algorithms " will emerge

Page 74

Our resumes also " leave out " a hell of a lot of " Variables " !

A resume is like a jigsaw puzzle with a lot of missing pieces !

We are basing on " Statistical Forecasting " , viz:

frequency of occurrence of certain keywords and attaching ( assigning ) probability values

Page 76

Just imagine , if we too can locate and deliver to our clients, just that candidate that he needs ! - in the first place , just those resumes , which he is likely to value / appreciate

I am sure, by now superior languages must have emerged.

Superior hardware certainly has , as has " Conventional Tools " of database management

Page 77

Perhaps what was " specialized " hardware in 1989, must have become quite common today in 2002 - and fairly cheap too

Page 81

* We must figure out ( - and write down ) , what " logic / rules " our consultants use ( even subconsciously ) , while selecting / rejecting a resume ( as being suitable ) , for a client need . Expert System must mimic a human expert

* We are basing ourselves ( ie: our proposed Expert System ) on this " type " ( see " patterns " in " Digital Biology " )

Page 82

This is what we are after ( Statistical Analysis / Fuzzy Forecasting / Pattern Recognition )

Page 87

Probability that this keyword ( symptom ) will be observed / found , given that the resume ( patient ) belongs to XYZ industry ( illness )

Page 88

* Random person = any given " incoming " email resume

Influence = " Automobile " industry

Based on our past population ? = { " Auto" resumes divided by All resumes } probability

If " symptoms " = keywords, which " symptoms " ( keywords ) , have appeared in " Auto " industry resumes , OR ,

which keywords have NEVER appeared in " Auto " resumes , OR

appeared with low frequency ?

Page 89

With addition of ALL keywords ( including new keywords , not previously recorded as " keyword " ) from each new incoming resume , the " prior probabilities " keep changing

Page 90

In this " resume " , assuming it belongs to a particular " industry " ,

> what keywords can be expected

> what keywords may not be expected

We can also reverse the " reasoning " viz:

> What " industry " ( or " Function " ) might a given resume belong to , if

* certain keywords are present , AND

* certain keywords are absent ?

* The " result / answer " provided by Expert System , can then be tested / verified with what jobseeker has himself " clicked "

Page 91

Reiteration

So each new incoming resume would change the " prior probability ' ( again and again ), for each Industry / Function / Designation / Edu Quali / Exp etc

Graph : X Axis > Cumulative No of Resumes Y Axis > Probability No ( 0 to 1 )

Page 92

With 65,000 resumes, ( ie; Patients ), and 13,000 keywords ( symptoms ) , we could get a fairly accurate " estimate " of " prior probabilities "

This will keep improving / converging ( see hand-drawn graph on page 91 ) as resume database and keywords database keeps growing ( especially if we can succeed in downloading thousands of resumes ( or job advts ) , from Naukri / Monster / JobsAhead etc

Page 93

In Indian resumes , keyword " Birth date " would have the probability of 0.99999

Of course, most such keywords are of no interest to us !

Page 95

For our purpose, " keywords " are all " items of evidence "

If each and every keyword found in a incoming resume , corresponds to our " hypothesis " ( viz: keywords A / B / C / , are always present in resumes belonging to ' Auto ' industry ), then we obtain max possible " Posterior Probability "

So , if our knowledge base ( not only of keywords , but phrases / sentences / names of current and past employers / posting cities etc ) is VERY WIDE and VERY DEEP , we would be able to formulate more accurate hypothesis and obtain higher " posterior probability "

Page 96

So , the key issue is to write down a " Set of hypothesis "

Page 97

Let us say , keyword " Derivative " may have a very low frequency of occurrence in 65,000 resumes ( of all industries put together ) but, it could be a very important keyword for the " Financial Services " industry

Page 98

Eg: certain keywords are often associated ( found in ) with certain " Industries " or certain ' Functions " ( Domain keywords )

Page 99

With each incoming resume , the probability of each keyword ( in keyword database ) will keep changing

Eg:

Does " Edu Quali " have any role / effect /ightage , in the selection of a candidate ?

Eg :

What is the max age ( or Min Exp ) at which corporate will appoint a Manager / a General Manager / a Vice President ?

Page 103

Based on frequency distribution in existing 65,000 resumes , say , every 20th incoming resume belongs to ,

> XYZ industry

> ABC function

Page 104

Evidence : If an incoming resume belongs to " Auto " industry, it would contain keywords , " car / automobile " etc , OR Edu Quali = Diploma in Auto Engineering

Eg :

Probability of selection of an executive is ZERO , if he is 65 years of age !

One could assign " probability of getting short-listed " or even " probability of getting appointed " for each AGE or for each " YEARS OF EXPERIENCE " or for each " EDU LEVEL " etc

This is very similar to asking :

" Will this executive ( incoming resume ) NOT get short-listed ? OR , NOT get appointed ?

Page 105

Obviously ,

# a person ( incoming resume ) having no experience ( fresh graduate ) will NOT get shortlisted for the position of MANAGER ( Zero Probability )

# a person ( incoming resume ) having NO graduate degree , will NOT get shortlisted for the position of MANAGER ( Zero Probability )

# a person ( incoming resume ) with less than 5 years of experience , will NOT get shortlisted for the position of GENERAL MANAGER ( Zero Probability ),

but

will get shortlisted for the position of SUPERVISOR ( 0.9 probability )

Page 106

So , we need to build up a database of WHO ( executive ) got shortlisted / appointed , by WHOM ( client ) and WHEN , and WHY ( to best of our knowledge ), over the last 13 years and HOW MUCH his " background " matched " Client Requirement " and Keywords in resumes

Page 107

* How " good " or " bad " is a given resume for a given client needs ?

* Co-relating ,

> Search parameters used , with

> Search results ( resumes / keywords in resumes ) ,

for each and every , online / offline Resume Search

Page 109

* Is this like saying : " Resume A belongs to Engineering Industry to some degree and also to Automobile Industry to some degree ?

Same way can be said about " Functions "

* Same as our rating one resume as " Good " and another as " Bad " ( of course, in relation to client's stated needs )

Page 110

Eg: SET A > Resumes belonging to Eng Industry

SET B > Resumes belonging to Auto Industry

A resume can belong to both the sets with " different degree of membership "

Page 114

In our case , we have to make a " discreet decision " , viz; " Shall I shortlist & forward this resume to client or not ?

Page 115

* A given keyword is present in a resume , then treat the resume as belonging to " Eng Ind "

Or ( better ) , a " given combination " of keywords being present or absent in a resume should be classified under ( say ) , " Engg Ind "

* For us , " real world experience " = several sets of " keywords " discovered in 65,000 resumes which are already categorized ( by human experts ) as belonging to Industry or Function , A or B or C

Page 118

Eg : A given resume is a " very close match " / " close match " / " fairly close match " / " not a very close match " ,

with client's requirement

Page 127

* Who knows , what " new / different " keywords would appear ( - and what will disappear ) , in a given type of resumes , over next 5 / 10 years ?

* We may think of our Corporate Database in these terms and call it " Corporate Knowledgebase "

Page 129

* One could replace " Colour of Teeth " = Designation Level ( MD / CEO / President / GM )

* " Data Structure " table with following columns :

> Keywords

> Industry

> Function

> Designation

> Education

> Age

> Experience

> Current Employer

> Skills

> Attitudes

> Attributes

Page 132

Is this somewhat like saying :

" This ( given ) resume belongs to Industry X or Industry Y ?

Number of times a given resume got shortlisted in years , X / X+1 / X+2 / etc

If we treat " getting shortlisted " as a measure of " Success " , ( which is as close to defining " success " as we can ), = prize money won

Of course , in our case , " No of times getting shortlisted " ( in any given year 0 , is a function of ,

> Industry background

> Function background

> Designation

> Edu

> Age ( Max )

> Exp ( Min )

> Skills

> Knowledge etc

Page 133

Which is what the resumes are made up of ( - sentences )

See my notes on ARDIS and ARGIS

Page 136

This was written 13 years ago . In last few months, scientists have implanted simple micro sensors in human body and succeeded in " integrating " these ( chips ) into human nervous system !

Eg : restoring " vision " thru chip implant ( Macular Degeneration of Retina )

Page 143

In ARDIS , I have talked about character recognition / word recognition / sentence - phrase recognition

Page 148

" You see what you want to see and hear what you want to hear "

I have no doubt these ( Object oriented Language ) must have made great strides in last 20 years

Page 149

* Are our 13,000 keywords , the " Working Memory " ?

* Eg : Frequency distribution of keywords belonging to ( say ) 1,000 resumes belonging to " Pharma Industry " , is a " pattern "

When a new resume arrives, the software " invokes " this " pattern " to check the amount / degree of " MATCH "

Page 150

Eg : This is like starting with an assumption ( hypothesis ) that the next incoming resume belongs to " Auto Industry " or to " Sales " function , and then proceed to prove / disprove it

Page 151

Keywords pertaining to any given " Industry " or " Function " will go on changing over the years , as new skills and knowledge gets added so " recent ' keywords are more valid

Eg : " Balanced Score Card " was totally unknown 2 years ago !

Page 152

In case of " keywords " , is this comparable to ordering ( ie ; arranging ) by frequency of occurrence ?

Page 153

* Treating child as an " Expert System " , capable of drawing inference

* Eg: blocks different colours

* Rules will remain same

* Keywords will change over time ( working memory )

* Gross ( or crude ) search , to be refined later on

Page 154

* Obtaining an understanding of what the system is trying to achieve

* I suppose this has already happened

Page 155

* See my notes on " Words and Sentences that follow "

Page 178

* We are thinking in terms of past " Resumes " which have been " short-listed"

* I have already written some rules

* This must have happened in last 13 years !

Page 179

We must question our consultants as to what logic / rules do they use / apply ( even subconsciously ) - to short-list or rather select from shortlisted candidates

Page 181

" KNOWLEDGE ACQUISITION TOOLS " , developed by us over the years :

* Highlighter

* Eliminator

* Refiner

* Compiler

* Educator

* Composer

* Matchmaker

* Member Segregator

* Mapper ( to be developed )

Page 186

I have written some rules but many more need to be written

Page 191

Like " weightages" in Neural Networks ?

To find " real evidence " , take resumes ( keywords ) AND " Interview Assessment Sheets " of " Successful " candidates and find co-relation

Eg : Rules re " Designation Level " could conflict with rules re: " Experience ( years ) " or " Age ( Min ) "

Page 192

Special Rule > Eg: Edu Quali = CA / ICWA / MBA , for " Finance " function

General Rule > Basic degree in commerce

" Rule Sets " on Age ( max ) / Exp ( min ) / Edu Quali / Industry / Function / Designation level ..etc

If our Corporate Client can explicitly state " weightage " for each of the above , it would simplify the design of expert system

Mr Nagle had actually built this ( weightage ) factor in our HH3P search engine , way back in 1992

Instead of clients , consultants entered these " weightages " in search engine ; but resume database was too small ( may be < 5,000 )

Page 197

Quacks sometimes do " cure " a disease , but we would never " know " why or how !

Page 198

We have databases of :

* Successful ( or " failed " ) candidates , and

* their " Resumes " and " Assessment Sheets " , and

* Keywords contained in these resumes

Page 199

* " Not Fine " = Failed candidates

* Fine = Successful candidates ( from historical data )

* We will need to define which are " successful " and which are " failure " candidates ( of course , in relation to a given vacancy )

* We want to be able to predict which candidates are likely to be " successful "

* We must have data on past 500 " successful " and 5000 " failure " candidates in our database

Page 222

Abhi :

This is exactly what our " Knowledge Tools " do , viz:

* Highlighter

* Eliminator

* Refiner

* Sorter

* Matchmaker

* Educator

* Compiler

* Member Segregator etc

Page 227

Diebold predicted such a factory in his 1958 book " Automatic Factory "

Someday , we would modify our OES ( Order Execution System ) in such a way that our clients will " Self Service " themselves

" SiVA " on JobStreet.com has elements of such a self-serving system

Page 232

Statistical Analysis of thousands of job advts of last 5 years , could help us extrapolate next 10 year's trend

Page 236

We are planning a direct link from HR Managers to our OES ( which is our factory floor )

We will still need " consultants " - but only to interact ( talk ) with clients & candidates ; not to fill in INPUT screens of OES !

Page 237

Obviously , author had a vision of Internet - which , I could envision 2 years earlier in 1987 , in my report " QUO VADIS "

Page 239

Norbert Weiner had predicted this in 1958

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Monday, 24 October 2016

AIR - ( PART ONE ) : ARTIFICIAL INTELLIGENCE IN RECRUITING

No comments:

Post a Comment