_:vb829602 "Community" . _:vb829606 "Fun with Data: Calculating Gender Balance Using First Names" . _:vb829622 "2019-01-16T19:07:57Z" . . . _:vb829616 . _:vb829622 "#1" . _:vb829617 . . _:vb829621 "http://schema.org/ListItem" . . "Embedded JSONLD-in-HTML Statement 22" . _:vb829622 _:vb829626 . _:vb829606 "2016-11-08T19:35:08Z" . _:vb829622 _:vb829625 . . _:vb829601 "Fun with Data: Calculating Gender Balance Using First Names" . . . . _:vb829599 _:vb829600 . _:vb829630 . . _:vb829631 . . _:vb829614 "\n Terms of Service \n " . "https://discourse.looker.com/uploads/default/original/2X/5/5a6904546e7c2ff87c53f993551e16ae1ecb2194.png" . _:vb829607 "http://schema.org/Organization" . _:vb829628 . _:vb829629 . . _:vb829626 . _:vb829601 "#1" . _:vb829602 "http://schema.org/Organization" . _:vb829627 . _:vb829635 "http://schema.org/SiteNavigationElement" . _:vb829600 "http://schema.org/ListItem" . _:vb829624 . _:vb829634 "http://schema.org/SiteNavigationElement" . _:vb829622 "http://schema.org/DiscussionForumPosting" . _:vb829625 . _:vb829633 "http://schema.org/SiteNavigationElement" . _:vb829606 _:vb829610 . _:vb829622 . _:vb829632 "http://schema.org/SiteNavigationElement" . _:vb829606 _:vb829609 . _:vb829623 . . . _:vb829620 . . "In the The Name Game: Step 3 we were able to compute, given a name, the percent likelihood that the name is male. The calculation is computed by dividing the size of the male population for a given name by the total pop\u2026" . . _:vb829621 . . . . _:vb829601 _:vb829604 . _:vb829636 "http://schema.org/SiteNavigationElement" . _:vb829601 _:vb829605 . . . . _:vb829616 . . _:vb829624 . . _:vb829616 . _:vb829601 "2019-01-16T19:07:57Z" . _:vb829617 . . . . _:vb829617 . . _:vb829614 . . . _:vb829627 "2016-11-08T19:35:08Z" . _:vb829615 . _:vb829612 . _:vb829627 "http://schema.org/DiscussionForumPosting" . . _:vb829630 "http://schema.org/InteractionCounter" . . . . "width=device-width, initial-scale=1.0, minimum-scale=1.0, user-scalable=yes, viewport-fit=cover" . _:vb829613 . . _:vb829631 "http://schema.org/InteractionCounter" . . _:vb829610 . _:vb829617 "2020-10-25T02:21:01.469691"^^ . . _:vb829611 . _:vb829625 "http://schema.org/InteractionCounter" . _:vb829616 "2020-10-25T02:21:01.469691"^^ . _:vb829608 . _:vb829626 "http://schema.org/InteractionCounter" . _:vb829606 "http://schema.org/DiscussionForumPosting" . _:vb829609 . _:vb829601 "http://schema.org/DiscussionForumPosting" . _:vb829606 . _:vb829607 . _:vb829604 . "In the The Name Game: Step 3 we were able to compute, given a name, the percent likelihood that the name is male. The calculation is computed by dividing the size of the male population for a given name by the total population for that name. Some Gender Neutral Names Explore From Here Building a Name Map We can use the same table in BigQuery and this simple query to build a lookup table that maps names to their likelihood of being male. If we don\u00E2\u0080\u0099t have data (later when we join and the join..." . _:vb829605 . . _:vb829601 "\n In the The Name Game: Step 3 we were able to compute, given a name, the percent likelihood that the name is male. The calculation is computed by dividing the size of the male population for a given name by the total population for that name.\nSome Gender Neutral Names\n\nExplore From Here\nBuilding a Name Map\nWe can use the same table in BigQuery and this simple query to build a lookup table that maps names to their likelihood of being male. If we don\u00E2\u0080\u0099t have data (later when we join and the join fails), we assume a 50% probability.\n\nOld LookML\n- view: gender_guess\n derived_table:\n sql: |\n SELECT\n UPPER(name) AS name\n , FLOAT(SUM(CASE WHEN gender = 'M' THEN number ELSE 0 END))\n / SUM(number) AS percentage_male\n FROM [fh-bigquery:popular_names.usa_1910_2013]\n GROUP EACH BY 1\n fields:\n - dimension: name\n - dimension: percentage_male\n type: number\n sql: COALESCE(${TABLE}.percentage_male, 0.5)\n\n\n\nNew LookML\nview: gender_guess {\n derived_table: {\n sql: SELECT\n UPPER(name) AS name\n , FLOAT(SUM(CASE WHEN gender = 'M' THEN number ELSE 0 END))\n / SUM(number) AS percentage_male\n FROM [fh-bigquery:popular_names.usa_1910_2013]\n GROUP EACH BY 1\n ;;\n }\n\n dimension: name {}\n\n dimension: percentage_male {\n type: number\n sql: COALESCE(${TABLE}.percentage_male, 0.5) ;;\n }\n}\n\n\nNames and likelihood they are Male\nThe names with Percentage Male = 1 are certainly male, the names with 0 are certainly female. The fractional names are somewhere between.\n\nExplore From Here\nNames and the United States Patent and Trademark Office (USPTO)\nI recently uploaded all the USPTO data to BigQuery. The main table in this dataset is \u00E2\u0080\u0098case_files\u00E2\u0080\u0099 and on each case file, there is the name of the attorney assigned to the case.\nAttorney Names\nAttorneys and the number of cases they\u00E2\u0080\u0099ve worked on.\n\nExplore From Here\nParsing out the First Name\nIt appears that names are of the form\n\n, \n\nWe can parse the pretty easily using a regular expression. First names appear to immediately follow the comma. We can codify this with a new dimension and a regular expression.\n\nOld LookML\n - dimension: exm_attourney_first_name\n sql: REGEXP_EXTRACT(${exm_attorney_name}, `, (\\\\w+)`)\n\n\n\nNew LookML\ndimension: exm_attourney_first_name {\n sql: REGEXP_EXTRACT(${exm_attorney_name}, `, (\\\\w+)`) ;;\n}\n\n\nAnd the results:\n\nExplore From Here\nJoining the Tables\nBig query doesn\u00E2\u0080\u0099t let us join on expressions, so we have to move the dimension in a derived table. BigQuery is smart enough to optimize this out if we don\u00E2\u0080\u0099t use the expression when referencing this derived table in a query.\n\nOld LookML\n- view: case_file\n derived_table:\n sql: |\n SELECT *,\n REGEXP_EXTRACT(exm_attorney_name, ', (\\\\w+)') as exm_attorney_first_name\n FROM trademark.case_file\n\n\n\nNew LookML\nview: case_file {\n derived_table: {\n sql: SELECT *,\n REGEXP_EXTRACT(exm_attorney_name, ', (\\\\w+)') as exm_attorney_first_name\n FROM trademark.case_file\n ;;\n }\n}\n\n\nNext we join in gender_guess to the case file.\n\nOld LookML\n- explore: case_file\n joins:\n - join: exm_attorney_gender\n from: gender_guess\n sql_on: ${case_file.exm_attorney_first_name} = ${exm_attorney_gender.name}\n relationship: many_to_one\n\n\n\nNew LookML\nexplore: case_file {\n join: exm_attorney_gender {\n from: gender_guess\n sql_on: ${case_file.exm_attorney_first_name} = ${exm_attorney_gender.name} ;;\n relationship: many_to_one\n }\n}\n\n\nNow we see names together with gender score (percentage male).\n\nExplore From Here\nAdd Some Measures\nWe\u00E2\u0080\u0099d like to be able to see the count of attorneys and the percentage of those attorneys that were male over time. Summing the probabilities distinctly on the attorney name will give us those counts.\n\nOld LookML\n- measure: count_male_cases\n type: number\n sql: SUM(${exm_attorney_gender.percentage_male})\n\n - measure: percentage_male_cases\n type: number\n sql: ${count_male_cases}/${count}\n value_format_name: percent_2\n\n - measure: count_attornies\n type: count_distinct\n sql: ${exm_attorney_name}\n\n - measure: count_male_attornies\n type: sum_distinct\n sql: ${exm_attorney_gender.percentage_male}\n sql_distinct_key: ${exm_attorney_name}\n\n - measure: percentage_male_attornies\n type: number\n sql: ${count_male_attornies}/${count_attornies}\n value_format_name: percent_2\n\n\n\nNew LookML\nmeasure: count_male_cases {\n type: number\n sql: SUM(${exm_attorney_gender.percentage_male}) ;;\n}\n\nmeasure: percentage_male_cases {\n type: number\n sql: ${count_male_cases}/${count} ;;\n value_format_name: percent_2\n}\n\nmeasure: count_attornies {\n type: count_distinct\n sql: ${exm_attorney_name} ;;\n}\n\nmeasure: count_male_attornies {\n type: sum_distinct\n sql: ${exm_attorney_gender.percentage_male} ;;\n sql_distinct_key: ${exm_attorney_name} ;;\n}\n\nmeasure: percentage_male_attornies {\n type: number\n sql: ${count_male_attornies}/${count_attornies} ;;\n value_format_name: percent_2\n}\n\n\nGender Mix Over Time\nI looks like in 1978 the USPTO Examiner staff was 2/3\u00E2\u0080\u0099s Male, and that those men handled close to 90% of the case load. 10 years later, in 1988, that balance had changed to 50%, with about 50% of the case load being male.\nSince then, the examiner staff has become predominately female \u00E2\u0080\u0094now only 40% male, with only 40% of the case load being handled by male attorneys.\n\nExplore From Here\n " . _:vb829606 "2016-11-08T19:35:08Z" . _:vb829602 . . _:vb829603 . _:vb829611 "http://schema.org/SiteNavigationElement" . _:vb829600 . _:vb829636 "\n Privacy Policy \n " . _:vb829601 . _:vb829601 _:vb829602 . . _:vb829615 "http://schema.org/SiteNavigationElement" . _:vb829606 _:vb829607 . _:vb829614 "http://schema.org/SiteNavigationElement" . "24" . _:vb829613 "http://schema.org/SiteNavigationElement" . "https://discourse.looker.com/t/fun-with-data-calculating-gender-balance-using-first-names/2526" . _:vb829612 "http://schema.org/SiteNavigationElement" . _:vb829609 "http://schema.org/InteractionCounter" . _:vb829620 _:vb829621 . _:vb829610 "http://schema.org/InteractionCounter" . . _:vb829604 "http://schema.org/InteractionCounter" . . "Embedded JSONLD-in-HTML Statement 20" . _:vb829632 "\n Home \n " . _:vb829605 "http://schema.org/InteractionCounter" . "2020-10-25T02:21:01.469691"^^ . _:vb829622 _:vb829623 . "Embedded JSONLD-in-HTML Statement 21" . _:vb829606 "\n Edited to add in new LookML\n " . _:vb829620 "http://schema.org/BreadcrumbList" . . _:vb829621 "1" . . _:vb829622 "\n In the The Name Game: Step 3 we were able to compute, given a name, the percent likelihood that the name is male. The calculation is computed by dividing the size of the male population for a given name by the total population for that name.\nSome Gender Neutral Names\n\nExplore From Here\nBuilding a Name Map\nWe can use the same table in BigQuery and this simple query to build a lookup table that maps names to their likelihood of being male. If we don\u00E2\u0080\u0099t have data (later when we join and the join fails), we assume a 50% probability.\n\nOld LookML\n- view: gender_guess\n derived_table:\n sql: |\n SELECT\n UPPER(name) AS name\n , FLOAT(SUM(CASE WHEN gender = 'M' THEN number ELSE 0 END))\n / SUM(number) AS percentage_male\n FROM [fh-bigquery:popular_names.usa_1910_2013]\n GROUP EACH BY 1\n fields:\n - dimension: name\n - dimension: percentage_male\n type: number\n sql: COALESCE(${TABLE}.percentage_male, 0.5)\n\n\n\nNew LookML\nview: gender_guess {\n derived_table: {\n sql: SELECT\n UPPER(name) AS name\n , FLOAT(SUM(CASE WHEN gender = 'M' THEN number ELSE 0 END))\n / SUM(number) AS percentage_male\n FROM [fh-bigquery:popular_names.usa_1910_2013]\n GROUP EACH BY 1\n ;;\n }\n\n dimension: name {}\n\n dimension: percentage_male {\n type: number\n sql: COALESCE(${TABLE}.percentage_male, 0.5) ;;\n }\n}\n\n\nNames and likelihood they are Male\nThe names with Percentage Male = 1 are certainly male, the names with 0 are certainly female. The fractional names are somewhere between.\n\nExplore From Here\nNames and the United States Patent and Trademark Office (USPTO)\nI recently uploaded all the USPTO data to BigQuery. The main table in this dataset is \u00E2\u0080\u0098case_files\u00E2\u0080\u0099 and on each case file, there is the name of the attorney assigned to the case.\nAttorney Names\nAttorneys and the number of cases they\u00E2\u0080\u0099ve worked on.\n\nExplore From Here\nParsing out the First Name\nIt appears that names are of the form\n\n, \n\nWe can parse the pretty easily using a regular expression. First names appear to immediately follow the comma. We can codify this with a new dimension and a regular expression.\n\nOld LookML\n - dimension: exm_attourney_first_name\n sql: REGEXP_EXTRACT(${exm_attorney_name}, `, (\\\\w+)`)\n\n\n\nNew LookML\ndimension: exm_attourney_first_name {\n sql: REGEXP_EXTRACT(${exm_attorney_name}, `, (\\\\w+)`) ;;\n}\n\n\nAnd the results:\n\nExplore From Here\nJoining the Tables\nBig query doesn\u00E2\u0080\u0099t let us join on expressions, so we have to move the dimension in a derived table. BigQuery is smart enough to optimize this out if we don\u00E2\u0080\u0099t use the expression when referencing this derived table in a query.\n\nOld LookML\n- view: case_file\n derived_table:\n sql: |\n SELECT *,\n REGEXP_EXTRACT(exm_attorney_name, ', (\\\\w+)') as exm_attorney_first_name\n FROM trademark.case_file\n\n\n\nNew LookML\nview: case_file {\n derived_table: {\n sql: SELECT *,\n REGEXP_EXTRACT(exm_attorney_name, ', (\\\\w+)') as exm_attorney_first_name\n FROM trademark.case_file\n ;;\n }\n}\n\n\nNext we join in gender_guess to the case file.\n\nOld LookML\n- explore: case_file\n joins:\n - join: exm_attorney_gender\n from: gender_guess\n sql_on: ${case_file.exm_attorney_first_name} = ${exm_attorney_gender.name}\n relationship: many_to_one\n\n\n\nNew LookML\nexplore: case_file {\n join: exm_attorney_gender {\n from: gender_guess\n sql_on: ${case_file.exm_attorney_first_name} = ${exm_attorney_gender.name} ;;\n relationship: many_to_one\n }\n}\n\n\nNow we see names together with gender score (percentage male).\n\nExplore From Here\nAdd Some Measures\nWe\u00E2\u0080\u0099d like to be able to see the count of attorneys and the percentage of those attorneys that were male over time. Summing the probabilities distinctly on the attorney name will give us those counts.\n\nOld LookML\n- measure: count_male_cases\n type: number\n sql: SUM(${exm_attorney_gender.percentage_male})\n\n - measure: percentage_male_cases\n type: number\n sql: ${count_male_cases}/${count}\n value_format_name: percent_2\n\n - measure: count_attornies\n type: count_distinct\n sql: ${exm_attorney_name}\n\n - measure: count_male_attornies\n type: sum_distinct\n sql: ${exm_attorney_gender.percentage_male}\n sql_distinct_key: ${exm_attorney_name}\n\n - measure: percentage_male_attornies\n type: number\n sql: ${count_male_attornies}/${count_attornies}\n value_format_name: percent_2\n\n\n\nNew LookML\nmeasure: count_male_cases {\n type: number\n sql: SUM(${exm_attorney_gender.percentage_male}) ;;\n}\n\nmeasure: percentage_male_cases {\n type: number\n sql: ${count_male_cases}/${count} ;;\n value_format_name: percent_2\n}\n\nmeasure: count_attornies {\n type: count_distinct\n sql: ${exm_attorney_name} ;;\n}\n\nmeasure: count_male_attornies {\n type: sum_distinct\n sql: ${exm_attorney_gender.percentage_male} ;;\n sql_distinct_key: ${exm_attorney_name} ;;\n}\n\nmeasure: percentage_male_attornies {\n type: number\n sql: ${count_male_attornies}/${count_attornies} ;;\n value_format_name: percent_2\n}\n\n\nGender Mix Over Time\nI looks like in 1978 the USPTO Examiner staff was 2/3\u00E2\u0080\u0099s Male, and that those men handled close to 90% of the case load. 10 years later, in 1988, that balance had changed to 50%, with about 50% of the case load being male.\nSince then, the examiner staff has become predominately female \u00E2\u0080\u0094now only 40% male, with only 40% of the case load being handled by male attorneys.\n\nExplore From Here\n " . _:vb829636 . _:vb829634 . _:vb829635 . _:vb829632 . _:vb829633 . _:vb829630 . _:vb829631 . _:vb829628 . _:vb829629 . _:vb829626 . _:vb829627 . _:vb829624 . _:vb829625 . _:vb829622 . _:vb829623 . _:vb829620 . _:vb829621 . _:vb829615 "\n Privacy Policy \n " . _:vb829616 . _:vb829617 . _:vb829614 . _:vb829615 . _:vb829612 . . _:vb829613 . . . _:vb829610 . _:vb829611 . _:vb829616 . _:vb829608 . _:vb829609 . _:vb829606 . . _:vb829636 . _:vb829607 . _:vb829617 . _:vb829604 . "Embedded JSONLD-in-HTML Statement 19" . _:vb829600 "1" . _:vb829605 . _:vb829602 . . _:vb829627 _:vb829628 . _:vb829603 . _:vb829600 . . _:vb829601 . . _:vb829634 . . _:vb829599 . . . . _:vb829635 . "https://discourse.looker.com" . . . . . _:vb829632 . . . . . _:vb829633 . . . _:vb829610 "http://schema.org/CommentAction" . . _:vb829600 "Data Explorers Guild" . . . . "#ffffff" . . . . . . . _:vb829611 "\n Home \n " . . . . . . . . . . . . . . . . _:vb829603 "http://schema.org/Person" . . . . . _:vb829605 "http://schema.org/CommentAction" . . . . . . . . . . . _:vb829629 . . . _:vb829622 . _:vb829627 "\n Edited to add in new LookML\n " . . _:vb829606 _:vb829608 . . . . _:vb829609 "http://schema.org/LikeAction" . . . . _:vb829631 "http://schema.org/CommentAction" . _:vb829627 "#2" . . _:vb829608 "http://schema.org/Person" . . . . . . . . . . _:vb829627 . . . _:vb829604 "http://schema.org/LikeAction" . . _:vb829600 . . _:vb829626 "http://schema.org/CommentAction" . . "Embedded JSONLD-in-HTML Statement 17" . . . . "Discourse 2.6.0.beta1 - https://github.com/discourse/discourse version 957e851ffe9bf15bb4d8c6a6d4fe8ff9326f86da" . . . _:vb829601 _:vb829603 . . _:vb829604 "0" . . . . . _:vb829605 "0" . . . . . _:vb829633 "\n Categories \n " . . . _:vb829608 . . . . . _:vb829601 . . _:vb829621 "Data Explorers Guild" . . _:vb829630 "http://schema.org/LikeAction" . . . . _:vb829599 "http://schema.org/BreadcrumbList" . . . . . _:vb829609 "0" . . . . . . . _:vb829610 "0" . "Embedded JSONLD-in-HTML Statement 18" . . . _:vb829606 "#2" . . . . "Fun with Data: Calculating Gender Balance Using First Names" . . . . . _:vb829630 . . . _:vb829606 . _:vb829631 . . _:vb829628 . "summary" . _:vb829629 . . . _:vb829626 . . _:vb829627 . . _:vb829624 . . _:vb829625 . . . _:vb829622 . . _:vb829623 . . _:vb829620 . . _:vb829625 "http://schema.org/LikeAction" . _:vb829621 . . _:vb829621 . . . _:vb829624 "http://schema.org/Person" . . . _:vb829614 . . _:vb829615 . . _:vb829612 . . _:vb829613 . . . _:vb829610 . . . _:vb829611 . "2" . _:vb829608 . . . _:vb829609 . . _:vb829622 _:vb829624 . _:vb829606 . . _:vb829607 . . _:vb829634 "\n FAQ/Guidelines \n " . _:vb829629 "http://schema.org/Person" . _:vb829604 . . _:vb829605 . . _:vb829602 . . _:vb829616 . _:vb829603 . . _:vb829600 . . _:vb829601 . . . _:vb829616 . _:vb829599 . . . . _:vb829616 . . . "2" . . _:vb829616 . _:vb829627 _:vb829629 . . . . . . . . _:vb829622 "2016-04-25T17:43:41Z" . _:vb829625 "0" . . . _:vb829626 "0" . . . . . . . . . . . . . _:vb829630 "0" . . . . . _:vb829631 "0" . . . . _:vb829629 "daniel_nelson_looker" . _:vb829601 "" . . . . . . . . . . . . . "https://discourse.looker.com" . _:vb829612 "\n Categories \n " . . . . _:vb829603 "lloydtabb" . _:vb829623 "Community" . _:vb829599 . _:vb829616 . _:vb829601 "2016-04-25T17:43:41Z" . _:vb829617 . . . _:vb829627 "Fun with Data: Calculating Gender Balance Using First Names" . _:vb829636 . . _:vb829634 . _:vb829635 . _:vb829632 . _:vb829613 "\n FAQ/Guidelines \n " . _:vb829633 . _:vb829622 "Fun with Data: Calculating Gender Balance Using First Names" . _:vb829635 "\n Terms of Service \n " . _:vb829628 "http://schema.org/Organization" . _:vb829608 "daniel_nelson_looker" . _:vb829627 "2016-11-08T19:35:08Z" . _:vb829624 "lloydtabb" . _:vb829622 "" . "32" . _:vb829628 "Community" . . . . _:vb829627 _:vb829630 . _:vb829627 _:vb829631 . _:vb829607 "Community" . "Embedded JSONLD-in-HTML Statement 16" . _:vb829623 "http://schema.org/Organization" . _:vb829603 .