About: http://ods-qa.openlinksw.com:8896/proxy-iri/00d1d1e04beb12927391e98fb6b9ad7c0ca950bf

Not logged in : Login

Facets (new session)
Description
Metadata
Settings
- Rule:
- Inverse Functional Properties:
- "Same As":

About: http://ods-qa.openlinksw.com:8896/proxy-iri/00d1d1e04beb12927391e98fb6b9ad7c0ca950bf Goto Sponge NotDistinct Permalink

An Entity of Type : schema:DiscussionForumPosting, within Data Space : ods-qa.openlinksw.com:8896 associated with source document(s)

Attributes	Values
type	schema:BreadcrumbList schema:DiscussionForumPosting
datePublished	2016-04-25T17:43:41Z
publisher	Community
described by	Fun with Data: Calculating Gender Balance Using First Names - Data Explorers Guild - Community https://discourse.looker.com/t/fun-with-data-calculating-gender-balance-using-first-names/2526
mainEntityOfPage	https://discourse.looker.com/t/fun-with-data-calculating-gender-balance-using-first-names/2526
author	http://ods-qa.openlinksw.com:8896/proxy-iri/b43fc0710eb5d072ecbc008eaf4750df59d0d22d
itemListElement	http://ods-qa.openlinksw.com:8896/proxy-iri/4218899079d2fcead2151aa20ce3760cd9df8ad7
position	#1
keywords
dateModified	2019-01-16T19:07:57Z
interactionStatistic	http://ods-qa.openlinksw.com:8896/proxy-iri/bcb76b6046c0c60312dddccd91ac5f32ca1ed4e9 http://ods-qa.openlinksw.com:8896/proxy-iri/62169e7141915e26682153e7ea0a4a817796e8c2
articleBody	In the The Name Game: Step 3 we were able to compute, given a name, the percent likelihood that the name is male. The calculation is computed by dividing the size of the male population for a given name by the total population for that name. Some Gender Neutral Names Explore From Here Building a Name Map We can use the same table in BigQuery and this simple query to build a lookup table that maps names to their likelihood of being male. If we don’t have data (later when we join and the join fails), we assume a 50% probability. Old LookML - view: gender_guess derived_table: sql: \| SELECT UPPER(name) AS name , FLOAT(SUM(CASE WHEN gender = 'M' THEN number ELSE 0 END)) / SUM(number) AS percentage_male FROM [fh-bigquery:popular_names.usa_1910_2013] GROUP EACH BY 1 fields: - dimension: name - dimension: percentage_male type: number sql: COALESCE(${TABLE}.percentage_male, 0.5) New LookML view: gender_guess { derived_table: { sql: SELECT UPPER(name) AS name , FLOAT(SUM(CASE WHEN gender = 'M' THEN number ELSE 0 END)) / SUM(number) AS percentage_male FROM [fh-bigquery:popular_names.usa_1910_2013] GROUP EACH BY 1 ;; } dimension: name {} dimension: percentage_male { type: number sql: COALESCE(${TABLE}.percentage_male, 0.5) ;; } } Names and likelihood they are Male The names with Percentage Male = 1 are certainly male, the names with 0 are certainly female. The fractional names are somewhere between. Explore From Here Names and the United States Patent and Trademark Office (USPTO) I recently uploaded all the USPTO data to BigQuery. The main table in this dataset is ‘case_files’ and on each case file, there is the name of the attorney assigned to the case. Attorney Names Attorneys and the number of cases they’ve worked on. Explore From Here Parsing out the First Name It appears that names are of the form <LAST_NAME>, <FIRST_NAME> <INITIAL OR NAME> We can parse the pretty easily using a regular expression. First names appear to immediately follow the comma. We can codify this with a new dimension and a regular expression. Old LookML - dimension: exm_attourney_first_name sql: REGEXP_EXTRACT(${exm_attorney_name}, `, (\\w+)`) New LookML dimension: exm_attourney_first_name { sql: REGEXP_EXTRACT(${exm_attorney_name}, `, (\\w+)`) ;; } And the results: Explore From Here Joining the Tables Big query doesn’t let us join on expressions, so we have to move the dimension in a derived table. BigQuery is smart enough to optimize this out if we don’t use the expression when referencing this derived table in a query. Old LookML - view: case_file derived_table: sql: \| SELECT , REGEXP_EXTRACT(exm_attorney_name, ', (\\w+)') as exm_attorney_first_name FROM trademark.case_file New LookML view: case_file { derived_table: { sql: SELECT , REGEXP_EXTRACT(exm_attorney_name, ', (\\w+)') as exm_attorney_first_name FROM trademark.case_file ;; } } Next we join in gender_guess to the case file. Old LookML - explore: case_file joins: - join: exm_attorney_gender from: gender_guess sql_on: ${case_file.exm_attorney_first_name} = ${exm_attorney_gender.name} relationship: many_to_one New LookML explore: case_file { join: exm_attorney_gender { from: gender_guess sql_on: ${case_file.exm_attorney_first_name} = ${exm_attorney_gender.name} ;; relationship: many_to_one } } Now we see names together with gender score (percentage male). Explore From Here Add Some Measures We’d like to be able to see the count of attorneys and the percentage of those attorneys that were male over time. Summing the probabilities distinctly on the attorney name will give us those counts. Old LookML - measure: count_male_cases type: number sql: SUM(${exm_attorney_gender.percentage_male}) - measure: percentage_male_cases type: number sql: ${count_male_cases}/${count} value_format_name: percent_2 - measure: count_attornies type: count_distinct sql: ${exm_attorney_name} - measure: count_male_attornies type: sum_distinct sql: ${exm_attorney_gender.percentage_male} sql_distinct_key: ${exm_attorney_name} - measure: percentage_male_attornies type: number sql: ${count_male_attornies}/${count_attornies} value_format_name: percent_2 New LookML measure: count_male_cases { type: number sql: SUM(${exm_attorney_gender.percentage_male}) ;; } measure: percentage_male_cases { type: number sql: ${count_male_cases}/${count} ;; value_format_name: percent_2 } measure: count_attornies { type: count_distinct sql: ${exm_attorney_name} ;; } measure: count_male_attornies { type: sum_distinct sql: ${exm_attorney_gender.percentage_male} ;; sql_distinct_key: ${exm_attorney_name} ;; } measure: percentage_male_attornies { type: number sql: ${count_male_attornies}/${count_attornies} ;; value_format_name: percent_2 } Gender Mix Over Time I looks like in 1978 the USPTO Examiner staff was 2/3’s Male, and that those men handled close to 90% of the case load. 10 years later, in 1988, that balance had changed to 50%, with about 50% of the case load being male. Since then, the examiner staff has become predominately female —now only 40% male, with only 40% of the case load being handled by male attorneys. Explore From Here
headline	Fun with Data: Calculating Gender Balance Using First Names
is topic of	https://discourse.looker.com/t/fun-with-data-calculating-gender-balance-using-first-names/2526
is container of of	Fun with Data: Calculating Gender Balance Using First Names - Data Explorers Guild - Community
is object of	Embedded HTML5 Microdata Statement 1
is subject of	Embedded HTML5 Microdata Statement 8 Embedded HTML5 Microdata Statement 9 Embedded HTML5 Microdata Statement 10 Embedded HTML5 Microdata Statement 11 Embedded HTML5 Microdata Statement 12 Embedded HTML5 Microdata Statement 13 Embedded HTML5 Microdata Statement 14 Embedded HTML5 Microdata Statement 15 Embedded HTML5 Microdata Statement 16 Embedded HTML5 Microdata Statement 17 Embedded HTML5 Microdata Statement 18 Embedded HTML5 Microdata Statement 19 Embedded HTML5 Microdata Statement 20 Embedded HTML5 Microdata Statement 21

Faceted Search & Find service v1.17_git55 as of Mar 01 2021

Alternative Linked Data Documents: ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 08.03.3322 as of Mar 14 2022, on Linux (x86_64-generic-linux-glibc25), Single-Server Edition (7 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software