Sunday, September 24, 2023

Median and Quartiles by Group in SQLite

Data distributions are often characterized by their range mean, median, and quartiles.  For data sets in SQLite, the range (min and max) and mean are easily calculated with built-in aggregate functions.  Calculation of quartiles (the first quartile, the median, and the third quartile) require custom SQL code, as there are no built-in aggregate functions for these values.  This previous post contains code to calculate the median, but that technique cannot be extended to calculate the first and third quartiles.

There are other available examples of calculation of quartiles in SQLite, but (as of this writing) all of them (that I have found) yield a result that exactly matches one of the values in the data set.  These results may be only approximate for some data sets, specifically when there is an even number of rows overall or within two adjacent quartiles.

The code below takes the same approach to calculating the first and third quartiles as is standard for the median:

  • If there is an odd number of rows, take the middle value
  • If there is an even number of rows, take the average of the highest value in the lower bin (i.e., the lower half for the median) and the lowest value in the upper bin.

The solution presented here differs from others in accommodating the second of these cases.  In addition, it allows for calculation of these statistics for subgroups of the overall data set.

This illustration is designed to be run on a table with the following columns:

  • "id": A categorical column used to divide the data set into subgroups.  Statistics will be calculated for each unique "id".
  • "value": A numerical column for which the median and quartiles will be calculated.

Such a table can be produced with the following code:

create table testdata as
select *
from
	(select value as id from generate_series(1,8)) as d
	cross join (select random() as value from generate_series(1, 10000));

This approach uses SQLite's 'ntile()' function to subdivide the data set (or each subgroup) into equal or nearly-equal subsets.  The sizes and bounding values for each of these subsets are then extracted to allow determination of which of the two cases listed above should be used for each quartile and the median.

with nt as (
	SELECT
		id, value,
		ntile(4) over (partition by id order by value) as ntile_4
	FROM
		testdata
	),
idlist as (select distinct id from nt),
q1 as (
	select id, max(value) as q1_max, count(*) as q1_n
	from nt where ntile_4 = 1 group by id),
q2 as (
	select id, min(value) as q2_min, max(value) as q2_max, count(*) as q2_n
	from nt where ntile_4 = 2 group by id),
q3 as (
	select id, min(value) as q3_min, max(value) as q3_max, count(*) as q3_n
	from nt where ntile_4 = 3 group by id),
q4 as (
	select id, min(value) as q4_min, count(*) as q4_n
	from nt where ntile_4 = 4 group by id),
s as (
	SELECT
		st.id,
		q1_max, q1_n,
		q2_min, q2_max, q2_n,
		q3_min , q3_max , q3_n,
		q4_min, q4_n
	from
		idlist as st
		left join q1 on q1.id=st.id
		left join q2 on q2.id=st.id
		left join q3 on q3.id=st.id
		left join q4 on q4.id=st.id)
select
	id,
	case when q1_n = q2_n then (q1_max + q2_min)/2
		else case when q1_n > q2_n then q1_max else q2_min end end as q1,
	case when q1_n + q2_n = q3_n + q4_n then (q2_max + q3_min)/2
		else case when q1_n + q2_n > q3_n + q4_n then q2_max else q3_min end end as median,
	case when q3_n = q4_n then (q3_max + q4_min)/2
		else case when q3_n > q4_n then q3_max else q4_min end end as q3
from s
order by id
;

Monday, September 4, 2023

Why Don't People Ask More Questions?

More than what?  said a person who asks a lot of questions.

This is not a new question. There are numerous summaries of reasons why people don't ask questions, and suggestions for how one can become a better question-asker. A lot of existing discussion is focused on general conversations; the focus here is specifically on technical conversations in professional settings, where the goal is to convey information from one party to another (that is, report-talk rather than rapport-talk (Tannen 1990).

This post summarizes commonly suggested reasons why people don't ask more question, touches briefly on some philosophy of communication, and suggests an additional (partial) answer that doesn't appear to be discussed elsewhere.


Common Reasons that People Don't Ask Questions


Commonly-cited reasons for a person's reluctance to ask questions are:

  • Risk aversion -- This may take several forms:
    • Unwillingness to ask what may be a stupid question, and suffer embarassment or reputational harm.
    • Unwillingness to seem pushy (this overlaps somewhat with deference to authority, below).
    • Fear of consequences -- asking questions that challenge the status quo might be perceived as having a risk to one's position or standing.
    • Avoidance of transparency -- fear of having the same question turned back on oneself, or of being asked to explain the basis of the question.
  • Deference to authority -- Feeling that one does not have the position, or background, for asking questions to be acceptable to others (imposter syndrome).
  • Egotism -- This takes two forms:
    • Desiring to present oneself as knowledgeable, and not needing to ask questions.
    • Desiring, and impatient, to speak oneself, so not wishing to preserve another person's centrality to the conversation by asking questions.
  • Apathy -- The topic is uninteresting.
  • Animus -- The speaker is uninteresting or worse.
  • Sex- or culturally-determined differences — women may ask fewer questions in some circumstances, particularly those that may be confrontational (Carter et al., 2018; Schmidt and Davenport, 2017, Tannen 1990).

These are potential explanations for Why don't people ask the questions that they have?, but something else to consider is Why don't people think of questions to ask?  Simple explanations are apathy (again) and the lack of critical thinking skills.  However, differing contexts of speaker and listener may also play a large role.


Conversational Implicature and Ambiguity


The reasons why any questions at all might need to be asked can be attributed to communication problem: either a lack of clarity on the speaker's part, or a lack of understanding on the questioner's part. The existence of such communication problems seems like it should be an aberration in natural language. Natural languages have evolved, however, to allow ambiguity, and there may even be some advantages to this ambiguity (Piantadosi et al. 2012). Languages can be constructed to minimize or eliminate ambiguity (cf. Lojban), but natural languages have not developed to be free of ambiguity.

 

Conversational implicature is the term used to describe the disjunction in communication when what is said is not what is meant. Conversational implicature underlies many common forms of communication, ranging from humor to political dog whistles. But it is not limited to such special cases; it is common in everyday conversation. A humdrum example from Paul Grice, who coined the term (reproduced from Benotti and Blackburn [2014]) is:


Man standing by his car: "I am out of petrol."

Passer-by: "There is a garage around the corner."


The effectiveness of conversational implicature in this case rests on common context and inferences, such as:

  • Both parties know that petrol is available at garages.
  • "Around the corner" means that the garage is within walking distance.
  • The passer-by knows that the garage is open and has petrol to sell.

Implicature simplifies conversation by eliminating the need to explicitly state those things that are part of the participants' shared context, or that can be reasonably inferred.


Much--even most--meaning in conversation does not reside in the words spoken at all, but is filled in by the person listening. -- D. Tannen (1990)


Although the result of conversational implicature can be ambiguity, ambiguity may have important benefits for the simplification of communication (Piantadosi et al. 2012).


A meta-context for the use of conversational implicature is that the conversation is governed by the Cooperative Principle: all parties to the conversation are making a good-faith effort to exchange information reliably (Benotti and Blackburn 2014; Bi 2019). Therefore, underlying all the other contextual and inferential information that is required for communication is the presumption that both parties are engaging in honest communication.


Lack of clarity, lack of understanding, and even misperception arise when the participants do not have a shared context, or when the same inference is not reasonable to, or is not made by, both parties.


Commonality of context and inferential ability is assumed by the speaker unless they are actually attempting to mislead. It is incumbent on the listener—the potential questioner—to identify when the speaker's context or inferences may be different than their own. If the listener does not have any relevant context, and cannot make any relevant inference, the problem should be immediately obvious to the listener. The listener's response in such cases should be either to ask a question or to submit to losing the thread of the conversation.


An even more challenging situation occurs when the listener's context is different from that of the speaker, or when the inference made by the listener is different from that of the speaker. This will lead to misunderstanding that may not be recognized by either the speaker or the listener. If the speaker understands their audience, they should refrain from using implicatures that can be misperceived in this way. However, the listener must be constantly alert for the possibilities of differing contexts and implications. This burden requires a mode of listening that goes beyond the simple acquisition of information, and that engages the ability to imagine the existence of alternate contexts and implications.



More Reasons Why People Don't Ask More Questions


Recognition of the consequences of conversational implicature allows the identification of several possible additional reasons why people don't ask questions—specifically, why they don't have questions to ask. These reasons are not commonly discussed:

  • The listener does not recognize the existence of contextual and inferential assumptions made by the speaker (whether or not their own assumptions would be the same or different).
  • The listener recognizes the contextual and inferential assumptions but either cannot, or does not make the effort to, imagine potential alternatives.

That is, sometimes people don't ask questions because they don't realize that they don't actually understand what was said.  This can result from a failure to see through language's fog of ambiguity, or to 'see' the wrong thing entirely.

 

These reasons are not necessarily independent of those listed in the first section above. Some may have had the experience of working with others who comprehensively question assumptions—or at least, voice those questions—in some social circumstances but not in others.


For example, although violation of the Cooperative Principle may be a deliberate and acceptable strategy to improve communication (Bi 2019), it may also seem to be transgressive in that it is uncooperative. In combination with other factors listed above, some individuals may feel that the transgressive cost outweighs the informational benefit.

 

On the other side of the coin are those who frequently ask questions. Some individuals may be less inclined to adhere to the Cooperative Principle, because of their nature or preferred (possibly strategic) mode of interaction. This tendency may also be situational. Alternatively, some individuals may not very sensitive to the norms of cooperative social conventions.


Summary


Conversational implicature and the ambiguity that it encompasses provide a basis for understanding why individuals do not ask questions in some circumstances.




References


Benotti, L., and P. Blackburn. 2014. Context and Implicature. Context in Computing, Springer New York. pp.419-436.


Bi, M. 2019. Analysis of the Conversational Implicature of Violating Cooperative Principles in Daily Discourse. American Journal of History and Culture, 2:13.


Carter, A. J., A. Croft, D. Lukas, and G. M. Sandstrom. 2018. Women's visibility in academic seminars: Women ask fewer questions than men. PLOS ONE, Public Library of Science (PLoS), 2018, 14, e0212146.


Piantadosi, S. T., H. Tily, and E. Gibson. 2012. The communicative function of ambiguity in language. Cognition 122:280-291.


Schmidt, S. J. and J. R. Davenport, J. R. 2017. Who asks questions at astronomy meetings?

Nature Astronomy, 1, 0153.


Tannen, D. 1990. You Just Don't Understand: Women and Men in Conversation. Harper-Collins, New York.