I don't see anything wrong with your SAS code.
Those checks are case sensitive so if the values for some records use different cases for the specified phrases, then those records wouldn't be kept.
You can rule out case differences by using the following if statement:
if upcase(VAR1) in ('POSSIBLYRELATED', 'PROBABLYRELATED', 'RELATED') | VAR2 in ('POSSIBLYRELATED', 'PROBABLYRELATED', 'RELATED') | VAR3 in ('POSSIBLYRELATED', 'PROBABLYRELATED', 'RELATED');
If that still doesn't work, then the only thing I can think of is encoding issues with your variable values.
I also wanted to point out that the in statement looks for EXACT strings.
if you are looking for the existence of a substring within the values of VAR1 through VAR3 then you need to search for the occurrence of those substrings.
if(index(upcase(var1),'POSSIBLYRELATED')>0) | index(upcase(var1),'PROBABLYRELATED')>0) | index(upcase(var1),'RELATED')>0)
| index(upcase(var2),'POSSIBLYRELATED')>0) | index(upcase(var2),'PROBABLYRELATED')>0) | index(upcase(var2),'RELATED')>0)
| index(upcase(var3),'POSSIBLYRELATED')>0) | index(upcase(var3),'PROBABLYRELATED')>0) | index(upcase(var3),'RELATED')>0) );
------------------------------
David Wilson
Senior Director, Statistics
RTI International
------------------------------
Original Message:
Sent: 12-17-2024 18:50
From: Benjamin Levinson
Subject: SAS code question: Multiple OR conditions in IF statement in DATA STEP
Why does the following SAS code not include in the new SAS dataset 'AE_related' all the observations that fulfil the conditions set forth in the IF statement, rather it only includes some observations? Is there a mistake in the code?
data AE_related;
set AE;
if VAR1 in ('PossiblyRelated', 'ProbablyRelated', 'Related') | VAR2 in ('PossiblyRelated', 'ProbablyRelated', 'Related') | VAR3 in ('PossiblyRelated', 'ProbablyRelated', 'Related');
run;
------------------------------
Benjamin A. Levinson
Research Scientist
NYU Grossman School of Medicine
------------------------------