Analytic data sets#
Catalog#
The following data is available at: /n/dominici_nsaph_l3/Lab/projects/analytic/
MedPar (Admissions)#
admissions_by_year
data_source |
MedPar |
fasse_location |
|
rce_location |
|
date_created |
Feb 20 2020 |
size |
22 GB |
files |
├── admissions_1999.fst
├── admissions_2000.fst
├── ...
└── admissions_2016.fst
header
QID : chr
AGE : int
SEX : int
RACE : int
SSA_STATE_CD : int
SSA_CNTY_CD : int
PROV_NUM : int
ADM_SOURCE : chr
ADM_TYPE : int
ADATE : chr
DDATE : chr
BENE_DOD : chr
DODFLAG : chr
ICU_DAY : int
CCI_DAY : int
ICU : int
CCI : int
DIAG1 : chr
DIAG2 : chr
DIAG3 : chr
DIAG4 : chr
DIAG5 : chr
DIAG6 : chr
DIAG7 : chr
DIAG8 : chr
DIAG9 : chr
DIAG10 : logi
diag11 : logi
diag12 : logi
diag13 : logi
diag14 : logi
diag15 : logi
diag16 : logi
diag17 : logi
diag18 : logi
diag19 : logi
diag20 : logi
diag21 : logi
diag22 : logi
diag23 : logi
diag24 : logi
diag25 : logi
YEAR : int
LOS : int
Parkinson_pdx : int
Parkinson_pdx2dx_10 : int
Parkinson_pdx2dx_25 : int
Alzheimer_pdx : int
Alzheimer_pdx2dx_10 : int
Alzheimer_pdx2dx_25 : int
Dementia_pdx : int
Dementia_pdx2dx_10 : int
Dementia_pdx2dx_25 : int
CHF_pdx : int
CHF_pdx2dx_10 : int
CHF_pdx2dx_25 : int
AMI_pdx : int
AMI_pdx2dx_10 : int
AMI_pdx2dx_25 : int
COPD_pdx : int
COPD_pdx2dx_10 : int
COPD_pdx2dx_25 : int
DM_pdx : int
DM_pdx2dx_10 : int
DM_pdx2dx_25 : int
Stroke_pdx : int
Stroke_pdx2dx_10 : int
Stroke_pdx2dx_25 : int
CVD_pdx : int
CVD_pdx2dx_10 : int
CVD_pdx2dx_25 : int
CSD_pdx : int
CSD_pdx2dx_10 : int
CSD_pdx2dx_25 : int
Ischemic_stroke_pdx : int
Ischemic_stroke_pdx2dx_10: int
Ischemic_stroke_pdx2dx_25: int
Hemo_Stroke_pdx : int
Hemo_Stroke_pdx2dx_10 : int
Hemo_Stroke_pdx2dx_25 : int
zipcode_R : int
Race_gp : chr
Sex_gp : chr
age_gp : chr
Dual : int
MBSF (Denominator)#
denom
data_source |
MBSF |
fasse_location |
|
size |
7.4 GB |
files |
├── qid_data_2009.fst
├── qid_data_2010.fst
├── ...
├── qid_data_2016.fst
├── qid_entry_exit.fst
└── year_zip_confounders.fst
header (qid_data_yyyy)
qid : chr
year : int
zip : int
sex : int
age : int
dual : chr
dead : logi
hmo_mo: chr
fips : int
race : chr
sexM : num
header (year_zip_confounders)
zip : num
year : int
mean_bmi : num
smoke_rate : num
hispanic : num
pct_blk : num
medhouseholdincome: num
medianhousevalue : num
poverty : num
education : num
popdensity : num
pct_owner_occ : num
summer_tmmx : num
winter_tmmx : num
summer_rmax : num
winter_rmax : num
city : chr
statecode : chr
latitude : num
longitude : num
min_year: 2000
max_year: 2016
Annual Exposure per Medicare Beneficiary#
qid_yr_exposures
rce_location |
|
fasse_location |
|
dataset_author |
Daniel Mork |
date_created |
April 2022 |
size |
139 GB |
description |
Annual exposure measurements (columns, 2000-2016) for each Medicare benficiary (rows) tied to their zip code of residence in a given year. Exposures (xxx in file name) include: no2, ozone, pm2.5, pm2.5components, pr (precipitation), rmax (max humidity), tmmx (max temperature), zip (zip code of residence). |
files |
├── qid_yr_no2.fst
├── qid_yr_ozone.fst
├── qid_yr_pm25comp_br.fst
├── qid_yr_pm25comp_ca.fst
├── qid_yr_pm25comp_cu.fst
├── qid_yr_pm25comp_ec.fst
├── qid_yr_pm25comp_fe.fst
├── qid_yr_pm25comp_k.fst
├── qid_yr_pm25comp_nh4.fst
├── qid_yr_pm25comp_ni.fst
├── qid_yr_pm25comp_no3.fst
├── qid_yr_pm25comp_oc.fst
├── qid_yr_pm25comp_pb.fst
├── qid_yr_pm25comp_si.fst
├── qid_yr_pm25comp_so4.fst
├── qid_yr_pm25comp_v.fst
├── qid_yr_pm25comp_z.fst
├── qid_yr_pm25.fst
├── qid_yr_pr.fst
├── qid_yr_rmax.fst
├── qid_yr_tmmx.fst
└── qid_yr_zip.fst
header (qid_yr_xxx.fst):
qid : chr
2000: num
2001: num
2002: num
2003: num
2004: num
2005: num
2006: num
2007: num
2008: num
2009: num
2010: num
2011: num
2012: num
2013: num
2014: num
2015: num
2016: num
MBSF (Enrollment file, denominator)#
denom_by_year
data_source |
MBSF, census (interpolated), BRFSS (interpolated), PM2.5 exposure, seasonal temperature |
rce_location |
|
fasse_location |
|
git_repository |
|
dataset_author |
Ben Sabath, Xiao Wu |
spatial_resolution |
zipcode |
temporal_coverage |
1999-2016 |
processing_description |
Recommended for use. Available in both |
date_created |
Apr 2021 |
size |
7.4 GB |
files |
├── confounder_exposure_merged_nodups_health_1999.fst
├── ...
└── confounder_exposure_merged_nodups_health_2016.fst
header
zip : int
year : int
qid : chr
dodflag : chr
bene_dod : chr
sex : int
race : int
age : int
hmo_mo : chr
hmoind : chr
statecode : chr
latitude : num
longitude : num
dual : chr
death : int
dead : logi
entry_age : int
entry_year : int
entry_age_break : int
followup_year : num
followup_year_plus_one : num
pm25_ensemble : num
pm25_no_interp : num
pm25_nn : num
ozone : num
ozone_no_interp : num
zcta : int
poverty : num
popdensity : num
medianhousevalue : num
pct_blk : num
medhouseholdincome : num
pct_owner_occ : num
hispanic : num
education : num
population : num
zcta_no_interp : int
poverty_no_interp : num
popdensity_no_interp : num
medianhousevalue_no_interp : num
pct_blk_no_interp : num
medhouseholdincome_no_interp: num
pct_owner_occ_no_interp : num
hispanic_no_interp : num
education_no_interp : num
population_no_interp : int
smoke_rate : num
mean_bmi : num
smoke_rate_no_interp : num
mean_bmi_no_interp : num
amb_visit_pct : num
a1c_exm_pct : num
amb_visit_pct_no_interp : num
a1c_exm_pct_no_interp : num
tmmx : num
rmax : num
pr : num
cluster_cat : chr
fips_no_interp : int
fips : int
summer_tmmx : num
summer_rmax : num
winter_tmmx : num
winter_rmax : num
AD/ADRD Hospitalization#
hospitalization
data_source |
MedPar derived |
rce_location |
|
fasse_location |
|
dataset_author |
Daniel Mork |
description |
The first recorded hospitalization for each individual broken down by primary/secondary/any billing code (ICD). |
size |
1.2 GB |
files |
├── First_hosp_AD_any.fst
├── First_hosp_AD_primary.fst
├── First_hosp_ADRD_any.fst
├── First_hosp_ADRD_primary.fst
├── First_hosp_ADRD_secondary.fst
└── First_hosp_AD_secondary.fst
header
QID : Factor
ADATE: Date
year : num
Medicare Entry Age#
medicare_entry_age
data_source |
MBSF derived |
rce_location |
|
fasse_location |
|
size |
2.3 GB |
date_created |
Jan 26, 2021 |
dataset_author |
Ben Sabath, Whenhee Lee |
spatial_resolution |
zipcode |
git_repository |
|
files |
└── medicare_entry_age.csv
Years in Medicare#
years_in_medicare
data_source |
MBSF derived |
rce_location |
|
fasse_location |
|
description |
Number of years a beneficiary has been in Medicare (or in other words, the number of years since one has entered Medicare). Allows for grouping on how long beneficiaries have been in Medicare. |
size |
8.8 GB |
date_created |
Jan 26, 2021 |
temporal_coverage |
1999-2016 |
dataset_author |
Ben Sabath, Whanhee Lee |
spatial_resolution |
zipcode |
git_repository |
|
files |
├── follow_up_year_2000.fst
├── ...
└── follow_up_year_2016.fst
Temperature Humidity Precipitation#
temperature_seasonal_zipcode
rce_location |
|
fasse_location |
|
dataset_author |
Xiao Wu, Ben Sabath |
date_created |
Jul 23, 2020 |
data_source |
Google Earth Engine provides a single interface for interacting with a number of geospatial data sources. The sources used and links to their documentation are: GRIDMET, NLDAS, MODIS MOD10A1.006, GLDAS, NOAA CDR PATMOSX, NOAA NCEP Climate Forecast System V2 |
spatial_coverage |
contiguous US |
spatial_resolution |
zipcode |
temporal_coverage |
1999-2019 |
temporal_resolution |
annually |
description |
This dataset contains information on temperature, relative humidity, and total precipitation data. The data is available as raster files on Google earth engine. The temporal and spatial resolutions varied by data source, but all were available at a daily resolution or more frequently. Where the time resolution of the rasters is more than daily, daily averages for each raster were calculated. Next, using Google earth engine’s spatial averaging algorithms and a set of polygons representing the areas of interest, the daily value for each polygon was calculated. The polygons used were the ones described in the preceding section. The results of this calculation were then downloaded as a csv file to the RCE. At this point, there is one file for each year. Following this, annual averages are calculated for each location, and these are combined in to a single file. The daily values are also combined in to a single file. For the |
git_repository |
|
meterological |
Temperature (K) - variable name: tmmx (Source: GRIDMET); Relative Humidity - variable name: rmax (Source: GRIDMET) |
size |
65 MB |
header |
|
files |
└── temperature_seasonal_zipcode_combined.csv
Pollution-Census-Temperature covariates#
merged_covariates_pm_census_temp
data_source |
US Census/ACS, Business Analyst Data Set, BRFSS |
rce_location |
|
fasse_location |
|
dataset_author |
Xiao Wu, Ben Sabath |
date_created |
May 29, 2019 |
spatial_coverage |
contiguous US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
publication |
|
git_repository |
nejm_confounder_summary/nejm_confounder and rce_data_list/confounder_data |
size |
296 MB |
header |
|
files |
└── merged_covariates.csv
Population-Weighted Daily County-Level Heat Metrics#
county_heat_metrics
data_source |
ERA5-Land gridded data |
fasse_location |
|
dataset_author |
Keith Spangler |
date_created |
June 17, 2022 |
spatial_coverage |
contiguous US |
spatial_resolution |
county |
temporal_coverage |
2000-2020 |
temporal_resolution |
daily |
publication |
|
size |
1.03 GB |
header |
|
files |
└── Heatvars_County_2000-2020_v1.2.Rds
Medicaid - Respiratory Hospitalizations in Children#
medicaid_children_99-12
data_source |
Medicaid |
rce_location |
|
fasse_location |
|
dataset_author |
Jenny Lee |
date_created |
2021 |
spatial_coverage |
contiguous US |
spatial_resolution |
zipcode |
temporal_coverage |
1999-2012 |
temporal_resolution |
annually |
description |
The data prepared for this project consists of the Medicaid Fee For Service population, with unrestricted Medicaid benefits, under the age of 20 from 1999-2012. This data also includes all hospitalizations for that population, with indicators included regarding whether or not they were associated with a set of respiratory hospitalizations. See the schema for the hospitalization data below for details on specific indicators. |
git_repository |
|
exposures |
Xiao Wu’s CausalGPS PM2.5 data |
size |
14 GB |
files |
├── denom
│ ├── denom_under_20_1999.fst
│ ├── ...
│ └── denom_under_20_2012.fst
└── hosps
├── under_20_admissions_1999.fst
├── ...
└── under_20_admissions_2012.fst
Exposure-census-BRFFS confounders#
confounders
data_source |
US Census, BRFSS |
rce_location |
|
fasse_location |
|
dataset_author |
Ben Sabath, Whanhee Lee |
date_created |
Apr 23, 2021 |
spatial_coverage |
contiguous US |
spatial_resolution |
zipcode, zcta |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
git_repository |
|
size |
247 MB |
header |
|
files |
├── merged_confounders_2000.csv
├── ...
└── merged_confounders_2016.csv
ADRD Hospitalization Records#
adrd_hospitalization
dataset_author |
Shuxin Dong |
date_created |
Jan 27, 2022 |
data_source |
MedPar (admissions) |
spatial_coverage |
US |
spatial_resolution |
zipcode (unaggregated) |
temporal_coverage |
2000-2016 |
temporal_resolution |
daily (with admission date) |
description |
extract the ADRD hospitalizations based on the Chronic Condition Warehouse |
rce_location |
|
fasse_location |
|
size |
1.9 GB |
git_repository |
|
other |
The Chronic Condition Warehouse list for ADRD: https://www2.ccwdata.org/web/guest/condition-categories |
files |
├── ADRD_2000.fst
├── ...
└── ADRD_2016.fst
header
QID : chr
ADATE : Date
DDATE : Date
zipcode_R : int
DIAG1 : chr
DIAG2 : chr
DIAG3 : chr
DIAG4 : chr
DIAG5 : chr
DIAG6 : chr
DIAG7 : chr
DIAG8 : chr
DIAG9 : chr
DIAG10 : chr
AGE : int
Sex_gp : chr
Race_gp : chr
SSA_STATE_CD : int
SSA_CNTY_CD : int
PROV_NUM : int
ADM_SOURCE : chr
ADM_TYPE : int
Dual : int
year : num
AD_primary : logi
AD_any : logi
AD_secondary : logi
ADRD_primary : logi
ADRD_any : logi
ADRD_secondary: logi
Medpar File 2000-2016 Clean#
medpar_hospital_clean_0619
dataset_author |
Mahdieh Danesh Yazdi |
date_created |
May 2019 |
data_source |
MedPar (admissions) |
spatial_coverage |
US |
size |
1.8 GB |
spatial_resolution |
zipcode, city |
temporal_coverage |
2000-2016 |
temporal_resolution |
admissions date |
processing_description |
The data was limited to the years 2000-2016 (1999 was dropped). Demographic data was removed (use demographic data from denominator file). Duplicated admission records were removed. For multiple admissions on the same day, the longer length of stay was kept and those without missing diagnositic codes. Subset data to keep only first two diagnostic codes. A diabetes varible was created (would review ICD codes used clinically prior to use). |
rce_location |
|
fasse_location |
|
files |
└── medpar_hospital_clean_0619.rds
Denominator File 2000-2016 Clean#
denominator_clean_0619
dataset_author |
Mahdieh Danesh Yazdi |
date_created |
May 2019 |
data_source |
MBSF (denominator) |
spatial_coverage |
US |
size |
3.8 GB |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
processing_description |
The data was limited to the years 2000-2016 (1999 was dropped). Rows with empty or missing QID values were dropped. Those whose sex changed through follow up were dropped. Those whose race changed through follow up were assigned “Other/Unknown” category. Those who had multiple dates of death in different years were dropped. For those with multiple dates of death in the same year, earlier date of death was assigned. If duplicate rows existed, one with date of death and one without, the row with non-missing date of death was kept. Multiple QID-year rows with differing values of other variables were removed. Observations with invalid zip codes were removed. Warning: There may be excess deaths on the last day of the month due to CMS processing. Sometimes when the exact date of death is unknown, it is assigned to the last day of the month. |
rce_location |
|
fasse_location |
|
files |
└── denominator_clean_0619.rds
Denominator Clean Merged with Exposure and Covariate Data#
merged_denominator_clean_0619_exp_conf
dataset_author |
Mahdieh Danesh Yazdi |
date_created |
February 2020 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
size |
30 GB ( |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
The clean denominator file merged with annual PM2.5, NO2, O3 levels from 1-km exposure models generated by Qian Di and Weeberb Requia aggregatetd to zip code level by Yaguang Wei. Also merged with covariate data from the Census, ACS, BRFSS, and Dartmouth Health Atlas created by Ben Sabath. Missing values were filled in using interpolated/extrapolated values from Liuhua Shi. (Negative values were set to 0 and values greater than 100% were set to 100%). Other missing values were dropped. The exposure values and covariate data may need to updated depending on study being done. |
rce_location |
|
fasse_location |
|
files |
├── denominator.rds
└── denominator.fst
Hospital Admissions Merged with Denominator, Exposure, and Covariates#
national_exp_0621
dataset_author |
Mahdieh Danesh Yazdi |
date_created |
Jun 2021 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
The clean denominator file merged with the clean hospital admissions data, limited to FFS patients, and then merged with annual PM2.5, NO2, O3 levels and Warm-season O3 levels from 1-km exposure models generated by Qian Di and Weeberb Requia aggregatetd to zip code level by Yaguang Wei. Also merged with covariate data from the Census, ACS, BRFSS, and Dartmouth Health Atlas created by Ben Sabath. Missing values were filled in using interpolated/extrapolated values from Liuhua Shi. (Negative values were set to 0 and values greater than 100% were set to 100%). Other missing values were dropped. The exposure values and covariate data may need to updated depending on study being done. Individuals may have multiple admissions per year. |
size |
32 GB |
rce_location |
|
fasse_location |
|
files |
└── national_exp_0621.fst
Aggregated 2010-2016 Medicare Mortality Data with PM2.5 Exposure and ZIP code level variables#
aggregate_medicare_data_2010to2016
description |
aggregate_medicare_data_2010to2016.fst only contains data for year 2011, pm2.5 level in 2010 and 2011 and the mortality in the following 5 years. That is, the dataset contains enrollees of year 2011 and information of 2010 exposures and the outcome is |
dataset_author |
Falco J. Bargagli-Stoffi, Riccardo Cadei |
date_created |
2020 |
data_source |
Medicaid, Exposure Data, Census Data |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2011 |
temporal_resolution |
Annually |
publication |
Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects https://arxiv.org/abs/2009.09036 |
rce_location |
|
fasse_location |
|
files |
└── aggregate_medicare_data_2010to2016.fst
Nationwide Medicare Strata#
erc_strata
dataset_author |
Kevin Josey |
date_created |
Aug 5 2022 |
data_source |
Medicare File from Xiao et al.’s Science Advances paper (see |
spatial_coverage |
contiguous US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annual |
description |
Data were divided and aggregated into custom strata, then subsetted depending on several individual factors. I further merged these data tables with neighborhood level covariates. |
rce_location |
|
fasse_location |
|
git_repository |
|
size |
6.9 GB |
files |
├── aggregate_data_qd.RData
├── aggregate_data_rm.RData
├── national_merged2016_qd.RData
├── national_merged2016_rm.RData
├── qd
│ ├── 0_all_qd.RData
│ ├── 0_asian_qd.RData
│ ├── 0_black_qd.RData
│ ├── 0_hispanic_qd.RData
│ ├── 0_white_qd.RData
│ ├── 1_all_qd.RData
│ ├── 1_asian_qd.RData
│ ├── 1_black_qd.RData
│ ├── 1_hispanic_qd.RData
│ ├── 1_white_qd.RData
│ ├── 2_all_qd.RData
│ ├── 2_asian_qd.RData
│ ├── 2_black_qd.RData
│ ├── 2_hispanic_qd.RData
│ └── 2_white_qd.RData
└── rm
├── 0_all_rm.RData
├── 0_asian_rm.RData
├── 0_black_rm.RData
├── 0_hispanic_rm.RData
├── 0_white_rm.RData
├── 1_all_rm.RData
├── 1_asian_rm.RData
├── 1_black_rm.RData
├── 1_hispanic_rm.RData
├── 1_white_rm.RData
├── 2_all_rm.RData
├── 2_asian_rm.RData
├── 2_black_rm.RData
├── 2_hispanic_rm.RData
└── 2_white_rm.RData
CVD Medicaid#
cvd_medicaid
dataset_author |
Ben Sabath |
date_created |
January 28, 2020 |
data_source |
Medicaid |
spatial_coverage |
US (continental) |
spatial_resolution |
zipcode |
temporal_coverage |
2002-2012 |
temporal_resolution |
daily |
size |
86 GB |
git_repository |
|
rce_location |
|
fasse_location |
|
publication |
|
files |
├── [2.3G] cvd.csv
├── [2.1G] cvd.sas7bdat
├── [6.3K] CVD-specific data dictionary-07-12-2018.docx
├── [6.8K] data_dictionary.md
├── [ 70G] merged_cvd_data.csv
├── [ 18K] merge.out
│ ├── [5.7K] log.txt
│ ├── [ 77] r_error.0
│ └── [ 12K] r_out.0
├── [1.7K] merge.R
├── [ 906] readme
└── [ 899] r.submit
Aggregated CVD cohort Medicare#
aggregated_cvd_cohort_medicare
dataset_author |
Jochem Klompmaker |
date_created |
April 2022 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics |
rce_location |
|
fasse_location |
|
size |
38 GB |
files |
├── [4.2G] aggregate_CVD_65yrs.fst
├── [3.8G] aggregate_CVD_75yrs.fst
├── [3.1G] aggregate_CVD_85yrs.fst
├── [6.5G] aggregate_CVD.fst
├── [4.8G] aggregate_death_CVD.fst
├── [4.6G] aggregate__excl_1yrhosp_CVD.fst
├── [4.3G] aggregate_excl_1yrhosp_RES.fst
├── [1.2M] cc_zipyear_all.fst
├── [1.2M] cc_zipyear_confounder.fst
├── [941K] cc_zipyear_cvd.fst
├── [347M] CVD_count.fst
├── [354M] CVD_death_count.fst
├── [439M] time_count.fst
└── [439M] time_death_count.fst
Aggregated CHD cohort Medicare#
aggregated_chd_cohort_medicare
dataset_author |
Jochem Klompmaker |
date_created |
April 2022 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics |
rce_location |
|
fasse_location |
|
size |
35 GB |
files |
├── [4.1G] aggregate_CHD_65yrs.fst
├── [3.8G] aggregate_CHD_75yrs.fst
├── [3.2G] aggregate_CHD_85yrs.fst
├── [ 14G] aggregate_CHD.fst
├── [4.3G] aggregate_excl_1yrhosp_CHD.fst
├── [1.2M] cc_zipyear_chd.fst
├── [ 92M] CHD_count.fst
└── [116M] time_count.fst
Aggregated CBV cohort Medicare#
aggregated_cbv_cohort_medicare
dataset_author |
Jochem Klompmaker |
date_created |
April 2022 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
Denominator file linked with hospitalization data and merged with confounders and exposures (NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics |
rce_location |
|
fasse_location |
|
size |
35 GB |
files |
├── [4.1G] aggregate_CBV_65yrs.fst
├── [3.8G] aggregate_CBV_75yrs.fst
├── [3.2G] aggregate_CBV_85yrs.fst
├── [ 14G] aggregate_CBV.fst
├── [4.4G] aggregate__excl_1yrhosp_CBV.fst
├── [ 93M] CBV_count.fst
├── [1.2M] cc_zipyear_cbv.fst
└── [117M] time_count.fst
Aggregated ADRD cohort Medicare#
aggregated_adrd_cohort_medicare
dataset_author |
Jochem Klompmaker |
date_created |
February 2022 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
Denominator file linked with hospitalization data and merged with confounders and exposures (NDVI, blue space, park cover, NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics |
rce_location |
|
fasse_location |
|
size |
28 GB |
files |
├── [3.6G] aggregate_ALZ_65yrs.fst
├── [3.4G] aggregate_ALZ_75yrs.fst
├── [2.8G] aggregate_ALZ_85yrs.fst
├── [4.5G] aggregate_ALZ.fst
├── [4.4G] aggregate_death_ALZ.fst
├── [3.8G] aggregate_excl_1yrhosp_ALZ.fst
├── [358M] ALZ_count.fst
├── [387M] ALZ_death_count.fst
├── [471M] time_count.fst
└── [472M] time_death_count.fst
Aggregated PD cohort Medicare#
aggregated_pd_cohort_medicare
dataset_author |
Jochem Klompmaker |
date_created |
February 2022 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
description |
Denominator file linked with hospitalization data and merged with confounders and exposures (NDVI, blue space, park cover, NO2, PM2.5, ozone, temperature, humidity). Person records were aggregated by zip code, year and individual demographics. |
rce_location |
|
fasse_location |
|
size |
29 GB |
files |
├── [4.4G] aggregate_death_PAR.fst
├── [4.4G] aggregate_excl_1yrhosp_PAR.fst
├── [3.7G] aggregate_PAR_65yrs.fst
├── [3.4G] aggregate_PAR_75yrs.fst
├── [2.9G] aggregate_PAR_85yrs.fst
├── [4.6G] aggregate_PAR.fst
├── [ 94M] PAR_count.fst
├── [405M] PAR_death_count.fst
├── [222M] time_count.fst
└── [486M] time_death_count.fst
Daily County Level Heatwave Associated Hospitalizations#
daily_county_level_heatwave_assosciated_hospitalizations
dataset_author |
Ben Sabath |
date_created |
July 10, 2020 |
size |
7.7 GB |
data_source |
MedPar (admissions), MBSF (denominator), Medicaid MAX |
spatial_coverage |
US |
spatial_resolution |
county |
temporal_coverage |
2006-2016, 1999-2016 |
temporal_resolution |
daily |
description |
FIPS code, race, sex, age, and dual eligibility were determined for each case based on the information in the patient summary file for that individual in the year of their admission. The denominator for each observation is calculated monthly and contains all individuals who are eligible for Fee for Service (FFS) hospitalization coverage and have not died prior to that month. The CCS codes included were 2, 50, 55, 114, 157, 159, and 244. ICD processing done using the ICD package(Wasey 2018). The author of this package asks that it be cited in papers using data that was created using the package. |
rce_location |
|
fasse_location |
|
publication |
|
git_repository |
|
files |
├── 1999_2016
│ └── county_ccs_hosps
│ ├── cache_dir
│ │ ├── daily_counts
│ │ │ ├── daily_counts_by_ccs_1999.fst
│ │ │ ├── ...
│ │ │ └── daily_counts_by_ccs_2016.fst
│ │ └── denom
│ │ ├── ffs_patient_summary_by_county_1999.fst
│ │ ├── ...
│ │ └── ffs_patient_summary_by_county_2016.fst
│ ├── data
│ │ ├── daily_ccs_heatwave_counts_by_fips_1999.fst
│ │ ├── ...
│ │ └── daily_ccs_heatwave_counts_by_fips_2016.fst
│ └── data_daily_hosp_mort
│ ├── daily_only_ccs_heatwave_hosp_mort_counts_by_fips_1999.fst
│ ├── ...
│ └── daily_only_ccs_heatwave_hosp_mort_counts_by_fips_2016.fst
└── 2006_2016
└── county_ccs_hosps
├── cache_dir
│ ├── daily_counts
│ │ ├── daily_counts_by_ccs_2006.fst
│ │ ├── ...
│ │ └── daily_counts_by_ccs_2016.fst
│ └── denom
│ ├── ffs_patient_summary_by_county_2006.fst
│ ├── ...
│ └── ffs_patient_summary_by_county_2016.fst
├── data
│ ├── daily_ccs_heatwave_counts_by_fips_2006.fst
│ ├── ...
│ ├── daily_ccs_heatwave_counts_by_fips_2016.fst
│ ├── Daily_Heat_CCS_2006-2016_with_Temperature_by_WFO.Rda
│ ├── Daily_Heat_CCS_2006-2016_with_Temperature_by_WFO_v0.Rda
│ ├── Daily_Heat_CCS_2006-2016_with_Temperature_ERA5Land.Rda
│ ├── Daily_Heat_CCS_2006-2016_with_Temperature.Rda
│ └── Daily_Heat_CCS_2006-2016_with_Temperature_v0.Rda
├── readme.md
└── schema.yml
Hospitalizations for kidney disease and comorbidities#
medicare_for_kidney_diseases
dataset_author |
Ana Trisovic |
date_created |
July 10, 2022 |
data_source |
MedPar (admissions), MBSF (denominator), confounders |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
annually |
git_repository |
|
description |
Special modifications for the kidney diseases for numerators and denominators (people at risk) for the analysis by Whanhee Lee. |
rce_location |
|
fasse_location |
|
size |
31 GB |
header |
|
files |
└── [ 27G] final.csv
IHD medicare hospitalizations (2005)#
ihd_medicare_hosp_2005
dataset_name |
IHD medicare hospitalizations (2005) |
dataset_author |
Cory Zigler |
date_created |
Oct 4 2018 |
data_source |
MedPar (admissions) |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2005 |
temporal_resolution |
annually |
size |
234 MB |
rce_location |
|
fasse_location |
|
files |
├── [4.8K] 00Tree.html
├── [348K] AnnualFacilityData.Rda
├── [773K] AnnualUnitData.Rda
├── [ 12K] Create Analysis Data.R
├── [6.4K] Create HyADS Adjacency Matrix.R
├── [9.3K] Create Power Plant Data.R
├── [5.8K] Create Zip Code Data.R
├── [ 10M] data_nomed.Rda
├── [ 31K] facilities_for_analysis.Rda
├── [ 53M] HyADSmat.Rda
├── [108M] HyADSmat_replaced20191212.Rda
├── [3.1M] MonthlyFacilityData.Rda
├── [9.7M] MonthlyUnitData.Rda
├── [ 11M] out.zip_pp.rda
├── [ 114] Readme
├── [5.6M] ZipcodeData.Rda
└── [ 89K] zips_included.rda
Daily Florida Hospitalization Counts by Zip#
daily-florida-hosp-counts-zip
dataset_author |
Ben Sabath, Kate Burrows |
date_created |
February 07 2020 |
data_source |
MedPar (admissions), MBSF (denominator) |
spatial_coverage |
Florida |
spatial_resolution |
zipcode |
temporal_coverage |
1999-2016 |
temporal_resolution |
daily |
processing_description |
Denominator file linked with hospitalization data. This is the raw unprocessed data. |
size |
2.1 GB |
rce_location |
|
fasse_location |
|
files |
├── [308K] Burrows_DataRequest_September2019.pdf
├── [ 19M] death_count
│ ├── [1.0M] death_count_1999.fst
│ ├── [1.0M] ...
│ └── [1.2M] death_count_2016.fst
├── [104M] hosp_count
│ ├── [5.5M] hosp_count_1999.fst
│ ├── [5.6M] ...
│ └── [5.2M] hosp_count_2016.fst
├── [1.6G] merged_data
│ ├── [ 86M] daily_zips_1999.fst
│ ├── [106M] ...
│ └── [106M] daily_zips_2016.fst
└── [7.2M] zip_denom
├── [382K] zip_denom_1999.fst
├── [440K] ...
└── [450K] zip_denom_2016.fst
Coal PM2.5 Source Impacts#
coal_exposure_pm25
dataset_author |
Lucas Henneman |
date_created |
Sep 14, 2022 |
data_source |
HyADS exposure modeling |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
1999-2020 |
temporal_resolution |
annually |
rce_location |
|
fasse_location |
|
GitHub repository/directory on how the data was processed |
|
exposures |
This was created with the HyADS model using emissions from EPA’s CAMD database. |
meterological |
NOAA/NCAR reanalysis data. |
size |
6.3 GB |
files |
├── [300M] zips_pm25_byunit_1999.fst
├── [291M] ...
├── [134M] zips_pm25_byunit_2020.fst
├── [599K] zips_pm25_total_1999.fst
├── [599K] ...
└── [599K] zips_pm25_total_2020.fst
Aggregated 2000-2016 Medicare Mortality Data with PM2.5 Exposure by ZIP code#
aggregated_2000-2016_medicare_mortality_pm25_zip
dataset_author |
Xiao Wu, Ben Sabath |
date_created |
2020 |
data_source |
Medicaid, Exposure Data, Census Data |
spatial_coverage |
US |
spatial_resolution |
zipcode |
temporal_coverage |
2000-2016 |
temporal_resolution |
Annually |
processing_description |
See Xiao’s paper for processing description. |
rce_location |
|
fasse_location |
|
publication |
|
git_repository |
|
size |
166 MB |
files |
└── [166M] aggregate_data.RDS
Warning
The space of FASSE is limited, so do not copy analytic data to your own folder! Create symlinks to the data in your data
folder.
Symbolic links (or symlinks) are special files that point to files or directories in other locations on your system.
You will be able to use data with symlinks as normal.
Create the symlink in your data
folder in the following way:
cd data
ln -s /n/dominici_nsaph_l3/Lab/projects/analytic/fasse_location .
Note
You need data that is not here, but exists on RCE? If so, fill in the form here to get it transfered to FASSE.
Data questions#
What data sources (MedPar, MBSF, other) were used to create this data file? How many different data sources went into it?
What, if any, processing was done to the data sources? Were there any selections (cuts) done, data quality checks and aggregations?
Was this data used in any publication (add a link)?
Is there any git repository (or subfolder) related to it? (add git location)?
What is the RCE source location?
When was the data created and by who?
What is the spatial, temporal resolution?