Skip to content
Snippets Groups Projects
Commit 15da1814 authored by Riccardo Boero's avatar Riccardo Boero :innocent:
Browse files

Added database login parameters, updated years range by soruce, updated README...

Added database login parameters, updated years range by soruce, updated README with periods and DOIs for EUROSTAT data.
parent 32119629
No related branches found
No related tags found
1 merge request!1Fix refs in history
......@@ -33,7 +33,7 @@ Data refers to **jobs**.
| Column | Description |
|---|---|
|GeoID| FIPS block id, 15 chars: STATE+COUNTY+TRACT+BLOCK |
|Year| 2002-2020|
|Year| 2002-2021|
|CNS01 | Number of jobs in NAICS sector 11 (Agriculture, Forestry, Fishing and Hunting)|
| CNS02 | Number of jobs in NAICS sector 21 (Mining, Quarrying, and Oil and Gas Extraction)|
| CNS03 | Number of jobs in NAICS sector 22 (Utilities)|
......@@ -59,7 +59,7 @@ Data refers to **jobs**.
| Column | Description |
|---|---|
|GeoID| FIPS: US, STATE, COUNTY |
|Year| 2000-2022|
|Year| 2000-2023|
|Naics| Industry codes|
|Agglvl_code |14 National, by NAICS Sector; 15 National, by NAICS 3-digit; 16 National, by NAICS 4-digit; 17 National, by NAICS 5-digit; 18 National, by NAICS 6-digit; 54 Statewide, NAICS Sector; 55 Statewide, NAICS 3-digit; 56 Statewide, NAICS 4-digit; 57 Statewide, NAICS 5-digit; 58 Statewide, NAICS 6-digit; 74 County, NAICS Sector; 75 County, NAICS 3-digit; 76 County, NAICS 4-digit; 77 County, NAICS 5-digit; 78 County, NAICS 6-digit |
|Q1_establishments| Number of establishments in quarter 1|
......@@ -74,34 +74,40 @@ Data refers to **employed persons**.
| Column | Description |
|---|---|
|GeoID| NUTS 0 - 3 |
|Year| 1995-2021|
|Year| 1995-2022|
|Nace | Reduced level 1 NACE, Rev. 2|
|EmpTh | Thousands of Employed Persons|
Source: [doi:10.2908/NAMA_10R_3EMPERS](https://doi.org/10.2908/NAMA_10R_3EMPERS)
#### LFS - Labour Force Survey, EUROSTAT
Data refers to people of any sex and any age > 15 years old who are **employed persons**.
| Column | Description |
|---|---|
|GeoID| NUTS 0 - 2 char country code |
|Year| 2008-2021|
|Year| 2008-2023|
|Nace | Level 2 NACE, Rev. 2|
|EmpTh_Q1 | Thousands of Employed Persons in Quarter 1|
|EmpTh_Q2 | Thousands of Employed Persons in Quarter 2|
|EmpTh_Q3 | Thousands of Employed Persons in Quarter 3|
|EmpTh_Q4 | Thousands of Employed Persons in Quarter 4|
Source: [doi:10.2908/LFSQ_EGAN22D](https://doi.org/10.2908/LFSQ_EGAN22D)
#### SBS - Structural Business Statistics, EUROSTAT
Data include only G data source below for now and hence are limited to 2021.
| Column | Description |
|---|---|
|GeoID| NUTS 0 - 2 char country code |
|Year| 2008-2021|
|Year| 2008-2022|
|Nace | Level 4 NACE, Rev. 2|
|Enterprises| Number of enterprises |
|Employment| Persons employed - number |
|LaborCost| Unit labor cost per person employed, thousand euro|
Source: [doi:10.2908/SBS_OVW_ACT](https://doi.org/10.2908/SBS_OVW_ACT)
---
## Notes
Data is selected, downloaded, and reorganized from multiple data sources.
......@@ -110,7 +116,7 @@ The U.S. Census publishes regularly the [Longitudinal Employer-Household Dynamic
The data considered here is the Workplace Area Characteristics (WAC) of LODES8, where the version number indicates the TIGER geographical boundary specification adopted (2020 Census blocks). Details are at https://lehd.ces.census.gov/data/lodes/LODES8/LODESTechDoc8.0.pdf.
The overall period is 2002-2020 but not all states are represented in all periods. All 51 states are available for only the period 2011-2016.
The overall period is 2002-2021 but not all states are represented in all periods. All 51 states are available for only the period 2011-2016.
### BLS
BLS data could not be downloaded though API because of limitations on daily number of combination of series and time periods. QCEW data, however, is published also in tabular formats https://www.bls.gov/cew/downloadable-data-files.htm.
......
#!/bin/bash
for y in {2000..2022}
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {2000..2023}
do
mkdir -p ./temp
wget -N "https://data.bls.gov/cew/data/files/${y}/csv/${y}_qtrly_singlefile.zip" -P ./temp
unzip -d ./temp/ ./temp/${y}_qtrly_singlefile.zip
mariadb -h "" -u "" -p -e "CREATE TEMPORARY TABLE FACT_jobs.staging (GeoID varchar(5) NOT NULL, Year INT NOT NULL, own_code varchar(1) NOT NULL, Naics varchar(6) NOT NULL, Agglvl_code varchar(2) NOT NULL, size_code varchar(1) NOT NULL, Qtr varchar(1) NOT NULL, disclosure_code varchar(1), Quarterly_establishments INT, Jobs_month1 INT, Jobs_month2 INT, Jobs_month3 INT, Avg_weekly_wage INT); LOAD DATA LOCAL INFILE \"temp/${y}.q1-q4.singlefile.csv\" INTO TABLE FACT_jobs.staging FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2, @col3, @col4, @col5, @col6, @col7, @col8, @col9, @col10, @col11, @col12, @col13, @col14, @col15, @col16, @col17, @col18, @col19, @col20, @col21, @col22, @col23, @col24, @col25, @col26, @col27, @col28, @col29, @col30, @col31, @col32, @col33, @col34, @col35, @col36, @col37, @col38, @col39, @col40, @col41, @col42) SET GeoID=@col1, Year=${y}, own_code=@col2, Naics=@col3, Agglvl_code=@col4, size_code=@col5, Qtr=@col7, disclosure_code=@col8, Quarterly_establishments=@col9, Jobs_month1=@col10, Jobs_month2=@col11, Jobs_month3=@col12, Avg_weekly_wage=@col16; CREATE TEMPORARY TABLE FACT_jobs.staging2 (GeoID varchar(5) NOT NULL, Year INT NOT NULL, Naics varchar(6) NOT NULL, Agglvl_code varchar(2) NOT NULL, Qtr varchar(1) NOT NULL, Disclosure_percent DOUBLE, Quarterly_establishments INT, Jobs_month1 INT, Jobs_month2 INT, Jobs_month3 INT, Avg_weekly_wage DOUBLE); INSERT INTO FACT_jobs.staging2 SELECT GeoID, Year, Naics, Agglvl_code, Qtr, (1.0 - (CASE WHEN SUM(Quarterly_establishments) = 0 THEN 0 ELSE (SUM(CASE WHEN disclosure_code = \"N\" THEN Quarterly_establishments ELSE 0 END) / SUM(Quarterly_establishments)) END )), SUM(Quarterly_establishments), SUM(Jobs_month1), SUM(Jobs_month2), SUM(Jobs_month3), (CASE WHEN sum(Jobs_month1 + Jobs_month2 + Jobs_month3) = 0 THEN 0 ELSE sum(Avg_weekly_wage * Jobs_month1 + Avg_weekly_wage * Jobs_month2 + Avg_weekly_wage * Jobs_month3) / sum(Jobs_month1 + Jobs_month2 + Jobs_month3) END) FROM FACT_jobs.staging WHERE ((Agglvl_code > 13 AND Agglvl_code < 19) OR (Agglvl_code > 53 AND Agglvl_code < 59) OR (Agglvl_code > 73 AND Agglvl_code < 79)) AND own_code > 0 AND size_code = 0 GROUP BY GeoID, Year, Qtr, Agglvl_code, Naics; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q1_establishments, Q1_disclosure, Q1_avg_weekly_wage, Jan_jobs, Feb_jobs, Mar_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 1 ON DUPLICATE KEY UPDATE Q1_establishments = Quarterly_establishments, Q1_disclosure = Disclosure_percent, Q1_avg_weekly_wage = Avg_weekly_wage, Jan_jobs = Jobs_month1, Feb_jobs = Jobs_month2, Mar_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q2_establishments, Q2_disclosure, Q2_avg_weekly_wage, Apr_jobs, May_jobs, Jun_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 2 ON DUPLICATE KEY UPDATE Q2_establishments = Quarterly_establishments, Q2_disclosure = Disclosure_percent, Q2_avg_weekly_wage = Avg_weekly_wage, Apr_jobs = Jobs_month1, May_jobs = Jobs_month2, Jun_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q3_establishments, Q3_disclosure, Q3_avg_weekly_wage, Jul_jobs, Aug_jobs, Sep_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 3 ON DUPLICATE KEY UPDATE Q3_establishments = Quarterly_establishments, Q3_disclosure = Disclosure_percent, Q3_avg_weekly_wage = Avg_weekly_wage, Jul_jobs = Jobs_month1, Aug_jobs = Jobs_month2, Sep_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q4_establishments, Q4_disclosure, Q4_avg_weekly_wage, Oct_jobs, Nov_jobs, Dec_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 4 ON DUPLICATE KEY UPDATE Q4_establishments = Quarterly_establishments, Q4_disclosure = Disclosure_percent, Q4_avg_weekly_wage = Avg_weekly_wage, Oct_jobs = Jobs_month1, Nov_jobs = Jobs_month2, Dec_jobs = Jobs_month3;"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging (GeoID varchar(5) NOT NULL, Year INT NOT NULL, own_code varchar(1) NOT NULL, Naics varchar(6) NOT NULL, Agglvl_code varchar(2) NOT NULL, size_code varchar(1) NOT NULL, Qtr varchar(1) NOT NULL, disclosure_code varchar(1), Quarterly_establishments INT, Jobs_month1 INT, Jobs_month2 INT, Jobs_month3 INT, Avg_weekly_wage INT); LOAD DATA LOCAL INFILE \"temp/${y}.q1-q4.singlefile.csv\" INTO TABLE FACT_jobs.staging FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2, @col3, @col4, @col5, @col6, @col7, @col8, @col9, @col10, @col11, @col12, @col13, @col14, @col15, @col16, @col17, @col18, @col19, @col20, @col21, @col22, @col23, @col24, @col25, @col26, @col27, @col28, @col29, @col30, @col31, @col32, @col33, @col34, @col35, @col36, @col37, @col38, @col39, @col40, @col41, @col42) SET GeoID=@col1, Year=${y}, own_code=@col2, Naics=@col3, Agglvl_code=@col4, size_code=@col5, Qtr=@col7, disclosure_code=@col8, Quarterly_establishments=@col9, Jobs_month1=@col10, Jobs_month2=@col11, Jobs_month3=@col12, Avg_weekly_wage=@col16; CREATE TEMPORARY TABLE FACT_jobs.staging2 (GeoID varchar(5) NOT NULL, Year INT NOT NULL, Naics varchar(6) NOT NULL, Agglvl_code varchar(2) NOT NULL, Qtr varchar(1) NOT NULL, Disclosure_percent DOUBLE, Quarterly_establishments INT, Jobs_month1 INT, Jobs_month2 INT, Jobs_month3 INT, Avg_weekly_wage DOUBLE); INSERT INTO FACT_jobs.staging2 SELECT GeoID, Year, Naics, Agglvl_code, Qtr, (1.0 - (CASE WHEN SUM(Quarterly_establishments) = 0 THEN 0 ELSE (SUM(CASE WHEN disclosure_code = \"N\" THEN Quarterly_establishments ELSE 0 END) / SUM(Quarterly_establishments)) END )), SUM(Quarterly_establishments), SUM(Jobs_month1), SUM(Jobs_month2), SUM(Jobs_month3), (CASE WHEN sum(Jobs_month1 + Jobs_month2 + Jobs_month3) = 0 THEN 0 ELSE sum(Avg_weekly_wage * Jobs_month1 + Avg_weekly_wage * Jobs_month2 + Avg_weekly_wage * Jobs_month3) / sum(Jobs_month1 + Jobs_month2 + Jobs_month3) END) FROM FACT_jobs.staging WHERE ((Agglvl_code > 13 AND Agglvl_code < 19) OR (Agglvl_code > 53 AND Agglvl_code < 59) OR (Agglvl_code > 73 AND Agglvl_code < 79)) AND own_code > 0 AND size_code = 0 GROUP BY GeoID, Year, Qtr, Agglvl_code, Naics; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q1_establishments, Q1_disclosure, Q1_avg_weekly_wage, Jan_jobs, Feb_jobs, Mar_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 1 ON DUPLICATE KEY UPDATE Q1_establishments = Quarterly_establishments, Q1_disclosure = Disclosure_percent, Q1_avg_weekly_wage = Avg_weekly_wage, Jan_jobs = Jobs_month1, Feb_jobs = Jobs_month2, Mar_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q2_establishments, Q2_disclosure, Q2_avg_weekly_wage, Apr_jobs, May_jobs, Jun_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 2 ON DUPLICATE KEY UPDATE Q2_establishments = Quarterly_establishments, Q2_disclosure = Disclosure_percent, Q2_avg_weekly_wage = Avg_weekly_wage, Apr_jobs = Jobs_month1, May_jobs = Jobs_month2, Jun_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q3_establishments, Q3_disclosure, Q3_avg_weekly_wage, Jul_jobs, Aug_jobs, Sep_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 3 ON DUPLICATE KEY UPDATE Q3_establishments = Quarterly_establishments, Q3_disclosure = Disclosure_percent, Q3_avg_weekly_wage = Avg_weekly_wage, Jul_jobs = Jobs_month1, Aug_jobs = Jobs_month2, Sep_jobs = Jobs_month3; INSERT INTO FACT_jobs.QCEW (GeoID, Year, Naics, Agglvl_code, Q4_establishments, Q4_disclosure, Q4_avg_weekly_wage, Oct_jobs, Nov_jobs, Dec_jobs) SELECT GeoID, Year, Naics, Agglvl_code, Quarterly_establishments, Disclosure_percent, Avg_weekly_wage, Jobs_month1, Jobs_month2, Jobs_month3 FROM FACT_jobs.staging2 WHERE Qtr = 4 ON DUPLICATE KEY UPDATE Q4_establishments = Quarterly_establishments, Q4_disclosure = Disclosure_percent, Q4_avg_weekly_wage = Avg_weekly_wage, Oct_jobs = Jobs_month1, Nov_jobs = Jobs_month2, Dec_jobs = Jobs_month3;"
rm -r ./temp
done
#!/bin/bash
for y in {2008..2021}
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {2008..2023}
do
mkdir -p ./temp
for q in {1..4}
do
wget -O ./temp/temp_q${q}.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/lfsq_egan22d/1.0?c[sex]=T&c[age]=Y_GE15&c[time_period]=${y}-Q${q}&format=tsv"
done
mariadb -h "" -u "" -p -e "CREATE TEMPORARY TABLE FACT_jobs.staging_1 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q1.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_2 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q2.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_3 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q3.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_4 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q4.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q1) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_1 ON DUPLICATE KEY UPDATE EmpTh_Q1 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q2) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_2 ON DUPLICATE KEY UPDATE EmpTh_Q2 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q3) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_3 ON DUPLICATE KEY UPDATE EmpTh_Q3 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q4) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_4 ON DUPLICATE KEY UPDATE EmpTh_Q4 = Value;"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging_1 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q1.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_2 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q2.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_3 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q3.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_4 (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_q4.tsv\" INTO TABLE FACT_jobs.staging_1 FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q1) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_1 ON DUPLICATE KEY UPDATE EmpTh_Q1 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q2) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_2 ON DUPLICATE KEY UPDATE EmpTh_Q2 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q3) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_3 ON DUPLICATE KEY UPDATE EmpTh_Q3 = Value; INSERT INTO FACT_jobs.LFS (GeoID, Year, Nace, EmpTh_Q4) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging_4 ON DUPLICATE KEY UPDATE EmpTh_Q4 = Value;"
rm -r ./temp
done
#!/bin/bash
for y in {1995..2021}
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {1995..2022}
do
mkdir -p ./temp
wget -O ./temp/temp.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/nama_10r_3empers/1.0?c[wstatus]=EMP&c[time_period]=${y}&format=tsv"
mariadb -h "" -u "" -p -e "CREATE TEMPORARY TABLE FACT_jobs.staging (Area varchar(30) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp.tsv\" INTO TABLE FACT_jobs.staging FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.REA (GeoID, Year, Nace, EmpTh) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging ON DUPLICATE KEY UPDATE EmpTh = Value;"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging (Area varchar(30) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp.tsv\" INTO TABLE FACT_jobs.staging FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.REA (GeoID, Year, Nace, EmpTh) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-2),',',1), Value FROM FACT_jobs.staging ON DUPLICATE KEY UPDATE EmpTh = Value;"
rm -r ./temp
done
#!/bin/bash
for y in {2021..2021}
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
for y in {2021..2022}
do
mkdir -p ./temp
wget -O ./temp/temp_ENT.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sbs_ovw_act/1.0?c[indic_sbs]=ENT_NR&c[time_period]=${y}&format=tsv"
wget -O ./temp/temp_EMP.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sbs_ovw_act/1.0?c[indic_sbs]=EMP_NR&c[time_period]=${y}&format=tsv"
wget -O ./temp/temp_LC.tsv -N "https://ec.europa.eu/eurostat/api/dissemination/sdmx/3.0/data/dataflow/ESTAT/sbs_ovw_act/1.0?c[indic_sbs]=LC_EMP_TEUR&c[time_period]=${y}&format=tsv"
mariadb -h "" -u "" -p -e "CREATE TEMPORARY TABLE FACT_jobs.staging_ENT (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_ENT.tsv\" INTO TABLE FACT_jobs.staging_ENT FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_EMP (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_EMP.tsv\" INTO TABLE FACT_jobs.staging_EMP FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_LC (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_LC.tsv\" INTO TABLE FACT_jobs.staging_LC FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, Enterprises) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_ENT ON DUPLICATE KEY UPDATE Enterprises = Value; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, Employment) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_EMP ON DUPLICATE KEY UPDATE Employment = Value; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, LaborCost) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_LC ON DUPLICATE KEY UPDATE LaborCost = Value;"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "CREATE TEMPORARY TABLE FACT_jobs.staging_ENT (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_ENT.tsv\" INTO TABLE FACT_jobs.staging_ENT FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_EMP (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_EMP.tsv\" INTO TABLE FACT_jobs.staging_EMP FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; CREATE TEMPORARY TABLE FACT_jobs.staging_LC (Area varchar(50) NOT NULL, Value DOUBLE, Year INT); LOAD DATA LOCAL INFILE \"temp/temp_LC.tsv\" INTO TABLE FACT_jobs.staging_LC FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2) SET Area=@col1, Value=@col2, Year=${y}; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, Enterprises) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_ENT ON DUPLICATE KEY UPDATE Enterprises = Value; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, Employment) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_EMP ON DUPLICATE KEY UPDATE Employment = Value; INSERT INTO FACT_jobs.SBS (GeoID, Year, Nace, LaborCost) SELECT SUBSTRING_INDEX(Area,',',-1), Year, SUBSTRING_INDEX(SUBSTRING_INDEX(Area,',',-3),',',1), Value FROM FACT_jobs.staging_LC ON DUPLICATE KEY UPDATE LaborCost = Value;"
rm -r ./temp
done
#!/bin/bash
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
declare -a state_label=("ak" "al" "ar" "az" "ca" "co" "ct" "dc" "de" "fl" "ga" "hi" "ia" "id" "il" "in" "ks" "ky" "la" "ma" "md" "me" "mi" "mn" "mo" "ms" "mt" "nc" "nd" "ne" "nh" "nj" "nm" "nv" "ny" "oh" "ok" "or" "pa" "pr" "sc" "sd" "tn" "tx" "ut" "va" "vt" "wa" "wi" "wv" "wy")
for y in {2002..2020}
for y in {2002..2021}
do
for i in "${state_label[@]}"
do
mkdir -p ./temp
wget -N "https://lehd.ces.census.gov/data/lodes/LODES8/${i}/wac/${i}_wac_S000_JT00_${y}.csv.gz" -P ./temp
wget -N "https://lehd.ces.census.gov/data/lodes/LODES8/${i}/wac/${i}_wac_S000_JT00_${y}.csv.gz" -P ./temp`
FILE=./temp/${i}_wac_S000_JT00_${y}.csv
if [ -f "${FILE}.gz" ]; then
gunzip $FILE.gz
mariadb -h "" -u "" -p -e "LOAD DATA LOCAL INFILE '$FILE' INTO TABLE FACT_jobs.LODES8 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2, @col3, @col4, @col5, @col6, @col7, @col8, @col9, @col10, @col11, @col12, @col13, @col14, @col15, @col16, @col17, @col18, @col19, @col20, @col21, @col22, @col23, @col24, @col25, @col26, @col27, @col28, @col29, @col30, @col31, @col32, @col33, @col34, @col35, @col36, @col37, @col38, @col39, @col40, @col41, @col42, @col43, @col44, @col45, @col46, @col47, @col48, @col49, @col50, @col51, @col52, @col53) set GeoID=@col1,Year=${y},CNS01=@col9, CNS02=@col10, CNS03=@col11, CNS04=@col12, CNS05=@col13, CNS06=@col14, CNS07=@col15, CNS08=@col16, CNS09=@col17, CNS10=@col18, CNS11=@col19, CNS12=@col20, CNS13=@col21, CNS14=@col22, CNS15=@col23, CNS16=@col24, CNS17=@col25, CNS18=@col26, CNS19=@col27, CNS20=@col28;"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" -e "LOAD DATA LOCAL INFILE '$FILE' INTO TABLE FACT_jobs.LODES8 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES (@col1, @col2, @col3, @col4, @col5, @col6, @col7, @col8, @col9, @col10, @col11, @col12, @col13, @col14, @col15, @col16, @col17, @col18, @col19, @col20, @col21, @col22, @col23, @col24, @col25, @col26, @col27, @col28, @col29, @col30, @col31, @col32, @col33, @col34, @col35, @col36, @col37, @col38, @col39, @col40, @col41, @col42, @col43, @col44, @col45, @col46, @col47, @col48, @col49, @col50, @col51, @col52, @col53) set GeoID=@col1,Year=${y},CNS01=@col9, CNS02=@col10, CNS03=@col11, CNS04=@col12, CNS05=@col13, CNS06=@col14, CNS07=@col15, CNS08=@col16, CNS09=@col17, CNS10=@col18, CNS11=@col19, CNS12=@col20, CNS13=@col21, CNS14=@col22, CNS15=@col23, CNS16=@col24, CNS17=@col25, CNS18=@col26, CNS19=@col27, CNS20=@col28;"
fi
rm -r ./temp
done
......
#!/bin/bash
echo "*** Create database:"
mariadb -h "" -u "" -p < inputs/sql/create_db.sql
echo "*** Create tables:"
mariadb -h "" -u "" -p < inputs/sql/create_tables.sql
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
echo "*** Create database on host: $HOST with user: $USER"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" < inputs/sql/create_db.sql
echo "*** Create tables on host: $HOST with user: $USER"
mariadb -h "$HOST" -u "$USER" -p"$PASSWORD" < inputs/sql/create_tables.sql
#!/bin/bash
# Function to display help
function show_help() {
echo "Usage: $0 <host> <user> <password>"
echo ""
echo "This script sets up the database and loads the data."
echo ""
echo "Arguments:"
echo " host Database host"
echo " user Database user"
echo " password Database password"
}
# Check if help is requested
if [[ "$1" == "-h" || "$1" == "--help" ]]; then
show_help
exit 0
fi
# Check if all three parameters are provided
if [[ $# -ne 3 ]]; then
echo "Error: Missing arguments."
show_help
exit 1
fi
# Assign parameters to variables
HOST=$1
USER=$2
PASSWORD=$3
# Run the individual scripts, passing host, user, and password
# create database
./inputs/setup_db.sh
./inputs/setup_db.sh "$HOST" "$USER" "$PASSWORD"
# load LODES data into DB
./inputs/load_lodes.sh
./inputs/load_lodes.sh "$HOST" "$USER" "$PASSWORD"
# load BLS data into DB
./inputs/load_bls.sh
./inputs/load_bls.sh "$HOST" "$USER" "$PASSWORD"
# load EUROSTAT data into DB
./inputs/load_eurostat_rea.sh
./inputs/load_eurostat_lfs.sh
./inputs/load_eurostat_sbs.sh
./inputs/load_eurostat_rea.sh "$HOST" "$USER" "$PASSWORD"
./inputs/load_eurostat_lfs.sh "$HOST" "$USER" "$PASSWORD"
./inputs/load_eurostat_sbs.sh "$HOST" "$USER" "$PASSWORD"
# dump data and drop database on 'local' instance
./inputs/dump_and_drop.sh
./inputs/dump_and_drop.sh "$HOST" "$USER" "$PASSWORD"
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment